Local Server (Server Mode)

Activate “Server Mode”

Open PowerShell and enter:

flm serve llama3.2:1b

You can choose to change the server port (default is 52625) by going to System Properties → Environment Variables, then modifying the value of FLM_SERVE_PORT.

⚠️ Be cautious: If you update this value, be sure to change any higher-level port settings in your application as well to ensure everything works correctly.

NPU Model Loading Behavior

FLM can keep one NPU model loaded per type at a time:

asr
llm
embedding

Different model types can run together (for example, one LLM and one embedding model).

Load all three model types in one server process (default port)

flm serve lfm2:1.2b -e 1 -a 1

Run model types in separate server processes (different ports)

flm serve -e 1 --port 52625
flm serve -a 1 --port 52627
flm serve lfm2:1.2b --port 52628

Set Context Length at Launch

The default context length for each model can be found here.

To change it at launch, in PowerShell, run:

flm serve llama3.2:1b --ctx-len 8192

Internally, FLM enforces a minimum context length of 512. If you specify a smaller value, it will automatically be adjusted up to 512.

If you enter a context length that is not a power of 2, FLM automatically rounds it up to the nearest power of 2. For example: input 8000 → adjusted to 8192.

Show Server Port

Show current FLM port (default) in PowerShell:

  flm port

Set Server Port at Launch

Set a custom port at launch:

  flm serve llama3.2:1b --port 8000
  flm serve llama3.2:1b -p 8000

⚠️ --port (-p) only affects the current run; it won’t change the default port.

Set Request Queue in Server Mode

Since v0.9.10, FLM adds a request queue in server mode to prevent overload under high traffic.
This keeps processing stable and orderly when multiple requests arrive.

Default: 10
Change with: --q-len (or -q)

To change it at launch, in PowerShell, run:

flm serve llama3.2:1b --q-len 20

Customizable Socket Connections in Server Mode

Set the maximum number of concurrent socket connections to control network resource usage.
👉 Recommended: set sockets equal to or greater than the queue length.

Default: 10
Change with: --socket (or -s)

To change it at launch, in PowerShell, run:

flm serve llama3.2:1b --socket 20

CORS lets browser apps hosted on a different origin call your FLM server safely.

Enable CORS

flm serve --cors 1

Disable CORS

flm serve --cors 0

⚠️ Default: CORS is enabled.
🔒 Security tip: Disable CORS (or restrict at your proxy) if your server is exposed beyond localhost (127.0.0.1).

Suppress Logs for Higher-Level Applications

When FLM is run as a subprocess inside another application, use quiet mode to reduce FLM log output:

flm serve --quiet

This keeps the parent application’s logs cleaner and easier to read.

Local Server (Server Mode)

Activate “Server Mode”

NPU Model Loading Behavior

Load all three model types in one server process (default port)

Run model types in separate server processes (different ports)

Set Context Length at Launch

Show Server Port

Set Server Port at Launch

Set Request Queue in Server Mode

Customizable Socket Connections in Server Mode

Cross-Origin Resource Sharing (CORS)

Suppress Logs for Higher-Level Applications