Nanbeige4.1 · FastFlowLM

⚡ Performance and Efficiency Benchmarks

This section reports the performance on NPU with FastFlowLM (FLM).

Note:

Results are based on FastFlowLM v0.9.38.

Under FLM’s default NPU power mode (Performance)

Newer versions may deliver improved performance.

Fine-tuned models show performance comparable to their base models.

Test System 1:

AMD Ryzen™ AI 7 350 (Kraken Point) with 32 GB DRAM; performance is comparable to other Kraken Point systems.

🚀 Decoding Speed (TPS, or Tokens per Second, starting @ different context lengths)

Model	HW	1k	2k	4k	8k	16k	32k
Nanbeige4.1-3B	NPU (FLM)	23.5	22.3	20.4	17.3	13.3	9.0

🚀 Prefill Speed (TPS, or Tokens per Second, with different prompt lengths)

Model	HW	1k	2k	4k	8k	16k	32k
Nanbeige4.1-3B	NPU (FLM)	612	731	742	686	523	343