Qwen2.5 · FastFlowLM

⚡ Performance and Efficiency Benchmarks

This section reports the performance on NPU with FastFlowLM (FLM).

Note:

Results are based on FastFlowLM v0.9.33.

Under FLM’s default NPU power mode (Performance)

Newer versions may deliver improved performance.

Fine-tuned models show performance comparable to their base models.

Test System 1:

AMD Ryzen™ AI 7 350 (Kraken Point) with 32 GB DRAM; performance is comparable to other Kraken Point systems.

🚀 Decoding Speed (TPS, or Tokens per Second, starting @ different context lengths)

Model	HW	1k	2k	4k	8k	16k	32k
Qwen2.5-3B-Instruct	NPU (FLM)	23.5	22.5	19.8	16.8	12.5	8.4
Qwen2.5-VL-3B-Instruct	NPU (FLM)	23.5	22.5	19.8	16.8	12.5	8.4

🚀 Prefill Speed (TPS, or Tokens per Second, with different prompt lengths)

Model	HW	1k	2k	4k	8k	16k	32k
Qwen2.5-3B-Instruct	NPU (FLM)	660	809	899	891	741	532
Qwen2.5-VL-3B-Instruct	NPU (FLM)	660	809	899	891	741	532

🚀 Prefill TTFT with Image Input (Seconds)

Prefill time-to-first-token (TTFT) for Qwen2.5-VL-3B-Instruct on NPU (FastFlowLM) with different image resolutions.

Mid Resolution Images:

Model	HW	720p (1280×720)	1080p (1920×1080)
Qwen2.5-VL-3B-Instruct	NPU (FLM)	4.3	7.9

High Resolution Images:

Model	HW	2K (2560×1440)	4K (3840×2160)
Qwen2.5-VL-3B-Instruct	NPU (FLM)	13.3	36.4

This test uses a short prompt: “Describe this image.”