⚡ Performance and Efficiency Benchmarks
This section reports the performance on NPU with FastFlowLM (FLM).
Note:
- Results are based on FastFlowLM v0.9.30.
- Under FLM’s default NPU power mode (Performance)
- Newer versions may deliver improved performance.
- Fine-tuned models show performance comparable to their base models (e.g., LFM2-1.2B vs. LFM2.5-Instruct-1.2B/LFM2.5-Thinking-1.2B, and LFM2-2.6B vs. LFM2-2.6B-Transcript).
Test System 1:
AMD Ryzen™ AI 7 350 (Kraken Point) with 32 GB DRAM; performance is comparable to other Kraken Point systems.
🚀 Decoding Speed (TPS, or Tokens per Second, starting @ different context lengths)
| Model | HW | 1k | 2k | 4k | 8k | 16k | 32k |
|---|---|---|---|---|---|---|---|
| LFM2-1.2B | NPU (FLM) | 62 | 61 | 59 | 56 | 52 | 46 |
| LFM2-2.6B | NPU (FLM) | 30 | 30 | 30 | 29 | 27 | 25 |
🚀 Prefill Speed (TPS, or Tokens per Second, with different prompt lengths)
| Model | HW | 1k | 2k | 4k | 8k | 16k | 32k |
|---|---|---|---|---|---|---|---|
| LFM2-1.2B | NPU (FLM) | 1537 | 2172 | 2521 | 2677 | 2359 | 1916 |
| LFM2-2.6B | NPU (FLM) | 747 | 1004 | 1193 | 1284 | 1210 | 1053 |
Test System 2:
AMD Ryzen™ AI 9 370 (Strix Point) with 32 GB DRAM; performance is comparable to other Strix Point and Strix Halo Point systems.
🚀 Decoding Speed (TPS, or Tokens per Second, starting @ different context lengths)
| Model | HW | 1k | 2k | 4k | 8k | 16k | 32k |
|---|---|---|---|---|---|---|---|
| LFM2-1.2B | NPU (FLM) | 56 | 55 | 54 | 51 | 49 | 45 |
| LFM2-2.6B | NPU (FLM) | 27 | 26 | 26 | 25 | 24 | 22 |
🚀 Prefill Speed (TPS, or Tokens per Second, with different prompt lengths)
| Model | HW | 1k | 2k | 4k | 8k | 16k | 32k |
|---|---|---|---|---|---|---|---|
| LFM2-1.2B | NPU (FLM) | 1487 | 1992 | 2436 | 2518 | 2257 | 1839 |
| LFM2-2.6B | NPU (FLM) | 715 | 932 | 1107 | 1206 | 1152 | 1008 |