FastFlowLM FastFlowLM
How It Works Models Benchmarks Demos Test Drive
Testimonials
Team Community News Roadmap
Docs
How It Works Models Benchmarks Demos Test Drive
Testimonials
Team Community News Roadmap
Docs
GitHub Discord YouTube Email

Docs

Phi

Overview
Install

Instructions

Overview Sys Command and CLI Mode Server Mode Server Basics API / Client Usage Open WebUI Tool Calling LangChain RAG LangChain Web Search Obsidian Microsoft AI Toolkit

Models

Overview LLaMA DeepSeek Qwen Gemma MedGemma gpt-oss LiquidAI/LFM Microsoft/Phi Whisper EmbeddingGemma

Benchmarks

Overview LLaMA3 Gemma3 Qwen3 gpt-oss LiquidAI/LFM2 Microsoft/Phi4

🧩 Model Card: microsoft/Phi-4-mini-instruct

  • Type: Text-to-Text
  • Think: No
  • Tool Calling Support: No
  • Base Model: microsoft/Phi-4-mini-instruct
  • Quantization: Q4_1
  • Max Context Length: 128k tokens
  • Default Context Length: 32k tokens (change default)
  • Set Context Length at Launch

▶️ Run with FastFlowLM in PowerShell:

flm run phi4-mini-it:4b

FastFlowLM The leading LLM inference runtime for parallel NPU architectures

© 2026 FastFlowLM. All rights reserved.

Site

Technology Testimonials Company Docs

Connect

GitHub Discord YouTube Email