FastVLM: Apple's Extremely Fast Vision Language Model

Runs directly on iPhone, first token output up to 85x faster!

How does it work?

Understand Image
Image → Tokens
Tokens → Language

FastVLM efficiently understands image content, converts it into compact tokens, and then uses these tokens to quickly generate accurate text descriptions or answers.

Core Advantages

Extreme Speed

Astonishing first token output speed! FastVLM-0.5B is 85x faster than LLaVA-OneVision. FastVLM-7B (with Qwen2) is 7.9x faster than Cambrian-1-8B (at similar accuracy).

Compact & Efficient

Small model size, easier deployment. FastVLM-0.5B is 3.4x smaller than LLaVA-OneVision. Ideal for on-device use like iPhone, iPad, Mac.

On-Device Intelligence

No cloud dependency, runs directly on your Apple device, protecting privacy and responding faster.

Perfectly adapted to the iOS/Mac ecosystem, empowering edge AI applications.

Examples Showcase

FastVLM counting example

Object Counting

FastVLM handwriting recognition example

Handwriting Recognition

FastVLM Emoji understanding example

Emoji Understanding

Performance Comparison

FastVLM accuracy vs. latency comparison chart

Application Scenarios

Image Captioning

Automatically generate vivid and accurate text descriptions for images.

Visual Question Answering (VQA)

Understand image content and answer questions about the image.

Image Recognition & Analysis

Recognize objects, text, or data in images for intelligent analysis.

Especially suitable for scenarios requiring real-time image and text interaction.

Model Downloads

PyTorch Checkpoints

ModelStageDownload Link
FastVLM-0.5B2fastvlm_0.5b_stage2
FastVLM-0.5B3fastvlm_0.5b_stage3
FastVLM-1.5B2fastvlm_1.5b_stage2
FastVLM-1.5B3fastvlm_1.5b_stage3
FastVLM-7B2fastvlm_7b_stage2
FastVLM-7B3fastvlm_7b_stage3

Apple Silicon Compatible Models

For convenience running on Apple Silicon devices, we provide models in pre-converted formats:

FastVLM-0.5B (Stage 3, fp16) Download
FastVLM-1.5B (Stage 3, int8) Download
FastVLM-7B (Stage 3, int4) Download

Learn More

Explore the technical details of FastVLM, view the source code, or read the research paper.

About FastVLM

FastVLM: Apple's ultra-fast vision language model that runs directly on iPhone, with first token output up to 85x faster!

© 2025 FastVLM. All rights reserved. | Privacy Policy | Terms of Service