Llama 3.2 1B

mobile-small

Compact Llama for mobile. Good extraction quality with low resource use.

Parameters1.0B

Context131,072 tokens

Download0.7 GB

Min RAM4 GB

Architecturellama

Available quantizations

Quant	Size	Quality
Q4_K_M	0.7 GB	good
Q5_K_M	0.8 GB	better
Q8_0	1.3 GB	best

About this model

By Meta

Meta's smallest Llama 3.2 — built for on-device use. Quick summaries, short chats, and lightweight tasks on hardware where bigger models won't fit.

What it's good at

Extraction — Reliable at pulling structured facts out of text.
Classification — Good at labeling and routing text.
Tool use — Not tuned for tool use — better as a reasoning or chat model.
Draft model — Small and fast — a good speculative-decoding draft model.

Real-world performance

Runs great on M1 · 16 GB RAM, M5 Max · 128 GB RAM — 30–240 tok/s.

Running it with llama.cpp

Q4_K_M is the sweet spot for most machines; Q8_0 squeezes out the best quality if you have the RAM to spare.

License: Meta Llama 3.2 License

Built with Llama

Check whether your machine can run this model — with real measured speeds from anonymous community benchmarks — on the Central-Intel compatibility page.