Llama 3.2 1B
mobile-smallCompact Llama for mobile. Good extraction quality with low resource use.
Parameters1.0B
Context131,072 tokens
Download0.7 GB
Min RAM4 GB
Architecturellama
Available quantizations
| Quant | Size | Quality |
|---|---|---|
| Q4_K_M | 0.7 GB | good |
| Q5_K_M | 0.8 GB | better |
| Q8_0 | 1.3 GB | best |
About this model
By Meta
Meta's smallest Llama 3.2 — built for on-device use. Quick summaries, short chats, and lightweight tasks on hardware where bigger models won't fit.
What it's good at
- Extraction — Reliable at pulling structured facts out of text.
- Classification — Good at labeling and routing text.
- Tool use — Not tuned for tool use — better as a reasoning or chat model.
- Draft model — Small and fast — a good speculative-decoding draft model.
Real-world performance
Runs great on M1 · 16 GB RAM, M5 Max · 128 GB RAM — 30–240 tok/s.
Running it with llama.cpp
- Q4_K_M is the sweet spot for most machines; Q8_0 squeezes out the best quality if you have the RAM to spare.
License: Meta Llama 3.2 License
Built with Llama
Check whether your machine can run this model — with real measured speeds from anonymous community benchmarks — on the Central-Intel compatibility page.