Qwen 2.5 1.5B

mobile-small

Best quality-to-size ratio for mobile. Strong at structured output.

Parameters1.5B
Context32,768 tokens
Download1.0 GB
Min RAM4 GB
Architectureqwen2

Available quantizations

QuantSizeQuality
Q4_K_M1.0 GBgood
Q5_K_M1.1 GBbetter
Q8_01.6 GBbest

About this model

By Alibaba Cloud — Qwen team

A small Qwen 2.5 model that punches above its weight for everyday chat and utility work on modest hardware. A good first local model for 8 GB machines.

What it's good at

  • Extraction — Reliable at pulling structured facts out of text.
  • Classification — Good at labeling and routing text.
  • Tool use — Not tuned for tool use — better as a reasoning or chat model.
  • Draft model — Small and fast — a good speculative-decoding draft model.

Real-world performance

Runs great on M1 · 16 GB RAM, M5 Max · 128 GB RAM — 25–190 tok/s.

Running it with llama.cpp

  • Q4_K_M is the sweet spot for most machines; Q8_0 squeezes out the best quality if you have the RAM to spare.

License: Apache 2.0

Check whether your machine can run this model — with real measured speeds from anonymous community benchmarks — on the Central-Intel compatibility page.