← Wiki

Bonsai — ternary-weight 500M LLM running in browser via WebGPU

Key Takeaways

Bonsai is a 500M-parameter ternary-weight language model from deepgrove (Apache-2). Weights take only three values: -1, 0, +1. Built on Llama architecture with Mistral tokenizer, trained on less than 5B tokens (one-two orders of magnitude less than typical 0.5B models), yet competitive with Qwen 2.5 0.5B (46.96 vs 48.22 avg across ARC, HellaSwag, PiQA, MMLU, Winogrande).

The HuggingFace Space webml-community/bonsai-webgpu runs this model entirely in the browser via WebGPU — no server, no API key, no data upload. Inference is hardware-accelerated through the browser’s GPU. This is the transformers.js + ONNX Runtime Web pipeline hitting a sweet spot: quantized small model + WebGPU compute = usable LLM on the client side.

Why it matters

Browser LLM = privacy ceiling raised. For web apps, “on-device AI” usually meant “call our API and trust our privacy policy”. WebGPU + small quantized models makes it literal: the model file downloads once, runs in the tab, data never leaves. Same privacy guarantee as Apple Foundation Models on iOS, but for web.

Ternary quantization vs 1-bit: common misnomer — “1-bit” LLMs (BitNet b1.58, Bonsai) actually use ternary weights (1.58 bits). Trade-off: ~10× smaller memory than fp16, dramatically lower bandwidth, but requires custom kernels. Bonsai currently runs at 16-bit precision while mixed-precision kernels are in development — so the full efficiency story is still ahead.

Training efficiency: <5B tokens for competitive 0.5B model is unusual. Points to DCLM-Pro + Fineweb-Edu data quality mattering more than scale at this tier. Similar to our own SGR thesis: inference-time structure beats more training.

Founder implication: web MVPs with AI features no longer need OpenAI API on the critical path. A 500M model in-browser can handle classification, extraction, simple QA, schema-guided output. Zero marginal cost per call — freemium actually works because there is no unit cost to subsidize.

Trade-offs

Connections

References

Related