2026-04-22 concept privacy pii ml openai on-device gpt-oss redaction moe token-classification

privacy-filter — A Real Model for the PII Problem

openai/privacy-filter (Apache-2.0, Python, 173★) is OpenAI’s open-source PII detection and masking model. It’s the boring-but-important piece that was missing from most agent pipelines: not a regex library, not an LLM prompt — a dedicated 1.5B-parameter model fine-tuned specifically for token-level PII classification, runnable on a laptop.

Up to now the PII step in any agent stack was either a fragile regex (misses “John Smith” anywhere, over-matches every date), or a round-trip to GPT-4/Claude with a “detect PII in this text” prompt (slow, expensive, non-deterministic, leaks the PII to the provider you’re trying to protect from). privacy-filter is the third option: a small local model that does one job well.

The architecture is interesting

The model is a pre-norm transformer encoder with 8 blocks, grouped-query attention (14 Q heads, 2 KV heads), and a sparse MoE layer (128 experts, top-4 routing — only 50M of the 1.5B parameters are active per token). 128K context window. Per-token logits over 33 classes (8 PII categories × 4 boundary tags via BIOES + background), decoded into coherent spans with a constrained Viterbi decoder using tunable linear-chain transitions.

The clever bit: the checkpoint was bootstrapped from gpt-oss (their open autoregressive model) and converted into a bidirectional classifier, instead of training a masked-LM from scratch. They’re reusing pretrained weights across form-factors — one more data point for the “pretrain once, adapt many” pattern. Supervised fine-tuning on labeled PII spans does the rest; they call it “data-efficient finetuning.”

The 8 PII categories

account_number   private_address   private_email      private_person
private_phone    private_url       private_date       secret

“Private” as a prefix is the signal: the model is trying to distinguish personal references (Alice Smith lives at 12 Oak St) from public entities (Tim Cook, CEO of Apple). This is exactly where regex falls over.

How to use it

pip install -e .
opf "Alice was born on 1990-01-02 at 123 Main St."
opf -f ./conversation.txt

Model weights auto-download to ~/.opf/privacy_filter. At runtime you can tune the six transition-bias parameters of the Viterbi decoder to trade precision for recall — useful because agent pipelines have very different tolerances (batch logging wants high recall even at the cost of some over-redaction; live customer-facing sanitization wants high precision so you don’t mangle legitimate text).

Where it fits in our stack

This closes a real gap in the privacy-as-architecture approach we’ve been pushing. That page argues for “no accounts, on-device, zero data”; privacy-filter is the component that makes that tractable when you do have to send text to an LLM provider — strip PII locally first, send the redacted version, reconstitute on return if needed. It sits alongside Apple Foundation Models as another “real local model for a real privacy primitive” building block.

For anything client-facing in Invoice CRM or the 6-crm/ workspace, or the Gmail/Drive sync in Solograph’s personal layer, this is the sanitizer I want between “user text” and “provider API call.” The alternatives (presidio, spacy NER, regex stacks) all have worse precision and worse multilingual fallback. privacy-filter is primarily English but reports “selected multilingual robustness” — worth benchmarking on Russian before relying on it.

What it is NOT

OpenAI is careful in the README: “not an anonymization, compliance, or safety guarantee.” It’s a detection/redaction model, not a certification. That means:

No differential privacy, no k-anonymity — purely span-level masking
Static label policy — can’t add new PII categories without fine-tuning
Drops on non-Latin scripts and underrepresented naming patterns — false negatives on unusual names, fragmented boundaries on mixed content
Over-redacts public figures — Tim Cook may get masked as a private person

You still need the policy layer (what to do with a flagged span), the audit log (what was redacted), and the legal framing. The model is one leaf of the tree.

The pattern worth noticing

Every few months OpenAI open-sources exactly the commodity piece you’d have otherwise built yourself badly. Whisper killed everyone’s homegrown STT. gpt-oss is now the cheap pretraining base for downstream classifiers — this repo is the first concrete example of that. Expect more: a small local toxicity classifier, a small local moderation model, a small local summarizer — all bootstrapped from gpt-oss, all Apache-2.0, all undercutting the “call our API for this” business.

For solo founders the calculus is: stop building the commodity layer, compose from these.