LLM Providers & Startup Credits
Where to get LLMs: free tiers, startup credit programs, open-source models. Organized from cheapest to most powerful.
Startup Credit Programs
Apply for credits when you have a registered company or VC backing.
| Provider | Credits | Requirements | What you get | Link |
|---|---|---|---|---|
| Together AI | $15K–$50K | Build (<$5M raised): $15K + 3h eng. Scale ($5-10M): $30K + 6h. Grow (>$10M): $50K + 10h | Open model inference + fine-tuning + GTM + VC network | together.ai/startup-accelerator |
| Modal | $25K | Seed–Series A, VC-backed or raised >$1M | GPU, inference, training, batch, sandboxes | modal.com/startups |
| Cerebras | $22.5K | Startup deal | Inference value + priority support + co-marketing | cerebras.ai/startup-deal |
| Fireworks | Varies | AI-native startups | Platform, tools, expertise | fireworks.ai/startup-program |
| Groq | $10K | Groq for Startups eligibility | Fast inference credits (LPU) | console.groq.com/docs/billing-faqs |
| Google Cloud | $2K–$350K | Google for Startups | Vertex AI, Gemini API, GCP credits | cloud.google.com/startup |
| AWS | $1K–$100K | AWS Activate | Bedrock (Claude, Llama, etc), SageMaker | aws.amazon.com/activate |
| Azure | $1K–$150K | Microsoft for Startups | Azure OpenAI, Cognitive Services | startups.microsoft.com |
| OpenAI | $2.5K+ (VC referral) | VC partner referral required, varies by stage. Grove program for pre-idea founders (closed cohorts) | GPT-4/5 API credits | openai.com/startups |
| Anthropic | Varies | Claude for Startups — credits, resources, community | Claude API credits | claude.com/programs/startups |
| Cloudflare | $5K–$250K | Bootstrapped: $5K. Early: $25K. Seed ($1-5M): $100K. High growth/Tier 1 VC: $250K. Workers AI capped at $50K | CDN, Workers AI (free models), R2 storage, Pages, zero egress | cloudflare.com/forstartups |
| Replicate | $1K–$10K | Startup program | GPU inference for open models | replicate.com |
Community Signal (Reddit, Apr 2026)
- Together AI — wide model selection, good price/quality. Few post-mortem reviews of accelerator, but tiers are clear
- Cerebras — complaints about limits/queues, signup pauses due to load. Great speed when access is available
- Groq — very high speed, good price/perf. Rate-limit and burst error discussions
- Fireworks — technically solid, discussions focus on pricing/stability rather than startup program
- Cloudflare — Workers AI has free tier with open models, startup credits up to $250K. Already on CF Pages = easy upgrade path
- xAI / Grok — no public startup program found (check manually)
- OpenRouter — no accelerator program, standard billing model
No-VC / Bootstrapped Friendly
| Provider | Credits | Requirements | Link |
|---|---|---|---|
| DeepInfra DeepStart | 1B free tokens | Startup application, no VC required | deepinfra.com/deepstart |
| NVIDIA Inception | Free membership → unlocks $100K AWS + $150K Nebius | 1+ dev, website, incorporated, <10 years. No equity, no VC | nvidia.com/startups |
| Mistral (Mistralship) | Up to $30K | Startup program application | mistral.ai |
| Microsoft Founders Hub | $1K–$150K | No VC required, business verification | startups.microsoft.com |
| DigitalOcean Hatch | Up to $100K | Raised <$10M, website, AI-native preferred | digitalocean.com/startups |
| OVHcloud | Up to €100K | Pre-seed/seed (START) or Series A+ (SCALE) | startup.ovhcloud.com |
| Scaleway | Up to €36K | EU-based or EU-focused | scaleway.com/startup-program |
| Intel Liftoff | Cloud credits (varies) | Early-stage to Series B, must have product | intel.com/liftoff |
| xAI | $25 signup + $150/mo | Opt-in data sharing (irreversible) | x.ai/api |
Credit Stacking Strategy
Apply in this order for maximum runway (no VC required):
- NVIDIA Inception (free) → unlocks $100K AWS + $150K Nebius partner credits
- Microsoft Founders Hub → up to $150K Azure (incl Azure OpenAI)
- DeepInfra DeepStart → 1B tokens free
- Mistral → $30K
- Modal → $25K (if seed-funded)
- Stack the rest: Cerebras $22.5K + Groq $10K + CF $5-250K
Total potential: $500K+ without VC funding.
Credits and requirements change — check provider pages for current terms.
Free Tiers (no application needed)
Start building today for $0.
| Provider | Free Tier | Models | Limits | Best for |
|---|---|---|---|---|
| Google AI Studio | Free | Gemini 2.5 Pro/Flash | 15 RPM, rate limited | Prototyping, testing |
| Groq | Free | Llama 3, Mixtral, Gemma | Rate limited | Fast inference (LPU, 840 tok/s) |
| Cerebras | 1M tok/day free | Llama, Qwen, GPT-OSS | 1M tokens/day, no card | Fastest inference (2,200 tok/s) |
| SambaNova | $5 credits (30 days) | Llama, Qwen, DeepSeek | Limited | Fast RDU inference |
| OpenRouter | 29 free models | GPT-OSS 120B, Nemotron 120B, Llama 70B, DeepSeek R1, Gemma 4 | 20 RPM, 200 req/day per model | Routing + fallback, no card |
| Cloudflare Workers AI | 10K neurons/day free | Llama, Mistral, Gemma, Phi, etc | 10K neurons/day, zero egress | Edge inference, startup credits up to $50K |
| NVIDIA NIM | Free preview | Llama, Mistral, CodeLlama, DeepSeek, etc | Rate limited, API key required | Code gen, inference preview |
| Hugging Face | Free inference | All open models | Rate limited, queue | Testing open models |
| Ollama | Free (local) | Any GGUF model | Your hardware | Privacy, offline |
| LM Studio | Free (local) | Any GGUF model | Your hardware | GUI for local models |
| MLX | Free (local) | Apple Silicon models | M1+ Mac | On-device, fast |
Inference Pricing ($/1M tokens, Apr 2026)
Prices dropped ~80% in 12 months. Format: input / output.
Frontier (closed)
| Provider | Model | Input | Output |
|---|---|---|---|
| OpenAI | GPT-5 | ~$5 | ~$15 |
| Anthropic | Claude Opus 4 | ~$15 | ~$75 |
| Anthropic | Claude Sonnet 4 | ~$3 | ~$15 |
| Gemini 2.5 Pro | ~$1.25 | ~$10 |
Open Models — 70B class
| Provider | Llama 3.3 70B | Qwen 2.5 72B | DeepSeek V3 | Speed |
|---|---|---|---|---|
| Novita AI | $0.14 / $0.40 | $0.38 / $0.40 | $0.27 / $0.40 | GPU |
| DeepInfra | $0.23 / $0.40 | $0.12 / $0.39 | $0.27 / $1.10 | GPU |
| Hyperbolic | $0.40 | $0.40 | $0.25 | GPU |
| Groq | $0.59 / $0.79 | — | — | 394 tok/s (LPU) |
| Together AI | $0.88 | — | $0.60 / $1.70 | GPU |
| OpenRouter | FREE | $0.23 | FREE (R1) | varies |
Open Models — 120B+ class
| Provider | GPT-OSS 120B | Nemotron 120B | Qwen3 235B |
|---|---|---|---|
| Groq | $0.15 / $0.60 | — | — |
| Together AI | $0.15 / $0.60 | — | — |
| Fireworks | $0.15 / $0.60 | — | — |
| Cerebras | $0.35 / $0.75 | — | $0.60 / $1.20 |
| OpenRouter | FREE | FREE | — |
Nemotron 3 Super 120B — Full Provider Comparison
Best open model for price/quality ratio (MoE: 12B active / 120B total, 1M context).
| Provider | Input $/1M | Output $/1M | 100M tokens (50/50) | tok/s | Source |
|---|---|---|---|---|---|
| DeepInfra | $0.10 | $0.50 | $30 | 459 | deepinfra.com |
| Hyperbolic | $0.30 | $0.30 | $30 | ~400 | hyperbolic.xyz |
| W&B | ~$0.15 | ~$0.55 | $35 | ~440 | artificialanalysis.ai |
| Baseten | ~$0.18 | ~$0.65 | $41 | 485 | baseten.co |
| Nebius | ~$0.20 | ~$0.70 | $45 | 464 | nebius.ai |
| Digital Ocean | $0.30 | $0.65 | $48 | — | NVIDIA NIM partners |
| Bitdeer AI | $0.20 | $0.80 | $50 | — | NVIDIA NIM partners |
| CoreWeave | $0.20 | $0.80 | $50 | — | NVIDIA NIM partners |
| Cloudflare | $0.50 | $1.50 | $100 | ~80 | CF Workers AI |
| OpenRouter | FREE | FREE | $0 | varies | rate limited (20 RPM) |
| NVIDIA NIM | FREE | FREE | $0 | 449 | 1000 credits, 40 RPM |
Winner: DeepInfra — $30 per 100M tokens, 459 tok/s. If speed critical: Baseten ($41, 485 tok/s). Cloudflare 3.3x more expensive and 6x slower — use only for edge/prototype.
Top Open-Source Models — Price/Quality Leaders (Apr 2026)
Best bang for buck across all open models and providers.
| Model | Params (active) | Best Provider | In $/1M | Out $/1M | 100M (50/50) | tok/s |
|---|---|---|---|---|---|---|
| Qwen3 235B (A22B) | 235B (22B) | DeepInfra | $0.12 | $0.39 | $26 | ~300 |
| Qwen3 32B | 32B | Novita AI | $0.10 | $0.45 | $28 | ~500 |
| Nemotron 120B (A12B) | 120B (12B) | DeepInfra | $0.10 | $0.50 | $30 | 459 |
| DeepSeek V3.2 | 671B MoE | DeepSeek API | $0.28 | $0.42 | $35 | ~200 |
| GPT-OSS 120B | 120B | Groq | $0.15 | $0.60 | $38 | ~400 |
| Qwen3 Coder 480B (A35B) | 480B (35B) | DeepInfra Turbo | $0.22 | $0.90 | $56 | 173 |
| Llama 4 Maverick | 400B MoE | Groq | $0.50 | $0.77 | $64 | ~300 |
| DeepSeek R1 (reasoning) | 671B MoE | Novita AI | $0.70 | $2.50 | $160 | — |
Best picks (at 80/20 input/output split — typical for agents):
| Use Case | Model | Provider | 100M cost (80/20) | Speed |
|---|---|---|---|---|
| Best overall | Qwen3 235B | DeepInfra | $17 | ~300 tok/s |
| Fastest powerful | Nemotron 120B | DeepInfra | $18 | 459 tok/s |
| Cheapest with cache | DeepSeek V3.2 | DeepSeek API | $11 | ~200 tok/s |
| Fastest overall | GPT-OSS 120B | Groq | $24 | 500 tok/s |
| Best for coding | Qwen3 Coder 480B | DeepInfra | $36 | 173 tok/s |
| Best reasoning | DeepSeek R1 | Novita AI | $106 | — |
| Budget $100 | Qwen3 235B | DeepInfra | ~600M tokens | — |
Model ID for API: Qwen/Qwen3-235B-A22B (OpenAI-compatible endpoint at api.deepinfra.com/v1/openai)
Nebius Token Factory — Batch = 50% Off
Nebius offers base (cheap) and fast (2x price) flavors. Batch inference = automatic 50% discount on base prices.
| Model | Nebius base in/out | Nebius batch in/out | 100M batch (80/20) |
|---|---|---|---|
| Qwen3 235B | $0.20 / $0.60 | $0.10 / $0.30 | $14 |
| Qwen3 32B | $0.10 / $0.30 | $0.05 / $0.15 | $7 |
| Qwen3 30B-A3B | $0.10 / $0.30 | $0.05 / $0.15 | $7 |
| Qwen3 Coder 480B | $0.40 / $1.80 | $0.20 / $0.90 | $34 |
| GPT-OSS 120B | $0.15 / $0.60 | $0.08 / $0.30 | $12 |
Nebius batch Qwen3 235B = $14/100M — cheaper than DeepInfra realtime ($17). Best for non-realtime workloads (research, bulk analysis, data processing).
Also notable: Qwen3 30B-A3B (only 3B active params, 30B total MoE) at $0.10/$0.30 — frontier quality at micro-model cost.
DeepSeek Cache Hack
DeepSeek auto-caches prompt prefixes. Cache hit = $0.03/M instead of $0.28/M (10x savings). For repeated system prompts this means ~$5 effective per 100M tokens instead of $35. Best for agents with stable system prompts.
For comparison: frontier models at 100M tokens (80/20 split)
| Model | 100M cost | vs Qwen3 235B |
|---|---|---|
| Qwen3 235B (open, DeepInfra) | $17 | baseline |
| Gemini 2.5 Flash (Google) | $132 | 8x more |
| Claude Sonnet 4 (Anthropic) | $540 | 32x more |
| GPT-5 (OpenAI) | $700 | 41x more |
| Claude Opus 4 (Anthropic) | $2,700 | 159x more |
Speed Leaderboard (custom hardware)
| Provider | Hardware | 8B tok/s | 70B tok/s | 120B tok/s | Note |
|---|---|---|---|---|---|
| Cerebras | WSE-3 (wafer, 4T transistors) | ~2,200 | ~1,500 | ~3,000 | 6x faster than Groq. OpenAI partner |
| Groq | LPU (SRAM ASIC) | 840 | 394 | 500 | Sub-100ms TTFT. Wide model selection |
| SambaNova | SN50 (RDU, Feb 2026) | ~988 | ~536 | — | 5x claim, 10M context support |
| GPU providers | H100/B200 | ~300 | ~150 | ~80 | DeepInfra, Together, Fireworks |
Groq Pricing (from LPU, exact Apr 2026)
| Model | TPS | Input $/1M | Output $/1M | 100M (50/50) |
|---|---|---|---|---|
| GPT-OSS 20B | 1,000 | $0.075 | $0.30 | $19 |
| Llama 4 Scout | 594 | $0.11 | $0.34 | $23 |
| GPT-OSS 120B | 500 | $0.15 | $0.60 | $38 |
| Qwen3 32B | 662 | $0.29 | $0.59 | $44 |
| Llama 3.3 70B | 394 | $0.59 | $0.79 | $69 |
| Llama 3.1 8B | 840 | $0.05 | $0.08 | $7 |
Best Value Picks
| Use Case | Provider | Why |
|---|---|---|
| Cheapest 70B | Novita AI or OpenRouter (free) | $0.14/M or $0 |
| Cheapest 120B+ | OpenRouter free (GPT-OSS, Nemotron) | $0 with rate limits |
| Fastest | Cerebras | 2-6x faster than competition |
| Fast + cheap | Groq | Good speed/price, $0.05-0.79 |
| Most models | Together AI or OpenRouter | 200+ models |
| DeepSeek R1 cheap | Novita AI | $0.70 / $2.50 |
| Free no card | Cerebras free tier | 1M tokens/day |
| 29 free models | OpenRouter | Including GPT-OSS, Nemotron, Llama, DeepSeek R1 |
Key Providers
| Provider | Differentiator | Link |
|---|---|---|
| OpenAI | Best ecosystem, GPT-5 | openai.com |
| Anthropic | Best coding, Claude | anthropic.com |
| 1M+ context, multimodal | ai.google.dev | |
| OpenRouter | Aggregator, 29 free models, fallback routing | openrouter.ai |
| Groq | LPU custom ASIC, very fast | groq.com |
| Cerebras | Wafer-scale, fastest inference | cerebras.ai |
| Together AI | Open models, fine-tuning, ATLAS engine | together.ai |
| Fireworks | Low latency, batch 50% discount | fireworks.ai |
| SambaNova | Custom RDU chip, fast | sambanova.ai |
| Novita AI | Cheapest prices across the board | novita.ai |
| Hyperbolic | Good prices, open models | hyperbolic.xyz |
| DeepInfra | H100/B200, competitive pricing | deepinfra.com |
| Nebius | Wide Qwen3 selection, batch 50% off, base+fast flavors | nebius.com/token-factory |
| Mistral AI | Own models (Large/Medium/Small), EU data residency, free experiment tier | mistral.ai |
| xAI (Grok) | Grok 4, 2M context window (largest), X/Twitter integration | x.ai/api |
| Cohere | RAG-optimized, Rerank model, multilingual Aya, 1K free calls/mo | cohere.com |
| SiliconFlow | Chinese provider, very cheap, free Qwen3-8B/DeepSeek-7B | siliconflow.com |
| Featherless AI | Flat-rate: $10/mo (15B), $200/mo (all 25K+ HuggingFace models, unlimited) | featherless.ai |
| GMI Cloud | H200 at $2.60/hr, some free models, 37% cheaper than hyperscalers | gmicloud.ai |
| Vast.ai | GPU marketplace, H100 from $1.87/hr, 3-5x cheaper than hyperscalers | vast.ai |
| RunPod | Serverless inference, scale-to-zero, H100 $2.69/hr flex | runpod.io |
| Modal | GPU cloud, BYO model, autoscaling | modal.com |
Open-Source Models (self-host)
Run locally or on your own infrastructure. Zero API cost.
| Model Family | Sizes | License | Best for |
|---|---|---|---|
| Llama 3.x (Meta) | 8B, 70B, 405B | Llama License | General purpose, coding |
| Qwen 2.5 (Alibaba) | 0.5B–72B | Apache 2.0 | Multilingual, coding |
| Mistral / Mixtral | 7B, 8x7B, 8x22B | Apache 2.0 | Fast, efficient |
| DeepSeek V3/R1 | 67B, 671B MoE | MIT | Reasoning, math, coding |
| Gemma 2 (Google) | 2B, 9B, 27B | Gemma License | Lightweight, on-device |
| Phi-3/4 (Microsoft) | 3.8B, 14B | MIT | Small, efficient |
| Command R+ (Cohere) | 104B | CC-BY-NC | RAG-optimized |
Local runners: Ollama, LM Studio, llama.cpp, vLLM, MLX (Apple Silicon)
Strategy: From $0 to Scale
Phase 1: Validation ($0)
- Google AI Studio (Gemini 2.5 Flash) for prototyping
- Ollama/MLX for local development
- OpenRouter free models for testing
Phase 2: MVP ($50-200/mo)
- Anthropic Claude for coding agent work
- OpenRouter for model routing + fallback
- Together AI for cheap open model inference
Phase 3: Growth (startup credits)
- Apply to 2-3 startup programs ($25K-100K total)
- Modal for GPU inference at scale
- Mix providers: expensive model for hard tasks, cheap for routine
Phase 4: Scale (optimize cost)
- Self-host open models for high-volume tasks
- Keep API providers for frontier capabilities
- Optimize token usage at every level
See Also
- token-efficient-web-requests — reduce API costs 80% with content negotiation
- infra-two-tools — infrastructure strategy (SST + Pulumi)
- privacy-as-architecture — when to self-host vs use API
- apple-on-device-ai — zero-cost on-device inference for iOS
Pricing Comparison Sites
| Site | What | Link |
|---|---|---|
| Artificial Analysis | Quality + speed + price benchmarks, industry standard | artificialanalysis.ai |
| Price Per Token | 300+ models, cheapest finder, coding/RAG leaderboards | pricepertoken.com |
| CostGoat | 309 APIs, quality/price/value scoring | costgoat.com |
| Infrabase | 53 providers, EU flagged, hosting type filters | infrabase.ai |
| Epoch AI | Historical price trend analysis (academic-grade) | epoch.ai |
Catalog growing. Last updated: 2026-04-10. Add providers as you discover them.