← Wiki

LLM Providers & Startup Credits

Where to get LLMs: free tiers, startup credit programs, open-source models. Organized from cheapest to most powerful.


Startup Credit Programs

Apply for credits when you have a registered company or VC backing.

Provider Credits Requirements What you get Link
Together AI $15K–$50K Build (<$5M raised): $15K + 3h eng. Scale ($5-10M): $30K + 6h. Grow (>$10M): $50K + 10h Open model inference + fine-tuning + GTM + VC network together.ai/startup-accelerator
Modal $25K Seed–Series A, VC-backed or raised >$1M GPU, inference, training, batch, sandboxes modal.com/startups
Cerebras $22.5K Startup deal Inference value + priority support + co-marketing cerebras.ai/startup-deal
Fireworks Varies AI-native startups Platform, tools, expertise fireworks.ai/startup-program
Groq $10K Groq for Startups eligibility Fast inference credits (LPU) console.groq.com/docs/billing-faqs
Google Cloud $2K–$350K Google for Startups Vertex AI, Gemini API, GCP credits cloud.google.com/startup
AWS $1K–$100K AWS Activate Bedrock (Claude, Llama, etc), SageMaker aws.amazon.com/activate
Azure $1K–$150K Microsoft for Startups Azure OpenAI, Cognitive Services startups.microsoft.com
OpenAI $2.5K+ (VC referral) VC partner referral required, varies by stage. Grove program for pre-idea founders (closed cohorts) GPT-4/5 API credits openai.com/startups
Anthropic Varies Claude for Startups — credits, resources, community Claude API credits claude.com/programs/startups
Cloudflare $5K–$250K Bootstrapped: $5K. Early: $25K. Seed ($1-5M): $100K. High growth/Tier 1 VC: $250K. Workers AI capped at $50K CDN, Workers AI (free models), R2 storage, Pages, zero egress cloudflare.com/forstartups
Replicate $1K–$10K Startup program GPU inference for open models replicate.com

Community Signal (Reddit, Apr 2026)

No-VC / Bootstrapped Friendly

Provider Credits Requirements Link
DeepInfra DeepStart 1B free tokens Startup application, no VC required deepinfra.com/deepstart
NVIDIA Inception Free membership → unlocks $100K AWS + $150K Nebius 1+ dev, website, incorporated, <10 years. No equity, no VC nvidia.com/startups
Mistral (Mistralship) Up to $30K Startup program application mistral.ai
Microsoft Founders Hub $1K–$150K No VC required, business verification startups.microsoft.com
DigitalOcean Hatch Up to $100K Raised <$10M, website, AI-native preferred digitalocean.com/startups
OVHcloud Up to €100K Pre-seed/seed (START) or Series A+ (SCALE) startup.ovhcloud.com
Scaleway Up to €36K EU-based or EU-focused scaleway.com/startup-program
Intel Liftoff Cloud credits (varies) Early-stage to Series B, must have product intel.com/liftoff
xAI $25 signup + $150/mo Opt-in data sharing (irreversible) x.ai/api

Credit Stacking Strategy

Apply in this order for maximum runway (no VC required):

  1. NVIDIA Inception (free) → unlocks $100K AWS + $150K Nebius partner credits
  2. Microsoft Founders Hub → up to $150K Azure (incl Azure OpenAI)
  3. DeepInfra DeepStart → 1B tokens free
  4. Mistral → $30K
  5. Modal → $25K (if seed-funded)
  6. Stack the rest: Cerebras $22.5K + Groq $10K + CF $5-250K

Total potential: $500K+ without VC funding.

Credits and requirements change — check provider pages for current terms.


Free Tiers (no application needed)

Start building today for $0.

Provider Free Tier Models Limits Best for
Google AI Studio Free Gemini 2.5 Pro/Flash 15 RPM, rate limited Prototyping, testing
Groq Free Llama 3, Mixtral, Gemma Rate limited Fast inference (LPU, 840 tok/s)
Cerebras 1M tok/day free Llama, Qwen, GPT-OSS 1M tokens/day, no card Fastest inference (2,200 tok/s)
SambaNova $5 credits (30 days) Llama, Qwen, DeepSeek Limited Fast RDU inference
OpenRouter 29 free models GPT-OSS 120B, Nemotron 120B, Llama 70B, DeepSeek R1, Gemma 4 20 RPM, 200 req/day per model Routing + fallback, no card
Cloudflare Workers AI 10K neurons/day free Llama, Mistral, Gemma, Phi, etc 10K neurons/day, zero egress Edge inference, startup credits up to $50K
NVIDIA NIM Free preview Llama, Mistral, CodeLlama, DeepSeek, etc Rate limited, API key required Code gen, inference preview
Hugging Face Free inference All open models Rate limited, queue Testing open models
Ollama Free (local) Any GGUF model Your hardware Privacy, offline
LM Studio Free (local) Any GGUF model Your hardware GUI for local models
MLX Free (local) Apple Silicon models M1+ Mac On-device, fast

Inference Pricing ($/1M tokens, Apr 2026)

Prices dropped ~80% in 12 months. Format: input / output.

Frontier (closed)

Provider Model Input Output
OpenAI GPT-5 ~$5 ~$15
Anthropic Claude Opus 4 ~$15 ~$75
Anthropic Claude Sonnet 4 ~$3 ~$15
Google Gemini 2.5 Pro ~$1.25 ~$10

Open Models — 70B class

Provider Llama 3.3 70B Qwen 2.5 72B DeepSeek V3 Speed
Novita AI $0.14 / $0.40 $0.38 / $0.40 $0.27 / $0.40 GPU
DeepInfra $0.23 / $0.40 $0.12 / $0.39 $0.27 / $1.10 GPU
Hyperbolic $0.40 $0.40 $0.25 GPU
Groq $0.59 / $0.79 394 tok/s (LPU)
Together AI $0.88 $0.60 / $1.70 GPU
OpenRouter FREE $0.23 FREE (R1) varies

Open Models — 120B+ class

Provider GPT-OSS 120B Nemotron 120B Qwen3 235B
Groq $0.15 / $0.60
Together AI $0.15 / $0.60
Fireworks $0.15 / $0.60
Cerebras $0.35 / $0.75 $0.60 / $1.20
OpenRouter FREE FREE

Nemotron 3 Super 120B — Full Provider Comparison

Best open model for price/quality ratio (MoE: 12B active / 120B total, 1M context).

Provider Input $/1M Output $/1M 100M tokens (50/50) tok/s Source
DeepInfra $0.10 $0.50 $30 459 deepinfra.com
Hyperbolic $0.30 $0.30 $30 ~400 hyperbolic.xyz
W&B ~$0.15 ~$0.55 $35 ~440 artificialanalysis.ai
Baseten ~$0.18 ~$0.65 $41 485 baseten.co
Nebius ~$0.20 ~$0.70 $45 464 nebius.ai
Digital Ocean $0.30 $0.65 $48 NVIDIA NIM partners
Bitdeer AI $0.20 $0.80 $50 NVIDIA NIM partners
CoreWeave $0.20 $0.80 $50 NVIDIA NIM partners
Cloudflare $0.50 $1.50 $100 ~80 CF Workers AI
OpenRouter FREE FREE $0 varies rate limited (20 RPM)
NVIDIA NIM FREE FREE $0 449 1000 credits, 40 RPM

Winner: DeepInfra — $30 per 100M tokens, 459 tok/s. If speed critical: Baseten ($41, 485 tok/s). Cloudflare 3.3x more expensive and 6x slower — use only for edge/prototype.

Top Open-Source Models — Price/Quality Leaders (Apr 2026)

Best bang for buck across all open models and providers.

Model Params (active) Best Provider In $/1M Out $/1M 100M (50/50) tok/s
Qwen3 235B (A22B) 235B (22B) DeepInfra $0.12 $0.39 $26 ~300
Qwen3 32B 32B Novita AI $0.10 $0.45 $28 ~500
Nemotron 120B (A12B) 120B (12B) DeepInfra $0.10 $0.50 $30 459
DeepSeek V3.2 671B MoE DeepSeek API $0.28 $0.42 $35 ~200
GPT-OSS 120B 120B Groq $0.15 $0.60 $38 ~400
Qwen3 Coder 480B (A35B) 480B (35B) DeepInfra Turbo $0.22 $0.90 $56 173
Llama 4 Maverick 400B MoE Groq $0.50 $0.77 $64 ~300
DeepSeek R1 (reasoning) 671B MoE Novita AI $0.70 $2.50 $160

Best picks (at 80/20 input/output split — typical for agents):

Use Case Model Provider 100M cost (80/20) Speed
Best overall Qwen3 235B DeepInfra $17 ~300 tok/s
Fastest powerful Nemotron 120B DeepInfra $18 459 tok/s
Cheapest with cache DeepSeek V3.2 DeepSeek API $11 ~200 tok/s
Fastest overall GPT-OSS 120B Groq $24 500 tok/s
Best for coding Qwen3 Coder 480B DeepInfra $36 173 tok/s
Best reasoning DeepSeek R1 Novita AI $106
Budget $100 Qwen3 235B DeepInfra ~600M tokens

Model ID for API: Qwen/Qwen3-235B-A22B (OpenAI-compatible endpoint at api.deepinfra.com/v1/openai)

Nebius Token Factory — Batch = 50% Off

Nebius offers base (cheap) and fast (2x price) flavors. Batch inference = automatic 50% discount on base prices.

Model Nebius base in/out Nebius batch in/out 100M batch (80/20)
Qwen3 235B $0.20 / $0.60 $0.10 / $0.30 $14
Qwen3 32B $0.10 / $0.30 $0.05 / $0.15 $7
Qwen3 30B-A3B $0.10 / $0.30 $0.05 / $0.15 $7
Qwen3 Coder 480B $0.40 / $1.80 $0.20 / $0.90 $34
GPT-OSS 120B $0.15 / $0.60 $0.08 / $0.30 $12

Nebius batch Qwen3 235B = $14/100M — cheaper than DeepInfra realtime ($17). Best for non-realtime workloads (research, bulk analysis, data processing).

Also notable: Qwen3 30B-A3B (only 3B active params, 30B total MoE) at $0.10/$0.30 — frontier quality at micro-model cost.

DeepSeek Cache Hack

DeepSeek auto-caches prompt prefixes. Cache hit = $0.03/M instead of $0.28/M (10x savings). For repeated system prompts this means ~$5 effective per 100M tokens instead of $35. Best for agents with stable system prompts.

For comparison: frontier models at 100M tokens (80/20 split)

Model 100M cost vs Qwen3 235B
Qwen3 235B (open, DeepInfra) $17 baseline
Gemini 2.5 Flash (Google) $132 8x more
Claude Sonnet 4 (Anthropic) $540 32x more
GPT-5 (OpenAI) $700 41x more
Claude Opus 4 (Anthropic) $2,700 159x more

Speed Leaderboard (custom hardware)

Provider Hardware 8B tok/s 70B tok/s 120B tok/s Note
Cerebras WSE-3 (wafer, 4T transistors) ~2,200 ~1,500 ~3,000 6x faster than Groq. OpenAI partner
Groq LPU (SRAM ASIC) 840 394 500 Sub-100ms TTFT. Wide model selection
SambaNova SN50 (RDU, Feb 2026) ~988 ~536 5x claim, 10M context support
GPU providers H100/B200 ~300 ~150 ~80 DeepInfra, Together, Fireworks

Groq Pricing (from LPU, exact Apr 2026)

Model TPS Input $/1M Output $/1M 100M (50/50)
GPT-OSS 20B 1,000 $0.075 $0.30 $19
Llama 4 Scout 594 $0.11 $0.34 $23
GPT-OSS 120B 500 $0.15 $0.60 $38
Qwen3 32B 662 $0.29 $0.59 $44
Llama 3.3 70B 394 $0.59 $0.79 $69
Llama 3.1 8B 840 $0.05 $0.08 $7

Best Value Picks

Use Case Provider Why
Cheapest 70B Novita AI or OpenRouter (free) $0.14/M or $0
Cheapest 120B+ OpenRouter free (GPT-OSS, Nemotron) $0 with rate limits
Fastest Cerebras 2-6x faster than competition
Fast + cheap Groq Good speed/price, $0.05-0.79
Most models Together AI or OpenRouter 200+ models
DeepSeek R1 cheap Novita AI $0.70 / $2.50
Free no card Cerebras free tier 1M tokens/day
29 free models OpenRouter Including GPT-OSS, Nemotron, Llama, DeepSeek R1

Key Providers

Provider Differentiator Link
OpenAI Best ecosystem, GPT-5 openai.com
Anthropic Best coding, Claude anthropic.com
Google 1M+ context, multimodal ai.google.dev
OpenRouter Aggregator, 29 free models, fallback routing openrouter.ai
Groq LPU custom ASIC, very fast groq.com
Cerebras Wafer-scale, fastest inference cerebras.ai
Together AI Open models, fine-tuning, ATLAS engine together.ai
Fireworks Low latency, batch 50% discount fireworks.ai
SambaNova Custom RDU chip, fast sambanova.ai
Novita AI Cheapest prices across the board novita.ai
Hyperbolic Good prices, open models hyperbolic.xyz
DeepInfra H100/B200, competitive pricing deepinfra.com
Nebius Wide Qwen3 selection, batch 50% off, base+fast flavors nebius.com/token-factory
Mistral AI Own models (Large/Medium/Small), EU data residency, free experiment tier mistral.ai
xAI (Grok) Grok 4, 2M context window (largest), X/Twitter integration x.ai/api
Cohere RAG-optimized, Rerank model, multilingual Aya, 1K free calls/mo cohere.com
SiliconFlow Chinese provider, very cheap, free Qwen3-8B/DeepSeek-7B siliconflow.com
Featherless AI Flat-rate: $10/mo (15B), $200/mo (all 25K+ HuggingFace models, unlimited) featherless.ai
GMI Cloud H200 at $2.60/hr, some free models, 37% cheaper than hyperscalers gmicloud.ai
Vast.ai GPU marketplace, H100 from $1.87/hr, 3-5x cheaper than hyperscalers vast.ai
RunPod Serverless inference, scale-to-zero, H100 $2.69/hr flex runpod.io
Modal GPU cloud, BYO model, autoscaling modal.com

Open-Source Models (self-host)

Run locally or on your own infrastructure. Zero API cost.

Model Family Sizes License Best for
Llama 3.x (Meta) 8B, 70B, 405B Llama License General purpose, coding
Qwen 2.5 (Alibaba) 0.5B–72B Apache 2.0 Multilingual, coding
Mistral / Mixtral 7B, 8x7B, 8x22B Apache 2.0 Fast, efficient
DeepSeek V3/R1 67B, 671B MoE MIT Reasoning, math, coding
Gemma 2 (Google) 2B, 9B, 27B Gemma License Lightweight, on-device
Phi-3/4 (Microsoft) 3.8B, 14B MIT Small, efficient
Command R+ (Cohere) 104B CC-BY-NC RAG-optimized

Local runners: Ollama, LM Studio, llama.cpp, vLLM, MLX (Apple Silicon)


Strategy: From $0 to Scale

Phase 1: Validation ($0)

Phase 2: MVP ($50-200/mo)

Phase 3: Growth (startup credits)

Phase 4: Scale (optimize cost)


See Also


Pricing Comparison Sites

Site What Link
Artificial Analysis Quality + speed + price benchmarks, industry standard artificialanalysis.ai
Price Per Token 300+ models, cheapest finder, coding/RAG leaderboards pricepertoken.com
CostGoat 309 APIs, quality/price/value scoring costgoat.com
Infrabase 53 providers, EU flagged, hosting type filters infrabase.ai
Epoch AI Historical price trend analysis (academic-grade) epoch.ai

Catalog growing. Last updated: 2026-04-10. Add providers as you discover them.