Inference-as-a-Service — Market
Updated 6/25/2026
Verified claims and product-axis read for Inference-as-a-Service. Every fact below is sourced; every product judgment traces back to underlying signals.
Verified facts
- Baseten, an inference-serving platform, reached a ~$13B valuation after a ~$1.5B raise. ↗ (financial)
- Fireworks AI, a fast open-model inference platform, was reportedly raising at around a ~$15B valuation. ↗ (financial)
- Together AI reached roughly ~$1B in annual recurring revenue serving open models via API. ↗ (financial)
- Nebius's Token Factory offers managed inference; Nebius's Q1 2026 revenue grew ~684% year-over-year. ↗ (financial)
- Cerebras, a wafer-scale inference chipmaker, completed its IPO in 2026 (~$66B day-one market cap). ↗ (financial)
- Groq builds custom LPU inference silicon; NVIDIA struck a ~$20B non-exclusive LPU license and hired Groq's founder (Dec 2025). ↗ _(historical_event)_
- Inference now accounts for ~2/3 of AI accelerator demand in 2026, up from ~1/2 in 2025 and ~1/3 in 2023 (Deloitte). ↗ (other)
- Inference-as-a-service decouples model serving from raw GPU rental — buyers pay per token, not per GPU-hour. ↗ (other)
- Open-weight models (GLM, Qwen, DeepSeek, Llama) at roughly 1/6 the cost of frontier models drive inference-service economics. ↗ (other)
- The inference-service cohort — Baseten, Fireworks, Together, Nebius — is among the best-funded categories in AI infrastructure. ↗ (other)
See the Products and Strategy modules for the full product list and forward-looking judgment.
→ Get this data as JSONLast updated: Jun 25, 2026