Topic

Inference-as-a-Service

Market Companies Products Hiring Strategy

Inference-as-a-Service — Market

Updated 6/25/2026

Verified claims and product-axis read for Inference-as-a-Service. Every fact below is sourced; every product judgment traces back to underlying signals.

Verified facts

Baseten, an inference-serving platform, reached a ~$13B valuation after a ~$1.5B raise. ↗ (financial)
Fireworks AI, a fast open-model inference platform, was reportedly raising at around a ~$15B valuation. ↗ (financial)
Together AI reached roughly ~$1B in annual recurring revenue serving open models via API. ↗ (financial)
Nebius's Token Factory offers managed inference; Nebius's Q1 2026 revenue grew ~684% year-over-year. ↗ (financial)
Cerebras, a wafer-scale inference chipmaker, completed its IPO in 2026 (~$66B day-one market cap). ↗ (financial)
Groq builds custom LPU inference silicon; NVIDIA struck a ~$20B non-exclusive LPU license and hired Groq's founder (Dec 2025). ↗ _(historical_event)_
Inference now accounts for ~2/3 of AI accelerator demand in 2026, up from ~1/2 in 2025 and ~1/3 in 2023 (Deloitte). ↗ (other)
Inference-as-a-service decouples model serving from raw GPU rental — buyers pay per token, not per GPU-hour. ↗ (other)
Open-weight models (GLM, Qwen, DeepSeek, Llama) at roughly 1/6 the cost of frontier models drive inference-service economics. ↗ (other)
The inference-service cohort — Baseten, Fireworks, Together, Nebius — is among the best-funded categories in AI infrastructure. ↗ (other)

See the Products and Strategy modules for the full product list and forward-looking judgment.

→ Get this data as JSONLast updated: Jun 25, 2026