Inference-as-a-Service — Timeline
25 milestones, source-traced.
Aug 5, 2024
Funding: Groq raised $640 million in Series D at a $2.8 billion valuation, led by…
Groq raised $640 million in Series D at a $2.8 billion valuation, led by BlackRock Private Equity Partners with participation from Neuberger Berman, Type One Ventures, Cisco Investments, KDDI Open Innovation Fund III, and Samsung Catalyst Fund.
funding-research →Nov 1, 2024
Partnership: Together AI announced a partnership with Hypertec to co-build a cluster of 36,000 Nvidia…
Together AI announced a partnership with Hypertec to co-build a cluster of 36,000 Nvidia GB200 NVL72 GPUs.
funding-research →Dec 1, 2024
Funding: Baseten closed a $150M Series D at a $2.15B valuation led by BOND, with…
Baseten closed a $150M Series D at a $2.15B valuation led by BOND, with participation from Conviction, CapitalG, Premji Invest, 01A, IVP, Spark, Greylock, Scribble Ventures, BoxGroup, and Kevin & Elizabeth Weil.
funding-research →Feb 1, 2025
Product Launch: Together AI launched Together GPU Clusters powered by Nvidia Blackwell GPUs.
funding-research →Feb 1, 2025
Partnership: Groq secured a $1.5 billion commitment from Saudi Arabia to expand AI chip distribution…
Groq secured a $1.5 billion commitment from Saudi Arabia to expand AI chip distribution in the country, with projected $500 million in 2025 revenue.
funding-research →Feb 20, 2025
Funding: Together AI closed a $305 million Series B round led by General Catalyst with…
Together AI closed a $305 million Series B round led by General Catalyst with co-lead Prosperity7 at a $3.3 billion valuation.
funding-research →Sep 1, 2025
Product Launch: Together AI launched Together Instant Clusters, enabling automated GPU cluster provisioning from a single…
Together AI launched Together Instant Clusters, enabling automated GPU cluster provisioning from a single node up to hundreds of GPUs.
funding-research →Sep 17, 2025
Funding: Groq closed a $750 million Series E round at a $6.9 billion post-money valuation,…
Groq closed a $750 million Series E round at a $6.9 billion post-money valuation, led by Disruptive with participation from BlackRock, Neuberger Berman, and DTCP.
funding-research →Sep 30, 2025
Funding: Cerebras Systems raised $1.1 billion in Series G led by Fidelity Management & Research…
Cerebras Systems raised $1.1 billion in Series G led by Fidelity Management & Research Company and Atreides Management at a post-money valuation of $8.1 billion.
funding-research →Oct 1, 2025
Funding: Fireworks AI raised $250M Series C co-led by Lightspeed Venture Partners, Index Ventures, and…
Fireworks AI raised $250M Series C co-led by Lightspeed Venture Partners, Index Ventures, and Evantic at a $4B post-money valuation.
funding-research →Oct 1, 2025
Customer Win: Fireworks AI customer base grew to over 10,000 companies by October 2025, up from…
Fireworks AI customer base grew to over 10,000 companies by October 2025, up from ~1,000 at Series B.
funding-research →Dec 1, 2025
Groq builds custom LPU inference silicon; NVIDIA struck a ~$20B non-exclusive LPU license and…
Groq builds custom LPU inference silicon; NVIDIA struck a ~$20B non-exclusive LPU license and hired Groq's founder (Dec 2025).
Knowledge base →Feb 1, 2026
Cerebras, a wafer-scale inference chipmaker, completed its IPO in 2026 (~$66B day-one market cap).
Knowledge base →Feb 3, 2026
Funding: Cerebras Systems raised $1 billion in Series H led by Tiger Global at a…
Cerebras Systems raised $1 billion in Series H led by Tiger Global at a post-money valuation of approximately $23 billion.
funding-research →Mar 1, 2026
Funding: Baseten raised a $300M Series E at a $5B valuation led by IVP and…
Baseten raised a $300M Series E at a $5B valuation led by IVP and CapitalG, with NVIDIA contributing approximately $150M alongside 01A, Altimeter, Battery Ventures, BOND, BoxGroup, Blackbird Ventures, Conviction, and Greylock.
funding-research →Apr 15, 2026
Funding: Cerebras Systems secured an $850 million revolving credit facility arranged by Morgan Stanley, Citi,…
Cerebras Systems secured an $850 million revolving credit facility arranged by Morgan Stanley, Citi, Barclays, UBS and others.
funding-research →May 1, 2026
Baseten, an inference-serving platform, reached a ~$13B valuation after a ~$1.5B raise.
Knowledge base →May 1, 2026
Fireworks AI, a fast open-model inference platform, was reportedly raising at around a ~$15B…
Fireworks AI, a fast open-model inference platform, was reportedly raising at around a ~$15B valuation.
Knowledge base →May 1, 2026
Together AI reached roughly ~$1B in annual recurring revenue serving open models via API.
Knowledge base →May 1, 2026
Nebius's Token Factory offers managed inference; Nebius's Q1 2026 revenue grew ~684% year-over-year.
Knowledge base →Jun 1, 2026
The inference-service cohort — Baseten, Fireworks, Together, Nebius — is among the best-funded categories…
The inference-service cohort — Baseten, Fireworks, Together, Nebius — is among the best-funded categories in AI infrastructure.
Knowledge base →Jun 1, 2026
Open-weight models (GLM, Qwen, DeepSeek, Llama) at roughly 1/6 the cost of frontier models…
Open-weight models (GLM, Qwen, DeepSeek, Llama) at roughly 1/6 the cost of frontier models drive inference-service economics.
Knowledge base →Jun 1, 2026
Inference now accounts for ~2/3 of AI accelerator demand in 2026, up from ~1/2…
Inference now accounts for ~2/3 of AI accelerator demand in 2026, up from ~1/2 in 2025 and ~1/3 in 2023 (Deloitte).
Knowledge base →Jun 1, 2026
Inference-as-a-service decouples model serving from raw GPU rental — buyers pay per token, not…
Inference-as-a-service decouples model serving from raw GPU rental — buyers pay per token, not per GPU-hour.
Knowledge base →Jun 1, 2026
Customer Win: Cerebras announced a $10 billion compute deal with OpenAI to deliver 750 megawatts of…
Cerebras announced a $10 billion compute deal with OpenAI to deliver 750 megawatts of AI compute capacity by 2028.
funding-research →
Milestones merged from 0 curated events, 10 verified facts (with observed dates), and 15 business signals from the last 24 months. Deduped by date + label; curated entries take precedence.
← Back to Inference-as-a-Service