Skip to main content

Inference-as-a-Service — Timeline

25 milestones, source-traced.

  1. Aug 5, 2024

    Funding: Groq raised $640 million in Series D at a $2.8 billion valuation, led by…

    Groq raised $640 million in Series D at a $2.8 billion valuation, led by BlackRock Private Equity Partners with participation from Neuberger Berman, Type One Ventures, Cisco Investments, KDDI Open Innovation Fund III, and Samsung Catalyst Fund.

    funding-research
  2. Nov 1, 2024

    Partnership: Together AI announced a partnership with Hypertec to co-build a cluster of 36,000 Nvidia…

    Together AI announced a partnership with Hypertec to co-build a cluster of 36,000 Nvidia GB200 NVL72 GPUs.

    funding-research
  3. Dec 1, 2024

    Funding: Baseten closed a $150M Series D at a $2.15B valuation led by BOND, with…

    Baseten closed a $150M Series D at a $2.15B valuation led by BOND, with participation from Conviction, CapitalG, Premji Invest, 01A, IVP, Spark, Greylock, Scribble Ventures, BoxGroup, and Kevin & Elizabeth Weil.

    funding-research
  4. Feb 1, 2025

    Product Launch: Together AI launched Together GPU Clusters powered by Nvidia Blackwell GPUs.

    funding-research
  5. Feb 1, 2025

    Partnership: Groq secured a $1.5 billion commitment from Saudi Arabia to expand AI chip distribution…

    Groq secured a $1.5 billion commitment from Saudi Arabia to expand AI chip distribution in the country, with projected $500 million in 2025 revenue.

    funding-research
  6. Feb 20, 2025

    Funding: Together AI closed a $305 million Series B round led by General Catalyst with…

    Together AI closed a $305 million Series B round led by General Catalyst with co-lead Prosperity7 at a $3.3 billion valuation.

    funding-research
  7. Sep 1, 2025

    Product Launch: Together AI launched Together Instant Clusters, enabling automated GPU cluster provisioning from a single…

    Together AI launched Together Instant Clusters, enabling automated GPU cluster provisioning from a single node up to hundreds of GPUs.

    funding-research
  8. Sep 17, 2025

    Funding: Groq closed a $750 million Series E round at a $6.9 billion post-money valuation,…

    Groq closed a $750 million Series E round at a $6.9 billion post-money valuation, led by Disruptive with participation from BlackRock, Neuberger Berman, and DTCP.

    funding-research
  9. Sep 30, 2025

    Funding: Cerebras Systems raised $1.1 billion in Series G led by Fidelity Management & Research…

    Cerebras Systems raised $1.1 billion in Series G led by Fidelity Management & Research Company and Atreides Management at a post-money valuation of $8.1 billion.

    funding-research
  10. Oct 1, 2025

    Funding: Fireworks AI raised $250M Series C co-led by Lightspeed Venture Partners, Index Ventures, and…

    Fireworks AI raised $250M Series C co-led by Lightspeed Venture Partners, Index Ventures, and Evantic at a $4B post-money valuation.

    funding-research
  11. Oct 1, 2025

    Customer Win: Fireworks AI customer base grew to over 10,000 companies by October 2025, up from…

    Fireworks AI customer base grew to over 10,000 companies by October 2025, up from ~1,000 at Series B.

    funding-research
  12. Dec 1, 2025

    Groq builds custom LPU inference silicon; NVIDIA struck a ~$20B non-exclusive LPU license and…

    Groq builds custom LPU inference silicon; NVIDIA struck a ~$20B non-exclusive LPU license and hired Groq's founder (Dec 2025).

    Knowledge base
  13. Feb 1, 2026

    Cerebras, a wafer-scale inference chipmaker, completed its IPO in 2026 (~$66B day-one market cap).

    Knowledge base
  14. Feb 3, 2026

    Funding: Cerebras Systems raised $1 billion in Series H led by Tiger Global at a…

    Cerebras Systems raised $1 billion in Series H led by Tiger Global at a post-money valuation of approximately $23 billion.

    funding-research
  15. Mar 1, 2026

    Funding: Baseten raised a $300M Series E at a $5B valuation led by IVP and…

    Baseten raised a $300M Series E at a $5B valuation led by IVP and CapitalG, with NVIDIA contributing approximately $150M alongside 01A, Altimeter, Battery Ventures, BOND, BoxGroup, Blackbird Ventures, Conviction, and Greylock.

    funding-research
  16. Apr 15, 2026

    Funding: Cerebras Systems secured an $850 million revolving credit facility arranged by Morgan Stanley, Citi,…

    Cerebras Systems secured an $850 million revolving credit facility arranged by Morgan Stanley, Citi, Barclays, UBS and others.

    funding-research
  17. May 1, 2026

    Baseten, an inference-serving platform, reached a ~$13B valuation after a ~$1.5B raise.

    Knowledge base
  18. May 1, 2026

    Fireworks AI, a fast open-model inference platform, was reportedly raising at around a ~$15B…

    Fireworks AI, a fast open-model inference platform, was reportedly raising at around a ~$15B valuation.

    Knowledge base
  19. May 1, 2026

    Together AI reached roughly ~$1B in annual recurring revenue serving open models via API.

    Knowledge base
  20. May 1, 2026

    Nebius's Token Factory offers managed inference; Nebius's Q1 2026 revenue grew ~684% year-over-year.

    Knowledge base
  21. Jun 1, 2026

    The inference-service cohort — Baseten, Fireworks, Together, Nebius — is among the best-funded categories…

    The inference-service cohort — Baseten, Fireworks, Together, Nebius — is among the best-funded categories in AI infrastructure.

    Knowledge base
  22. Jun 1, 2026

    Open-weight models (GLM, Qwen, DeepSeek, Llama) at roughly 1/6 the cost of frontier models…

    Open-weight models (GLM, Qwen, DeepSeek, Llama) at roughly 1/6 the cost of frontier models drive inference-service economics.

    Knowledge base
  23. Jun 1, 2026

    Inference now accounts for ~2/3 of AI accelerator demand in 2026, up from ~1/2…

    Inference now accounts for ~2/3 of AI accelerator demand in 2026, up from ~1/2 in 2025 and ~1/3 in 2023 (Deloitte).

    Knowledge base
  24. Jun 1, 2026

    Inference-as-a-service decouples model serving from raw GPU rental — buyers pay per token, not…

    Inference-as-a-service decouples model serving from raw GPU rental — buyers pay per token, not per GPU-hour.

    Knowledge base
  25. Jun 1, 2026

    Customer Win: Cerebras announced a $10 billion compute deal with OpenAI to deliver 750 megawatts of…

    Cerebras announced a $10 billion compute deal with OpenAI to deliver 750 megawatts of AI compute capacity by 2028.

    funding-research

Milestones merged from 0 curated events, 10 verified facts (with observed dates), and 15 business signals from the last 24 months. Deduped by date + label; curated entries take precedence.

← Back to Inference-as-a-Service