Datacenter/GPU-cluster infra engineer benchmarking Colossus-class 100k+ GPU buildouts at the power/optical/networking layer.
Audience Profile
- Age / Experience: 8–18 years in infra/platform engineering
- Current role: Datacenter Infra / GPU-Cluster / Platform Engineer (hyperscaler / AI lab / neocloud)
- Top pain points:
- Hundreds-of-MW interconnect queue + on-site generation timelines
- Direct-to-chip cooling past ~50 kW/rack at 100k-GPU scale
- Optical fabric + collective-comm bandwidth contention at cluster scale
- Top decision blockers:
- Vendor lock on switch silicon / optical modules reducing leverage
- Public sourcing on Colossus internals flagged low-confidence, not confirmed
What This Segment Needs
- Information: Confirmed (S-1/10-Q/10-K) site power, topology, and kW/rack figures — not press numbers — plus GPU-supplier confirmation language vs AI-prior claims.
- Tools: Comparators for datacenter-segment revenue, R&D %, customer concentration, and contracted-compute vs run-rate, mapped onto switch/optical vendor-lock exposure.
- Services: Comp benchmarks (no comp disclosed here) and interview scripts to probe on-call for 100k-GPU fabric incidents and on-site-generation permitting status.
Top 5 Companies for You (Fit Score)
| Rank | Company | Score | Why | |------|---------|-------|-----| | 1 | NVIDIA | 86/100 | DC revenue ~$115.2B FY2025 (+142% YoY, ~88% of $130.5B); GB200 NVL72 ramp Q4 FY2025 → Rubin (GTC Mar 2025); Spectrum-X Photonics co-packaged optics. But ~40–50% DC revenue from a few buyers; Colossus internals unconfirmed. | | 2 | Anthropic | 84/100 | Run-rate ~5x $1B→>$5B in 2025; $13B Series F at $183B post (2025-09-02); ~$28B compute backing (MSFT $5B, NVIDIA $10B); tri-cloud infra. But $30B+ compute liabilities dwarf run-rate; not profitable. | | 3 | Astera Labs | 83/100 | 10-Q (Q ended 2026-03-31): revenue $308.4M (+93.4% YoY), net income $80.3M; R&D $125.6M (40.7% of revenue); UALink/UCIe/PCIe Gen6/CXL/SerDes ladders open. But top-5 customers ~90%, China 29% export risk. | | 4 | SpaceX | 81/100 | Sequenced IPO: confidential S-1 2026-04-01, public S-1 ~5/20, SPCX NASDAQ 6/12 at $1.75–2T. Colossus 100K+ Hopper-class Memphis, ~150MW+ site power w/ on-site gas turbines; Anthropic first external customer 2026-05-06. 1M-GPU ambition unconfirmed. | | 5 | Coherent | 76/100 | 10-Q: Q3 revenue $1.81B (+20.6%), 9M $5.07B (+18.5%), net earnings ~3.9x; R&D $506.6M 9M (~10% rev) for hyperscaler AI optical transceivers; integrated InP fab→datacom. But operating cash only $10.1M, 2023/2025 restructuring. |
Deal-Breakers (Your Hard Preferences)
No hard preferences declared for this segment.
How to Evaluate Any Company in this Niche (Checklist)
- [ ] Check growth signals: latest 10-Q datacenter-segment revenue YoY (>40% = real AI pull) and explicit GPU-supplier/cluster confirmation language
- [ ] Check comp data: levels.fyi / Blind for Staff Infra/Platform TC bands (no comp disclosed here for any of the 5)
- [ ] Check learning signals: R&D as % of revenue (>15%) and named standards work (UALink, UCIe, PCIe Gen6, CXL, co-packaged optics)
- [ ] Check stability signals: customer concentration (top-5 >80% = fragile) and contracted-compute liabilities vs current run-rate
- [ ] Check culture signals: ratio of engineering reqs to sales reqs; in interview ask about on-call for 100k-GPU fabric/collective-comm incidents
- [ ] Check power reality: site MW confirmed in S-1/10-K vs "reported," and on-site generation (gas-turbine) permitting status
- [ ] Check vendor leverage: switch-silicon / optical-module lock-in and collective-comm bandwidth headroom at full cluster scale
Reverse-Hype Watch
- Colossus capacity claims (100k→200k, "~1M-GPU" Memphis) are AI-prior/aspirational, explicitly NOT confirmed.
- NVIDIA customer concentration: ~40–50% DC revenue from a handful of buyers; a 2–3 customer capex pause hits revenue sharply.
- Anthropic contracted compute liabilities ($30B Azure + Google TPU + up to 1GW NVIDIA) dwarf the >$5B run-rate.
- Coherent capex +77% / inventory +48% built ahead of AI-optics orders, but operating cash collapsed to $10.1M with no customer-win signals supplied.
Under-reported for this segment: the exact things you benchmark — confirmed site-level power topology, on-site generation permitting timelines, real kW/rack cooling limits past 50 kW, and optical-fabric/collective-comm contention — almost never appear in S-1s or 10-Qs. Headlines optimize for GPU count; switch-silicon and optical-module vendor-lock leverage, which determines your real negotiating position, is effectively invisible in public sourcing.