HBM Memory layer
HBM3 vs HBM3e vs HBM4
Bandwidth, capacity, and availability cross-cut for HBM generations across SK hynix / Micron / Samsung.
The engineer question
Which HBM generation matches my training workload economics?
Result
- Recommended generationshipping today
- HBM3e
- HBM3 — bandwidth / stack6.4 Gt/s/pin
- 819 GB/s
- HBM3 — max capacity / stack8-Hi / 12-Hi
- 24 GB
- HBM3 — availabilityGA since 2022 (H100, MI300)
- Shipping
- HBM3 — stacks for footprint~1,260 GB working set
- 53 stacks
- HBM3e — bandwidth / stack9.6 Gt/s/pin
- 1.20 TB/s
- HBM3e — max capacity / stack8-Hi / 12-Hi
- 36 GB
- HBM3e — availabilityGA since 2024 (H200, B200, MI325X)
- Shipping
- HBM3e — stacks for footprint~1,260 GB working set
- 35 stacks
- HBM4 (target) — bandwidth / stack8 Gt/s/pin
- 1.80 TB/s
- HBM4 (target) — max capacity / stack12-Hi / 16-Hi
- 64 GB
- HBM4 (target) — availabilitySampling 2025–26, volume 2026+ (forward-looking)
- Target / forward-looking
- HBM4 (target) — stacks for footprint~1,260 GB working set
- 20 stacks
Recommendation
Working set ≈ 1,260 GB before sharding across GPUs. HBM3e is the mid-2026 sweet spot: ~1.2 TB/s/stack and up to 36 GB/stack are shipping in volume (H200 / B200 / MI325X), so you avoid the supply + maturity risk of HBM4. Pay the HBM4 premium only if you specifically need >1.6 TB/s or 48–64 GB/stack and can wait for volume.
Assumptions
- · Source: JEDEC standards (HBM3 JESD238, HBM4 JESD270-4) + public SK hynix / Samsung / Micron datasheets & roadmap slides, mid-2026. All figures are approximate / typical — not audited.
- · HBM3 ≈ 6.4 Gt/s/pin × 1024-bit = ~819 GB/s; HBM3e ≈ 9.2–9.6 Gt/s/pin → ~1.18–1.23 TB/s. Vendor speed grades vary ±5–10%.
- · HBM4 is forward-looking: JESD270-4 doubles the interface to 2048-bit; per-stack bandwidth target ~1.6–2.0 TB/s and capacity ~48–64 GB (12-Hi/16-Hi) are vendor ROADMAP TARGETS, not shipping silicon. Treated with an 0.8 risk discount in the recommendation.
- · Footprint heuristic: ~18 GB/B-param (training, mixed precision incl. optimizer+grads+activations) and ~2.5 GB/B-param (inference, fp8 weights + KV cache). Real usage swings ±50% with parallelism strategy, precision, batch and context length.
- · Cost index is RELATIVE (HBM3e = 1.0), reflecting mid-2026 contract/street $/GB direction, not a quoted price. HBM4 carries a launch premium that erodes over time.
- · Excluded: GPU/accelerator package limits (how many stacks a part actually carries, e.g. 6–8 on current top parts), interposer/CoWoS capacity, NVLink/Infinity-Fabric topology, power & cooling, total $/token, and supply allocation. This sizes a per-stack spec fit only, not a full BOM.
Worked example (default inputs)
Result
- Recommended generationshipping today
- HBM3e
- HBM3 — bandwidth / stack6.4 Gt/s/pin
- 819 GB/s
- HBM3 — max capacity / stack8-Hi / 12-Hi
- 24 GB
- HBM3 — availabilityGA since 2022 (H100, MI300)
- Shipping
- HBM3 — stacks for footprint~1,260 GB working set
- 53 stacks
- HBM3e — bandwidth / stack9.6 Gt/s/pin
- 1.20 TB/s
- HBM3e — max capacity / stack8-Hi / 12-Hi
- 36 GB
- HBM3e — availabilityGA since 2024 (H200, B200, MI325X)
- Shipping
- HBM3e — stacks for footprint~1,260 GB working set
- 35 stacks
- HBM4 (target) — bandwidth / stack8 Gt/s/pin
- 1.80 TB/s
- HBM4 (target) — max capacity / stack12-Hi / 16-Hi
- 64 GB
- HBM4 (target) — availabilitySampling 2025–26, volume 2026+ (forward-looking)
- Target / forward-looking
- HBM4 (target) — stacks for footprint~1,260 GB working set
- 20 stacks
Recommendation
Working set ≈ 1,260 GB before sharding across GPUs. HBM3e is the mid-2026 sweet spot: ~1.2 TB/s/stack and up to 36 GB/stack are shipping in volume (H200 / B200 / MI325X), so you avoid the supply + maturity risk of HBM4. Pay the HBM4 premium only if you specifically need >1.6 TB/s or 48–64 GB/stack and can wait for volume.
Assumptions
- · Source: JEDEC standards (HBM3 JESD238, HBM4 JESD270-4) + public SK hynix / Samsung / Micron datasheets & roadmap slides, mid-2026. All figures are approximate / typical — not audited.
- · HBM3 ≈ 6.4 Gt/s/pin × 1024-bit = ~819 GB/s; HBM3e ≈ 9.2–9.6 Gt/s/pin → ~1.18–1.23 TB/s. Vendor speed grades vary ±5–10%.
- · HBM4 is forward-looking: JESD270-4 doubles the interface to 2048-bit; per-stack bandwidth target ~1.6–2.0 TB/s and capacity ~48–64 GB (12-Hi/16-Hi) are vendor ROADMAP TARGETS, not shipping silicon. Treated with an 0.8 risk discount in the recommendation.
- · Footprint heuristic: ~18 GB/B-param (training, mixed precision incl. optimizer+grads+activations) and ~2.5 GB/B-param (inference, fp8 weights + KV cache). Real usage swings ±50% with parallelism strategy, precision, batch and context length.
- · Cost index is RELATIVE (HBM3e = 1.0), reflecting mid-2026 contract/street $/GB direction, not a quoted price. HBM4 carries a launch premium that erodes over time.
- · Excluded: GPU/accelerator package limits (how many stacks a part actually carries, e.g. 6–8 on current top parts), interposer/CoWoS capacity, NVLink/Infinity-Fabric topology, power & cooling, total $/token, and supply allocation. This sizes a per-stack spec fit only, not a full BOM.