Skip to main content

HBM Memory layer

HBM3 vs HBM3e vs HBM4

Bandwidth, capacity, and availability cross-cut for HBM generations across SK hynix / Micron / Samsung.

The engineer question
Which HBM generation matches my training workload economics?

Inputs

Training is more HBM-hungry per param (optimizer + grads); inference is leaner but KV-cache grows with context.

Total parameter count in billions (e.g. 70 for a 70B model).

Relative $/GB you'll tolerate; HBM3e is the reference (index 1.0).

Result

Recommended generationshipping today
HBM3e
HBM3 — bandwidth / stack6.4 Gt/s/pin
819 GB/s
HBM3 — max capacity / stack8-Hi / 12-Hi
24 GB
HBM3 — availabilityGA since 2022 (H100, MI300)
Shipping
HBM3 — stacks for footprint~1,260 GB working set
53 stacks
HBM3e — bandwidth / stack9.6 Gt/s/pin
1.20 TB/s
HBM3e — max capacity / stack8-Hi / 12-Hi
36 GB
HBM3e — availabilityGA since 2024 (H200, B200, MI325X)
Shipping
HBM3e — stacks for footprint~1,260 GB working set
35 stacks
HBM4 (target) — bandwidth / stack8 Gt/s/pin
1.80 TB/s
HBM4 (target) — max capacity / stack12-Hi / 16-Hi
64 GB
HBM4 (target) — availabilitySampling 2025–26, volume 2026+ (forward-looking)
Target / forward-looking
HBM4 (target) — stacks for footprint~1,260 GB working set
20 stacks

Recommendation

Working set ≈ 1,260 GB before sharding across GPUs. HBM3e is the mid-2026 sweet spot: ~1.2 TB/s/stack and up to 36 GB/stack are shipping in volume (H200 / B200 / MI325X), so you avoid the supply + maturity risk of HBM4. Pay the HBM4 premium only if you specifically need >1.6 TB/s or 48–64 GB/stack and can wait for volume.

Assumptions

  • · Source: JEDEC standards (HBM3 JESD238, HBM4 JESD270-4) + public SK hynix / Samsung / Micron datasheets & roadmap slides, mid-2026. All figures are approximate / typical — not audited.
  • · HBM3 ≈ 6.4 Gt/s/pin × 1024-bit = ~819 GB/s; HBM3e ≈ 9.2–9.6 Gt/s/pin → ~1.18–1.23 TB/s. Vendor speed grades vary ±5–10%.
  • · HBM4 is forward-looking: JESD270-4 doubles the interface to 2048-bit; per-stack bandwidth target ~1.6–2.0 TB/s and capacity ~48–64 GB (12-Hi/16-Hi) are vendor ROADMAP TARGETS, not shipping silicon. Treated with an 0.8 risk discount in the recommendation.
  • · Footprint heuristic: ~18 GB/B-param (training, mixed precision incl. optimizer+grads+activations) and ~2.5 GB/B-param (inference, fp8 weights + KV cache). Real usage swings ±50% with parallelism strategy, precision, batch and context length.
  • · Cost index is RELATIVE (HBM3e = 1.0), reflecting mid-2026 contract/street $/GB direction, not a quoted price. HBM4 carries a launch premium that erodes over time.
  • · Excluded: GPU/accelerator package limits (how many stacks a part actually carries, e.g. 6–8 on current top parts), interposer/CoWoS capacity, NVLink/Infinity-Fabric topology, power & cooling, total $/token, and supply allocation. This sizes a per-stack spec fit only, not a full BOM.

Worked example (default inputs)

Result

Recommended generationshipping today
HBM3e
HBM3 — bandwidth / stack6.4 Gt/s/pin
819 GB/s
HBM3 — max capacity / stack8-Hi / 12-Hi
24 GB
HBM3 — availabilityGA since 2022 (H100, MI300)
Shipping
HBM3 — stacks for footprint~1,260 GB working set
53 stacks
HBM3e — bandwidth / stack9.6 Gt/s/pin
1.20 TB/s
HBM3e — max capacity / stack8-Hi / 12-Hi
36 GB
HBM3e — availabilityGA since 2024 (H200, B200, MI325X)
Shipping
HBM3e — stacks for footprint~1,260 GB working set
35 stacks
HBM4 (target) — bandwidth / stack8 Gt/s/pin
1.80 TB/s
HBM4 (target) — max capacity / stack12-Hi / 16-Hi
64 GB
HBM4 (target) — availabilitySampling 2025–26, volume 2026+ (forward-looking)
Target / forward-looking
HBM4 (target) — stacks for footprint~1,260 GB working set
20 stacks

Recommendation

Working set ≈ 1,260 GB before sharding across GPUs. HBM3e is the mid-2026 sweet spot: ~1.2 TB/s/stack and up to 36 GB/stack are shipping in volume (H200 / B200 / MI325X), so you avoid the supply + maturity risk of HBM4. Pay the HBM4 premium only if you specifically need >1.6 TB/s or 48–64 GB/stack and can wait for volume.

Assumptions

  • · Source: JEDEC standards (HBM3 JESD238, HBM4 JESD270-4) + public SK hynix / Samsung / Micron datasheets & roadmap slides, mid-2026. All figures are approximate / typical — not audited.
  • · HBM3 ≈ 6.4 Gt/s/pin × 1024-bit = ~819 GB/s; HBM3e ≈ 9.2–9.6 Gt/s/pin → ~1.18–1.23 TB/s. Vendor speed grades vary ±5–10%.
  • · HBM4 is forward-looking: JESD270-4 doubles the interface to 2048-bit; per-stack bandwidth target ~1.6–2.0 TB/s and capacity ~48–64 GB (12-Hi/16-Hi) are vendor ROADMAP TARGETS, not shipping silicon. Treated with an 0.8 risk discount in the recommendation.
  • · Footprint heuristic: ~18 GB/B-param (training, mixed precision incl. optimizer+grads+activations) and ~2.5 GB/B-param (inference, fp8 weights + KV cache). Real usage swings ±50% with parallelism strategy, precision, batch and context length.
  • · Cost index is RELATIVE (HBM3e = 1.0), reflecting mid-2026 contract/street $/GB direction, not a quoted price. HBM4 carries a launch premium that erodes over time.
  • · Excluded: GPU/accelerator package limits (how many stacks a part actually carries, e.g. 6–8 on current top parts), interposer/CoWoS capacity, NVLink/Infinity-Fabric topology, power & cooling, total $/token, and supply allocation. This sizes a per-stack spec fit only, not a full BOM.

Related tools in the HBM Memory layer

Get notified when HBM3 vs HBM3e vs HBM4 numbers update

We refresh the inputs as the market moves. One email when they change.