What does the Inference Cost Calculator tool output?

$ / 1M tokens; GPU hours / day; Recommended cluster shape

What inputs does Inference Cost Calculator need?

Model size + variant; Throughput target (QPS); Hardware mix

Chips & Compute layer

Per-million-tokens cost for self-hosted inference across H100 / H200 / B200 / MI300.

The engineer question
What does it cost to self-host a 70B model at 100k QPS?

Status · Coming soon

Inputs

Outputs