What faster inference does to the cost of a token

Log scale PPMT vs. Throughput.

NVIDIA H200 · GLM‑5.2 (744B MoE, ~40B active) @ FP8 · combined in+out tokens · provider price $1.18–$1.83 / 1M (OpenRouter, 8:1)

Full TCO CapEx-free TCO OpenRouter price

cost

Power elasticity

Equipment life

Equipment life with high wear

at 500 tokens/s

Cost / 1M tokens—

Tokens per watt—