What faster inference does to the cost of a token

Log scale PPMT vs. Throughput.

NVIDIA H200 · GLM‑5.2 (744B MoE, ~40B active) @ FP8 · combined in+out tokens · provider price $1.18–$1.83 / 1M (OpenRouter, 8:1)

Full TCO CapEx-free TCO OpenRouter price
cost
at 500 tokens/s
Cost / 1M tokens
Tokens per watt