NVIDIA H200
Hopper architecture · 141GB memory · 990 FP16 TFLOPS · 700W TDP
Cloud Pricing Today
About the NVIDIA H200
The NVIDIA H200 is an evolution of the H100 that addresses one of its key limitations: memory capacity. By upgrading from 80GB HBM3 to 141GB HBM3e, the H200 provides 76% more memory while maintaining the same 990 FP16 TFLOPS of compute performance.
This memory upgrade is particularly significant for large language model inference, where the entire model must fit in GPU memory for efficient serving. Models that required multi-GPU setups on the H100 can run on fewer H200 GPUs, reducing both cost and latency.
The H200 also benefits from higher memory bandwidth — 4.8 TB/s compared to the H100's 3.35 TB/s — which improves performance on memory-bandwidth-bound operations common in transformer inference. For training workloads, the H200 offers modest improvements through better memory utilisation, though the compute performance remains identical to the H100.
In the cloud market, the H200 commands a premium over the H100 but can deliver better total cost of ownership for memory-intensive workloads by reducing the number of GPUs required.
Common Use Cases
Key Facts
- Manufacturer
- NVIDIA
- Architecture
- Hopper
- Accelerator Type
- GPU
- Primary Use
- training
- Memory (VRAM)
- 141 GB
- FP16 Performance
- 990 TFLOPS
- Thermal Design Power
- 700W
Frequently Asked Questions
How much does an H200 cost per hour?
The NVIDIA H200 blended cloud pricing typically ranges from $8–$14 per hour depending on region and pricing model. It commands a significant premium over the H100 due to its higher memory capacity and bandwidth.
What is the difference between H100 and H200?
The H200 has 76% more memory (141GB vs 80GB) and 43% higher memory bandwidth (4.8 vs 3.35 TB/s) compared to the H100. Compute performance (990 FP16 TFLOPS) is identical. The H200 excels at memory-intensive workloads like large model inference.
Is the H200 better than the H100 for inference?
Yes, for large model inference the H200 is generally better due to its larger memory (141GB vs 80GB) and higher memory bandwidth. This allows it to serve larger models on fewer GPUs, reducing overall inference cost and latency.
Related Accelerators
Compare NVIDIA H200
Same 990 TFLOPS compute but 76% more memory (141GB vs 80GB). H200 is better for memory-bound workloads. H100 is more widely available and cheaper per hour.
B200 offers ~1.8x compute (1,800 TFLOPS) with 192GB memory. B200 is the better choice for training; H200 offers a more established ecosystem.
Calculate NVIDIA H200 ROI
Estimate payback period, annual returns, and 3-year ROI with live Signwl pricing data.
Track NVIDIA H200 pricing over time
Get access to historical pricing data, regional analysis, and custom alerts.