NVIDIAGPU

NVIDIA H200

Name: NVIDIA H200
Brand: NVIDIA

Hopper architecture · 141GB memory · 990 FP16 TFLOPS · 700W TDP

Cloud Pricing Today

About the NVIDIA H200

The NVIDIA H200 is an evolution of the H100 that addresses one of its key limitations: memory capacity. By upgrading from 80GB HBM3 to 141GB HBM3e, the H200 provides 76% more memory while maintaining the same 990 FP16 TFLOPS of compute performance.

This memory upgrade is particularly significant for large language model inference, where the entire model must fit in GPU memory for efficient serving. Models that required multi-GPU setups on the H100 can run on fewer H200 GPUs, reducing both cost and latency.

The H200 also benefits from higher memory bandwidth — 4.8 TB/s compared to the H100's 3.35 TB/s — which improves performance on memory-bandwidth-bound operations common in transformer inference. For training workloads, the H200 offers modest improvements through better memory utilisation, though the compute performance remains identical to the H100.

In the cloud market, the H200 commands a premium over the H100 but can deliver better total cost of ownership for memory-intensive workloads by reducing the number of GPUs required.

Memory (VRAM)

141 GB

FP16 Performance

990 TFLOPS

Power (TDP)

700W

Architecture

Hopper

Common Use Cases

LLM inference at scaleLarge model trainingMemory-intensive AI workloadsGenerative AI serving

Key Facts

Manufacturer: NVIDIA
Architecture: Hopper
Accelerator Type: GPU
Primary Use: training
Memory (VRAM): 141 GB
FP16 Performance: 990 TFLOPS
Thermal Design Power: 700W

Frequently Asked Questions

How much does an H200 cost per hour?

The NVIDIA H200 blended cloud pricing typically ranges from $8–$14 per hour depending on region and pricing model. It commands a significant premium over the H100 due to its higher memory capacity and bandwidth.

What is the difference between H100 and H200?

The H200 has 76% more memory (141GB vs 80GB) and 43% higher memory bandwidth (4.8 vs 3.35 TB/s) compared to the H100. Compute performance (990 FP16 TFLOPS) is identical. The H200 excels at memory-intensive workloads like large model inference.

Is the H200 better than the H100 for inference?

Yes, for large model inference the H200 is generally better due to its larger memory (141GB vs 80GB) and higher memory bandwidth. This allows it to serve larger models on fewer GPUs, reducing overall inference cost and latency.

Related Accelerators

NVIDIA

NVIDIA GB300

Blackwell · 288GB · 2250 TFLOPS

training

NVIDIA

NVIDIA GB200

Blackwell · 192GB · 1800 TFLOPS

training

NVIDIA

NVIDIA B200

Blackwell · 192GB · 1800 TFLOPS

training

NVIDIA

NVIDIA B300

Blackwell · 288GB · 2250 TFLOPS

training

NVIDIA

NVIDIA H100

Hopper · 80GB · 990 TFLOPS

training

NVIDIA

NVIDIA A100 80GB

Ampere · 80GB · 312 TFLOPS

training

Compare NVIDIA H200

Compare

NVIDIA H200 vs H100

Same 990 TFLOPS compute but 76% more memory (141GB vs 80GB). H200 is better for memory-bound workloads. H100 is more widely available and cheaper per hour.

View full comparison →

Compare

NVIDIA H200 vs B200

B200 offers ~1.8x compute (1,800 TFLOPS) with 192GB memory. B200 is the better choice for training; H200 offers a more established ecosystem.

View full comparison →

Investment Tool

Calculate NVIDIA H200 ROI

Estimate payback period, annual returns, and 3-year ROI with live Signwl pricing data.

→

Track NVIDIA H200 pricing over time

Get access to historical pricing data, regional analysis, and custom alerts.