Fundamentals

What is VRAM? GPU Memory Explained

VRAM (Video RAM) is the dedicated memory on a GPU, used to store model weights, activations, and data during AI training and inference. VRAM capacity determines the maximum size of AI models a GPU can handle. According to Signwl data, cloud GPUs currently range from 8GB (T4, Inferentia) to 288GB (GB300) of VRAM, with the most popular training GPU — the H100 — offering 80GB.

Why VRAM Matters for AI

During AI model training, the entire model — its weights, gradients, optimiser states, and a batch of training data — must fit in GPU memory. If a model exceeds available VRAM, you either need to split it across multiple GPUs (which adds complexity and cost) or use memory-saving techniques that slow down training.

For inference, the model weights must reside in VRAM to serve predictions efficiently. A 70 billion parameter model in FP16 requires approximately 140GB of VRAM, which exceeds the capacity of a single H100 (80GB) but fits on an H200 (141GB).

Types of GPU Memory

Data centre GPUs use two main types of memory:

**HBM (High Bandwidth Memory)** — used in training GPUs like the H100 (80GB HBM3), A100 (80GB HBM2e), and MI300X (192GB HBM3). HBM provides extremely high bandwidth (2-5 TB/s) essential for training large models.

**GDDR (Graphics DDR)** — used in inference GPUs like the L40S (48GB GDDR6X), A10G (24GB GDDR6), and T4 (16GB GDDR6). GDDR offers lower bandwidth but at lower cost, making it suitable for inference workloads that are less bandwidth-sensitive.

How Much VRAM Do You Need?

A rough guide for model inference in FP16:

- 7B parameter model: ~14GB VRAM → fits on T4 (16GB) or larger - 13B parameter model: ~26GB VRAM → needs L40S (48GB) or A100 (40/80GB) - 70B parameter model: ~140GB VRAM → needs H200 (141GB) or multi-GPU setup - 405B parameter model: ~810GB VRAM → requires multi-GPU cluster

Quantisation (INT8, INT4) can halve or quarter these requirements, enabling larger models on smaller GPUs.

Frequently Asked Questions

How much VRAM do I need for AI?

It depends on your model size. For inference: 7B models need ~14GB, 13B models need ~26GB, 70B models need ~140GB (in FP16). Quantisation can reduce these by 2-4x. For training, you need 2-4x more VRAM than the model size due to gradients and optimiser states.

What is the difference between VRAM and RAM?

VRAM is memory on the GPU, directly accessible by GPU cores at extremely high bandwidth. System RAM is on the motherboard, accessible by the CPU. AI models must be loaded into VRAM for efficient GPU processing — system RAM is too slow for GPU compute.

Which GPU has the most VRAM?

According to Signwl data, the NVIDIA GB300 offers the most VRAM at 288GB of HBM3e. The AMD MI300X offers 192GB, and the NVIDIA H200 offers 141GB. Among widely available cloud GPUs, the H100 offers 80GB.