H100$6.39/hr 1.2% 7d
A100 80GB$2.45/hr 0.5% 7d
H200$10.29/hr 0.8% 7d
L40S$1.28/hr 0.3% 7d
T4$0.24/hr 0.6% 7d
L4$0.45/hr 1.1% 7d
H100$6.39/hr 1.2% 7d
A100 80GB$2.45/hr 0.5% 7d
H200$10.29/hr 0.8% 7d
L40S$1.28/hr 0.3% 7d
T4$0.24/hr 0.6% 7d
L4$0.45/hr 1.1% 7d
NVIDIAGPU

NVIDIA T4

Turing architecture · 16GB memory · 65 FP16 TFLOPS · 70W TDP

Cloud Pricing Today

About the NVIDIA T4

The NVIDIA T4 is one of the most widely available and affordable GPUs in the cloud, making it a popular entry point for AI inference workloads. Based on the Turing architecture with 16GB of GDDR6 memory, it delivers 65 FP16 TFLOPS while consuming only 70W — one of the lowest power draws of any data centre GPU.

Despite being a legacy architecture, the T4 remains relevant because of its exceptional availability and low cost. It is available in virtually every cloud region from all major providers, and its low power consumption makes it cost-effective for always-on inference services.

The T4 introduced INT8 and INT4 precision support for inference workloads, enabling quantised model serving that can significantly improve throughput. It can efficiently serve models up to approximately 7 billion parameters with appropriate quantisation.

For budget-conscious deployments, the T4 offers strong value. While newer GPUs like the L4 provide better performance, the T4's extremely low cost per hour makes it the go-to choice for lightweight inference, development environments, and cost-optimised serving.

Memory (VRAM)
16 GB
FP16 Performance
65 TFLOPS
Power (TDP)
70W
Architecture
Turing

Common Use Cases

Budget inferenceLightweight AI servingVideo transcodingEntry-level ML

Key Facts

Manufacturer
NVIDIA
Architecture
Turing
Accelerator Type
GPU
Primary Use
inference
Memory (VRAM)
16 GB
FP16 Performance
65 TFLOPS
Thermal Design Power
70W

Frequently Asked Questions

How much does a T4 GPU cost per hour?

The NVIDIA T4 is one of the most affordable cloud GPUs, with blended pricing typically between $0.15–$0.35 per hour. Spot pricing can drop below $0.10/hr in some regions.

What models can run on a T4?

The T4's 16GB of memory supports models up to approximately 7B parameters with quantisation (INT8/INT4). It can run inference on models like Llama 2 7B, Mistral 7B, and similar-sized models. Larger models require multi-GPU setups or higher-memory GPUs.

T4 vs L4 — which is better?

The L4 is the T4's successor, offering ~1.9x FP16 performance (121 vs 65 TFLOPS), 50% more memory (24GB vs 16GB), and similar low power consumption (72W vs 70W). The L4 is better for most inference workloads, but the T4 is cheaper and more widely available.

Related Accelerators

Compare NVIDIA T4

Investment Tool

Calculate NVIDIA T4 ROI

Estimate payback period, annual returns, and 3-year ROI with live Signwl pricing data.

Track NVIDIA T4 pricing over time

Get access to historical pricing data, regional analysis, and custom alerts.