NVIDIA T4
Turing architecture · 16GB memory · 65 FP16 TFLOPS · 70W TDP
Cloud Pricing Today
About the NVIDIA T4
The NVIDIA T4 is one of the most widely available and affordable GPUs in the cloud, making it a popular entry point for AI inference workloads. Based on the Turing architecture with 16GB of GDDR6 memory, it delivers 65 FP16 TFLOPS while consuming only 70W — one of the lowest power draws of any data centre GPU.
Despite being a legacy architecture, the T4 remains relevant because of its exceptional availability and low cost. It is available in virtually every cloud region from all major providers, and its low power consumption makes it cost-effective for always-on inference services.
The T4 introduced INT8 and INT4 precision support for inference workloads, enabling quantised model serving that can significantly improve throughput. It can efficiently serve models up to approximately 7 billion parameters with appropriate quantisation.
For budget-conscious deployments, the T4 offers strong value. While newer GPUs like the L4 provide better performance, the T4's extremely low cost per hour makes it the go-to choice for lightweight inference, development environments, and cost-optimised serving.
Common Use Cases
Key Facts
- Manufacturer
- NVIDIA
- Architecture
- Turing
- Accelerator Type
- GPU
- Primary Use
- inference
- Memory (VRAM)
- 16 GB
- FP16 Performance
- 65 TFLOPS
- Thermal Design Power
- 70W
Frequently Asked Questions
How much does a T4 GPU cost per hour?
The NVIDIA T4 is one of the most affordable cloud GPUs, with blended pricing typically between $0.15–$0.35 per hour. Spot pricing can drop below $0.10/hr in some regions.
What models can run on a T4?
The T4's 16GB of memory supports models up to approximately 7B parameters with quantisation (INT8/INT4). It can run inference on models like Llama 2 7B, Mistral 7B, and similar-sized models. Larger models require multi-GPU setups or higher-memory GPUs.
T4 vs L4 — which is better?
The L4 is the T4's successor, offering ~1.9x FP16 performance (121 vs 65 TFLOPS), 50% more memory (24GB vs 16GB), and similar low power consumption (72W vs 70W). The L4 is better for most inference workloads, but the T4 is cheaper and more widely available.
Related Accelerators
Compare NVIDIA T4
L4 delivers ~1.9x performance (121 vs 65 TFLOPS) with 50% more memory (24GB vs 16GB) at similar power. L4 is better for most workloads; T4 is cheaper and more widely available.
A10G delivers 1.9x the performance (125 vs 65 TFLOPS) with 50% more memory (24GB vs 16GB). A10G costs roughly 2x per hour but handles larger models and higher throughput.
Calculate NVIDIA T4 ROI
Estimate payback period, annual returns, and 3-year ROI with live Signwl pricing data.
Track NVIDIA T4 pricing over time
Get access to historical pricing data, regional analysis, and custom alerts.