Fundamentals

What are TFLOPS? GPU Performance Explained

TFLOPS (Tera Floating-Point Operations Per Second) measure a GPU's computational throughput — how many trillions of mathematical operations it can perform each second. FP16 TFLOPS is the standard benchmark for AI workloads. According to Signwl data, cloud GPUs range from 11 TFLOPS (P4) to 2,250 TFLOPS (GB300), with the popular H100 delivering 990 FP16 TFLOPS.

Understanding Floating Point Precision

GPUs perform calculations at different precision levels, each suited to different tasks:

**FP32 (32-bit)** — full precision, used for scientific computing and some training **FP16 (16-bit)** — half precision, the standard for AI training and inference **BF16 (Brain Float 16)** — alternative half precision, popular for training **FP8 (8-bit)** — introduced with H100's Transformer Engine, doubles throughput vs FP16 **INT8/INT4** — integer precision, used for quantised inference

Lower precision means faster computation and less memory usage, but potentially reduced accuracy. Modern AI training uses mixed precision (FP16/FP32) to balance speed and accuracy.

TFLOPS as a Performance Metric

TFLOPS measures peak theoretical throughput — the maximum operations per second under ideal conditions. Real-world performance depends on memory bandwidth, software optimisation, and workload characteristics.

However, TFLOPS remains the most useful single metric for comparing GPU performance. Signwl uses FP16 TFLOPS as the standard benchmark across all GPU profiles because it best represents AI workload performance.

Performance per Dollar

Raw TFLOPS alone don't tell the full story — cost matters. Signwl calculates TFLOPS per dollar per hour for each GPU, revealing which accelerators deliver the best performance for their price.

A GPU with lower TFLOPS but much lower cost can deliver better value than a flagship GPU. For example, the L40S delivers 366 FP16 TFLOPS at a fraction of the H100's price, making it more cost-effective for workloads that don't need maximum performance.

Frequently Asked Questions

What does TFLOPS mean?

TFLOPS stands for Tera Floating-Point Operations Per Second — trillions of mathematical operations per second. FP16 TFLOPS is the standard AI performance metric. The NVIDIA H100 delivers 990 FP16 TFLOPS.

Are more TFLOPS always better?

More TFLOPS means more raw compute, but cost and memory also matter. A cheaper GPU with fewer TFLOPS can deliver better value (TFLOPS per dollar). Memory bandwidth and capacity are equally important for many AI workloads.

What is the difference between FP16 and FP32 TFLOPS?

FP16 uses half the precision of FP32, allowing GPUs to perform roughly twice as many operations per second. FP16 is the standard for AI training and inference, while FP32 is used for scientific computing requiring full precision.