Comparison

NVIDIA L40S vs NVIDIA A10G

Mid-tier inference — Ada Lovelace vs Ampere

The L40S delivers 2.9x the FP16 performance of the A10G (366 vs 125 TFLOPS) with double the memory (48GB vs 24GB). Both are strong inference GPUs at different price points.

Pricing Comparison

Specifications

Specification	NVIDIA L40S	NVIDIA A10G
Manufacturer	NVIDIA	NVIDIA
Architecture	Ada Lovelace	Ampere
Accelerator Type	GPU	GPU
Primary Use	inference	inference
Memory (VRAM)	48 GB	24 GB
FP16 Performance	366 TFLOPS	125 TFLOPS
TDP	350W	300W
Perf per Watt	1.05 TFLOPS/W	0.42 TFLOPS/W

Detailed Analysis

The L40S and A10G serve the mid-tier inference market but at different capability levels. The L40S, based on Ada Lovelace, offers a substantial performance uplift over the Ampere-based A10G.

The L40S's 48GB of GDDR6X memory enables it to handle models up to approximately 25B parameters with quantisation, while the A10G's 24GB limits it to roughly 13B. This memory advantage makes the L40S significantly more versatile for modern AI workloads.

The L40S also delivers 2.9x the FP16 TFLOPS (366 vs 125), meaning higher throughput for any given model. Its Ada Lovelace architecture includes hardware-accelerated ray tracing, making it a dual-purpose GPU for mixed AI and graphics workloads.

The A10G remains popular due to its lower cost and wide availability. For inference workloads that fit within 24GB of memory, the A10G can deliver acceptable performance at a significantly lower price point.

Verdict

Best for Training

L40S for fine-tuning tasks that need 48GB. A10G for small-scale fine-tuning within 24GB.

Best for Inference

L40S for larger models and higher throughput. A10G for cost-sensitive deployments with smaller models.

Best Value

A10G for workloads within its 24GB memory. L40S when you need the extra memory and throughput.

Frequently Asked Questions

Is the L40S good for inference?

Yes — the L40S is one of the best price/performance inference GPUs with 48GB memory and 366 FP16 TFLOPS. It can handle models up to ~25B parameters with quantisation.