NVIDIA L40S
Ada Lovelace architecture · 48GB memory · 366 FP16 TFLOPS · 350W TDP
Cloud Pricing Today
About the NVIDIA L40S
The NVIDIA L40S is a versatile Ada Lovelace-generation GPU that bridges the gap between inference-optimised and training-capable accelerators. With 48GB of GDDR6X memory and 366 FP16 TFLOPS, it offers more compute than the A100 at a significantly lower price point.
The L40S is particularly well-suited for AI inference and fine-tuning workloads where 48GB of memory is sufficient. Its Ada Lovelace architecture includes fourth-generation Tensor Cores and hardware-accelerated ray tracing, making it a strong choice for mixed AI/graphics workloads.
Compared to inference-focused GPUs like the L4 and T4, the L40S offers substantially more memory and compute, enabling it to handle larger models and higher batch sizes. Its 350W TDP is moderate for a data centre GPU, offering a reasonable balance between performance and power consumption.
In the cloud GPU market, the L40S occupies a cost-effective middle ground. It delivers approximately 1.17x the FP16 performance of the A100 at a fraction of the cost, making it increasingly popular for inference deployments and model fine-tuning.
Common Use Cases
Key Facts
- Manufacturer
- NVIDIA
- Architecture
- Ada Lovelace
- Accelerator Type
- GPU
- Primary Use
- inference
- Memory (VRAM)
- 48 GB
- FP16 Performance
- 366 TFLOPS
- Thermal Design Power
- 350W
Frequently Asked Questions
How much does an L40S cost per hour?
The NVIDIA L40S blended cloud pricing typically ranges from $0.80–$1.80 per hour, making it one of the most cost-effective GPUs for inference and fine-tuning workloads.
Is the L40S good for AI training?
The L40S can handle fine-tuning and smaller-scale training, but its 48GB GDDR6X memory (vs HBM on training GPUs) limits it for large model training. It excels at inference and medium-scale fine-tuning.
What is the difference between L40S and A100?
The L40S has slightly higher FP16 performance (366 vs 312 TFLOPS) but less memory (48GB GDDR6X vs 80GB HBM2e). The A100's HBM memory offers higher bandwidth, making it better for training. The L40S is typically cheaper and better suited for inference.
Related Accelerators
Compare NVIDIA L40S
L40S has slightly higher TFLOPS (366 vs 312) but less memory (48GB GDDR6X vs 80GB HBM2e). A100 is better for training; L40S is more cost-effective for inference.
L40S delivers 2.9x the performance (366 vs 125 TFLOPS) with double the memory (48GB vs 24GB). L40S costs more but handles significantly larger models and batch sizes.
Calculate NVIDIA L40S ROI
Estimate payback period, annual returns, and 3-year ROI with live Signwl pricing data.
Track NVIDIA L40S pricing over time
Get access to historical pricing data, regional analysis, and custom alerts.