NVIDIAGPU

NVIDIA L40S

Name: NVIDIA L40S
Brand: NVIDIA

Ada Lovelace architecture · 48GB memory · 366 FP16 TFLOPS · 350W TDP

Cloud Pricing Today

About the NVIDIA L40S

The NVIDIA L40S is a versatile Ada Lovelace-generation GPU that bridges the gap between inference-optimised and training-capable accelerators. With 48GB of GDDR6X memory and 366 FP16 TFLOPS, it offers more compute than the A100 at a significantly lower price point.

The L40S is particularly well-suited for AI inference and fine-tuning workloads where 48GB of memory is sufficient. Its Ada Lovelace architecture includes fourth-generation Tensor Cores and hardware-accelerated ray tracing, making it a strong choice for mixed AI/graphics workloads.

Compared to inference-focused GPUs like the L4 and T4, the L40S offers substantially more memory and compute, enabling it to handle larger models and higher batch sizes. Its 350W TDP is moderate for a data centre GPU, offering a reasonable balance between performance and power consumption.

In the cloud GPU market, the L40S occupies a cost-effective middle ground. It delivers approximately 1.17x the FP16 performance of the A100 at a fraction of the cost, making it increasingly popular for inference deployments and model fine-tuning.

Memory (VRAM)

48 GB

FP16 Performance

366 TFLOPS

Power (TDP)

350W

Architecture

Ada Lovelace

Common Use Cases

AI inferenceModel fine-tuningGenerative AI serving3D graphics and rendering

Key Facts

Manufacturer: NVIDIA
Architecture: Ada Lovelace
Accelerator Type: GPU
Primary Use: inference
Memory (VRAM): 48 GB
FP16 Performance: 366 TFLOPS
Thermal Design Power: 350W

Frequently Asked Questions

How much does an L40S cost per hour?

The NVIDIA L40S blended cloud pricing typically ranges from $0.80–$1.80 per hour, making it one of the most cost-effective GPUs for inference and fine-tuning workloads.

Is the L40S good for AI training?

The L40S can handle fine-tuning and smaller-scale training, but its 48GB GDDR6X memory (vs HBM on training GPUs) limits it for large model training. It excels at inference and medium-scale fine-tuning.

What is the difference between L40S and A100?

The L40S has slightly higher FP16 performance (366 vs 312 TFLOPS) but less memory (48GB GDDR6X vs 80GB HBM2e). The A100's HBM memory offers higher bandwidth, making it better for training. The L40S is typically cheaper and better suited for inference.

Related Accelerators

NVIDIA

NVIDIA GB300

Blackwell · 288GB · 2250 TFLOPS

training

NVIDIA

NVIDIA GB200

Blackwell · 192GB · 1800 TFLOPS

training

NVIDIA

NVIDIA B200

Blackwell · 192GB · 1800 TFLOPS

training

NVIDIA

NVIDIA B300

Blackwell · 288GB · 2250 TFLOPS

training

NVIDIA

NVIDIA H200

Hopper · 141GB · 990 TFLOPS

training

NVIDIA

NVIDIA H100

Hopper · 80GB · 990 TFLOPS

training

Compare NVIDIA L40S

Compare

NVIDIA L40S vs A100 80GB

L40S has slightly higher TFLOPS (366 vs 312) but less memory (48GB GDDR6X vs 80GB HBM2e). A100 is better for training; L40S is more cost-effective for inference.

View full comparison →

Compare

NVIDIA L40S vs A10G

L40S delivers 2.9x the performance (366 vs 125 TFLOPS) with double the memory (48GB vs 24GB). L40S costs more but handles significantly larger models and batch sizes.

View full comparison →

Investment Tool

Calculate NVIDIA L40S ROI

Estimate payback period, annual returns, and 3-year ROI with live Signwl pricing data.

→

Track NVIDIA L40S pricing over time

Get access to historical pricing data, regional analysis, and custom alerts.