AWSGPU
AWS Inferentia2
Inferentia2 architecture · 32GB memory
Cloud Pricing Today
About the AWS Inferentia2
AWS Inferentia2 is the second generation of Amazon's custom inference chip, offering significantly improved performance and support for larger models compared to the original Inferentia.
Memory (VRAM)
32 GB
Architecture
Inferentia2
Common Use Cases
Large model inferenceGenerative AI servingCost-effective inference
Key Facts
- Manufacturer
- AWS
- Architecture
- Inferentia2
- Accelerator Type
- GPU
- Primary Use
- inference
- Memory (VRAM)
- 32 GB
Related Accelerators
NVIDIA
NVIDIA A10G
Ampere · 24GB · 125 TFLOPS
inference
NVIDIA
NVIDIA A10
Ampere · 24GB · 125 TFLOPS
inference
NVIDIA
NVIDIA L40S
Ada Lovelace · 48GB · 366 TFLOPS
inference
NVIDIA
NVIDIA L4
Ada Lovelace · 24GB · 121 TFLOPS
inference
NVIDIA
NVIDIA T4
Turing · 16GB · 65 TFLOPS
inference
NVIDIA
NVIDIA T4G
Turing · 16GB · 65 TFLOPS
inference
Track AWS Inferentia2 pricing over time
Get access to historical pricing data, regional analysis, and custom alerts.