AWSGPU
AWS Inferentia
Inferentia architecture · 8GB memory
Cloud Pricing Today
About the AWS Inferentia
AWS Inferentia is a custom machine learning inference chip designed by Amazon. It is optimised for high-throughput, low-cost inference deployments within the cloud ecosystem.
Memory (VRAM)
8 GB
Architecture
Inferentia
Common Use Cases
High-throughput inferenceCost-optimised ML servingProduction deployment
Key Facts
- Manufacturer
- AWS
- Architecture
- Inferentia
- Accelerator Type
- GPU
- Primary Use
- inference
- Memory (VRAM)
- 8 GB
Related Accelerators
NVIDIA
NVIDIA A10G
Ampere · 24GB · 125 TFLOPS
inference
NVIDIA
NVIDIA A10
Ampere · 24GB · 125 TFLOPS
inference
NVIDIA
NVIDIA L40S
Ada Lovelace · 48GB · 366 TFLOPS
inference
NVIDIA
NVIDIA L4
Ada Lovelace · 24GB · 121 TFLOPS
inference
NVIDIA
NVIDIA T4
Turing · 16GB · 65 TFLOPS
inference
NVIDIA
NVIDIA T4G
Turing · 16GB · 65 TFLOPS
inference
Track AWS Inferentia pricing over time
Get access to historical pricing data, regional analysis, and custom alerts.