AWSGPU

AWS Inferentia2

Name: AWS Inferentia2
Brand: AWS

Inferentia2 architecture · 32GB memory

Cloud Pricing Today

About the AWS Inferentia2

AWS Inferentia2 is the second generation of Amazon's custom inference chip, offering significantly improved performance and support for larger models compared to the original Inferentia.

Memory (VRAM)

32 GB

Architecture

Inferentia2

Common Use Cases

Large model inferenceGenerative AI servingCost-effective inference

Key Facts

Manufacturer: AWS
Architecture: Inferentia2
Accelerator Type: GPU
Primary Use: inference
Memory (VRAM): 32 GB

Related Accelerators

NVIDIA

NVIDIA A10G

Ampere · 24GB · 125 TFLOPS

inference

NVIDIA

NVIDIA A10

Ampere · 24GB · 125 TFLOPS

inference

NVIDIA

NVIDIA L40S

Ada Lovelace · 48GB · 366 TFLOPS

inference

NVIDIA

NVIDIA L4

Ada Lovelace · 24GB · 121 TFLOPS

inference

NVIDIA

NVIDIA T4

Turing · 16GB · 65 TFLOPS

inference

NVIDIA

NVIDIA T4G

Turing · 16GB · 65 TFLOPS

inference

Track AWS Inferentia2 pricing over time

Get access to historical pricing data, regional analysis, and custom alerts.