AWSGPU

AWS Inferentia

Name: AWS Inferentia
Brand: AWS

Inferentia architecture · 8GB memory

Cloud Pricing Today

About the AWS Inferentia

AWS Inferentia is a custom machine learning inference chip designed by Amazon. It is optimised for high-throughput, low-cost inference deployments within the cloud ecosystem.

Memory (VRAM)

8 GB

Architecture

Inferentia

Common Use Cases

High-throughput inferenceCost-optimised ML servingProduction deployment

Key Facts

Manufacturer: AWS
Architecture: Inferentia
Accelerator Type: GPU
Primary Use: inference
Memory (VRAM): 8 GB

Related Accelerators

NVIDIA

NVIDIA A10G

Ampere · 24GB · 125 TFLOPS

inference

NVIDIA

NVIDIA A10

Ampere · 24GB · 125 TFLOPS

inference

NVIDIA

NVIDIA L40S

Ada Lovelace · 48GB · 366 TFLOPS

inference

NVIDIA

NVIDIA L4

Ada Lovelace · 24GB · 121 TFLOPS

inference

NVIDIA

NVIDIA T4

Turing · 16GB · 65 TFLOPS

inference

NVIDIA

NVIDIA T4G

Turing · 16GB · 65 TFLOPS

inference

Track AWS Inferentia pricing over time

Get access to historical pricing data, regional analysis, and custom alerts.