AWS Bedrock Pricing 2026: Per-Token Cost & Throughput

Amazon Bedrock charges $0.003 to $0.075 per 1,000 input tokens depending on the foundation model selected, with provisioned throughput priced at $21 to $63 per hour per model unit. A 1,000-seat enterprise deployment using Claude 3.5 Sonnet on Bedrock typically lands at $180,000 to $420,000 in annual run-rate cost. The two cost variables that decide that number are model selection and the on-demand versus provisioned throughput decision. This page documents Bedrock list prices for 2026, the model-by-model economics, and the patterns that keep Bedrock bills inside the budget.

Bedrock 2026 model pricing table

Bedrock is a model marketplace, not a single product. Each foundation model on Bedrock is priced separately by its provider and routed through the AWS billing line. The price ranges in the table below reflect AWS public pricing as of Q1 2026 for the us-east-1 region.

Foundation model	Input per 1K tokens	Output per 1K tokens	Provider
Claude 3.5 Sonnet v2	$0.003	$0.015	Anthropic
Claude 3.5 Haiku	$0.0008	$0.004	Anthropic
Claude 3 Opus	$0.015	$0.075	Anthropic
Amazon Nova Pro	$0.0008	$0.0032	Amazon
Amazon Nova Lite	$0.00006	$0.00024	Amazon
Amazon Nova Micro	$0.000035	$0.00014	Amazon
Amazon Titan Text Premier	$0.0005	$0.0015	Amazon
Meta Llama 3.3 70B	$0.00072	$0.00072	Meta
Meta Llama 3.1 405B	$0.00532	$0.016	Meta
Mistral Large 2	$0.002	$0.006	Mistral
Cohere Command R+	$0.0025	$0.01	Cohere
AI21 Jamba 1.5 Large	$0.002	$0.008	AI21

The price spread is wide. Claude 3 Opus at $0.075 per 1,000 output tokens costs 1,250 times more than Nova Micro at $0.00006 per 1,000 input tokens. Model selection is therefore the single largest cost lever in any Bedrock deployment. A retrieval-augmented generation pipeline that defaults to Opus for every query when Haiku would suffice will inflate cost by 18x with no quality benefit on the routine traffic.

Negotiation lever: Bedrock on-demand list pricing is non-negotiable through the AWS console. Discounts on Bedrock are realised through the Enterprise Discount Program, where Bedrock consumption counts toward EDP commit at full list and burns down with the same discount applied to other AWS services. See our AWS Enterprise Agreement and EDP guide for the commit structure that captures Bedrock spend.

On-demand versus provisioned throughput

Bedrock has two consumption modes. On-demand charges per token consumed, with no minimum commitment, and shares model capacity with all other AWS customers in the region. Provisioned throughput reserves dedicated model capacity (measured in model units) at an hourly rate, regardless of how much you use it.

The break-even between the two is workload-specific. For Claude 3.5 Sonnet, a single model unit (MU) lists at $39.60 per hour, or $28,512 per month for a one-month no-commit reservation. That same MU at on-demand pricing produces approximately 16 million input plus 4 million output tokens per hour under typical mixed loads. Workloads that consume 8 to 10 million tokens per hour are at the break-even point.

Reservation term	Discount versus 1-month	MU/hour example (Claude 3.5 Sonnet)
1 month, no commit	baseline	$39.60
6 months commit	30 percent	$27.72
1 year commit	40 to 50 percent	$19.80 to $23.76

Provisioned throughput is the right answer for predictable, high-volume workloads where latency consistency matters. On-demand is the right answer for variable workloads, pilots, and any scenario where the volume sits below the break-even line. Mixing both is common: provisioned capacity for the steady-state floor, on-demand for spike absorption.

Custom model import and Bedrock Studio

Bedrock Custom Model Import lets enterprises run their own fine-tuned open-weights models (Llama, Mistral, Flan-T5) on Bedrock infrastructure. The pricing model is per-model-copy per-hour, with a five-minute minimum and a separate storage charge.

A custom imported Llama 3 70B model lists at $33 per hour per active copy, with a minimum scale-out of 1 copy. A 24-hour pilot at 1 copy costs $792. Production deployment at 2 copies for 24x7 operation lists at $48,180 per month before EDP discount. Storage of the model weights runs $1.95 per GB per month.

Bedrock Studio (general availability 2025) gives application teams a low-code build environment for Bedrock-backed apps. Studio itself does not add per-seat cost, but the apps built in Studio consume Bedrock tokens and Knowledge Base storage at standard rates.

Agents, Knowledge Bases, Guardrails

Bedrock Agents adds orchestration to a base model. The Agents feature itself is free of incremental fee. The token consumption underneath each agent invocation is billed at the standard model rate for the model the agent uses, multiplied by the number of model calls required to complete the chain of reasoning. A multi-step agent that calls the model six times to complete a task incurs six times the token cost of a single-shot prompt.

Bedrock Knowledge Bases adds retrieval-augmented generation. The Knowledge Base feature is free of incremental fee. Underlying costs are the vector database (OpenSearch Serverless at $0.24 per OCU-hour, Pinecone or Redis as alternatives), the embeddings model invocation (Titan Embeddings at $0.0001 per 1,000 tokens, Cohere Embed at $0.0001), and the storage of the source documents in S3.

Component	Pricing model	Typical monthly cost (mid-size deployment)
OpenSearch Serverless vector store	$0.24 per OCU-hour, 2 OCU minimum	$345 per month
Titan Embeddings ingestion	$0.0001 per 1K tokens	$200 to $800 per month for 5M docs
Source document S3 storage	$0.023 per GB-month standard tier	$50 to $500 per month
Re-embedding for document updates	Per-token, repeat charge	Variable, often underestimated

Bedrock Guardrails adds policy enforcement (PII detection, prompt injection detection, content filtering). Guardrails is priced per text unit (1,000 characters). A guardrail invocation on a typical 2,000-character prompt and 1,500-character response costs $0.00075 per request. At 10 million requests per month, Guardrails adds $7,500 per month to the bill.

Cost-control patterns that actually work

Five patterns reliably reduce Bedrock spend by 30 to 60 percent without compromising application quality.

First, model cascade routing. Route the easy 80 percent of traffic to Haiku, Nova Lite, or Llama 70B. Reserve Sonnet, Opus, or Llama 405B for the queries that fail the lightweight model's confidence check or that match a known complexity pattern. Cascade routing typically reduces token spend by 55 to 70 percent on customer-support and search use cases.

Second, prompt caching. Anthropic models on Bedrock support prompt caching at $0.00075 per 1,000 cached tokens read (75 percent off standard input pricing). Long system prompts, knowledge base context, and few-shot examples cached for repeated invocation reduce the cost of high-volume agent loops dramatically.

Third, output token cap. Set max-tokens conservatively. The default in many SDKs is 4,096 tokens output. Most production prompts produce 200 to 500 useful output tokens. The cap prevents runaway generations from inflating cost by 8x.

Fourth, batch inference. For non-real-time workloads (overnight summarisation, content review pipelines), Bedrock Batch lists at 50 percent off on-demand pricing on supported models. Batch jobs run on AWS-managed schedule with no SLA guarantee on completion time.

Fifth, reserved capacity with rightsizing. For workloads above the on-demand break-even, provisioned throughput at one-year commit captures 40 to 50 percent savings. The risk is over-provisioning. The rightsizing discipline is to measure on-demand consumption for 30 days, then commit to 70 to 80 percent of observed peak, leaving on-demand to absorb the remaining variability.

The Bedrock bill-shock pattern: A retrieval-augmented chatbot serving 50,000 queries per day at 4,000 input tokens (knowledge context) and 600 output tokens per query on Claude 3.5 Sonnet lists at $20,475 per month. The same workload on Claude 3.5 Haiku costs $5,460 per month. With cascade routing (80 percent Haiku, 20 percent Sonnet), it drops to $8,463 per month. Model selection alone delivered 59 percent saving with no quality regression on the routed traffic.

Bedrock versus SageMaker decision

Bedrock and SageMaker are not substitutes. Bedrock is the managed foundation-model service. SageMaker is the full machine-learning platform: notebooks, training, model registry, real-time and batch inference endpoints, MLOps tooling.

The decision rule. Use Bedrock when the requirement is to invoke a foundation model (Claude, Llama, Nova, Mistral) via API for a generative application. Use SageMaker when the requirement is to train, fine-tune, deploy, or operate a custom model with full control over the inference stack. Many production architectures use both: SageMaker for the proprietary recommendation model, Bedrock for the generative summary on top.

SageMaker JumpStart sits between the two. JumpStart lets teams deploy open-weights foundation models on dedicated SageMaker inference endpoints, priced per instance-hour rather than per-token. For very high-volume inference on a single open model, SageMaker JumpStart often beats Bedrock economics. Below 8 to 10 million tokens per hour, Bedrock is cheaper and operationally simpler.

Bedrock contractual position in AWS negotiations

Bedrock is a strategic AWS line. AWS account teams have explicit Bedrock growth quotas, and Bedrock consumption is one of the levers AWS uses to justify EDP renewal increases. That gives buyers more bargaining room than they typically realise.

Three negotiation moves work in 2026. First, request a Bedrock-specific discount sleeve inside the EDP, separating Bedrock consumption from general AWS infrastructure spend so the discount on Bedrock can exceed the blended EDP rate. AWS account teams have authority to grant 10 to 20 percent on Bedrock specifically when Bedrock is the strategic spend.

Second, secure a Bedrock model price-lock for the EDP term. AWS reprices Bedrock models periodically (typically downward as competition intensifies). A price-lock provision protects the customer from upward repricing without locking out the customer from beneficial price reductions. This is a clause AWS will accept when asked, and refuses when not asked.

Third, negotiate the data residency and indemnity terms. Bedrock service terms include model-provider passthrough language that varies by model. Anthropic's IP indemnity on Bedrock follows the Anthropic enterprise policy. Amazon's Nova indemnity is the AWS standard. Mistral and Cohere terms are more limited. See our AI contract clauses guide for the indemnity matrix.

For the broader cloud commercial framework, see our AWS EDP negotiation playbook and cloud cost optimization guide. For the multi-vendor AI procurement view that compares Bedrock against Azure OpenAI, Vertex AI, and direct Anthropic, see our AI procurement guide and the AWS vendor hub. Engagement starts at AI procurement advisory.

AWS EDP Negotiation: Hit the Tier, Not the Overcommit

Hit the AWS EDP discount tier without overcommitting spend.

Read the white paper

AWS Bedrock Pricing 2026

Bedrock 2026 model pricing table

On-demand versus provisioned throughput

Custom model import and Bedrock Studio

Agents, Knowledge Bases, Guardrails

Cost-control patterns that actually work

Bedrock versus SageMaker decision

Bedrock contractual position in AWS negotiations

The Licensing Edge

Get an Independent Bedrock Cost Review

AWS Bedrock Pricing 2026

Bedrock 2026 model pricing table

On-demand versus provisioned throughput

Custom model import and Bedrock Studio

Agents, Knowledge Bases, Guardrails

Cost-control patterns that actually work

Bedrock versus SageMaker decision

Bedrock contractual position in AWS negotiations

Related Intelligence

AWS Enterprise Agreement and EDP 2026 Pillar

AI Procurement Guide: The Buyer-Side Framework

Cloud Cost Optimization: 38 Percent Median Saving

The Licensing Edge

Get an Independent Bedrock Cost Review