AI Models

The Heroku Managed Inference and Agent add-on supports the following models. The add-on is hosted in two regions: us and eu. However, the add-on can be provisioned and accessed from apps in any Heroku region. Select a model to view information on rate limits, prompt caching, and implementation.

Model Documentation	Model ID	Region	Supported Inputs	Supported Outputs	API Endpoint	Model Source	Description
Amazon Rerank 1.0	amazon-rerank-1-0	US, EU	`text`	`score`	/v1/rerank	Amazon	A reliable, high-performing reranking model backed by AWS infrastructure.
Nova Lite	nova-lite	US, EU	`text`, `image`, `video`	`text`	/v1/chat/completions	Amazon	A fast and cost-effective LLM.
Nova 2 Lite	nova-2-lite	US, EU	`text`, `image`, `video`	`text`	/v1/chat/completions	Amazon	A fast and cost-effective LLM that supports conversational chat, tool-calling, and advanced reasoning with extended context.
Nova Pro	nova-pro	US, EU	`text`, `image`, `video`	`text`	/v1/chat/completions	Amazon	A high-performance LLM designed for complex tasks.
Claude 3 Haiku	claude-3-haiku	EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	A fast and affordable LLM that supports chat and tool-calling.
Claude 3.5 Haiku	claude-3-5-haiku	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	An affordable and straightforward LLM that supports chat and tool-calling.
Claude 3.7 Sonnet	claude-3-7-sonnet	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	An intelligent and detail-oriented LLM that supports chat, tool-calling, and enhanced reasoning.
Claude 4 Sonnet	claude-4-sonnet	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	An intelligent and detail-oriented LLM that supports chat, tool-calling, and enhanced reasoning.
Claude 4.5 Haiku	claude-4-5-haiku	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	A state-of-the-art LLM that supports chat, tool-calling, and enhanced reasoning.
Claude 4.5 Sonnet	claude-4-5-sonnet	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	A state-of-the-art LLM optimized for enterprise apps that supports chat, tool-calling, and enhanced reasoning.
Claude Sonnet 4.6	claude-sonnet-4-6	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	A state-of-the-art LLM designed for complex tasks including data processing, sales forecasting, and content generation.
Claude Opus 4.5	claude-opus-4-5	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	A next-generation, frontier LLM that supports chat, tool-calling, autonomous coding, effort control, and enhanced reasoning.
Claude Opus 4.6	claude-opus-4-6	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	A next-generation, frontier LLM that supports chat, tool-calling, autonomous coding, effort control, and enhanced reasoning.
Claude Opus 4.7	claude-opus-4-7	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	A next-generation, frontier LLM that supports chat, tool-calling, autonomous coding, effort control, and enhanced reasoning.
Claude Opus 4.8	claude-opus-4-8	US, EU	`text`, `image`	`text`	/v1/chat/completions	Anthropic	A next-generation, frontier LLM that supports chat, tool-calling, autonomous coding, effort control, and enhanced reasoning.
Cohere Embed Multilingual	cohere-embed-multilingual	US, EU	`text`	`embedding`	/v1/embeddings	Cohere	A state-of-the-art embedding model that supports multiple languages and can be helpful for developing RAG search.
Cohere Embed V4	cohere-embed-v4	US, EU	`text`	`embedding`	/v1/embeddings	Cohere	A state-of-the-art embedding model that supports over 100 languages and can be helpful for developing RAG search.
Cohere Rerank 3.5	cohere-rerank-3-5	US, EU	`text`	`score`	/v1/rerank	Cohere	A reranking model that offers enhanced reasoning, broad data compatibility, and multilingual support.
DeepSeek V3.2	deepseek-v3-2	US	`text`	`text`	/v1/chat/completions	DeepSeek	An open-weight LLM that supports conversational chat, tool-calling, and high-efficiency reasoning.
MiniMax M2	minimax-m2	US	`text`	`text`	/v1/chat/completions	MiniMax	An open-weight LLM that supports conversational chat, tool-calling, and programming tasks.
MiniMax M2.1	minimax-m2-1	US	`text`	`text`	/v1/chat/completions	MiniMax	An open-weight LLM that supports conversational chat, tool-calling, and long-horizon reasoning.
Kimi K2 Thinking	kimi-k2-thinking	US	`text`	`text`	/v1/chat/completions	Moonshot AI	An open-weight LLM that supports conversational chat, tool-calling, and chain-of-thought processing.
Kimi K2.5	kimi-k2-5	US	`text`	`text`	/v1/chat/completions	Moonshot AI	An open-weight LLM that supports conversational chat, tool-calling, and multimodal agentic workflows.
OpenAI gpt-oss-120b	gpt-oss-120b	US, EU	`text`	`text`	/v1/chat/completions	OpenAI	An open-weight LLM that supports chat and tool-calling.
Qwen3 235B	qwen3-235b	US	`text`	`text`	/v1/chat/completions	Qwen	An open-weight LLM that supports conversational chat, tool-calling, complex reasoning, and agentic coding.
Qwen3 Coder 480B	qwen3-coder-480b	US	`text`	`text`	/v1/chat/completions	Qwen	An open-weight LLM that supports conversational chat, tool-calling, and agentic coding.
Stable Image Core	stable-image-core	US	`text`	`image`	/v1/images/generations	Stability AI	A fast and cost-effective diffusion (image generation) model.
Stable Image Ultra	stable-image-ultra	US	`text`	`image`	/v1/images/generations	Stability AI	A state-of-the-art diffusion (image generation) model.
GLM 4.7	glm-4-7	US	`text`	`text`	/v1/chat/completions	Z.ai	An open-weight LLM that supports conversational chat, tool-calling, and stable multi-step reasoning.
GLM 4.7 Flash	glm-4-7-flash	US	`text`	`text`	/v1/chat/completions	Z.ai	An open-weight LLM that supports conversational chat, tool-calling, and low-latency agentic tasks.

Deprecated Models

Heroku is deprecating the following models and will each end-of-life on the dates listed. During the deprecation period, requests to these models return a warning header. Before the EOL date, we’ll convert model-specific plans for deprecated models to the standard plan. After the EOL date, requests to these models return HTTP 410.

Model	Model ID	Deprecation Date	EOL Date	Replacement
Claude 3.5 Sonnet Latest	claude-3-5-sonnet-latest	January 22, 2026	February 22, 2026	claude-4-6-sonnet
Claude 3.7 Sonnet	claude-3-7-sonnet	March 21, 2026	April 21, 2026	claude-4-6-sonnet
Claude 3.5 Haiku	claude-3-5-haiku	May 12, 2026	June 12, 2026	claude-4-5-haiku
DeepSeek V3.2	deepseek-v3-2	June 15, 2026	July 15, 2026	gpt-oss-120b
MiniMax M2	minimax-m2	June 15, 2026	July 15, 2026	gpt-oss-120b
MiniMax M2.1	minimax-m2-1	June 15, 2026	July 15, 2026	gpt-oss-120b
Kimi K2 Thinking	kimi-k2-thinking	June 15, 2026	July 15, 2026	gpt-oss-120b
Kimi K2.5	kimi-k2-5	June 15, 2026	July 15, 2026	gpt-oss-120b
Qwen3 235B	qwen3-235b	June 15, 2026	July 15, 2026	gpt-oss-120b
Qwen3 Coder 480B	qwen3-coder-480b	June 15, 2026	July 15, 2026	gpt-oss-120b
GLM 4.7	glm-4-7	June 15, 2026	July 15, 2026	gpt-oss-120b
GLM 4.7 Flash	glm-4-7-flash	June 15, 2026	July 15, 2026	gpt-oss-120b
Claude 3 Haiku	claude-3-haiku	August 3, 2026	September 3, 2026	claude-4-5-haiku
Claude 4 Sonnet	claude-4-sonnet	September 7, 2026	October 7, 2026	claude-4-6-sonnet

Categories

AI Models

Deprecated Models