AI Models

The Heroku Managed Inference and Agent add-on supports the following models. The add-on is hosted in two regions: us and eu. However, the add-on can be provisioned and accessed from apps in any Heroku region. Select a model to view information on rate limits, prompt caching, and implementation.

Model Documentation Model ID Region Supported Inputs Supported Outputs API Endpoint Model Source Description
Amazon Rerank 1.0 amazon-rerank-1-0 US, EU text score /v1/rerank Amazon A reliable, high-performing reranking model backed by AWS infrastructure.
Nova Lite nova-lite US, EU text, image, video text /v1/chat/completions Amazon A fast and cost-effective LLM.
Nova 2 Lite nova-2-lite US, EU text, image, video text /v1/chat/completions Amazon A fast and cost-effective LLM that supports conversational chat, tool-calling, and advanced reasoning with extended context.
Nova Pro nova-pro US, EU text, image, video text /v1/chat/completions Amazon A high-performance LLM designed for complex tasks.
Claude 3 Haiku claude-3-haiku EU text, image text /v1/chat/completions Anthropic A fast and affordable LLM that supports chat and tool-calling.
Claude 3.5 Haiku claude-3-5-haiku US, EU text, image text /v1/chat/completions Anthropic An affordable and straightforward LLM that supports chat and tool-calling.
Claude 3.5 Sonnet Latest claude-3-5-sonnet-latest US, EU text, image text /v1/chat/completions Anthropic A fast and affordable LLM that supports chat and tool-calling.
Claude 3.7 Sonnet claude-3-7-sonnet US, EU text, image text /v1/chat/completions Anthropic An intelligent and detail-oriented LLM that supports chat, tool-calling, and enhanced reasoning.
Claude 4 Sonnet claude-4-sonnet US, EU text, image text /v1/chat/completions Anthropic An intelligent and detail-oriented LLM that supports chat, tool-calling, and enhanced reasoning.
Claude 4.5 Haiku claude-4-5-haiku US, EU text, image text /v1/chat/completions Anthropic A state-of-the-art LLM that supports chat, tool-calling, and enhanced reasoning.
Claude 4.5 Sonnet claude-4-5-sonnet US, EU text, image text /v1/chat/completions Anthropic A state-of-the-art LLM optimized for enterprise apps that supports chat, tool-calling, and enhanced reasoning.
Claude Sonnet 4.6 claude-sonnet-4-6 US, EU text, image text /v1/chat/completions Anthropic A state-of-the-art LLM designed for complex tasks including data processing, sales forecasting, and content generation.
Claude Opus 4.5 claude-opus-4-5 US, EU text, image text /v1/chat/completions Anthropic A next-generation, frontier LLM that supports chat, tool-calling, autonomous coding, effort control, and enhanced reasoning.
Claude Opus 4.6 claude-opus-4-6 US, EU text, image text /v1/chat/completions Anthropic A next-generation, frontier LLM that supports chat, tool-calling, autonomous coding, effort control, and enhanced reasoning.
Cohere Embed Multilingual cohere-embed-multilingual US, EU text embedding /v1/embeddings Cohere A state-of-the-art embedding model that supports multiple languages and can be helpful for developing RAG search.
Cohere Embed V4 cohere-embed-v4 US, EU text embedding /v1/embeddings Cohere A state-of-the-art embedding model that supports over 100 languages and can be helpful for developing RAG search.
Cohere Rerank 3.5 cohere-rerank-3-5 US, EU text score /v1/rerank Cohere A reranking model that offers enhanced reasoning, broad data compatibility, and multilingual support.
DeepSeek V3.2 deepseek-v3-2 US text text /v1/chat/completions DeepSeek An open-weight LLM that supports conversational chat, tool-calling, and high-efficiency reasoning.
MiniMax M2 minimax-m2 US text text /v1/chat/completions MiniMax An open-weight LLM that supports conversational chat, tool-calling, and programming tasks.
MiniMax M2.1 minimax-m2-1 US text text /v1/chat/completions MiniMax An open-weight LLM that supports conversational chat, tool-calling, and long-horizon reasoning.
Kimi K2 Thinking kimi-k2-thinking US text text /v1/chat/completions Moonshot AI An open-weight LLM that supports conversational chat, tool-calling, and chain-of-thought processing.
Kimi K2.5 kimi-k2-5 US text text /v1/chat/completions Moonshot AI An open-weight LLM that supports conversational chat, tool-calling, and multimodal agentic workflows.
OpenAI gpt-oss-120b gpt-oss-120b US, EU text text /v1/chat/completions OpenAI An open-weight LLM that supports chat and tool-calling.
Qwen3 235B qwen3-235b US text text /v1/chat/completions Qwen An open-weight LLM that supports conversational chat, tool-calling, complex reasoning, and agentic coding.
Qwen3 Coder 480B qwen3-coder-480b US text text /v1/chat/completions Qwen An open-weight LLM that supports conversational chat, tool-calling, and agentic coding.
Stable Image Ultra stable-image-ultra US, EU text image /v1/images/generations Stability AI A state-of-the-art diffusion (image generation) model.
GLM 4.7 glm-4-7 US text text /v1/chat/completions Z.ai An open-weight LLM that supports conversational chat, tool-calling, and stable multi-step reasoning.
GLM 4.7 Flash glm-4-7-flash US text text /v1/chat/completions Z.ai An open-weight LLM that supports conversational chat, tool-calling, and low-latency agentic tasks.

Deprecated Models

The following models are being deprecated and will reach end-of-life on the dates listed below. During the deprecation period, requests to these models return a warning header. Prior to the EOL date, model-specific plans for deprecated models will be converted to the standard plan. After the EOL date, requests to these models return HTTP 410.

Model Model ID Deprecation Date EOL Date Replacement
Claude 3.5 Sonnet Latest claude-3-5-sonnet-latest January 22, 2026 February 22, 2026 claude-4-6-sonnet
Claude 3.7 Sonnet claude-3-7-sonnet March 21, 2026 April 21, 2026 claude-4-6-sonnet
Claude 3.5 Haiku claude-3-5-haiku May 12, 2026 June 12, 2026 claude-4-5-haiku