Managed Inference and Agents API with Cohere Embed V4

Table of Contents [expand]

When to Use This Model
Usage
Example curl Requests

Last updated February 19, 2026

Cohere Embed v4 is an advanced embedding model that can convert text into dense vector representations. You can compare these resulting vectors to accomplish tasks like similarity search.

Model ID: cohere-embed-v4
Regions: us, eu

When to Use This Model

Cohere Embed v4 is ideal for retrieval-augmented generation (RAG) tasks, where you search and retrieve relevant documents based on natural-language queries. This model is also useful for building recommendation systems and classification tools that require consistent text embeddings.

Usage

Cohere Embed v4 follows our /v1/embeddings API schema.

This model is included as part of the standard plan for the Managed Inference and Agents add-on.

If you already have the standard plan provisioned, you do not need to create a new add-on. You can start using Cohere Embed v4 immediately by changing the "model" parameter in your API calls to "cohere-embed-v4".

If you are provisioning the add-on for the first time, run the following command to attach the standard plan to your app ($APP_NAME):

heroku addons:create heroku-inference:standard -a $APP_NAME

If you need to provision multiple instances of this add-on for the same app, you can specify a custom attachment name. See the Managing Add-ons article for details on using the --as flag.

Using config variables, you can invoke the model in various ways:

Heroku CLI ai plugin (heroku ai:models:call)
curl
Python
Ruby
JavaScript

Multimodal Support

Supported inputs: text
Supported outputs: embedding

Rate Limits

Maximum requests per minute: 500
Maximum tokens per minute: 800,000

Example curl Requests

To retrieve and export your API credentials:

export INFERENCE_KEY=$(heroku config:get -a $APP_NAME INFERENCE_KEY)
export INFERENCE_URL=$(heroku config:get -a $APP_NAME INFERENCE_URL)

Text to Embedding

curl $INFERENCE_URL/v1/embeddings \
 -H "Authorization: Bearer $INFERENCE_KEY" \
 -d @- <<EOF
{
  "input": ["Hello, I am a blob of text.", "How's the weather in Portland?"],
  "model": "cohere-embed-v4",
  "input_type": "search_document",
  "encoding_format": "raw"
}
EOF

Categories