Apache Kafka on Heroku

Last updated June 05, 2025

Kafka Concepts
Preparing Your Development Environment
Plans and Configurations
Provisioning the Add-On
Sharing Kafka Between Applications
Viewing Cluster Information
Upgrading an Apache Kafka on Heroku Plan
Managing Kafka
Kafka Versions and Clients
Connecting to a Kafka Cluster
Connecting to a Private or Shield Kafka Cluster From an External Resource
Monitoring Via Logs
Regions
Node Failure Exercise
Removing the add-on

Apache Kafka on Heroku is an add-on that provides Kafka as a service with full integration into the Heroku platform.

Apache Kafka is a distributed commit log for fast, fault-tolerant communication between producers and consumers using message-based topics. Kafka provides the messaging backbone for building a new generation of distributed applications capable of handling billions of events and millions of transactions. Kafka is designed to move large volumes of ephemeral data with a high degree of reliability and fault tolerance.

Kafka enables you to easily design and implement architectures for many important use cases, such as:

Elastic Queuing

Kafka makes it easy for systems to accept large volumes of inbound events without putting volatile scaling demands on downstream services. These downstream services can pull from event streams in Kafka when they have capacity, instead of being reactive to the “push” of events. This improves scaling, handling fluctuations in load, and general stability.

Data Pipelines & Analytics

Applications at scale often need analytics or ETL pipelines to get the most value from their data. Kafka’s immutable event streams enable developers to build highly parallel data pipelines for transforming and aggregating data. This means that developers can achieve much faster and more stable data pipelines than would have been possible with batch systems or mutable data.

Microservice Coordination

Many applications move to microservice-style architectures as they scale, and run up against the challenges that microservices entail: service discovery, dependencies and ordering of service availability, and service interaction. Applications that use Kafka for communication between services can simplify these design concerns dramatically. Kafka makes it easy for a service to bootstrap into the microservice network, and discover which Kafka topics to send and receive messages on. Ordering and dependency challenges are reduced, and topic-based coordination lowers the overhead of service discovery when messages between services are durably managed in Kafka.

Kafka Concepts

A Kafka cluster is composed of a number of brokers, or instances running Kafka. The number of brokers in a cluster can be scaled to increase capacity, resilience, and parallelism.

Producers are clients that write to Kafka brokers, while consumers are clients that read from Kafka brokers.

Brokers manage streams of messages (events sent to Kafka) in topics. Topics are configured with a range of options (retention or compaction, replication factor, etc) dependent on the data they’re meant to support.

Topics are composed of a number of partitions, discrete subsets of a topic used to balance the concerns of parallelism and ordering. Increased numbers of partitions can increase the number of producers and consumers that can work on a given topic, increasing parallelism and throughput. Messages within a partition are ordered, but the ordering of messages across partitions isn’t guaranteed. Balancing needs of parallelism and ordering is key to proper partition configuration for a topic.

As an example, consider a topic that uses a hash of a user ID as its partition key. This guarantees that any consumer sees updates for a given user in the order they occur, but updates for different users (potentially managed on different partitions) can arrive ahead of or behind each other.

Each message in a partition has an offset, or numeric identifier that denotes its position in the sequence. As of Kafka 0.10, messages can also have an optional timestamp, which can reflect either the time the message was created or the time the message was written to Kafka.

The Apache Kafka project provides a more in-depth discussion in their introduction documentation.

Preparing Your Development Environment

Apache Kafka on Heroku requires the use of our CLI plugin. This functionality will be merged into the Heroku CLI in the future, but for now, issue the following command:

$ heroku plugins:install heroku-kafka

The Kafka CLI plugin requires Python, and doesn’t work on Windows without additional configuration.

Install Python 2.7
Set PATH and PYTHONPATH for Python 2.7
Install node 8.x
Open cmd.exe as administrator, npm install --global windows-build-tools
Install .NET Framework
Install Visual C++ Build Tools
Run $ heroku plugins:install heroku-kafka

Local Testing and Development

Local testing and developing with Kafka can require some care, due to its clustered configuration. The kafka-docker setup offers a good way to run a local cluster, provided that it’s configured with a low enough memory footprint to allow for comfortable local operation. Heroku is working to provide a range of development-centric plans in the near future.

Plans and Configurations

A range of plans are currently available for the platform’s runtimes. The plans now available are dedicated clusters, optimized for high throughput and high volume. We continue to extend this range of plans to cover a broader set of needs, and to make evented architectures available for applications at all stages of development. You can find a list of all the plans on the Elements Marketplace.

Multi-tenant Plans

See Multi-Tenant Apache Kafka on Heroku for Basic multi-tenant plans.

Common Runtime Plans

Plan Name	Capacity	Max Retention	vCPU	RAM	Clusters
standard-0	150 GB	2 weeks	4	16 GB	3 kafka, 5 zookeeper
standard-1	300 GB	2 weeks	4	16 GB	3 kafka, 5 zookeeper
standard-2	900 GB	2 weeks	4	16 GB	3 kafka, 5 zookeeper
extended-0	400 GB	6 weeks	4	16 GB	8 kafka, 5 zookeeper
extended-1	800 GB	6 weeks	4	16 GB	8 kafka, 5 zookeeper
extended-2	2400 GB	6 weeks	4	16 GB	8 kafka, 5 zookeeper

Private Spaces Plans

Plan Name	Capacity	Max Retention	vCPU	RAM	Clusters
private-standard-0	150 GB	2 weeks	4	16 GB	3 kafka, 5 zookeeper
private-standard-1	300 GB	2 weeks	4	16 GB	3 kafka, 5 zookeeper
private-standard-2	900 GB	2 weeks	4	16 GB	3 kafka, 5 zookeeper
private-extended-0	400 GB	6 weeks	4	16 GB	8 kafka, 5 zookeeper
private-extended-1	800 GB	6 weeks	4	16 GB	8 kafka, 5 zookeeper
private-extended-2	2400 GB	6 weeks	4	16 GB	8 kafka, 5 zookeeper

Shield Spaces Plans

Plan Name	Capacity	Max Retention	vCPU	RAM	Clusters
shield-standard-0	150 GB	2 weeks	4	16 GB	3 kafka, 5 zookeeper
shield-standard-1	300 GB	2 weeks	4	16 GB	3 kafka, 5 zookeeper
shield-standard-2	900 GB	2 weeks	4	16 GB	3 kafka, 5 zookeeper
shield-extended-0	400 GB	6 weeks	4	16 GB	8 kafka, 5 zookeeper
shield-extended-1	800 GB	6 weeks	4	16 GB	8 kafka, 5 zookeeper
shield-extended-2	2400 GB	6 weeks	4	16 GB	8 kafka, 5 zookeeper

See the documentation on Migrating between Basic and dedicated plan types.

Provisioning the Add-On

Apache Kafka on Heroku is managed in the same manner as other add-ons on the platform. A Kafka cluster can be provisioned for a Heroku application via the CLI:

$ heroku addons:create heroku-kafka:standard-0 -a kafka-demo
Creating kafka-animated-39618... done
Adding kafka-animated-39618 to kafka-demo... done
The cluster should be available in 15-45 minutes.
Run `heroku kafka:wait` to wait until the cluster is ready.

Use `heroku addons:docs heroku-kafka` to view documentation.

New clusters take some time to become available because Kafka is a large-scale, highly available service. You can track the progress by typing heroku kafka:wait.

Usage of Zookeeper beyond its role in supporting Kafka isn’t recommended, as other uses can degrade the operational stability of your services. In Private Spaces (not the Common Runtime), access to the Zookeeper that is associated with Kafka can be enabled at add-on creation time. This can be done by passing an option with the add-on creation command: heroku addons:create heroku-kafka -- --enable-zookeeper. Zookeeper access can also be enabled or disabled after creation via the following commands: heroku kafka:zookeeper enable or heroku kafka:zookeeper disable.

In Shield Spaces, Zookeeper access isn’t allowed.

After Kafka has been provisioned, the Kafka config vars is available in the app configuration. This can be confirmed using the heroku config:get KAFKA_URL command. See Connecting to a Kafka cluster for how to connect to your cluster.

$ heroku config:get KAFKA_URL

Kafka is available in the Common Runtime, Private Spaces, and Shield Spaces. Provisioning Kafka for an application in a Private or Shield Space creates a Kafka cluster in an isolated data resource network attached to that Space.

Kafka works well when shared across many different code bases and projects within the same group. We recommend structuring your Kafka usage as a set of independent producers and consumers, set up as either multiple applications, or as process types of one or more applications.

$ heroku addons:attach my-originating-app::KAFKA -a this-app

Viewing Cluster Information

You can examine the current state of your cluster by typing:

$ heroku kafka:info

This command provides you with information on the resource’s name, creation date, plan, version, status, topics, traffic, and active consumers.

More detailed per-topic throughput information for your cluster is available via the following:

$ heroku kafka:topics

Maintenance

From time to time, Heroku performs maintenance tasks on an Apache Kafka for Heroku cluster. Typical tasks include updating the underlying infrastructure of the cluster. Heroku handles these maintenance tasks automatically. Heroku doesn’t schedule maintenance events during an app’s maintenance window. We send a notice when the maintenance starts and ends. See the Apache Kafka on Heroku Maintenance FAQ for more information.

Upgrading an Apache Kafka on Heroku Plan

You can upgrade your Apache Kafka on Heroku plan by using the heroku addons:upgrade command from the CLI.

This command can be used to:

Upgrade or downgrade between multi-tenant Kafka Basic plans.
Upgrade or downgrade between dedicated cluster plans.

See the differences between multi-tenant and dedicated clusters.

This command can’t be used to:

Upgrade or downgrade between a multi-tenant plan and a dedicated plan. This upgrade requires a migration.
Upgrade or downgrade between Common Runtime plans and Private or Shield tier plans.

Downgrading to a smaller plan also isn’t allowed if the Kafka cluster’s data size is over the limit of the plan you want to downgrade to.

To upgrade your Apache Kafka on Heroku plan, first, find the resource name of the Kafka cluster you want to upgrade. The resource name is a globally unique name of the cluster across all of your apps and add-ons:

$ heroku kafka:info -a example-app
=== kafka-animated-12345
Plan:                 heroku-kafka:standard-0
Status:               available
Version:              3.7.1
Created:              2022-11-30T13:02:37.320+00.00
Topics:               84 topics, see heroku kafka:topics
Partitions:           [··········] 414 / 12000 partition replicas (partitions × replication factor)
Messages:             0 messages/s
Traffic:              32 bytes/s in / 166 bytes/s out
Data Size:            [··········] 68.38 MB / 150.00 GB (0.04%)
Add-on:               kafka-convex-12345

In this following example, kafka-animated-12345 is upgraded from a Standard-0 plan to a Extended-0 plan. Remember, kafka-animated-12345 is the name of the Kafka cluster and not the application in this case:

$ heroku addons:upgrade kafka-animated-12345 extended-1 -a example-app
Changing kafka-convex-12345 on example-app from heroku-kafka:standard-0 to heroku-kafka:extended-0... done, ($1800.00/month)
Kafka cluster is being upgraded, and will be ready shortly.
Please use `heroku kafka:wait` to monitor the status of your upgrade.

The process of scaling up or down plan levels of Apache Kafka on Heroku is performed in-place. However, there are a few circumstances where actual data migration is required. If there are new brokers to add or remove during an upgrade, we rebalance the partitions between them.

You can follow the upgrade process using heroku kafka:wait:

$ heroku kafka:wait -a example-app
Waiting for cluster kafka-convex-12345... ⡿ upgrading

Alternatively, you can also use heroku kafka:info:

$ heroku kafka:info -a example-app
=== kafka-animated-12345
Plan:                 heroku-kafka:extended-0
Status:               upgrading
Version:              3.7.1
Created:              2022-11-30T13:02:37.320+00.00
Topics:               84 topics, see heroku kafka:topics
Partitions:           [··········] 414 / 12000 partition replicas (partitions × replication factor)
Messages:             0 messages/s
Traffic:              25 bytes/s in / 136 bytes/s out
Data Size:            [··········] 68.38 MB / 150.00 GB (0.04%)
Add-on:               kafka-convex-12345

The time it takes to complete an upgrade depends on the difference of the plans and the size of the stream volume. If the upgrade or downgrade is between levels of the same tier (for example, standard-0 to standard-1), the upgrade is almost immediate. If the upgrade or downgrade is between different tiers (for example, standard to extended), we create or remove the extra brokers that each plan offers and rebalance partitions between the final number of brokers. There’s no downtime in this process, but does take time to complete depending on the size of the cluster.

Managing Kafka

Topics are the structured representation of messages, and serve as the intermediary between producers and consumers. Aside from the name of a topic in Kafka, there are a few configurable properties that define how data flows through a topic.

The properties include the replication factor, the number of logical partitions, and either compaction or time-based retention. The Heroku-provided default settings are suitable for many applications, but for topics expected to handle billions of events per day, consider doing further research before entering production.

Configuration Defaults and Limits

Parameter	Default	Lower Limit	Upper Limit
Replication	3	3	Number of brokers in cluster
Retention Period	24 hours	6 hours	standard: 2 weeks, extended: 6 weeks
Partitions per Topic	32	1	256
Partitions per Cluster	NA	NA	4000 x Number of brokers in cluster

Care in topic design is encouraged. Parameters like retention or compaction can be changed relatively easily, and replication can be changed with some additional care, but partitions can’t currently be changed after creation. Compaction and time-based retention are mutually exclusive configurations for a given topic, though different topics within a cluster can have a mix of these configurations.

Understanding Topics

Kafka topics can be created and managed via the web dashboard and the CLI. This section covers the basics of CLI-based topic management.

Full CLI documentation can always be accessed through the Heroku CLI itself:

$ heroku help kafka

Automatic topic creation, or “create topic on first write,” isn’t currently available on Heroku.

You can create and destroy a topic with the following CLI command:

$ heroku kafka:topics:create my-cool-topic --partitions 100
$ heroku kafka:topics:destroy my-cool-topic

You can list all topics on a cluster with the following CLI command:

$ heroku kafka:topics

You can examine information about a topic with the following CLI command:

$ heroku kafka:topics:info my-cool-topic

To facilitate testing and inspection of topics, you can write to or tail topics from the CLI.

kafka:write and kafka:tail only work in Private and Shield Spaces if you created IP rules for allowed sources.

You can write a new message to a topic with the following CLI command:

$ heroku kafka:topics:write my-cool-topic MESSAGE

You can subscribe to a topic and read new messages from it with the following CLI command:

$ heroku kafka:topics:tail my-cool-topic

Basic multi-tenant Kafka plans require a prefix on topics and consumer groups. See the differences between multi-tenant and dedicated. When integrating Kafka consumers, ensure topics and consumer groups are prefixed with the value of the KAFKA_PREFIX environment variable. Otherwise, messages aren’t received, and errors like Broker: Topic authorization failed or Broker: Group authorization failed can appear in Kafka debug events.

Understanding Partitions

Kafka topics are configured with a number of logical partitions. These divide the log into shards; each capable of being independently distributed around the cluster and consumed from.

However, Kafka’s ordering guarantee only applies within an individual partition. This means that messages are consumed in the order they’re produced, but can be interleaved if they span multiple partitions.

Most consumer libraries allocate a single consumer thread per partition. Therefore the number of partitions you choose for your topics, and how you deliver messages to them, can be crucial to the scalability of your application.

We recommend using higher numbers of partitions if your consumers are relatively “slow” compared to your producers (for example, if they’re writing into an external database). As per the section previously on defaults and limits, the current plans have a default of 32 partitions per topic, a maximum of 256 partitions per topic, and a maximum of 4000 partitions times the number of brokers in a cluster (for example, 12,000 partitions for standard tier clusters, and 32,000 partitions for extended tier clusters).

Cleanup Policy

Kafka supports two primary modes of cleanup on topics: time-based retention, and log compaction. As of version 0.10.1.0, Kafka supports the mixed use of these modes, for example, a topic can have time-based retention, compaction, or both modes enabled.

Time-Based Retention

This mode is the default mode of cleanup for Kafka topics. As messages are written to the partitions of a topic, they’re annotated with time values and written to log segments. The log segments are then periodically processed, and messages that have outlived the retention window are cleaned up and removed.

$ heroku kafka:topics:retention-time my-cool-topic '36 hours'

Apache Kafka on Heroku has a minimum retention time of 6 hours, and a maximum of 2 weeks for standard plans and 6 weeks for extended plans. The default retention time when creating a new topic is 24 hours.

Log Compaction

Kafka supports an alternative configuration on topics known as log compaction. This configuration changes the semantics of a topic such that it keeps only the most recent message for a given key, tombstoning any predecessor. This allows for the creation of a value-stream, or table-like view of data, and is a powerful construct in modeling your data and systems.

It’s important to note that compacted topics (without time-based retention enabled) don’t automatically reclaim all storage space over time. The most recent version of a message for a given key persists until it’s actively tombstoned, either by a new message of that key being written to the topic, or an explicit tombstone being written for that key. This can cause compacted topics with unbounded keyspaces to experience unbounded growth over time, driving unexpected resource utilization. Compacted topics must be used and monitored carefully, in order to stay within plan limits.

It’s also important to note that older messages of a given key, though tombstoned, aren’t removed until the log-cleaner process clears them. This process is asynchronous, and multiple versions of a given key remain in the log until the current segment is processed.

$ heroku kafka:topics:compaction my-cool-topic enable

Replication

The replication factor of a topic determines the number of replicas or copies that are maintained across the brokers in the cluster. Replication provides additional durability and fault tolerance to the data in the Kafka cluster, while also increasing the volume of data managed over the cluster.

Kafka is configured to require a minimum replication factor of 3 on the standard and extended plans. Replication can be set as high as the number of brokers in the Kafka cluster.

$ heroku kafka:topics:replication-factor my-cool-topic 3

Increasing the replication factor of an existing topic can put additional load on your Kafka cluster, as it works to create additional replicas across the available brokers. This also increases the size of the data maintained in the cluster, which must be considered in the context of the plan capacity.

Producer Acknowledgment Configuration

Related to the number of replicas is the producer acknowledgment configuration, or producer acks. This determines how many in sync replicas must acknowledge a write before it’s considered successful. This setting, in conjunction with the replication factor, influences the latency and durability guarantees of your writes. This configuration resides in the application code you use to produce to Kafka, but is important to consider alongside your replication.

A configuration of acks=0 means that a producer doesn’t wait for any confirmation from the broker that it has attempted to write to before continuing on. This can reduce latency in producers, but no guarantees are made that the server recorded the message, and clients aren’t able to retry their writes.

A configuration of acks=1 means that a producer waits for confirmation of write from only the broker that the message is initially written to. Producer retries can still happen in this configuration, but data loss can still occur if the broker instance fails before the data is sync’d to replicas on other topics.

A configuration of acks=all or acks=-1 requires that all replicas are in sync and acknowledge the write before continuing. A high replication factor can increase the latency that this entails, but this yields the strongest guarantee of data resilience for writes.

Understanding ACLs

For customers using single-tenant (standard and extended) Apache Kafka On Heroku plans, we have a published, supported ACL policy. We don’t change these ACLs going forward without a new changelog update:

Your principal gets access to the following:

Read,Write,Describe on name * for resource type Topic
Read,Write,Describe,Delete on name * for resource type Group
Describe,Write on name * for resource type TransactionalId

Kafka Versions and Clients

Heroku currently offers Apache Kafka version 3.7.1 as the default.

Available Kafka Versions

Major Version	Minor Version	Status	EOL Date
2.7	2.7.1	EOL	2022-06-15
2.8	2.8.2	Available	TBD
3.7	3.7.1	Available	TBD

As a general rule, the version of the client library used must be equal to or less than the version on the cluster.

Kafka supports SSL to encrypt and authenticate connections, and this is the only connection mode supported in the Common Runtime. Because of this, you must use a library that supports SSL encryption and client certificates. In Private Spaces, plaintext connections can optionally be used, as described below. In Shield Spaces, plaintext connections aren’t allowed.

Version Lifecycle

The Apache Kafka project releases a new major version approximately every four months (3x/year). In accordance with this release schedule, we make all versions published in the last year available for new add-ons. We only support Apache Kafka versions that the upstream project maintains. One year after the most recent point release is available, we mark that major version deprecated and stop allowing new add-ons to use this version.

After a version is deprecated, your cluster continues to operate normally. However, we have found that running older versions is risky as deprecated versions don’t receive bug fixes or security patches and are no longer supported by the community. Heroku notifies customers via email about the deprecation process for their affected clusters.

We recommend that users regularly evaluate their add-on version and plan to upgrade their cluster at least once a year. By keeping up with this schedule, your cluster receives important bug fixes and notable improvements in reliability.

Upgrading Kafka Versions

To upgrade the version of a dedicated Kafka cluster, use the following from the command line:

$ heroku kafka:upgrade --version MAJOR_VERSION_NUMBER

It’s important to note that the upgrade command advances the version to the latest supported stable minor version. For example, currently, heroku kafka:upgrade --version 3.7 upgrades a cluster to version 3.7.1.

This command upgrades the Kafka brokers in the cluster to the new version. Upgrading the cluster involves several process restarts of the brokers, but those restarts are done one at a time. Assuming your application can handle broker restarts, the upgrade is relatively seamless.

We recommend reading the article about robust usage of Kafka to ensure proper handling of broker restarts. During the upgrade period, your cluster is running mixed versions. For example, you can have one broker on 0.10.2.1 and two brokers on 1.0.2.

Kafka strictly promises backward protocol compatibility: that is, you can always use a client protocol version older than the version(s) your cluster is running. You can’t however, use a newer client version than the one your cluster is running.

During the upgrade, you must keep your client on a version equal or lower than the version you’re upgrading from. After the upgrade is finished (denoted by the status in heroku kafka:wait), you can start using a new protocol version and any new features it supports. It isn’t required (but is recommended) to keep your client on the same protocol version your cluster is running.

It’s recommended you keep your clusters up to date with the latest recommended version from Heroku, which is 3.7.1 at this time. We perform heavy testing and validation work on all new Kafka releases, including testing the upgrade procedure, and are careful to only recommend versions that we trust.

Language Support

There are a great number of client libraries for Kafka, across many languages. Those listed following are libraries that we’ve either helped to improve, or that we feel have up-to-date support for Kafka features, including critical capabilities like support for SSL.

Basic multi-tenant Kafka plans require a prefix on topics and consumer groups. See the differences between multi-tenant and dedicated. When integrating Kafka consumers, ensure topics and consumer groups are prefixed with the value of the KAFKA_PREFIX environment variable. Otherwise, messages aren’t received, and errors like Broker: Topic authorization failed or Broker: Group authorization failed can appear in Kafka debug events.

Using Kafka in Ruby Applications

We recommend using rdkafka-ruby when connecting to Kafka from Ruby.

This example shows how to write and consume messages in Ruby.

When using rdkafka-ruby, you must specify "enable.ssl.certificate.verification" => false in your client configuration to connect to Apache Kafka on Heroku.

Using Kafka in Java Applications

We recommend using kafka-clients when connecting to Kafka from Java. See the Kafka project documentation for more information.

This example shows how to write and consume messages in Java. This section of the demo app provides a good example of working with the TrustStore and KeyStore for JVM applications.

On recent versions of the Java Kafka clients, you must specify ssl.endpoint.identification.algorithm= (empty string) to connect to Apache Kafka on Heroku.

Using Kafka in Go Applications

We recommend using sarama when connecting to Kafka from Go.

This example shows how to write and consume messages in Go.

Due to the strictness of the tls package in Go, you must set InsecureSkipVerify to true.

Using Kafka in Python Applications

We recommend using confluent-kafka-python when connecting to Kafka in Python. This client library wraps the C/C++ Kafka library.

We also recommend the kafka-python library, especially for scenarios where wrapping the C/C++ libraries is less than ideal. Heroku has created the kafka-helper library to make kafka-python easier to use.

Using Kafka in Node.js Applications

We recommend using kafkajs when connecting to Kafka from Node.js applications.

Using Kafka in PHP Applications

The rdkafka extension for PHP is available on Heroku. It provides bindings for the librdkafka library and supports SSL.

Using Kafka in Other Languages or Frameworks

The Confluent Client wiki is a good source for clients and code examples for other languages and frameworks.

Connecting to a Kafka Cluster

All connections to Kafka support SSL encryption and authentication. If the cluster is provisioned in a Private Space, you have the option to connect via plaintext. In Shield Spaces, plaintext connections aren’t allowed.

Connecting over SSL means all traffic is encrypted and authenticated via a SSL client certificate.

To connect over SSL, use the following environment variables:

As with all Heroku add-ons, important environment and configuration variables can change. It’s important to design your application to handle updates to these values, especially if these resources are being accessed from outside of Heroku.

KAFKA_URL: A comma-separated list of SSL URLs to the Kafka brokers making up the cluster.
KAFKA_TRUSTED_CERT: The brokers’ SSL certificate (in PEM format), to check that you’re connecting to the right servers.
KAFKA_CLIENT_CERT: The required client certificate (in PEM format) to authenticate clients against the broker.
KAFKA_CLIENT_CERT_KEY: The required client certificate key (in PEM format) to authenticate clients against the broker.

Kafka clusters require authenticating using the provided client certificate. Any requests not using the client certificate are denied.

KAFKA_PREFIX is only provided for basic plans. Don’t add the environment variable for non-basic plans. If you’re using a basic plan, see Multi-Tenant Apache Kafka on Heroku.

Rotating Credentials

It’s a good security practice to rotate the credentials for important services on a regular basis. On Apache Kafka on Heroku, this can be done with heroku kafka:credentials --reset.

$ heroku kafka:credentials HEROKU_KAFKA_GRAY_URL --reset

When you issue this command, the following steps are executed:

A new credential is generated for your cluster.
Existing topics and consumer groups receive new ACLs to allow the new client certificate credential.
When the new credentials are ready, we update the related config vars on your Heroku application.
For a period of 5 minutes, both the old client certificate and new client certificate remain valid. This allows time for the new client certificate to cycle in to your app.
After 5 minutes, the old client certificate credential is expired and no longer valid.

Connecting to a Private or Shield Kafka Cluster From an External Resource

See the “Connecting to a Private or Shield Kafka cluster from an External Resource” article.

Monitoring Via Logs

For dedicated cluster plans such as standard or extended plans, you can observe Kafka activity within your Heroku app’s log stream. Additional logging options for basic plans will be delivered in the future.

Kafka Logs

Kafka logs are visible with heroku logs.

You can filter to see the logs from a specific Apache Kafka on Heroku add-on. Use the --tail option to access real-time log entries.

For Cedar-generation apps:

$ heroku logs -p kafka-globular-94829 -tail -a example-app

For Fir-generation apps:

$ heroku logs -s heroku-kafka -a example-app

Heroku delivers log lines from your Kafka cluster with the WARN, ERROR, or FATAL levels to your app’s log stream. These log lines look like this:

2022-06-07T22:46:31+00:00 kafka[kafka-globular-94829.0]: pri=WARN  t=ReplicaFetcherThread-0-2 at=ReplicaFetcherThread [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Partition my-cool-topic-16 marked as failed

Kafka Metrics

Kafka metrics are written to the log stream withthe [heroku-kafka] prefix. Metrics emitted for specific brokers in the cluster are written as [heroku-kafka.N], where N is the broker id of the node responsible for the log line.

$ heroku logs --tail --ps heroku-kafka

2024-04-15T14:53:16.000000+00:00 app[heroku-kafka.2]: source=KAFKA addon=kafka-flat-32368 sample#load-avg-1m=0.005 sample#load-avg-5m=0.005 sample#load-avg-15m=0 sample#read-iops=0 sample#write-iops=0.18462 sample#memory-total=16098868kB sample#memory-free=5667652kB sample#memory-cached=5290740kB sample#bytes-in-per-second=0.0 sample#bytes-out-per-second=0.0

These metrics apply to an individual node in your cluster.

sample#bytes-in-per-second: The number of bytes ingested by your cluster per second. This factors in replication, so you see more bytes per second here than your producers send.
sample#bytes-out-per-second: The number of bytes output by your cluster per second. This factors in replication, so you see more bytes per second here than your consumers read.

Server Metrics

These metrics come directly from the server’s operating system.

sample#load-avg-1m, sample#load-avg-5m and sample#load-avg-15m: The average system load over a period of 1 minute, 5 minutes, and 15 minutes, divided by the number of available CPUs. A load-avg of 1.0 indicates that, on average, processes were requesting CPU resources for 100% of the timespan. This number includes I/O wait.
sample#read-iops and sample#write-iops: Number of read or write operations in I/O sizes of 16-KB blocks.
sample#memory-total: Total amount of server memory in use, in KB. This includes memory used by all Kafka processes, OS memory, and disk cache.
sample#memory-free: Amount of free memory available in KB.
sample#memory-cached: Amount of memory being used the OS for page cache, in KB.
sample#memory-percentage-used: Percentage of server memory used on the cluster, between 0.0–1.0.

Regions

Kafka is available in all regions currently supported by Heroku.

Heroku distributes Kafka brokers across network availability zones. It takes advantage of Kafka’s rack-aware partition assignment, in order to provide increased resilience to system or hardware failure. Some regions have different numbers of network availability zones, though, which can necessitate additional care in order to achieve the desired fault tolerance.

Node Failure Exercise

Distributed databases are designed to operate despite node failure. Unfortunately, while the database can remain available during a node failure, performance and other characteristics can be degraded. Adding nodes to a heavily loaded cluster results in similar behavior, as load is incurred while data is replicated to the new node. Apache Kafka on Heroku offers a CLI tool that can be used to cause one of the nodes in your cluster to fail.

This actually causes one of the nodes in your cluster to fail.

$ heroku kafka:fail

You can track the recovery progress by typing heroku kafka:wait.

During a failure, our automated systems work to restore normal operations. We recommend you verify that your application operates successfully under heavy load with a failed node. We recommend validating this in a staging environment.

The catastrophic flag doesn’t only reboot a node, it actually destroys a node and replaces it in the cluster. This places substantial additional read traffic on the other nodes in your cluster while the replacement node resynchronizes its state.

$ heroku kafka:fail --catastrophic

Removing the add-on

Kafka can be removed via the CLI.

This destroys all associated data and can’t be undone!

$ heroku addons:destroy heroku-kafka
-----> Removing heroku-kafka from kafka-demo... done, v20 (free)

Keep reading