Apache Kafka on Heroku
Last updated September 25, 2024
Table of Contents
- Kafka Concepts
- Preparing Your Development Environment
- Plans and Configurations
- Provisioning the Add-On
- Sharing Kafka Between Applications
- Viewing Cluster Information
- Upgrading an Apache Kafka on Heroku Plan
- Managing Kafka
- Kafka Versions and Clients
- Connecting to a Kafka Cluster
- Connecting to a Private or Shield Kafka Cluster From an External Resource
- Monitoring Via Logs
- Regions
- Node Failure Exercise
- Removing the add-on
Apache Kafka on Heroku is an add-on that provides Kafka as a service with full integration into the Heroku platform.
Apache Kafka is a distributed commit log for fast, fault-tolerant communication between producers and consumers using message-based topics. Kafka provides the messaging backbone for building a new generation of distributed applications capable of handling billions of events and millions of transactions. Kafka is designed to move large volumes of ephemeral data with a high degree of reliability and fault tolerance.
Kafka enables you to easily design and implement architectures for many important use cases, such as:
Elastic Queuing
Kafka makes it easy for systems to accept large volumes of inbound events without putting volatile scaling demands on downstream services. These downstream services can pull from event streams in Kafka when they have capacity, instead of being reactive to the “push” of events. This improves scaling, handling fluctuations in load, and general stability.
Data Pipelines & Analytics
Applications at scale often need analytics or ETL pipelines to get the most value from their data. Kafka’s immutable event streams enable developers to build highly parallel data pipelines for transforming and aggregating data. This means that developers can achieve much faster and more stable data pipelines than would have been possible with batch systems or mutable data.
Microservice Coordination
Many applications move to microservice-style architectures as they scale, and run up against the challenges that microservices entail: service discovery, dependencies and ordering of service availability, and service interaction. Applications that use Kafka for communication between services can simplify these design concerns dramatically. Kafka makes it easy for a service to bootstrap into the microservice network, and discover which Kafka topics to send and receive messages on. Ordering and dependency challenges are reduced, and topic-based coordination lowers the overhead of service discovery when messages between services are durably managed in Kafka.
Kafka Concepts
A Kafka cluster is composed of a number of brokers, or instances running Kafka. The number of brokers in a cluster can be scaled to increase capacity, resilience, and parallelism.
Producers are clients that write to Kafka brokers, while consumers are clients that read from Kafka brokers.
Brokers manage streams of messages (events sent to Kafka) in topics. Topics are configured with a range of options (retention or compaction, replication factor, etc) dependent on the data they’re meant to support.
Topics are composed of a number of partitions, discrete subsets of a topic used to balance the concerns of parallelism and ordering. Increased numbers of partitions can increase the number of producers and consumers that can work on a given topic, increasing parallelism and throughput. Messages within a partition are ordered, but the ordering of messages across partitions isn’t guaranteed. Balancing needs of parallelism and ordering is key to proper partition configuration for a topic.
As an example, consider a topic that uses a hash of a user ID as its partition key. This guarantees that any consumer sees updates for a given user in the order they occur, but updates for different users (potentially managed on different partitions) can arrive ahead of or behind each other.
Each message in a partition has an offset, or numeric identifier that denotes its position in the sequence. As of Kafka 0.10, messages can also have an optional timestamp, which can reflect either the time the message was created or the time the message was written to Kafka.
The Apache Kafka project provides a more in-depth discussion in their introduction documentation.
Preparing Your Development Environment
Apache Kafka on Heroku requires the use of our CLI plugin. This functionality will be merged into the Heroku CLI in the future, but for now, issue the following command:
$ heroku plugins:install heroku-kafka
The Kafka CLI plugin requires Python, and doesn’t work on Windows without additional configuration.
- Install Python 2.7
- Set PATH and PYTHONPATH for Python 2.7
- Install node 8.x
- Open cmd.exe as administrator,
npm install --global windows-build-tools
- Install .NET Framework
- Install Visual C++ Build Tools
- Run
$ heroku plugins:install heroku-kafka
Local Testing and Development
Local testing and developing with Kafka can require some care, due to its clustered configuration. The kafka-docker
setup offers a good way to run a local cluster, provided that it’s configured with a low enough memory footprint to allow for comfortable local operation. Heroku is working to provide a range of development-centric plans in the near future.
Plans and Configurations
A range of plans are currently available for the platform’s runtimes. The plans now available are dedicated clusters, optimized for high throughput and high volume. We continue to extend this range of plans to cover a broader set of needs, and to make evented architectures available for applications at all stages of development.
Common Runtime Plans
Plan Name | Capacity | Max Retention | vCPU | RAM | Clusters |
---|---|---|---|---|---|
standard-0 | 150 GB | 2 weeks | 4 | 16 GB | 3 kafka, 5 zookeeper |
standard-1 | 300 GB | 2 weeks | 4 | 16 GB | 3 kafka, 5 zookeeper |
standard-2 | 900 GB | 2 weeks | 4 | 16 GB | 3 kafka, 5 zookeeper |
extended-0 | 400 GB | 6 weeks | 4 | 16 GB | 8 kafka, 5 zookeeper |
extended-1 | 800 GB | 6 weeks | 4 | 16 GB | 8 kafka, 5 zookeeper |
extended-2 | 2400 GB | 6 weeks | 4 | 16 GB | 8 kafka, 5 zookeeper |
Private Spaces Plans
Plan Name | Capacity | Max Retention | vCPU | RAM | Clusters |
---|---|---|---|---|---|
private-standard-0 | 150 GB | 2 weeks | 4 | 16 GB | 3 kafka, 5 zookeeper |
private-standard-1 | 300 GB | 2 weeks | 4 | 16 GB | 3 kafka, 5 zookeeper |
private-standard-2 | 900 GB | 2 weeks | 4 | 16 GB | 3 kafka, 5 zookeeper |
private-extended-0 | 400 GB | 6 weeks | 4 | 16 GB | 8 kafka, 5 zookeeper |
private-extended-1 | 800 GB | 6 weeks | 4 | 16 GB | 8 kafka, 5 zookeeper |
private-extended-2 | 2400 GB | 6 weeks | 4 | 16 GB | 8 kafka, 5 zookeeper |
Shield Spaces Plans
Plan Name | Capacity | Max Retention | vCPU | RAM | Clusters |
---|---|---|---|---|---|
shield-standard-0 | 150 GB | 2 weeks | 4 | 16 GB | 3 kafka, 5 zookeeper |
shield-standard-1 | 300 GB | 2 weeks | 4 | 16 GB | 3 kafka, 5 zookeeper |
shield-standard-2 | 900 GB | 2 weeks | 4 | 16 GB | 3 kafka, 5 zookeeper |
shield-extended-0 | 400 GB | 6 weeks | 4 | 16 GB | 8 kafka, 5 zookeeper |
shield-extended-1 | 800 GB | 6 weeks | 4 | 16 GB | 8 kafka, 5 zookeeper |
shield-extended-2 | 2400 GB | 6 weeks | 4 | 16 GB | 8 kafka, 5 zookeeper |
A list of all plans available can be found here.
Provisioning the Add-On
Apache Kafka on Heroku is managed in the same manner as other add-ons on the platform. A Kafka cluster can be provisioned for a Heroku application via the CLI:
$ heroku addons:create heroku-kafka:standard-0 -a kafka-demo
Creating kafka-animated-39618... done
Adding kafka-animated-39618 to kafka-demo... done
The cluster should be available in 15-45 minutes.
Run `heroku kafka:wait` to wait until the cluster is ready.
Use `heroku addons:docs heroku-kafka` to view documentation.
New clusters take some time to become available because Kafka is a large-scale, highly available service. You can track the progress by typing heroku kafka:wait
.
Usage of Zookeeper beyond its role in supporting Kafka isn’t recommended, as other uses can degrade the operational stability of your services. In Private Spaces (not the Common Runtime), access to the Zookeeper that is associated with Kafka can be enabled at add-on creation time. This can be done by passing an option with the add-on creation command: heroku addons:create heroku-kafka -- --enable-zookeeper
. Zookeeper access can also be enabled or disabled after creation via the following commands: heroku kafka:zookeeper enable
or heroku kafka:zookeeper disable
.
In Shield Spaces, Zookeeper access isn’t allowed.
After Kafka has been provisioned, the Kafka config vars is available in the app configuration. This can be confirmed using the heroku config:get KAFKA_URL
command.
See Connecting to a Kafka cluster for how to connect to your cluster.
$ heroku config:get KAFKA_URL
Kafka is available in the Common Runtime, Private Spaces, and Shield Spaces. Provisioning Kafka for an application in a Private or Shield Space creates a Kafka cluster in an isolated data resource network attached to that Space.
Sharing Kafka Between Applications
Kafka works well when shared across many different code bases and projects within the same group. We recommend structuring your Kafka usage as a set of independent producers and consumers, set up as either multiple applications, or as process types of one or more applications.
$ heroku addons:attach my-originating-app::KAFKA -a this-app
Viewing Cluster Information
You can examine the current state of your cluster by typing:
$ heroku kafka:info
This command provides you with information on the resource’s name, creation date, plan, version, status, topics, traffic, and active consumers.
More detailed per-topic throughput information for your cluster is available via the following:
$ heroku kafka:topics
Maintenance
From time to time, Heroku performs maintenance tasks on an Apache Kafka for Heroku cluster. Typical tasks include updating the underlying infrastructure of the cluster. Heroku handles these maintenance tasks automatically. Heroku doesn’t schedule maintenance events during an app’s maintenance window. We send a notice when the maintenance starts and ends. See the Apache Kafka on Heroku Maintenance FAQ for more information.
Upgrading an Apache Kafka on Heroku Plan
You can upgrade your Apache Kafka on Heroku plan by using the heroku addons:upgrade
command from the CLI.
This command can be used to:
- Upgrade or downgrade between multi-tenant Kafka Basic plans.
- Upgrade or downgrade between dedicated cluster plans.
This command can’t be used to:
- Upgrade or downgrade between a multi-tenant plan and a dedicated plan. This upgrade requires a migration.
- Upgrade or downgrade between Common Runtime plans and Private or Shield tier plans.
Downgrading to a smaller plan also isn’t allowed if the Kafka cluster’s data size is over the limit of the plan you want to downgrade to.
To upgrade your Apache Kafka on Heroku plan, first, find the resource name of the Kafka cluster you want to upgrade. The resource name is a globally unique name of the cluster across all of your apps and add-ons:
$ heroku kafka:info -a example-app
=== kafka-animated-12345
Plan: heroku-kafka:standard-0
Status: available
Version: 3.7.1
Created: 2022-11-30T13:02:37.320+00.00
Topics: 84 topics, see heroku kafka:topics
Partitions: [··········] 414 / 12000 partition replicas (partitions × replication factor)
Messages: 0 messages/s
Traffic: 32 bytes/s in / 166 bytes/s out
Data Size: [··········] 68.38 MB / 150.00 GB (0.04%)
Add-on: kafka-convex-12345
In this following example, kafka-animated-12345
is upgraded from a Standard-0
plan to a Extended-0
plan. Remember, kafka-animated-12345
is the name of the Kafka cluster and not the application in this case:
$ heroku addons:upgrade kafka-animated-12345 extended-1 -a example-app
Changing kafka-convex-12345 on example-app from heroku-kafka:standard-0 to heroku-kafka:extended-0... done, ($1800.00/month)
Kafka cluster is being upgraded, and will be ready shortly.
Please use `heroku kafka:wait` to monitor the status of your upgrade.
The process of scaling up or down plan levels of Apache Kafka on Heroku is performed in-place. However, there are a few circumstances where actual data migration is required. If there are new brokers to add or remove during an upgrade, we rebalance the partitions between them.
You can follow the upgrade process using heroku kafka:wait
:
$ heroku kafka:wait -a example-app
Waiting for cluster kafka-convex-12345... ⡿ upgrading
Alternatively, you can also use heroku kafka:info
:
$ heroku kafka:info -a example-app
=== kafka-animated-12345
Plan: heroku-kafka:extended-0
Status: upgrading
Version: 3.7.1
Created: 2022-11-30T13:02:37.320+00.00
Topics: 84 topics, see heroku kafka:topics
Partitions: [··········] 414 / 12000 partition replicas (partitions × replication factor)
Messages: 0 messages/s
Traffic: 25 bytes/s in / 136 bytes/s out
Data Size: [··········] 68.38 MB / 150.00 GB (0.04%)
Add-on: kafka-convex-12345
The time it takes to complete an upgrade depends on the difference of the plans and the size of the stream volume. If the upgrade or downgrade is between levels of the same tier (for example, standard-0
to standard-1
), the upgrade is almost immediate. If the upgrade or downgrade is between different tiers (for example, standard
to extended
), we create or remove the extra brokers that each plan offers and rebalance partitions between the final number of brokers. There’s no downtime in this process, but does take time to complete depending on the size of the cluster.
Managing Kafka
Topics are the structured representation of messages, and serve as the intermediary between producers and consumers. Aside from the name of a topic in Kafka, there are a few configurable properties that define how data flows through a topic.
The properties include the replication factor, the number of logical partitions, and either compaction or time-based retention. The Heroku-provided default settings are suitable for many applications, but for topics expected to handle billions of events per day, consider doing further research before entering production.
Configuration Defaults and Limits
Parameter | Default | Lower Limit | Upper Limit |
---|---|---|---|
Replication | 3 | 3 | Number of brokers in cluster |
Retention Period | 1 day | 1 day | standard: 2 weeks, extended: 6 weeks |
Partitions per Topic | 32 | 1 | 256 |
Partitions per Cluster | NA | NA | 4000 x Number of brokers in cluster |
Care in topic design is encouraged. Parameters like retention or compaction can be changed relatively easily, and replication can be changed with some additional care, but partitions can’t currently be changed after creation. Compaction and time-based retention are mutually exclusive configurations for a given topic, though different topics within a cluster can have a mix of these configurations.
Understanding Topics
Kafka topics can be created and managed via the web dashboard and the CLI. This section covers the basics of CLI-based topic management.
Full CLI documentation can always be accessed through the Heroku CLI itself:
$ heroku help kafka
Automatic topic creation, or “create topic on first write,” isn’t currently available on Heroku.
You can create and destroy a topic with the following CLI command:
$ heroku kafka:topics:create my-cool-topic --partitions 100
$ heroku kafka:topics:destroy my-cool-topic
You can list all topics on a cluster with the following CLI command:
$ heroku kafka:topics
You can examine information about a topic with the following CLI command:
$ heroku kafka:topics:info my-cool-topic
To facilitate testing and inspection of topics, you can write to or tail topics from the CLI.
kafka:write
and kafka:tail
only work in Private and Shield Spaces if you created IP rules for allowed sources.
You can write a new message to a topic with the following CLI command:
$ heroku kafka:topics:write my-cool-topic MESSAGE
You can subscribe to a topic and read new messages from it with the following CLI command:
$ heroku kafka:topics:tail my-cool-topic
Basic multi-tenant Kafka plans require a prefix on topics and consumer groups. See the differences between multi-tenant and dedicated. When integrating Kafka consumers, ensure topics and consumer groups are prefixed with the value of the KAFKA_PREFIX
environment variable. Otherwise, messages aren’t received, and errors like Broker: Topic authorization failed or Broker: Group authorization failed can appear in Kafka debug events.
Understanding Partitions
Kafka topics are configured with a number of logical partitions. These divide the log into shards; each capable of being independently distributed around the cluster and consumed from.
However, Kafka’s ordering guarantee only applies within an individual partition. This means that messages are consumed in the order they’re produced, but can be interleaved if they span multiple partitions.
Most consumer libraries allocate a single consumer thread per partition. Therefore the number of partitions you choose for your topics, and how you deliver messages to them, can be crucial to the scalability of your application.
We recommend using higher numbers of partitions if your consumers are relatively “slow” compared to your producers (for example, if they’re writing into an external database). As per the section previously on defaults and limits, the current plans have a default of 32 partitions per topic, a maximum of 256 partitions per topic, and a maximum of 4000 partitions times the number of brokers in a cluster (for example, 12,000 partitions for standard
tier clusters, and 32,000 partitions for extended
tier clusters).
Cleanup Policy
Kafka supports two primary modes of cleanup on topics: time-based retention, and log compaction. As of version 0.10.1.0, Kafka supports the mixed use of these modes, for example, a topic can have time-based retention, compaction, or both modes enabled.
Time-Based Retention
This is the default mode of cleanup for Kafka topics. As messages are written to the partitions of a topic, they’re annotated with time values and written to log segments. The log segments are then periodically processed, and messages that have outlived the retention window are cleaned up and removed.
$ heroku kafka:topics:retention-time my-cool-topic '36 hours'
Currently, Apache Kafka on Heroku has a minimum retention time of 24 hours, and a maximum of 2 weeks for standard
plans and 6 weeks for extended
plans.
Log Compaction
Kafka supports an alternative configuration on topics known as log compaction. This configuration changes the semantics of a topic such that it keeps only the most recent message for a given key, tombstoning any predecessor. This allows for the creation of a value-stream, or table-like view of data, and is a powerful construct in modeling your data and systems.
It’s important to note that compacted topics (without time-based retention enabled) don’t automatically reclaim all storage space over time. The most recent version of a message for a given key persists until it’s actively tombstoned, either by a new message of that key being written to the topic, or an explicit tombstone being written for that key. This can cause compacted topics with unbounded keyspaces to experience unbounded growth over time, driving unexpected resource utilization. Compacted topics must be used and monitored carefully, in order to stay within plan limits.
It’s also important to note that older messages of a given key, though tombstoned, aren’t removed until the log-cleaner process clears them. This process is asynchronous, and multiple versions of a given key remain in the log until the current segment is processed.
$ heroku kafka:topics:compaction my-cool-topic enable
Replication
The replication factor of a topic determines the number of replicas or copies that are maintained across the brokers in the cluster. Replication provides additional durability and fault tolerance to the data in the Kafka cluster, while also increasing the volume of data managed over the cluster.
Kafka is configured to require a minimum replication factor of 3 on the standard
and extended
plans. Replication can be set as high as the number of brokers in the Kafka cluster.
$ heroku kafka:topics:replication-factor my-cool-topic 3
Increasing the replication factor of an existing topic can put additional load on your Kafka cluster, as it works to create additional replicas across the available brokers. This also increases the size of the data maintained in the cluster, which must be considered in the context of the plan capacity.
Producer Acknowledgment Configuration
Related to the number of replicas is the producer acknowledgment configuration, or producer acks
. This determines how many in sync replicas must acknowledge a write before it’s considered successful. This setting, in conjunction with the replication factor, influences the latency and durability guarantees of your writes. This configuration resides in the application code you use to produce to Kafka, but is important to consider alongside your replication.
A configuration of acks=0
means that a producer doesn’t wait for any confirmation from the broker that it has attempted to write to before continuing on. This can reduce latency in producers, but no guarantees are made that the server recorded the message, and clients aren’t able to retry their writes.
A configuration of acks=1
means that a producer waits for confirmation of write from only the broker that the message is initially written to. Producer retries can still happen in this configuration, but data loss can still occur if the broker instance fails before the data is sync’d to replicas on other topics.
A configuration of acks=all
or acks=-1
requires that all replicas are in sync and acknowledge the write before continuing. A high replication factor can increase the latency that this entails, but this yields the strongest guarantee of data resilience for writes.
Understanding ACLs
For customers using single-tenant (standard
and extended
) Apache Kafka On Heroku plans, we have a published, supported ACL policy. We don’t change these ACLs going forward without a new changelog update:
Your principal gets access to the following:
Read,Write,Describe
on name*
for resource typeTopic
Read,Write,Describe,Delete
on name*
for resource typeGroup
Describe,Write
on name*
for resource typeTransactionalId
Kafka Versions and Clients
Heroku currently offers Apache Kafka version 3.7.1 as the default.
Available Kafka Versions
Major Version | Minor Version | Status | EOL Date |
---|---|---|---|
2.7 | 2.7.1 | EOL | 2022-06-15 |
2.8 | 2.8.2 | Available | TBD |
3.7 | 3.7.1 | Available | TBD |
As a general rule, the version of the client library used must be equal to or less than the version on the cluster.
Kafka supports SSL to encrypt and authenticate connections, and this is the only connection mode supported in the Common Runtime. Because of this, you must use a library that supports SSL encryption and client certificates. In Private Spaces, plaintext connections can optionally be used, as described below. In Shield Spaces, plaintext connections aren’t allowed.
Version Lifecycle
The Apache Kafka project releases a new major version approximately every four months (3x/year). In accordance with this release schedule, we make all versions published in the last year available for new add-ons. We only support Apache Kafka versions that the upstream project maintains. One year after the most recent point release is available, we mark that major version deprecated and stop allowing new add-ons to use this version.
After a version is deprecated, your cluster continues to operate normally. However, we have found that running older versions is risky as deprecated versions don’t receive bug fixes or security patches and are no longer supported by the community. Heroku notifies customers via email about the deprecation process for their affected clusters.
We recommend that users regularly evaluate their add-on version and plan to upgrade their cluster at least once a year. By keeping up with this schedule, your cluster receives important bug fixes and notable improvements in reliability.
Upgrading Kafka Versions
To upgrade the version of a dedicated Kafka cluster, use the following from the command line:
$ heroku kafka:upgrade --version MAJOR_VERSION_NUMBER
It’s important to note that the upgrade command advances the version to the latest supported stable minor version. For example, currently, heroku kafka:upgrade --version 3.7
upgrades a cluster to version 3.7.1.
This command upgrades the Kafka brokers in the cluster to the new version. Upgrading the cluster involves several process restarts of the brokers, but those restarts are done one at a time. Assuming your application can handle broker restarts, the upgrade is relatively seamless.
We recommend reading the article about robust usage of Kafka to ensure proper handling of broker restarts. During the upgrade period, your cluster is running mixed versions. For example, you can have one broker on 0.10.2.1 and two brokers on 1.0.2.
Kafka strictly promises backward protocol compatibility: that is, you can always use a client protocol version older than the version(s) your cluster is running. You can’t however, use a newer client version than the one your cluster is running.
During the upgrade, you must keep your client on a version equal or lower than the version you’re upgrading from. After the upgrade is finished (denoted by the status in heroku kafka:wait
), you can start using a new protocol version and any new features it supports. It isn’t required (but is recommended) to keep your client on the same protocol version your cluster is running.
It’s recommended you keep your clusters up to date with the latest recommended version from Heroku, which is 3.7.1
at this time. We perform heavy testing and validation work on all new Kafka releases, including testing the upgrade procedure, and are careful to only recommend versions that we trust.
Language Support
There are a great number of client libraries for Kafka, across many languages. Those listed following are libraries that we’ve either helped to improve, or that we feel have up-to-date support for Kafka features, including critical capabilities like support for SSL.
Basic multi-tenant Kafka plans require a prefix on topics and consumer groups. See the differences between multi-tenant and dedicated. When integrating Kafka consumers, ensure topics and consumer groups are prefixed with the value of the KAFKA_PREFIX
environment variable. Otherwise, messages aren’t received, and errors like Broker: Topic authorization failed
or Broker: Group authorization failed
can appear in Kafka debug events.
Using Kafka in Ruby Applications
We recommend using rdkafka-ruby
when connecting to Kafka from Ruby.
This example shows how to write and consume messages in Ruby.
When using rdkafka-ruby
, you must specify "enable.ssl.certificate.verification" => false
in your client configuration to connect to Apache Kafka on Heroku.
Using Kafka in Java Applications
We recommend using kafka-clients
when connecting to Kafka from Java. See the Kafka project documentation for more information.
This example shows how to write and consume messages in Java. This section of the demo app provides a good example of working with the TrustStore and KeyStore for JVM applications.
On recent versions of the Java Kafka clients, you must specify ssl.endpoint.identification.algorithm=
(empty string) to connect to Apache Kafka on Heroku.
Using Kafka in Go Applications
We recommend using sarama
when connecting to Kafka from Go.
This example shows how to write and consume messages in Go.
Due to the strictness of the tls
package in Go, you must set InsecureSkipVerify
to true
.
Using Kafka in Python Applications
We recommend using confluent-kafka-python
when connecting to Kafka in Python. This client library wraps the C/C++ Kafka library.
We also recommend the kafka-python
library, especially for scenarios where wrapping the C/C++ libraries is less than ideal. Heroku has created the kafka-helper
library to make kafka-python
easier to use.
Using Kafka in Node.js Applications
We recommend using kafkajs
when connecting to Kafka from Node.js applications.
Using Kafka in PHP Applications
The rdkafka
extension for PHP is available on Heroku. It provides bindings for the librdkafka
library and supports SSL.
Using Kafka in Other Languages or Frameworks
The Confluent Client wiki is a good source for clients and code examples for other languages and frameworks.
Connecting to a Kafka Cluster
All connections to Kafka support SSL encryption and authentication. If the cluster is provisioned in a Private Space, you have the option to connect via plaintext. In Shield Spaces, plaintext connections aren’t allowed.
Connecting over SSL means all traffic is encrypted and authenticated via a SSL client certificate.
To connect over SSL, use the following environment variables:
As with all Heroku add-ons, important environment and configuration variables can change. It’s important to design your application to handle updates to these values, especially if these resources are being accessed from outside of Heroku.
KAFKA_URL
: A comma-separated list of SSL URLs to the Kafka brokers making up the cluster.KAFKA_TRUSTED_CERT
: The brokers’ SSL certificate (in PEM format), to check that you’re connecting to the right servers.KAFKA_CLIENT_CERT
: The required client certificate (in PEM format) to authenticate clients against the broker.KAFKA_CLIENT_CERT_KEY
: The required client certificate key (in PEM format) to authenticate clients against the broker.
Kafka clusters require authenticating using the provided client certificate. Any requests not using the client certificate are denied.
KAFKA_PREFIX
is only provided for basic
plans. Don’t add the environment variable for non-basic
plans. If you’re using a basic
plan, see Multi-Tenant Apache Kafka on Heroku.
Rotating Credentials
It’s a good security practice to rotate the credentials for important services on a regular basis. On Apache Kafka on Heroku, this can be done with heroku kafka:credentials --reset
.
$ heroku kafka:credentials HEROKU_KAFKA_GRAY_URL --reset
When you issue this command, the following steps are executed:
- A new credential is generated for your cluster.
- Existing topics and consumer groups receive new ACLs to allow the new client certificate credential.
- When the new credentials are ready, we update the related config vars on your Heroku application.
- For a period of 5 minutes, both the old client certificate and new client certificate remain valid. This allows time for the new client certificate to cycle in to your app.
- After 5 minutes, the old client certificate credential is expired and no longer valid.
Connecting to a Private or Shield Kafka Cluster From an External Resource
See the “Connecting to a Private or Shield Kafka cluster from an External Resource” article.
Monitoring Via Logs
For dedicated cluster plans such as standard
or extended
plans, you can observe Kafka activity within your Heroku app’s log stream. Additional logging options for basic
plans will be delivered in the future.
Kafka Logs
To view logs from the Kafka service itself, use the process type -p
flag and the add-on name. This command indicates that you only want to see the logs from a specific Apache Kafka on Heroku add-on. You can use the --tail
option to access real-time log entries.
$ heroku logs -p kafka-globular-94829 -tail
Heroku delivers log lines from your Kafka cluster with the WARN
, ERROR
, or FATAL
levels to your app’s log stream. These log lines look like this:
2022-06-07T22:46:31+00:00 kafka[kafka-globular-94829.0]: pri=WARN t=ReplicaFetcherThread-0-2 at=ReplicaFetcherThread [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Partition my-cool-topic-16 marked as failed
Kafka Metrics
Kafka metrics are written to the log stream withthe [heroku-kafka] prefix. Metrics emitted for specific brokers in the cluster are written as [heroku-kafka.N], where N is the broker id of the node responsible for the log line.
$ heroku logs --tail --ps heroku-kafka
2024-04-15T14:53:16.000000+00:00 app[heroku-kafka.2]: source=KAFKA addon=kafka-flat-32368 sample#load-avg-1m=0.005 sample#load-avg-5m=0.005 sample#load-avg-15m=0 sample#read-iops=0 sample#write-iops=0.18462 sample#memory-total=16098868kB sample#memory-free=5667652kB sample#memory-cached=5290740kB sample#bytes-in-per-second=0.0 sample#bytes-out-per-second=0.0
These metrics apply to an individual node in your cluster.
sample#bytes-in-per-second
: The number of bytes ingested by your cluster per second. This factors in replication, so you see more bytes per second here than your producers send.sample#bytes-out-per-second
: The number of bytes output by your cluster per second. This factors in replication, so you see more bytes per second here than your consumers read.
Server Metrics
These metrics come directly from the server’s operating system.
sample#load-avg-1m
,sample#load-avg-5m
andsample#load-avg-15m
: The average system load over a period of 1 minute, 5 minutes, and 15 minutes, divided by the number of available CPUs. A load-avg of 1.0 indicates that, on average, processes were requesting CPU resources for 100% of the timespan. This number includes I/O wait.sample#read-iops
andsample#write-iops
: Number of read or write operations in I/O sizes of 16-KB blocks.sample#memory-total
: Total amount of server memory in use, in KB. This includes memory used by all Kafka processes, OS memory, and disk cache.sample#memory-free
: Amount of free memory available in KB.sample#memory-cached
: Amount of memory being used the OS for page cache, in KB.sample#memory-percentage-used
: Percentage of server memory used on the cluster, between 0.0–1.0.
Regions
Kafka is available in all regions currently supported by Heroku.
Heroku distributes Kafka brokers across network availability zones. It takes advantage of Kafka’s rack-aware partition assignment, in order to provide increased resilience to system or hardware failure. Some regions have different numbers of network availability zones, though, which can necessitate additional care in order to achieve the desired fault tolerance.
Node Failure Exercise
Distributed databases are designed to operate despite node failure. Unfortunately, while the database can remain available during a node failure, performance and other characteristics can be degraded. Adding nodes to a heavily loaded cluster results in similar behavior, as load is incurred while data is replicated to the new node. Apache Kafka on Heroku offers a CLI tool that can be used to cause one of the nodes in your cluster to fail.
This actually causes one of the nodes in your cluster to fail.
$ heroku kafka:fail
You can track the recovery progress by typing heroku kafka:wait
.
During a failure, our automated systems work to restore normal operations. We recommend you verify that your application operates successfully under heavy load with a failed node. We recommend validating this in a staging environment.
The catastrophic
flag doesn’t only reboot a node, it actually destroys a node and replaces it in the cluster. This places substantial additional read traffic on the other nodes in your cluster while the replacement node resynchronizes its state.
$ heroku kafka:fail --catastrophic
Removing the add-on
Kafka can be removed via the CLI.
This destroys all associated data and can’t be undone!
$ heroku addons:destroy heroku-kafka
-----> Removing heroku-kafka from kafka-demo... done, v20 (free)