Table of Contents [expand]
Last updated June 09, 2026
Kafka Streams is a Java client library for processing streaming data with Apache Kafka on Heroku. Use Kafka Streams to develop lightweight, scalable, and fault-tolerant stream processing apps.
Apps built with Kafka Streams produce and consume data from streams, which are unbounded, replayable, ordered, and fault-tolerant sequences of events. A stream is either represented as a Kafka topic (KStream) or materialized as compacted topics (KTable). By default, the library makes sure that your app handles stream events one at a time and handles late-arriving or out-of-order events.
Heroku deploys apps using Kafka Streams as normal Java services and communicates directly with Kafka clusters. Both dedicated and basic Apache Kafka on Heroku plans support Kafka Streams with some additional setup required for basic plans.
Connecting Your App
See Connecting to a Kafka Cluster for more information.
Managing Internal Topics and Consumer Groups
Kafka Streams uses internal topics for fault tolerance and repartitioning. Apps with Kafka Streams require these topics to work properly.
Creating Kafka Streams internal topics is unrelated to Kafka’s auto.create.topics.enable config. Kafka Streams communicates with clusters directly through an admin client for topic creation.
Dedicated Kafka Plans
Dedicated Kafka clusters don’t require additional configuration.
See dedicated plans and configurations for more details.
Basic Kafka Plans
Basic Kafka plans run on multi-tenant clusters. Kafka Access Control Lists (ACLs) isolate user data and access privileges. Also, basic plans namespace topic and consumer group names with an auto-generated prefix to prevent naming collisions.
Running Kafka Streams apps on basic plans requires two preliminary steps: setting up the application.id and pre-creating internal topics and consumer groups.
Setting Your application.id
Each Kafka Streams app has an important unique identifier called the application.id that identifies it and its associated topology. If you use an Apache Kafka on Heroku basic plan, make sure that each application.id begins with your cluster’s assigned prefix:
properties.put(StreamsConfig.APPLICATION_ID_CONFIG, String.format("%saggregator-app", HEROKU_KAFKA_PREFIX));
Pre-Creating Internal Topics and Consumer Groups
Because basic plans on Apache Kafka on Heroku use limited ACLs, Kafka Streams apps can’t interact with topics and consumer groups directly. This limitation is problematic because Kafka Streams use an internal admin client to create internal topics and consumer groups transparently at runtime. This scenario primarily affects processors in Kafka Streams.
Processors are classes that implement a process method. They receive input events from a stream, process those events, and optionally produce output events to downstream processors. Stateful processors are processors that use state produced by previous events when processing subsequent ones. Kafka Streams provides built-in functionality for storage of this state.
For each stateful processor in your app, create its two internal topics for changelog and repartition. Internal topics follow the naming convention <application.id>-<operatorName>-<suffix>:
$ heroku kafka:topics:create ${KAFKA_PREFIX}aggregator-app-<OPERATOR>-changelog --app example-app
$ heroku kafka:topics:create ${KAFKA_PREFIX}aggregator-app-<OPERATOR>-repartition --app example-app
Additionally, create a single consumer group for your app that matches the application.id:
$ heroku kafka:consumer-groups:create ${KAFKA_PREFIX}aggregator-app --app example-app
See basic plans and configurations for more details.
Scaling Your App
Parallelism Model
In Kafka Streams, stream tasks are a fundamental unit of parallelism. Stream tasks can interact with one or more partitions. Because of this configuration, partitions also represent the upper bound for parallelism of a Kafka Streams service where the number of stream tasks must be less than or equal to the number of partitions.
Each instance of a Kafka Streams app contains stream threads. These threads are responsible for running one or more stream tasks. Kafka Streams makes sure that it spreads input partitions evenly across stream tasks so that it can consume and process all events.
Kafka Streams uses an application.id to make sure that it spreads partitions transparently and evenly across stream tasks. The application.id identifies your Kafka Streams service and is unique per Kafka cluster.
Vertical Scaling
By default, Kafka Streams create one stream thread per app instance. Each stream thread runs one or more stream tasks. Scale an app instance by scaling its number of stream threads. To do so, modify the num.stream.threads config value in your app. The app transparently rebalances workload across threads within each app instance.
Horizontal Scaling
Kafka Streams rebalances workload and local state across instances as the number of app instances changes. This rebalancing works transparently by distributing workload and local state across instances with the same application.id. Scale Kafka Streams apps horizontally by scaling the number of dynos with the heroku ps:scale command.
The number of input partitions is effectively the upper bound for parallelism. Make sure that the number of stream tasks doesn’t exceed the number of input partitions. Otherwise, this over-provisioning results in idle app instances.
Caveats
RocksDB Persistence
Kafka Streams uses RocksDB as the default storage engine for persistent stores. Because dynos are backed by an ephemeral filesystem, it isn’t practical to rely on the underlying disk for durable storage. This configuration presents a challenge for using RocksDB with Kafka Streams on Heroku. However, using the default RocksDB configuration isn’t a requirement. Kafka Streams treats RocksDB as a write-through cache, where the source of truth is the underlying changelog internal topic. If there’s no underlying RocksDB, the app replays the state directly from the changelog topics on startup.
By default, replaying state directly from changelog topics incurs additional latency when rebalancing your app instances or when dynos restart. To minimize latency, configure Kafka Streams to fail over stream tasks to their associated standby tasks.
Standby tasks are replicas of stream tasks that maintain fully replicated copies of state. Dynos use standby tasks to resume work immediately instead of waiting for the state to rebuild from the changelog topics.
You can modify the num.standby.replicas config in your app to change the number of Standby Tasks.