Streaming AWS

Apache Kafka on AWS User Guide

| Product: Apache Kafka on AWS

Overview

Apache Kafka is the leading open source distributed event streaming platform. It is used to build real time data pipelines and streaming applications, to decouple producers from consumers, and to aggregate logs and metrics at scale. This image runs Kafka in KRaft mode as a single combined broker and controller, so there is no separate ZooKeeper ensemble to operate. The broker runs as an unprivileged service user under systemd and listens for clients on port 9092.

The image bundles a web management console so you get a real Kafka administration surface in the browser without installing anything else. The console gives you a Topics view, a Nodes view and a Consumer Groups view, and it lets you browse individual messages, inspect partition assignments and watch consumer lag. It runs as its own service, bound to the loopback interface, behind an nginx reverse proxy that serves it on port 80.

Authentication is enabled by default. The broker enforces SASL SCRAM SHA 256, and the web console is protected by HTTP basic authentication. On the first boot of every deployed instance a one shot service generates a fresh administrator password, unique to that instance, bakes it into the KRaft cluster metadata as a SCRAM credential, and reuses it for the web console. The password is written to /root/kafka-credentials.txt with mode 0600 so that only the root user can read it. Two instances launched from the same Amazon Machine Image never share a password.

Kafka log directories and cluster metadata are stored on a dedicated EBS data volume mounted at /var/lib/kafka, separate from the operating system disk and independently resizable. The mount is pinned by filesystem identifier so it survives every reboot.

Prerequisites

Before you deploy this image you need:

  • An Amazon Web Services account where you can launch EC2 instances
  • IAM permissions to launch instances, create security groups, and subscribe to AWS Marketplace products
  • An EC2 key pair in the target Region for SSH access to the instance
  • A VPC and subnet in the target Region, with a security group allowing inbound port 22 from your management network and inbound port 80 from the networks that will reach the web console
  • If you intend to connect external Kafka clients, inbound port 9092 from your client network
  • The AWS CLI (version 2) installed locally if you plan to deploy from the command line

Step 1: Launch the Instance from the AWS Marketplace

Sign in to the AWS Management Console, open the EC2 service, and select Launch instance. Under Application and OS Images choose AWS Marketplace AMIs and search for Apache Kafka. Select the cloudimg listing and choose Select, then Continue on the subscription summary.

Pick an instance type of m5.large or larger — the Kafka broker and the bundled web console are both JVM processes. Choose your EC2 key pair under Key pair (login). Under Network settings select your VPC and subnet, and either create or select a security group that allows inbound port 22 from your management network, inbound port 80 from the networks that will reach the web console, and inbound port 9092 from your Kafka client network. Leave the root volume at the default size or larger.

Select Launch instance. First boot initialisation takes approximately one minute after the instance state becomes Running and the status checks pass.

Step 2: Launch the Instance from the AWS CLI

The following block launches an instance from the cloudimg Apache Kafka Marketplace AMI into an existing subnet and security group. Replace <ami-id> with the AMI ID shown on the Marketplace listing, <key-name> with your EC2 key pair name, <subnet-id> with your subnet ID, and <security-group-id> with a security group that opens ports 22, 80 and 9092 as described above.

aws ec2 run-instances \
  --image-id <ami-id> \
  --instance-type m5.large \
  --key-name <key-name> \
  --subnet-id <subnet-id> \
  --security-group-ids <security-group-id> \
  --block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":30,"VolumeType":"gp3"}}]' \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=apache-kafka-01}]'

The command prints a JSON document on success. Note the instance ID, then retrieve its public address once it is running with aws ec2 describe-instances --instance-ids <instance-id> --query "Reservations[].Instances[].PublicIpAddress" --output text.

Step 3: Connect and Retrieve Initial Credentials

Connect over SSH with the key pair you selected and the public IP address from step 2. The SSH login user depends on the operating system of the AMI variant you launched:

AMI variant SSH login user
Apache Kafka on Ubuntu 24.04 ubuntu

The first boot service runs before the SSH daemon becomes ready, so the credentials file is always in place when you log in for the first time.

ssh <login-user>@<public-ip>
sudo cat /root/kafka-credentials.txt

You will see a plain text file containing the broker bootstrap address, the administrator username (cloudimg), the administrator password, and the security protocol and SASL mechanism the broker expects. Copy these values somewhere secure (a password manager or encrypted vault). Do not commit them to source control. The same password authenticates both Kafka clients on port 9092 and the web management console on port 80.

Step 4: Confirm the Services Are Running

The image runs three services: kafka.service (the broker and controller), akhq.service (the web management console) and nginx.service (the reverse proxy that fronts the console). Confirm all three are active and check the installed Kafka version:

sudo systemctl is-active kafka.service akhq.service nginx.service
ls /opt/kafka/libs/ | grep -E '^kafka-clients-' | head -1

You should see active three times, followed by the Kafka client library version:

active
active
active
kafka-clients-4.3.0.jar

Step 5: Build a Client Configuration File

Every Kafka command line tool needs a small properties file that carries the SASL credentials. Build one from the values in the credentials file. This block reads the password directly from /root/kafka-credentials.txt so you never have to paste it:

PASS=$(sudo grep '^kafka.admin.pass=' /root/kafka-credentials.txt | cut -d= -f2-)
cat > /tmp/cloudimg.properties <<EOF
security.protocol=SASL_PLAINTEXT
sasl.mechanism=SCRAM-SHA-256
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="cloudimg" password="${PASS}";
EOF

The broker enforces authentication, so any tool invoked without --command-config /tmp/cloudimg.properties is rejected. That is the auth wall working as intended.

Step 6: Inspect the Broker and List Topics

With the client configuration in place, query the broker API versions and list the topics. The broker API call confirms the SASL handshake succeeds end to end:

sudo /opt/kafka/bin/kafka-broker-api-versions.sh --bootstrap-server 127.0.0.1:9092 --command-config /tmp/cloudimg.properties | head -3
sudo /opt/kafka/bin/kafka-topics.sh --bootstrap-server 127.0.0.1:9092 --command-config /tmp/cloudimg.properties --list

The first command prints the broker id and its supported API range; the second lists the topics, including the sample cloudimg topic created on first boot:

127.0.0.1:9092 (id: 1 rack: null isFenced: false) -> (
    Produce(0): 0 to 13 [usable: 13],
    Fetch(1): 4 to 18 [usable: 18],
__consumer_offsets
cloudimg

Step 7: The Web Management Console — Topics

Open a web browser and navigate to http://<public-ip>/. The browser prompts for HTTP basic authentication. Sign in as the cloudimg user with the password from /root/kafka-credentials.txt. The console opens on the Topics view, which lists every topic with its message count, on disk size, partition count, replication factor and any consumer groups attached to it.

Apache Kafka web management console Topics view listing each topic with its message count, partitions and replication

Selecting the magnifying glass on a topic row opens the topic detail, where you can browse individual messages, view per partition offsets and edit topic configuration.

Step 8: The Web Management Console — Nodes

The Nodes view shows every broker in the cluster. On a single instance deployment there is one node: broker id 1, host 127.0.0.1:9092, marked as the controller, holding all of the cluster's partitions.

Apache Kafka web management console Nodes view showing broker id 1 as the cluster controller

When you scale to a multi broker cluster, each broker appears here with its own partition share, and the controller column identifies which broker currently holds the controller role.

Step 9: The Web Management Console — Consumer Groups

The Consumer Groups view lists every consumer group, its state, the broker coordinating it, the number of live members and the topics it consumes. The lag badge next to each topic shows how far behind the group is — a lag of zero means the group has consumed every available message.

Apache Kafka web management console Consumer Groups view showing groups, their state and per topic lag

A group whose consumers have all disconnected shows the EMPTY state but keeps its committed offsets, so consumption resumes exactly where it stopped when a consumer rejoins.

Step 10: Produce and Consume a Message

Confirm the full streaming path from the command line. The first command pipes a single message into the cloudimg topic; the second reads it back from the start of the topic:

echo "hello cloudimg $(date +%s)" | sudo /opt/kafka/bin/kafka-console-producer.sh --bootstrap-server 127.0.0.1:9092 --command-config /tmp/cloudimg.properties --topic cloudimg
sudo /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --command-config /tmp/cloudimg.properties --topic cloudimg --from-beginning --max-messages 1 --timeout-ms 10000

The consumer prints the earliest message in the topic and then exits. Producing and consuming both succeed only because the client configuration carries valid SASL credentials.

Step 11: Create a New Topic

Create your own topics with kafka-topics.sh. Replace <topic-name> with your topic name and <partitions> and <replication> with your chosen numbers. On a single broker the replication factor cannot exceed one — add brokers first if you need replicas.

sudo /opt/kafka/bin/kafka-topics.sh --bootstrap-server 127.0.0.1:9092 --command-config /tmp/cloudimg.properties --create --topic <topic-name> --partitions <partitions> --replication-factor <replication>

After creating a topic, refresh the Topics view in the web console and it appears immediately.

Step 12: Connect an External Client

External producers and consumers connect to the broker on port 9092 using the same SASL SCRAM SHA 256 credentials. Make sure your security group allows inbound port 9092 from the client network, then point the client at <public-ip>:9092 with these properties:

bootstrap.servers=<public-ip>:9092
security.protocol=SASL_PLAINTEXT
sasl.mechanism=SCRAM-SHA-256
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="cloudimg" password="<KAFKA_ADMIN_PASSWORD>";

For traffic that crosses an untrusted network, terminate TLS at an upstream Application Load Balancer, or switch the broker listener to SASL_SSL and supply a keystore. The Kafka documentation covers the SASL_SSL configuration in detail.

Step 13: Heap and Performance Tuning

The broker heap is set in /etc/default/kafka through the KAFKA_HEAP_OPTS variable. The image ships with a conservative heap suitable for development and light production. For a busier broker, raise the heap to roughly half of the instance memory, then restart the service:

sudo systemctl restart kafka.service

Watch the broker logs with sudo journalctl -u kafka.service -f while it restarts to confirm it comes back cleanly.

Step 14: Back Up Kafka Data

Kafka data lives under /var/lib/kafka/data on the dedicated EBS data volume. The simplest consistent backup is an EBS snapshot of that volume, which you can schedule with Amazon Data Lifecycle Manager. For a file level copy, stop the broker first so the log segments are quiescent:

sudo systemctl stop kafka.service
sudo tar czf /var/tmp/kafka-data-backup.tgz -C /var/lib/kafka data
sudo systemctl start kafka.service

For zero downtime backups, scale to a multi broker cluster and rely on replication, or mirror topics to a second cluster.

Step 15: Scale to a Multi Broker Cluster

This image is a single broker and controller, which suits development, testing, single tenant production streaming, edge and IoT brokering, and proof of concept work. To scale horizontally, launch additional brokers, give each a unique node.id, list every controller in controller.quorum.voters, and raise the replication factor on your topics so partitions are spread across brokers. The Apache Kafka documentation covers multi node KRaft cluster configuration step by step.

Troubleshooting

If the web console does not load, confirm nginx.service and akhq.service are both active with sudo systemctl is-active nginx akhq, and check the console logs with sudo journalctl -u akhq.service --no-pager | tail -40. The console takes a few seconds to finish starting after a reboot.

The image also serves an unauthenticated health endpoint at http://<public-ip>/healthz, which returns the plain text ok with HTTP 200 once nginx is up. Point an Application Load Balancer target group health check or an external uptime monitor at /healthz — it confirms the proxy is serving without exposing the management console.

If a Kafka command hangs or is rejected, confirm /tmp/cloudimg.properties exists and carries the current password from /root/kafka-credentials.txt — the file lives in /tmp and does not survive a reboot, so rebuild it with the Step 5 block after each reboot.

If the broker does not start, inspect sudo journalctl -u kafka.service --no-pager | tail -60. A common cause on a resized instance is insufficient memory for the configured heap; lower KAFKA_HEAP_OPTS in /etc/default/kafka and restart.

Support

This image is published by cloudimg with 24/7 technical support by email and chat. Support covers Kafka deployment, broker tuning, topic and partition design, consumer group management, the web management console and scaling to a multi broker cluster.

All product and company names are trademarks or registered trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.