Application Development Azure

Debezium on Ubuntu 24.04 on Azure User Guide

| Product: Debezium on Ubuntu 24.04 LTS on Azure

Overview

Debezium is the leading open-source Change Data Capture (CDC) platform. It streams row-level changes from your databases (PostgreSQL, MySQL, MongoDB and more) into Apache Kafka so you can build event-driven pipelines, cache invalidation, search indexing, audit trails and microservice data synchronisation without polling. The cloudimg image installs Debezium 3.5.2 as a single-node appliance: Apache Kafka 3.9.2 running in KRaft mode (broker plus controller, no separate ZooKeeper) at /opt/kafka, and Kafka Connect (connect-distributed) loaded with the Debezium connector plugins at /opt/debezium-plugins, both running as a dedicated debezium system user on JDK 17. The Kafka Connect REST API has no built-in authentication, so it is bound to loopback and fronted by nginx on TCP 80 with HTTP Basic auth, with a unique password generated on the first boot of every VM. Kafka data and the Connect internal topics persist on a dedicated Azure data disk. Backed by 24/7 cloudimg support.

What is included:

  • Debezium 3.5.2 connector plugins (PostgreSQL, MySQL, MongoDB) at /opt/debezium-plugins
  • Apache Kafka 3.9.2 in KRaft single-node mode at /opt/kafka (kafka.service)
  • Kafka Connect (connect-distributed) with the Debezium plugins on the plugin path (kafka-connect.service)
  • The Connect REST API fronted by nginx on :80 with HTTP Basic auth (the Connect REST API has no built-in authentication) and a per-VM password in a root-only file
  • A dedicated Azure data disk at /var/lib/debezium holding the Kafka logs and the Connect offset/config/status topics — separate from the OS disk and re-provisioned with every VM
  • JDK 17 (Eclipse Temurin), with the Kafka and Connect JVM heaps tuned to fit a Standard_B4ms
  • kafka.service, kafka-connect.service and nginx.service as systemd units, enabled and active
  • 24/7 cloudimg support

Prerequisites

An active Azure subscription, an SSH key pair, and a VNet + subnet in the target region. Standard_B4ms (4 vCPU / 16 GiB RAM) is recommended for the two JVMs. NSG inbound: allow 22/tcp from your management network and 80/tcp for the authenticated Connect REST API. To let database clients and consumers reach Kafka and the Connect REST API directly from off-box, also open 9092/tcp (Kafka broker) and 8083/tcp (Connect REST) — see Step 11.

Step 1 — Deploy from the Azure Marketplace

Sign in to the Azure Portal, choose Create a resource, search the Marketplace for Debezium by cloudimg, and select Create. On Basics pick your subscription, resource group, region and size (Standard_B4ms); under Administrator account choose SSH public key and paste your key; under Inbound port rules allow SSH (22) and HTTP (80). Review the dedicated data disk on the Disks tab, then Review + createCreate.

Step 2 — Deploy from the Azure CLI

az vm create \
  --resource-group <your-rg> \
  --name debezium \
  --image <marketplace-image-urn> \
  --size Standard_B4ms \
  --admin-username azureuser \
  --ssh-key-values ~/.ssh/id_rsa.pub \
  --public-ip-sku Standard

After the VM is created, open the ports you need (port 80 is enough for the authenticated REST API; add 9092 and 8083 for off-box clients):

az vm open-port --resource-group <your-rg> --name debezium --port 80 --priority 900

Step 3 — Connect to your VM

ssh azureuser@<vm-public-ip>

Step 4 — Confirm the services are running

systemctl is-active kafka.service kafka-connect.service nginx.service

All three services report active. Kafka starts in a few seconds; Kafka Connect takes a little longer to load the Debezium plugins and join the cluster.

Step 5 — Retrieve your Connect password

The Connect password is generated uniquely on the first boot of your VM and written to a root-only file:

sudo cat /root/debezium-credentials.txt

This file contains DEBEZIUM_CONNECT_USER (admin), DEBEZIUM_CONNECT_PASSWORD, the Connect REST URL and the Kafka bootstrap address. Store the password somewhere safe.

Step 6 — Check the health endpoint

nginx serves an unauthenticated health endpoint for load balancers and probes:

curl -s http://localhost/health

It returns ok.

Step 7 — Confirm the Kafka and Connect versions

/opt/kafka/bin/kafka-topics.sh --version
java -version

Kafka reports 3.9.2 and Java reports a 17.x build (Eclipse Temurin).

Apache Kafka 3.9.2 and Debezium 3.5.2 versions with all three services active

Step 8 — List the Debezium connector plugins

The Kafka Connect REST API is reachable through nginx on port 80 behind HTTP Basic auth. List the loaded connector plugins and confirm the Debezium classes are present:

PW=<DEBEZIUM_CONNECT_PASSWORD>
curl -s -u admin:$PW http://localhost/connector-plugins | python3 -m json.tool | grep -E 'class|version' | grep -i debezium

You will see io.debezium.connector.postgresql.PostgresConnector, io.debezium.connector.mysql.MySqlConnector and io.debezium.connector.mongodb.MongoDbConnector. A request without credentials returns 401; with the per-VM password it returns 200.

The Connect REST connector-plugins endpoint listing the Debezium PostgreSQL, MySQL and MongoDB connectors

Step 9 — Kafka round-trip (create a topic, produce and consume)

Prove Kafka is working end to end with the console tools:

/opt/kafka/bin/kafka-topics.sh --bootstrap-server 127.0.0.1:9092 --create --topic demo --partitions 1 --replication-factor 1
echo 'hello-cdc' | /opt/kafka/bin/kafka-console-producer.sh --bootstrap-server 127.0.0.1:9092 --topic demo
/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic demo --from-beginning --max-messages 1 --timeout-ms 15000

The consumer prints hello-cdc. List your topics any time with /opt/kafka/bin/kafka-topics.sh --bootstrap-server 127.0.0.1:9092 --list.

A Kafka round-trip: creating a topic then producing and consuming a message

Step 10 — Register a Debezium connector via the REST API

Debezium connectors are registered by POSTing a JSON config to the Connect REST API. The example below registers a PostgreSQL connector; replace the database host, credentials and slot.name with your source database (which must have logical replication enabled, wal_level=logical). The connector configuration documentation is at debezium.io.

PW=<DEBEZIUM_CONNECT_PASSWORD>
curl -s -u admin:$PW -X POST http://localhost/connectors \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "inventory-postgres",
    "config": {
      "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
      "database.hostname": "<your-postgres-host>",
      "database.port": "5432",
      "database.user": "<replication-user>",
      "database.password": "<replication-password>",
      "database.dbname": "<your-db>",
      "topic.prefix": "inventory",
      "plugin.name": "pgoutput",
      "slot.name": "debezium",
      "schema.history.internal.kafka.bootstrap.servers": "127.0.0.1:9092",
      "schema.history.internal.kafka.topic": "schema-changes.inventory"
    }
  }'

Check the connector reaches RUNNING:

curl -s -u admin:$PW http://localhost/connectors/inventory-postgres/status | python3 -m json.tool

The connector.state and each task state report RUNNING once Debezium connects to PostgreSQL. Row-level changes then stream into Kafka topics named <topic.prefix>.<schema>.<table>, which any Kafka consumer can read. Delete a connector with curl -u admin:$PW -X DELETE http://localhost/connectors/inventory-postgres.

A registered Debezium PostgreSQL connector reporting state RUNNING via the Connect REST status API

Step 11 — Open Kafka and Connect to off-box clients

By default only port 80 (the authenticated Connect REST API through nginx) is needed. To let database clients, Kafka consumers, or external Connect tooling reach the VM directly, open the Kafka broker and Connect REST ports in the Azure NSG:

az vm open-port --resource-group <your-rg> --name debezium --port 9092 --priority 910
az vm open-port --resource-group <your-rg> --name debezium --port 8083 --priority 920

The Kafka broker advertises the VM's public IP on 9092 (set automatically at first boot), so external clients connect with bootstrap.servers=<vm-public-ip>:9092. Port 8083 is the raw Connect REST API with no authentication — restrict it to trusted networks in the NSG, or keep using the nginx-fronted port 80 with Basic auth.

Step 12 — Confirm data lives on the dedicated disk

The Kafka logs and the Connect internal topics are stored on the dedicated Azure data disk so they survive OS changes and can be resized independently:

findmnt /var/lib/debezium

The mount is backed by a separate Azure data disk captured into the image and re-provisioned on every VM.

Maintenance

  • Configuration: Kafka's KRaft config is /opt/kafka/config/cloudimg-kraft.properties and Connect's is /opt/kafka/config/cloudimg-connect-distributed.properties. JVM heaps are in /etc/default/kafka and /etc/default/kafka-connect. Edit and sudo systemctl restart kafka kafka-connect to apply.
  • Connectors: manage connectors through the Connect REST API (GET/POST/DELETE /connectors). Registered connectors persist in the connect-configs Kafka topic on the data disk and survive restarts.
  • Backups: snapshot the /var/lib/debezium data disk to preserve Kafka logs, connector configs and offsets.
  • Upgrades: replace the Kafka release under /opt/kafka or the Debezium plugins under /opt/debezium-plugins, then restart the services.
  • Security patches: unattended-upgrades remains enabled so the OS continues to receive security updates automatically.

Support

cloudimg provides 24/7 expert support for this image. Contact support@cloudimg.co.uk.

Debezium and Apache Kafka are trademarks of the Apache Software Foundation. This image is produced by cloudimg and is not affiliated with or endorsed by the Debezium project or the Apache Software Foundation. Debezium and Apache Kafka are distributed under the Apache License 2.0.