Apache Airflow 3.2 on Ubuntu 22.04 on Azure User Guide
Overview
This image runs Apache Airflow 3.2.0 as a single node standalone deployment on Ubuntu 22.04 LTS. The API server (which serves the web UI + REST API), scheduler, triggerer, and DAG processor each run as a dedicated systemd unit under the airflow service user. In Airflow 3.x the DAG processor is a separate mandatory process — the scheduler no longer parses DAG files inline as it did in 2.x. The metadata backend is PostgreSQL 14 running on the same virtual machine, bound to 127.0.0.1 and reachable only by processes on the host. The executor is LocalExecutor, so task instances run as subprocesses of the scheduler with no external Celery or Kubernetes dependency.
The image ships with an admin user already created in the metadata database and one example DAG, example_hello_world, pre staged in /opt/airflow/dags. The admin password is baked at build time and is identical across all customers deploying from the same gallery image version, so the first thing you should do after logging in is rotate it per Section 12.
The image is intended for teams that want a single node Airflow up and running in minutes, without spending hours on packaging, venv plumbing, PostgreSQL role creation, systemd units, and first admin user provisioning. It is not a multi node Airflow with a shared executor, it is not TLS encrypted out of the box, and it does not ship with LDAP, SSO, Redis, Celery workers, or a KubernetesExecutor. Section 13 documents the recommended path for hardening and scaling beyond a single node.
The brand is lowercase cloudimg throughout this guide. All cloudimg URLs in this guide use the form https://www.cloudimg.co.uk.
Prerequisites
Before you deploy this image you need:
- A Microsoft Azure subscription where you can create resource groups, virtual networks, and virtual machines
- Azure role permissions equivalent to Contributor on the target resource group
- An SSH public key for first login to the admin user account
- A virtual network and subnet in the same region as the Azure Compute Gallery the image is published into, with an associated network security group
- The Azure CLI (
azversion 2.50 or later) installed locally if you intend to use the CLI deployment path in Section 4 - The cloudimg Apache Airflow 3.2 offer enabled on your tenant in Azure Marketplace
Step 1: Deploy the Virtual Machine from the Azure Portal
Navigate to Marketplace in the Azure Portal, search for Apache Airflow 3.2, and select the cloudimg publisher entry. Click Create to begin the wizard.
On the Basics tab choose your subscription, target resource group, and region. The region must match the region your Azure Compute Gallery exposes the image in. Set the virtual machine name. Choose SSH public key as the authentication type, set the username to a name of your choice, and paste your SSH public key.
On the Disks tab the recommended OS disk type is Premium SSD. Leave the OS disk size at the default. The PostgreSQL metadata directory and the Airflow logs directory both live on the OS disk on the shipped image; for any real workload you should attach a separate Premium SSD data disk and relocate at least /var/lib/postgresql onto it.
On the Networking tab select your existing virtual network and subnet. Attach a network security group that allows inbound TCP 22 from your management IP range and inbound TCP 8080 only from the CIDRs you want to allow to reach the Airflow web UI. Do not expose 8080 to the public internet. The API server runs plain HTTP on 8080 and the admin role has full write access to the metadata database, so any exposure beyond a trusted management network requires the HTTPS reverse proxy described in Section 13.
On the Management, Monitoring, and Advanced tabs the defaults are appropriate. Click Review + create, wait for validation to pass, then click Create. Deployment takes around three minutes.
Step 2: Deploy the Virtual Machine from the Azure CLI
If you prefer the command line, use the gallery image resource identifier as the source. The exact resource identifier is published on your Partner Center plan. A representative invocation:
RG="airflow-prod"
LOCATION="eastus"
VM_NAME="airflow-1"
ADMIN_USER="airflowops"
GALLERY_IMAGE_ID="/subscriptions/<sub-id>/resourceGroups/azure-cloudimg/providers/Microsoft.Compute/galleries/cloudimgGallery/images/apache-airflow-3-2-ubuntu-22-04/versions/1.0.20260417"
SSH_KEY="$(cat ~/.ssh/id_rsa.pub)"
az group create --name "$RG" --location "$LOCATION"
az network vnet create \
--resource-group "$RG" \
--name airflow-vnet \
--address-prefix 10.30.0.0/16 \
--subnet-name airflow-subnet \
--subnet-prefix 10.30.1.0/24
az network nsg create --resource-group "$RG" --name airflow-nsg
az network nsg rule create \
--resource-group "$RG" --nsg-name airflow-nsg \
--name allow-ssh-mgmt --priority 100 \
--source-address-prefixes "<your-mgmt-cidr>" \
--destination-port-ranges 22 --access Allow --protocol Tcp
az network nsg rule create \
--resource-group "$RG" --nsg-name airflow-nsg \
--name allow-airflow-ui --priority 110 \
--source-address-prefixes "<your-mgmt-cidr>" \
--destination-port-ranges 8080 --access Allow --protocol Tcp
az vm create \
--resource-group "$RG" \
--name "$VM_NAME" \
--image "$GALLERY_IMAGE_ID" \
--size Standard_D2s_v3 \
--admin-username "$ADMIN_USER" \
--ssh-key-values "$SSH_KEY" \
--vnet-name airflow-vnet --subnet airflow-subnet \
--nsg airflow-nsg \
--os-disk-size-gb 64
For a production deployment replace the <your-mgmt-cidr> placeholder with a tight management source range. The API server listens on 0.0.0.0:8080, so the NSG rule is the only thing preventing arbitrary clients from reaching the UI.
Step 3: Connect via SSH
After deployment, find the public or private IP of the new virtual machine. From your management host:
ssh airflowops@<vm-ip>
The first login may take a few seconds while cloud init finalises. Once you have a shell, the API server, scheduler, triggerer, and PostgreSQL have all been started by systemd and the example DAG has been parsed by the scheduler.
Step 4: Retrieve the Admin Credentials
The initial admin credentials are written to a single file at build time, shipped inside the image:
sudo cat /stage/scripts/airflow-credentials.log
You will see something like:
# Apache Airflow 3.2 on Ubuntu 22.04 — initial credentials
# Generated on: 2026-04-17T09:45:00Z
#
# These credentials are shipped inside the captured image. All customers
# deploying from this image share the same initial admin password —
# ROTATE IT IMMEDIATELY on first login (Security -> List Users -> Edit).
#
# Web UI: http://<vm public ip>:8080
# Admin username: admin
# Admin password: 4f8aB2dQ9pK1xVnE7sT
#
# PostgreSQL metadata DB (localhost only, bound to 127.0.0.1:5432):
# database: airflow
# username: airflow
# password: 8Zc4pK1vXnE7sTuYrL6c
#
# Fernet key (also in /opt/airflow/airflow.cfg):
# qHFNMfFT0aMVg1n_Vw6LLsB7Jz3GQKxX4uEo9r2bIh8=
Because this password is baked at build time, every virtual machine launched from the same gallery image version starts with the same initial admin password. Rotate it immediately per Section 12. Once rotated, this file no longer reflects the live password — it remains as a record of the shipped password only.
Step 5: Open the Web UI and Log In
Point your browser at the Airflow web UI:
http://<vm-ip>:8080
Log in with username admin and the password from Step 4.
Immediately after login, rotate the password:
- Click the Security menu in the top right
- Choose List Users
- Click the edit (pencil) icon next to
admin - Enter a new password in both password fields
- Save
The rotated password is stored (hashed) in the ab_user table of the PostgreSQL metadata database. The shipped credentials file at /stage/scripts/airflow-credentials.log is not updated automatically after rotation — record the new password in your own secrets manager.
Step 6: Explore the Example DAG
The image ships with one example DAG, example_hello_world, at /opt/airflow/dags/example_hello_world.py. In the web UI:
- On the DAGs page locate
example_hello_world - Toggle the pause switch on the left to unpause the DAG
- Click the play (▶) icon on the right and choose Trigger DAG
- Watch the grid view — the single task
greettransitionsqueued → running → successwithin 10 seconds
You can verify the same end-to-end pipeline from the command line without the UI. The airflow dags test subcommand executes the DAG in-process without the scheduler. Run it as the airflow service user:
sudo -u airflow \
env AIRFLOW_HOME=/opt/airflow \
/opt/airflow/venv/bin/airflow dags test example_hello_world
You should see the log line hello from cloudimg — Apache Airflow 3.2 on Ubuntu 22.04 and the command should exit with status 0. This is the same path the cloudimg validate script uses to confirm the image boots into a working DAG before it is captured.
Step 7: Author Your First DAG
DAGs live at /opt/airflow/dags. The scheduler watches the directory and picks up new files automatically (by default within 30 seconds). Drop a Python file into that directory and the scheduler parses it, checks it for import errors, and registers it in the metadata database.
sudo tee /opt/airflow/dags/my_first_dag.py >/dev/null <<'PY'
from __future__ import annotations
import logging
from datetime import datetime
from airflow.decorators import dag, task
@dag(
dag_id="my_first_dag",
start_date=datetime(2026, 1, 1),
schedule=None,
catchup=False,
tags=["cloudimg", "example"],
)
def my_first_dag():
@task
def hello(name: str) -> str:
msg = f"hello {name} from my first DAG"
logging.info(msg)
return msg
hello("world")
my_first_dag()
PY
sudo chown airflow:airflow /opt/airflow/dags/my_first_dag.py
sudo chmod 0644 /opt/airflow/dags/my_first_dag.py
Refresh the DAGs page after about 30 seconds and my_first_dag should appear. If it does not, check for import errors:
sudo -u airflow \
env AIRFLOW_HOME=/opt/airflow \
/opt/airflow/venv/bin/airflow dags list-import-errors
Step 8: Server Components
The deployed image contains the following components:
| Component | Version | Purpose |
|---|---|---|
| Apache Airflow | 3.2.0 | Workflow orchestration — DAG authoring, scheduling, and monitoring |
| Python | 3.10 (Ubuntu default) | Host interpreter backing the Airflow venv |
| PostgreSQL | 14 | Metadata backend, local-only (bound to 127.0.0.1) |
| Ubuntu | 22.04 LTS | Base operating system |
| systemd units | airflow-apiserver, airflow-scheduler, airflow-triggerer, airflow-dag-processor, postgresql | Process supervision — API server serves web UI + REST API; dag-processor parses DAG files |
The Airflow processes run under the dedicated airflow system user. All four processes (API server, scheduler, triggerer, DAG processor) read configuration from /opt/airflow/airflow.cfg and connect to PostgreSQL over a local socket on 127.0.0.1:5432.
Step 9: Filesystem Layout
| Path | Owner | Purpose |
|---|---|---|
/opt/airflow/ |
airflow:airflow 0750 | AIRFLOW_HOME — config, logs, DAGs, venv |
/opt/airflow/airflow.cfg |
airflow:airflow 0640 | Main Airflow configuration, including fernet_key and sql_alchemy_conn |
/opt/airflow/venv/ |
airflow:airflow | Python virtual environment with apache-airflow==3.2.0 installed |
/opt/airflow/dags/ |
airflow:airflow 0750 | DAG source files — scheduler watches this directory |
/opt/airflow/logs/ |
airflow:airflow 0750 | Task and scheduler logs |
/opt/airflow/plugins/ |
airflow:airflow 0750 | Airflow plugins (empty on shipped image) |
/home/airflow/ |
airflow:airflow 0750 | Service user home — holds operator helper scripts |
/home/airflow/setEnv.sh |
airflow:airflow 0750 | source to get an Airflow ready shell (activates venv, exports AIRFLOW_HOME) |
/home/airflow/start_all.sh |
airflow:airflow 0750 | Starts all three Airflow systemd units |
/home/airflow/stop_all.sh |
airflow:airflow 0750 | Stops all three Airflow systemd units (reverse order) |
/etc/systemd/system/airflow-apiserver.service |
root:root 0644 | API server unit (web UI + REST API on port 8080) |
/etc/systemd/system/airflow-scheduler.service |
root:root 0644 | Scheduler unit |
/etc/systemd/system/airflow-triggerer.service |
root:root 0644 | Triggerer unit |
/etc/systemd/system/airflow-dag-processor.service |
root:root 0644 | DAG processor unit (parses files in /opt/airflow/dags/) |
/var/lib/postgresql/14/main/ |
postgres:postgres | PostgreSQL data directory (contains the airflow database) |
/stage/scripts/airflow-credentials.log |
root:root 0600 | Shipped admin + PostgreSQL credentials |
Step 10: Managing Airflow
The three Airflow processes and PostgreSQL are started by systemd at boot. Manage them as follows:
# Status of all Airflow units
sudo systemctl status airflow-apiserver airflow-scheduler airflow-triggerer airflow-dag-processor
# Stop (reverse dependency order)
sudo systemctl stop airflow-triggerer airflow-dag-processor airflow-scheduler airflow-apiserver
# Start
sudo systemctl start airflow-apiserver airflow-scheduler airflow-triggerer airflow-dag-processor
# Restart a single unit
sudo systemctl restart airflow-scheduler
# Tail live logs
sudo journalctl -u airflow-apiserver -f
sudo journalctl -u airflow-scheduler -f
sudo journalctl -u airflow-triggerer -f
sudo journalctl -u airflow-dag-processor -f
Or use the shipped helper scripts:
sudo /home/airflow/start_all.sh
sudo /home/airflow/stop_all.sh
For an Airflow-ready shell to run airflow CLI subcommands ad hoc, source the env helper:
sudo -u airflow bash
source /home/airflow/setEnv.sh
airflow dags list
airflow tasks list example_hello_world
PostgreSQL is managed by its own systemd unit:
sudo systemctl status postgresql
sudo systemctl restart postgresql
Step 11: Troubleshooting
The web UI will not load. Check sudo systemctl status airflow-apiserver.service (the API server hosts the web UI and REST API on the same port 8080). If active (running), check the NSG rule on port 8080 and confirm you are connecting from an allowed source IP. If inactive, check sudo journalctl -u airflow-apiserver.service -n 200 --no-pager for the failure.
A DAG does not appear in the UI. First check for import errors:
sudo -u airflow env AIRFLOW_HOME=/opt/airflow /opt/airflow/venv/bin/airflow dags list-import-errors
If the DAG imports cleanly but still does not appear, confirm the scheduler is running (sudo systemctl status airflow-scheduler) and give it up to 30 seconds to parse. The scheduler parses DAG files on a background loop; new files are not reflected in the UI until the next parse cycle.
Task runs are stuck in "queued". The single node image uses LocalExecutor. Confirm executor = LocalExecutor is set in /opt/airflow/airflow.cfg and that the scheduler unit is running. Tasks are executed as subprocesses of the scheduler; if the scheduler is stopped, queued tasks never run.
API server cannot connect to the metadata database. Check sudo systemctl status postgresql. If PostgreSQL is down the API server start up will loop. Restart it: sudo systemctl restart postgresql && sudo systemctl restart airflow-apiserver. Confirm the airflow database exists and the credentials in /opt/airflow/airflow.cfg match:
sudo -u postgres psql -d airflow -c "SELECT 1;"
Login fails with "Invalid login". The admin password stored in the metadata database is the source of truth. If you rotated the password per Section 5 and then lost the new one, reset it from the command line:
sudo -u airflow env AIRFLOW_HOME=/opt/airflow /opt/airflow/venv/bin/airflow users reset-password --username admin
Step 12: Rotating Credentials
Rotate the admin UI password via the web UI (Section 5) or CLI:
sudo -u airflow env AIRFLOW_HOME=/opt/airflow /opt/airflow/venv/bin/airflow users reset-password --username admin
The PostgreSQL password for the airflow role is set at build time and stored in two places: the sql_alchemy_conn line of /opt/airflow/airflow.cfg, and the airflow-credentials.log file in /stage/scripts. To rotate it:
# Stop Airflow processes so no connections linger
sudo systemctl stop airflow-triggerer airflow-scheduler airflow-apiserver
# Change the role password
NEW_PG_PW="$(openssl rand -base64 24 | tr -d '=+/' | cut -c1-20)"
sudo -u postgres psql -c "ALTER USER airflow WITH PASSWORD '${NEW_PG_PW}';"
# Update airflow.cfg with the new connection string
sudo sed -i -E \
"s|^sql_alchemy_conn\s*=.*|sql_alchemy_conn = postgresql+psycopg2://airflow:${NEW_PG_PW}@localhost/airflow|" \
/opt/airflow/airflow.cfg
# Start Airflow back up
sudo systemctl start airflow-apiserver airflow-scheduler airflow-triggerer
The Fernet key (used to encrypt Airflow connections and variables at rest) is set at build time and should be rotated only if you have a reason to — rotation requires decrypting all existing encrypted values with the old key and re encrypting with the new one. The Airflow documentation at https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/fernet.html describes the procedure.
Step 13: Security Recommendations
- Rotate the admin password immediately after first login (Section 5)
- Restrict the network security group on port 8080 to the smallest possible source CIDR — Airflow ships plain HTTP and the admin role has full write access to the metadata DB
- Add an HTTPS reverse proxy (nginx or Azure Application Gateway) in front of the API server before exposing it beyond a private management network
- Configure RBAC roles for team access — the shipped image has only the initial
adminuser. The Airflow web UI at Security → List Roles lets you create Viewer, User, Op, and custom roles, and assign them to named users - Treat
/opt/airflow/airflow.cfgas a secret — it contains the PostgreSQL password and the Fernet key, and is readable by theairflowgroup only (mode 0640) - Take regular snapshots of
/var/lib/postgresql/14/mainusing Azure Disk Snapshots so the metadata database (including DAG run history, connection records, and user records) can be restored - Subscribe to the Apache Airflow security announce list at https://airflow.apache.org/community/ and apply security patches as they are published
Step 14: Support and Licensing
Apache Airflow is licensed under the Apache License 2.0. The full text is reproduced in /opt/airflow/venv/lib/python3.10/site-packages/apache_airflow-3.2.0.dist-info/LICENSE on the deployed image. cloudimg distributes the unmodified upstream apache-airflow==3.2.0 package published to PyPI by the Apache Software Foundation, pinned via the Airflow project's official constraints file for Python 3.10, with cloudimg authored configuration, systemd units, helper scripts, and PostgreSQL backend layered on top.
For support with the cloudimg image itself contact support@cloudimg.co.uk or visit https://www.cloudimg.co.uk/support. For Apache Airflow questions outside the scope of cloudimg packaging, the Apache Airflow community resources at https://airflow.apache.org/community/ are the authoritative reference.