Apache Airflow on AWS User Guide
Overview
This image runs Apache Airflow as a single node deployment. Airflow is installed into a dedicated Python virtual environment, pinned with the Airflow project's official constraints file, so the dependency tree is reproducible. The metadata backend is PostgreSQL running on the same instance, bound to the loopback interface only. The executor is the LocalExecutor, so task instances run as subprocesses with no external Celery or Kubernetes dependency.
The Airflow API server, which serves both the web user interface and the REST API on port 8080, the scheduler, and the DAG processor each run as a dedicated systemd unit under an unprivileged airflow service account. PostgreSQL runs under its own systemd unit. All four units start automatically at boot.
Airflow administrator and PostgreSQL credentials are generated on the first boot of every deployed instance. Two instances launched from the same Amazon Machine Image never share passwords. The initial administrator password and the PostgreSQL password are written to /root/airflow-credentials.txt with mode 0600 so that only the root user can read them.
The image ships with one example DAG, example_hello_world, so a freshly launched instance has a working pipeline to trigger from the web interface immediately.
Prerequisites
Before you deploy this image you need:
- An Amazon Web Services account where you can launch EC2 instances
- IAM permissions to launch instances, create security groups, and subscribe to AWS Marketplace products
- An EC2 key pair in the target Region for SSH access to the instance
- A VPC and subnet in the target Region, with a security group allowing inbound port 22 from your management network and inbound port 8080 from the network you will reach the Airflow web interface from
- The AWS CLI (version 2) installed locally if you plan to deploy from the command line
Step 1: Launch the Instance from the AWS Marketplace
Sign in to the AWS Management Console, open the EC2 service, and select Launch instance. Under Application and OS Images choose AWS Marketplace AMIs and search for Apache Airflow. Select the cloudimg listing and choose Select, then Continue on the subscription summary.
Pick an instance type of m5.large or larger. Airflow runs a scheduler, a DAG processor, an API server and a PostgreSQL database, so it benefits from at least two vCPUs and 8 GiB of memory. Choose your EC2 key pair under Key pair (login). Under Network settings select your VPC and subnet, and either create or select a security group that allows inbound port 22 from your management network and inbound port 8080 from the network you will reach the web interface from. Leave the root volume at the default size or larger.
Select Launch instance. First boot initialisation takes approximately one to two minutes after the instance state becomes Running and the status checks pass.
Step 2: Launch the Instance from the AWS CLI
The following block launches an instance from the cloudimg Apache Airflow Marketplace AMI into an existing subnet and security group. Replace <ami-id> with the AMI ID shown on the Marketplace listing, <key-name> with your EC2 key pair name, <subnet-id> with your subnet ID, and <security-group-id> with a security group that opens ports 22 and 8080 as described above.
aws ec2 run-instances \
--image-id <ami-id> \
--instance-type m5.large \
--key-name <key-name> \
--subnet-id <subnet-id> \
--security-group-ids <security-group-id> \
--block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":30,"VolumeType":"gp3"}}]' \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=apache-airflow-01}]'
The command prints a JSON document on success. Note the instance ID, then retrieve its public address once it is running with aws ec2 describe-instances --instance-ids <instance-id> --query "Reservations[].Instances[].PublicIpAddress" --output text.
Step 3: Connect and Retrieve Initial Credentials
Connect over SSH with the key pair you selected and the public IP address from step 2. The SSH login user depends on the operating system of the AMI variant you launched:
| AMI variant | SSH login user |
|---|---|
| Apache Airflow on Ubuntu 24.04 | ubuntu |
The first boot service runs before the SSH daemon becomes ready, so the credentials file is always in place when you log in for the first time.
sudo cat /root/airflow-credentials.txt
You will see a plain text file containing the Airflow URL, the administrator username (admin), the administrator password, and the PostgreSQL database name, user, and password. Copy these values somewhere secure (a password manager or encrypted vault). Do not commit them to source control.
From the same SSH session you can confirm the deployment is healthy. The Airflow health endpoint needs no authentication:
curl -fsS http://127.0.0.1:8080/api/v2/monitor/health
A JSON document reporting the metadatabase, scheduler and DAG processor components confirms the full stack is up.
Step 4: First Login to the Airflow Web Interface
Open a web browser and navigate to http://<public-ip>:8080/. Airflow presents the sign-in form to visitors who do not yet have a session.

Enter the administrator username admin and the administrator password from /root/airflow-credentials.txt. Select Sign In. On the first successful sign in Airflow records your session and shows the navigation rail down the left of every page.
Step 5: The DAGs List
After signing in, the Dags page lists every workflow registered on the instance. The shipped image registers one example DAG, example_hello_world, tagged cloudimg and example.

Each row shows the DAG's schedule, its most recent run, the next scheduled run, and a small bar of recent run history. The toggle on the right pauses or unpauses the DAG, and the play icon triggers a manual run. The filter controls at the top of the page narrow the list by run state, by paused status, or by tag.
Step 6: Trigger the Example DAG
To run the example DAG from the web interface, locate example_hello_world on the Dags page, make sure its toggle is set to unpaused, and select the play icon, then Trigger. Open the DAG by clicking its name to watch the run.

The DAG detail view shows the grid of task instances down the left, with run history and duration charts on the Overview tab. The single task greet transitions from queued to running to success within a few seconds.
You can run the same pipeline from the command line without the web interface. The airflow dags test subcommand executes the DAG in process. Run it as the airflow service user:
sudo -u airflow env AIRFLOW_HOME=/opt/airflow \
/opt/airflow/venv/bin/airflow dags test example_hello_world
The command exits with status 0 when the DAG completes. This is the same path the cloudimg validate script uses to confirm the image boots into a working DAG before it is captured.
Step 7: Author Your First DAG
DAG files live in /opt/airflow/dags. The DAG processor watches that directory and registers new files automatically. Create a DAG file owned by the airflow user:
sudo -u airflow tee /opt/airflow/dags/my_first_dag.py >/dev/null <<'PYEOF'
from __future__ import annotations
import logging
from datetime import datetime
from airflow.decorators import dag, task
@dag(
dag_id="my_first_dag",
start_date=datetime(2026, 1, 1),
schedule=None,
catchup=False,
tags=["example"],
)
def my_first_dag():
@task
def hello(name: str) -> str:
msg = f"hello {name} from my first DAG"
logging.info(msg)
return msg
hello("world")
my_first_dag()
PYEOF
Refresh the Dags page after about thirty seconds and my_first_dag appears. If it does not, check for import errors:
sudo -u airflow env AIRFLOW_HOME=/opt/airflow \
/opt/airflow/venv/bin/airflow dags list-import-errors
Step 8: Server Components
The deployed image contains the following components:
| Component | Purpose |
|---|---|
| Apache Airflow 3.x | Workflow orchestration: DAG authoring, scheduling, and monitoring |
| Python 3 virtual environment | Isolated interpreter at /opt/airflow/venv with Airflow and its providers installed |
| PostgreSQL | Metadata backend, local only, bound to 127.0.0.1 |
| LocalExecutor | Runs task instances as subprocesses of the scheduler |
| systemd units | airflow-apiserver, airflow-scheduler, airflow-dag-processor, postgresql |
The Airflow processes run under the dedicated airflow system user. The API server hosts both the web interface and the REST API on port 8080. The scheduler decides which task instances to run, and the DAG processor parses the files in /opt/airflow/dags.
Step 9: Filesystem Layout
| Path | Purpose |
|---|---|
/opt/airflow/ |
AIRFLOW_HOME: configuration, logs, DAGs and the virtual environment, on its own EBS volume |
/opt/airflow/airflow.cfg |
Main Airflow configuration, including the executor, the Fernet key and the database connection string |
/opt/airflow/venv/ |
Python virtual environment with Apache Airflow installed |
/opt/airflow/dags/ |
DAG source files: the DAG processor watches this directory |
/opt/airflow/logs/ |
Task and scheduler logs |
/var/lib/postgresql/ |
PostgreSQL data directory holding the Airflow metadata database, on its own EBS volume |
/root/airflow-credentials.txt |
Per instance administrator and PostgreSQL credentials, generated on first boot, root only |
The image lays the Airflow home and the PostgreSQL data directory on separate EBS volumes, so each tier can be resized independently of the operating system disk.
Step 10: Managing the Services
The three Airflow units and PostgreSQL are started by systemd at boot. Manage them as follows:
sudo systemctl status airflow-apiserver airflow-scheduler airflow-dag-processor postgresql
To restart a single unit, for example after editing airflow.cfg:
sudo systemctl restart airflow-scheduler
To follow the live logs of a unit:
sudo journalctl -u airflow-apiserver -n 50 --no-pager
Step 11: Using the REST API
The Airflow REST API is served on the same port 8080 as the web interface. Authenticate by requesting a token with the administrator credentials, then pass the token as a bearer token. The following block authenticates and lists the DAGs:
PASS=$(sudo grep '^airflow.admin.pass=' /root/airflow-credentials.txt | cut -d= -f2-)
TOKEN=$(curl -s -X POST http://127.0.0.1:8080/auth/token \
-H 'Content-Type: application/json' \
-d "{\"username\":\"admin\",\"password\":\"${PASS}\"}" \
| sed -E 's/.*"access_token" *: *"([^"]+)".*/\1/')
curl -s http://127.0.0.1:8080/api/v2/dags -H "Authorization: Bearer ${TOKEN}"
The full REST API reference is published in the Apache Airflow documentation at https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html.
Step 12: Change the Administrator Password
For a production deployment rotate the administrator password that was generated on first boot. From the command line use the airflow users subcommand:
sudo -u airflow env AIRFLOW_HOME=/opt/airflow \
/opt/airflow/venv/bin/airflow users reset-password \
--username admin --password '<new-password>'
You can also create additional named users and assign them roles such as Viewer, User, or Op from Security in the web interface, so team members do not share the administrator account.
Step 13: Enable HTTPS with a Reverse Proxy
The Airflow API server listens on plain HTTP on port 8080. For any deployment beyond a trusted management network, place an HTTPS reverse proxy in front of it so session cookies and API tokens cannot be intercepted. nginx is a common choice:
sudo apt-get update && sudo apt-get install -y nginx certbot python3-certbot-nginx
sudo certbot --nginx -d airflow.your-domain.example \
--non-interactive --agree-tos -m you@your-domain.example --redirect
Configure the nginx server block to proxy to http://127.0.0.1:8080, and set base_url in the [api] section of /opt/airflow/airflow.cfg to your HTTPS URL so Airflow generates correct absolute links. Then restrict the instance security group so port 8080 is reachable only from the proxy.
Step 14: Backups and Maintenance
Airflow has two data sources that must be backed up together: the PostgreSQL metadata database, which holds DAG run history, connection records, variables and user accounts, and the DAG files in /opt/airflow/dags.
sudo -u postgres pg_dump airflow > /var/backups/airflow-db-$(date +%F).sql
sudo tar -czf /var/backups/airflow-dags-$(date +%F).tgz -C /opt/airflow dags
Ship both artifacts to an Amazon S3 bucket or another object store. Because the Airflow home and the PostgreSQL data directory are on their own EBS volumes, you can also take EBS snapshots of those volumes for point in time recovery.
For kernel and package updates, Ubuntu's unattended-upgrades is enabled by default, so security patches apply automatically. To update Airflow itself, install the new version into the virtual environment with pip, pinned to the matching Airflow constraints file, then run airflow db migrate and restart the services.
Step 15: Scaling Beyond a Single Instance
For larger deployments decouple Airflow from the single instance pattern:
- Move the metadata database to Amazon RDS for PostgreSQL and update the connection string in
/opt/airflow/airflow.cfg - Switch from the LocalExecutor to the CeleryExecutor with Amazon ElastiCache for Redis as the broker, and run workers on a fleet of instances
- Or run Airflow on Amazon EKS with the KubernetesExecutor so each task runs in its own pod
- Put the API server behind an Application Load Balancer for high availability of the web interface and REST API
Each of these is documented in the official Apache Airflow documentation at https://airflow.apache.org/docs/ under the deployment and executor sections.
Support
cloudimg provides 24/7/365 expert technical support for this image. Guaranteed response within 24 hours, one hour average for critical issues. Contact support@cloudimg.co.uk.
For general Apache Airflow questions consult the community resources at https://airflow.apache.org/community/ and the documentation at https://airflow.apache.org/docs/.