Application Development Azure

Dagster on Ubuntu 24.04 on Azure User Guide

| Product: Dagster on Ubuntu 24.04 LTS on Azure

Overview

Dagster is the data orchestrator for the whole development lifecycle, built around data assets. Define your tables, models and reports as assets, and Dagster builds a lineage graph, runs and schedules them, and gives you a rich web UI to observe every run, asset materialisation, schedule and sensor. It is a leading choice for data engineers building observable, testable data platforms. The cloudimg image installs Dagster 1.13.10 in a dedicated Python virtualenv, runs the Dagster webserver and daemon as systemd services bound to loopback behind an nginx reverse proxy on port 80 with HTTP Basic auth, persists run and event storage on a dedicated Azure data disk, and generates a unique login password on the first boot of every VM. Backed by 24/7 cloudimg support.

What is included:

  • Dagster 1.13.10 (webserver + daemon) in a Python virtualenv at /opt/dagster/venv
  • A sample asset project at /opt/dagster/project
  • The Dagster UI on :80 via nginx, with HTTP Basic auth (Dagster has no built-in auth)
  • A per-VM admin password in a root-only file
  • A dedicated Azure data disk at /var/lib/dagster holding the SQLite run/event/schedule storage - separate from the OS disk and re-provisioned with every VM
  • dagster-webserver.service, dagster-daemon.service and nginx.service as systemd units
  • 24/7 cloudimg support

Prerequisites

An active Azure subscription, an SSH key pair, and a VNet + subnet in the target region. Standard_B2ms (2 vCPU / 8 GiB RAM) is a good starting point; scale up for heavier pipelines. NSG inbound: allow 22/tcp from your management network and 80/tcp for the UI (front with TLS for public exposure - see Enabling HTTPS).

Step 1 - Deploy from the Azure Marketplace

Sign in to the Azure Portal, choose Create a resource, search the Marketplace for Dagster by cloudimg, and select Create. On Basics pick your subscription, resource group, region and size; under Administrator account choose SSH public key and paste your key; under Inbound port rules allow SSH (22) and HTTP (80). Review the dedicated data disk on the Disks tab, then Review + create -> Create.

Step 2 - Deploy from the Azure CLI

az vm create \
  --resource-group <your-rg> \
  --name dagster \
  --image <marketplace-image-urn> \
  --size Standard_B2ms \
  --admin-username azureuser \
  --ssh-key-values ~/.ssh/id_ed25519.pub \
  --vnet-name <your-vnet> --subnet <your-subnet> \
  --public-ip-sku Standard

az vm open-port --resource-group <your-rg> --name dagster --port 80 --priority 1010

Step 3 - Connect to your VM

ssh azureuser@<vm-public-ip>

Step 4 - Confirm the services are running

systemctl is-active dagster-webserver.service dagster-daemon.service nginx.service

All three report active. The webserver serves the UI and the daemon runs schedules and sensors.

Step 5 - Retrieve your password

The admin password is generated uniquely on the first boot of your VM and written to a root-only file:

sudo cat /root/dagster-credentials.txt

This file contains DAGSTER_ADMIN_USER (admin), DAGSTER_ADMIN_PASSWORD and the URL. Store the password somewhere safe.

Step 6 - Check the health endpoint

nginx serves an unauthenticated health endpoint for load balancers and probes:

curl -s http://localhost/health

It returns ok.

Step 7 - Open the Dagster UI

Browse to http://<vm-public-ip>/ and sign in as admin with the password from Step 5. The Lineage view shows the global asset graph from the bundled sample project; Catalog, Runs and Automation let you drill into assets, runs, schedules and sensors.

Dagster global asset lineage

The asset catalog lists every asset and its latest materialisation:

Dagster asset catalog

The Runs page shows every run with its status, duration and logs:

Dagster runs

Step 8 - Materialise the sample assets

In the Lineage view, click Materialize all to run the sample pipeline - Dagster executes raw_numbers -> doubled_numbers -> total and records the run and asset materialisations, which you can inspect on the Runs page. The bundled demo_pipeline schedule will also run it every 15 minutes once you turn the schedule on under Automation.

Step 9 - Confirm the server from the command line

The Dagster webserver reports its version at /server_info behind the same Basic auth:

curl -s -u admin:<DAGSTER_ADMIN_PASSWORD> http://localhost/server_info; echo

It returns JSON including "dagster_version": "1.13.10".

Step 10 - Confirm state lives on the dedicated disk

The Dagster instance (DAGSTER_HOME) and its SQLite storage are on the dedicated Azure data disk so they survive OS changes and can be resized independently:

findmnt /var/lib/dagster

The mount is backed by a separate Azure data disk captured into the image and re-provisioned on every VM.

Bring your own pipelines

Replace the sample project at /opt/dagster/project/definitions.py with your own assets and jobs (or point workspace.yaml at your own Python module), then restart the services:

sudo systemctl restart dagster-webserver dagster-daemon

Install any extra Python packages your code needs into the bundled virtualenv with sudo /opt/dagster/venv/bin/pip install <packages>.

Enabling HTTPS

The nginx reverse proxy terminates plain HTTP on port 80. For public exposure, put a certificate in front of it - add a DNS name for the VM and use the companion cloudimg nginx-ssl-certbot image as a TLS reverse proxy, or install certbot and extend the existing nginx site with a listen 443 ssl; server block. Keep the Dagster webserver bound to loopback so the only public surface is the authenticated, TLS-terminated proxy.

Maintenance

  • Backups: snapshot the /var/lib/dagster data disk to back up run and event history.
  • Database: the image uses SQLite, ideal for a single server; for high throughput, configure PostgreSQL storage in /var/lib/dagster/dagster.yaml.
  • Upgrades: sudo /opt/dagster/venv/bin/pip install -U dagster dagster-webserver then sudo systemctl restart dagster-webserver dagster-daemon.
  • Security patches: unattended-upgrades remains enabled so the OS continues to receive security updates automatically.

Support

cloudimg provides 24/7 expert support for this image. Contact support@cloudimg.co.uk.