Application Development AWS

Dagster on AWS User Guide

| Product: Dagster on AWS

Overview

This image runs Dagster, the fast, open source data orchestration platform for the development, production and observation of data assets. You define assets and jobs in plain Python and Dagster gives you a rich web interface to materialize them, schedule them, watch them run and inspect their lineage, with full visibility into every run.

Dagster 1.13.10 is installed into a dedicated Python virtual environment under /opt/dagster and run by an unprivileged dagster system account. Two systemd services run the platform: dagster-webserver.service serves the UI and the GraphQL API on the loopback interface (127.0.0.1:3000), and dagster-daemon.service drives schedules, sensors and the run queue. The Dagster instance directory, DAGSTER_HOME, lives at /var/lib/dagster, a dedicated, independently resizable EBS data volume that holds the instance configuration and the run, event and schedule storage. A small demo project ships at /opt/dagster/demo so the UI is non empty out of the box and you have a working example to learn from.

Dagster's webserver ships with no built in authentication, so it binds to the loopback interface only and is never exposed directly. An nginx reverse proxy publishes the Dagster UI on port 80 behind HTTP Basic authentication, forwarding the GraphQL WebSocket upgrade the UI needs to render and stream live run updates. The admin password is generated on the first boot of every deployed instance, so two instances launched from the same Amazon Machine Image never share a password. It is written to /root/dagster-credentials.txt with mode 0600 so that only the root user can read it.

Prerequisites

Before you deploy this image you need:

  • An Amazon Web Services account where you can launch EC2 instances
  • IAM permissions to launch instances, create security groups, and subscribe to AWS Marketplace products
  • An EC2 key pair in the target Region for SSH access to the instance
  • A VPC and subnet in the target Region, with a security group allowing inbound port 22 from your management network and port 80 for the Dagster UI
  • The AWS CLI (version 2) installed locally if you plan to deploy from the command line

Step 1: Launch the Instance from the AWS Marketplace

Sign in to the AWS Management Console, open the EC2 service, and select Launch instance. Under Application and OS Images choose AWS Marketplace AMIs and search for Dagster. Select the cloudimg listing and choose Select, then Continue on the subscription summary.

Pick an instance type of t3.large or larger. Choose your EC2 key pair under Key pair (login). Under Network settings select your VPC and subnet, and either create or select a security group that opens port 22 from your management network and port 80 for the Dagster UI. Leave the root volume at the default size or larger.

Select Launch instance. First boot initialisation takes a few seconds after the instance state becomes Running and the status checks pass; this is when the per instance admin password is generated.

Step 2: Launch the Instance from the AWS CLI

The following block launches an instance from the cloudimg Dagster Marketplace AMI into an existing subnet and security group. Replace <ami-id> with the AMI ID shown on the Marketplace listing, <key-name> with your EC2 key pair name, <subnet-id> with your subnet ID, and <security-group-id> with a security group that opens ports 22 and 80 as described above.

aws ec2 run-instances \
  --image-id <ami-id> \
  --instance-type t3.large \
  --key-name <key-name> \
  --subnet-id <subnet-id> \
  --security-group-ids <security-group-id> \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=dagster}]'

When the instance reaches the Running state and its status checks pass, note its public IP address or DNS name from the EC2 console or with aws ec2 describe-instances.

Step 3: Connect to Your Instance

Connect over SSH using your key pair and the login user for your operating system variant.

OS variant SSH login user
Ubuntu 24.04 ubuntu
ssh -i <key-name>.pem ubuntu@<public-ip>

Step 4: Retrieve the Admin Password

The admin password is unique to your instance and was generated on first boot. Read the credentials file as root with sudo cat /root/dagster-credentials.txt.

The file lists the Dagster UI URL, the admin user (admin) and the generated password, along with hints for pointing the workspace at your own code location and installing extra Python packages. Keep this password somewhere safe.

Step 5: Sign In to the Dagster UI

The Dagster UI is served on port 80 by nginx, which proxies to the webserver on 127.0.0.1:3000 with the GraphQL WebSocket upgrade the UI needs. In a browser, go to:

http://<instance-public-ip>/

You are prompted for credentials by the nginx proxy. Sign in as admin with the password from the credentials file. The Dagster UI then loads. Open the Lineage view to see the global asset graph of the demo project, with the three software defined assets raw_records, filtered_records and summary wired together.

The Dagster UI global asset lineage graph of the demo project

Materialize the assets with Materialize all in the top right of the asset graph, or run the bundled demo_job from the Jobs view. Dagster launches a run and the daemon executes each asset in dependency order. Open the run to watch it stream live, with a Gantt timeline of the steps and a structured event log delivered over the GraphQL websocket.

The Dagster UI run view streaming the demo run with per step logs

The Runs view lists every run, with its target assets, who launched it, its status and its duration. From here you can re execute a run, filter by status, or drill into any run for its full event history.

The Dagster UI runs list showing a materialization of the demo assets

Step 6: Confirm Dagster Is Running

Over SSH, confirm the webserver, the daemon and the nginx proxy are active:

sudo systemctl is-active dagster-webserver dagster-daemon nginx

Then confirm the webserver answers on the loopback interface. The /server_info endpoint returns the Dagster version JSON and is left unauthenticated so external health checks work:

curl -s http://127.0.0.1:3000/server_info

You should see all three services reported as active, and /server_info returns a small JSON document with the Dagster version. The webserver listens on 127.0.0.1:3000 (loopback only) and nginx publishes the UI on port 80 behind the login.

Confirm the installed Dagster version with:

/opt/dagster/venv/bin/dagster --version

Step 7: Replace the Demo Project with Your Own

A demo project ships at /opt/dagster/demo: a definitions.py with three assets and a job, and a workspace.yaml the webserver loads on start. To go live with your own code, point the workspace at your own code location and restart the services. Edit the workspace file to load your module or Python file instead of the demo definitions.py, then restart both services:

sudo systemctl restart dagster-webserver.service dagster-daemon.service

After the restart, reload the UI in your browser; the Deployment view shows your code location loaded. Keep your project files readable by the dagster user so the services can load them. The webserver runs dagster-webserver -h 127.0.0.1 -p 3000 -w /opt/dagster/demo/workspace.yaml, so anything the workspace points at is served on port 80 through the same authenticating proxy.

Step 8: Install Extra Python Packages

Dagster and its dependencies are installed into a dedicated virtual environment at /opt/dagster/venv. To use additional Python libraries or Dagster integration packages in your project, install them into that same virtual environment so the services pick them up:

/opt/dagster/venv/bin/pip --version

Then install whatever your project needs, for example sudo /opt/dagster/venv/bin/pip install dagster-aws dagster-dbt pandas, and restart the services with sudo systemctl restart dagster-webserver.service dagster-daemon.service. Because the virtual environment lives under /opt/dagster, your installed packages persist with the image.

Step 9: The Data Volume

The Dagster instance directory, DAGSTER_HOME, lives on a dedicated EBS volume mounted at /var/lib/dagster. This holds the dagster.yaml instance configuration and the SQLite backed run, event log and schedule storage, so a single node deployment needs no external database. Keeping it on its own volume lets you resize or snapshot the run history independently of the operating system disk. Confirm the mount with:

df -h /var/lib/dagster

To grow the instance directory, expand the EBS volume in the AWS console, then grow the filesystem on the instance with sudo resize2fs on the underlying device. For a larger or multi node deployment, swap the default SQLite storage for PostgreSQL by editing dagster.yaml and restarting the services.

Step 10: Enable HTTPS

The Dagster UI is served over plain HTTP on port 80 by nginx. For production use, place it behind TLS. Obtain a certificate for your domain (for example with a managed certificate on an Application Load Balancer in front of the instance, or with Certbot installed on the instance), then configure nginx to listen on 443 with your certificate and proxy to 127.0.0.1:3000 exactly as the bundled site does for port 80, keeping the GraphQL WebSocket upgrade headers and the HTTP Basic authentication in place so the UI continues to render and stream. Restrict the security group so ports 80 and 443 are reachable only from the networks that need the UI.

Step 11: Backup and Maintenance

Back up your deployment by snapshotting the /var/lib/dagster EBS volume, which captures the instance configuration and the entire run, event and schedule history. Keep your Dagster project under version control and redeploy it to a fresh instance to recover the code. Apply operating system security updates with sudo apt-get update && sudo apt-get upgrade and reboot when a new kernel is installed; the webserver, the daemon and nginx start automatically on boot.

Support

This image is published and supported by cloudimg. Support covers deployment, defining assets and jobs, code locations and workspaces, scheduling and sensors, swapping the SQLite instance storage for PostgreSQL, the authenticating proxy, TLS and performance tuning. Contact cloudimg through the support channel listed on the AWS Marketplace listing.

All product and company names are trademarks or registered trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.