Application Development Azure

Apache DolphinScheduler on Ubuntu 24.04 on Azure User Guide

| Product: Apache DolphinScheduler on Ubuntu 24.04 LTS on Azure

Overview

Apache DolphinScheduler is a distributed, easy-to-extend visual workflow scheduler for big-data and ETL pipelines. You design DAGs of tasks (shell, SQL, Spark, Flink, Python, HTTP and more) in a drag-and-drop editor, schedule them with cron expressions, and monitor every run with retries, alerts and lineage. The cloudimg image installs DolphinScheduler 3.4.2 as the all-in-one standalone server (master, worker, api-server, alert-server and the web UI in a single JVM on Java 17) at /opt/dolphinscheduler, runs it as a dedicated dolphinscheduler system user bound to loopback behind an nginx reverse proxy on TCP 80, persists all state on a dedicated Azure data disk, and rotates the admin password to a unique value on the first boot of every VM. Backed by 24/7 cloudimg support.

What is included:

  • Apache DolphinScheduler 3.4.2 (standalone server) at /opt/dolphinscheduler on Java 17
  • The web UI and the REST API on context path /dolphinscheduler, fronted by nginx on :80
  • DolphinScheduler's own username/password authentication with a per-VM admin password in a root-only file
  • A dedicated Azure data disk at /var/lib/dolphinscheduler holding the bundled database, the workflow repository and run history — separate from the OS disk and re-provisioned with every VM
  • dolphinscheduler.service + nginx.service as systemd units, enabled and active
  • An unauthenticated /health endpoint for load balancers and probes
  • 24/7 cloudimg support

Prerequisites

An active Azure subscription, an SSH key pair, and a VNet + subnet in the target region. Standard_B4ms (4 vCPU / 16 GiB RAM) is a good starting point for the standalone server; scale up for higher task concurrency. NSG inbound: allow 22/tcp from your management network and 80/tcp for the web UI and API (front with TLS for public exposure — see Enabling HTTPS).

Step 1 — Deploy from the Azure Marketplace

Sign in to the Azure Portal, choose Create a resource, search the Marketplace for Apache DolphinScheduler by cloudimg, and select Create. On Basics pick your subscription, resource group, region and size; under Administrator account choose SSH public key and paste your key; under Inbound port rules allow SSH (22) and HTTP (80). Review the dedicated data disk on the Disks tab, then Review + createCreate.

Step 2 — Deploy from the Azure CLI

az vm create \
  --resource-group <your-rg> \
  --name dolphinscheduler \
  --image <marketplace-image-urn> \
  --size Standard_B4ms \
  --admin-username azureuser \
  --ssh-key-values ~/.ssh/id_ed25519.pub \
  --vnet-name <your-vnet> --subnet <your-subnet> \
  --public-ip-sku Standard

az vm open-port --resource-group <your-rg> --name dolphinscheduler --port 80 --priority 1010

Step 3 — Connect to your VM

ssh azureuser@<vm-public-ip>

Step 4 — Confirm the services are running

systemctl is-active dolphinscheduler.service nginx.service

Both services report active. The standalone server is a JVM application and takes one to two minutes to finish starting on the first boot; once it is up it stays up across reboots.

Step 5 — Check the health endpoint

nginx serves an unauthenticated health endpoint for load balancers and probes, and the DolphinScheduler API exposes a Spring Boot actuator health endpoint:

curl -s http://localhost/health; echo
curl -s -o /dev/null -w 'API health: %{http_code}\n' http://localhost/dolphinscheduler/actuator/health

The first returns ok; the second returns API health: 200 once the standalone server has finished starting.

Step 6 — Retrieve your admin password

The admin password is rotated to a unique value on the first boot of your VM and written to a root-only file:

sudo cat /root/apache-dolphinscheduler-credentials.txt

This file contains DOLPHINSCHEDULER_ADMIN_USER (admin) and DOLPHINSCHEDULER_ADMIN_PASSWORD, plus the URL for the web UI. The shipped default password (dolphinscheduler123) is reset on first boot, so it no longer works. Store the per-VM password somewhere safe.

Step 7 — Open the web UI

Browse to http://<vm-public-ip>/ (which redirects to the DolphinScheduler UI) and sign in as admin with the password from Step 6. The UI opens on the project overview; the screenshots below show the sign-in page, the project list, a workflow DAG definition and the monitoring view.

Apache DolphinScheduler sign-in page

The Project list shows every workflow project with its workflow and task counts:

Apache DolphinScheduler project list

Inside a project, the Workflow Definition canvas is a drag-and-drop DAG editor for chaining tasks:

Apache DolphinScheduler workflow definition DAG editor

The Monitor view reports the health of the master, worker, API and alert components and the database:

Apache DolphinScheduler monitoring view

Step 8 — Sign in over the REST API

DolphinScheduler exposes a full REST API behind the same login. Authenticate with your per-VM password to obtain a session token:

PW=<DOLPHINSCHEDULER_ADMIN_PASSWORD>
curl -s -c /tmp/ds-cookies http://localhost/dolphinscheduler/login \
  --data-urlencode 'userName=admin' --data-urlencode "userPassword=$PW"; echo

A successful response is JSON with "code":0 and a session token. The session cookie is saved to /tmp/ds-cookies for subsequent calls. Errors (for example, the retired default password) return a non-zero code.

Step 9 — List projects over the API

Reuse the session from Step 8 to list workflow projects through the API:

curl -s -b /tmp/ds-cookies 'http://localhost/dolphinscheduler/projects?pageNo=1&pageSize=10' | head -c 400; echo

You get a JSON data object describing the projects. The same data backs the web UI's project list. From here the API lets you create projects, define workflows, trigger runs and read task logs — see the DolphinScheduler REST API documentation for the full surface.

Step 10 — Confirm data lives on the dedicated disk

All DolphinScheduler state — the bundled database, the workflow repository and run history — is stored on the dedicated Azure data disk so it survives OS changes and can be resized independently:

findmnt /var/lib/dolphinscheduler

The mount is backed by a separate Azure data disk captured into the image and re-provisioned on every VM.

Enabling HTTPS

The nginx reverse proxy terminates plain HTTP on port 80. For public exposure, put a certificate in front of it. The simplest path is to add a DNS name for the VM and use the companion cloudimg nginx-ssl-certbot image as a TLS reverse proxy, or install certbot and extend the existing nginx site with a listen 443 ssl; server block and your certificate paths. Keep the DolphinScheduler standalone server bound to loopback so the only public surface is the TLS-terminated proxy.

Maintenance

  • Configuration: the standalone server's settings live under /opt/dolphinscheduler/standalone-server/conf/ (application.yaml for the listener and datasource, common.properties for data paths). Edit and sudo systemctl restart dolphinscheduler to apply changes.
  • JVM heap: the standalone heap is set in /opt/dolphinscheduler/standalone-server/bin/jvm_args_env.sh (default 2 GiB). Increase it on larger VMs for higher task concurrency.
  • Backups: snapshot the /var/lib/dolphinscheduler data disk to capture the database, workflow repository and run history.
  • Upgrades: deploy a newer DolphinScheduler release alongside, migrate the database with the bundled upgrade SQL, and switch the service over.
  • Security patches: unattended-upgrades remains enabled so the OS continues to receive security updates automatically.

Support

cloudimg provides 24/7 expert support for this image. Contact support@cloudimg.co.uk.

Apache DolphinScheduler and Apache are trademarks of The Apache Software Foundation. This image is produced by cloudimg and is not affiliated with or endorsed by The Apache Software Foundation. Apache DolphinScheduler is distributed under the Apache License 2.0.