Apache DolphinScheduler on Ubuntu 24.04 on Azure User Guide
Overview
Apache DolphinScheduler is a distributed, easy-to-extend visual workflow scheduler for big-data and ETL pipelines. You design DAGs of tasks (shell, SQL, Spark, Flink, Python, HTTP and more) in a drag-and-drop editor, schedule them with cron expressions, and monitor every run with retries, alerts and lineage. The cloudimg image installs DolphinScheduler 3.4.2 as the all-in-one standalone server (master, worker, api-server, alert-server and the web UI in a single JVM on Java 17) at /opt/dolphinscheduler, runs it as a dedicated dolphinscheduler system user bound to loopback behind an nginx reverse proxy on TCP 80, persists all state on a dedicated Azure data disk, and rotates the admin password to a unique value on the first boot of every VM. Backed by 24/7 cloudimg support.
What is included:
- Apache DolphinScheduler 3.4.2 (standalone server) at
/opt/dolphinscheduleron Java 17 - The web UI and the REST API on context path
/dolphinscheduler, fronted by nginx on:80 - DolphinScheduler's own username/password authentication with a per-VM admin password in a root-only file
- A dedicated Azure data disk at
/var/lib/dolphinschedulerholding the bundled database, the workflow repository and run history — separate from the OS disk and re-provisioned with every VM dolphinscheduler.service+nginx.serviceas systemd units, enabled and active- An unauthenticated
/healthendpoint for load balancers and probes - 24/7 cloudimg support
Prerequisites
An active Azure subscription, an SSH key pair, and a VNet + subnet in the target region. Standard_B4ms (4 vCPU / 16 GiB RAM) is a good starting point for the standalone server; scale up for higher task concurrency. NSG inbound: allow 22/tcp from your management network and 80/tcp for the web UI and API (front with TLS for public exposure — see Enabling HTTPS).
Step 1 — Deploy from the Azure Marketplace
Sign in to the Azure Portal, choose Create a resource, search the Marketplace for Apache DolphinScheduler by cloudimg, and select Create. On Basics pick your subscription, resource group, region and size; under Administrator account choose SSH public key and paste your key; under Inbound port rules allow SSH (22) and HTTP (80). Review the dedicated data disk on the Disks tab, then Review + create → Create.
Step 2 — Deploy from the Azure CLI
az vm create \
--resource-group <your-rg> \
--name dolphinscheduler \
--image <marketplace-image-urn> \
--size Standard_B4ms \
--admin-username azureuser \
--ssh-key-values ~/.ssh/id_ed25519.pub \
--vnet-name <your-vnet> --subnet <your-subnet> \
--public-ip-sku Standard
az vm open-port --resource-group <your-rg> --name dolphinscheduler --port 80 --priority 1010
Step 3 — Connect to your VM
ssh azureuser@<vm-public-ip>
Step 4 — Confirm the services are running
systemctl is-active dolphinscheduler.service nginx.service
Both services report active. The standalone server is a JVM application and takes one to two minutes to finish starting on the first boot; once it is up it stays up across reboots.
Step 5 — Check the health endpoint
nginx serves an unauthenticated health endpoint for load balancers and probes, and the DolphinScheduler API exposes a Spring Boot actuator health endpoint:
curl -s http://localhost/health; echo
curl -s -o /dev/null -w 'API health: %{http_code}\n' http://localhost/dolphinscheduler/actuator/health
The first returns ok; the second returns API health: 200 once the standalone server has finished starting.
Step 6 — Retrieve your admin password
The admin password is rotated to a unique value on the first boot of your VM and written to a root-only file:
sudo cat /root/apache-dolphinscheduler-credentials.txt
This file contains DOLPHINSCHEDULER_ADMIN_USER (admin) and DOLPHINSCHEDULER_ADMIN_PASSWORD, plus the URL for the web UI. The shipped default password (dolphinscheduler123) is reset on first boot, so it no longer works. Store the per-VM password somewhere safe.
Step 7 — Open the web UI
Browse to http://<vm-public-ip>/ (which redirects to the DolphinScheduler UI) and sign in as admin with the password from Step 6. The UI opens on the project overview; the screenshots below show the sign-in page, the project list, a workflow DAG definition and the monitoring view.

The Project list shows every workflow project with its workflow and task counts:

Inside a project, the Workflow Definition canvas is a drag-and-drop DAG editor for chaining tasks:

The Monitor view reports the health of the master, worker, API and alert components and the database:

Step 8 — Sign in over the REST API
DolphinScheduler exposes a full REST API behind the same login. Authenticate with your per-VM password to obtain a session token:
PW=<DOLPHINSCHEDULER_ADMIN_PASSWORD>
curl -s -c /tmp/ds-cookies http://localhost/dolphinscheduler/login \
--data-urlencode 'userName=admin' --data-urlencode "userPassword=$PW"; echo
A successful response is JSON with "code":0 and a session token. The session cookie is saved to /tmp/ds-cookies for subsequent calls. Errors (for example, the retired default password) return a non-zero code.
Step 9 — List projects over the API
Reuse the session from Step 8 to list workflow projects through the API:
curl -s -b /tmp/ds-cookies 'http://localhost/dolphinscheduler/projects?pageNo=1&pageSize=10' | head -c 400; echo
You get a JSON data object describing the projects. The same data backs the web UI's project list. From here the API lets you create projects, define workflows, trigger runs and read task logs — see the DolphinScheduler REST API documentation for the full surface.
Step 10 — Confirm data lives on the dedicated disk
All DolphinScheduler state — the bundled database, the workflow repository and run history — is stored on the dedicated Azure data disk so it survives OS changes and can be resized independently:
findmnt /var/lib/dolphinscheduler
The mount is backed by a separate Azure data disk captured into the image and re-provisioned on every VM.
Enabling HTTPS
The nginx reverse proxy terminates plain HTTP on port 80. For public exposure, put a certificate in front of it. The simplest path is to add a DNS name for the VM and use the companion cloudimg nginx-ssl-certbot image as a TLS reverse proxy, or install certbot and extend the existing nginx site with a listen 443 ssl; server block and your certificate paths. Keep the DolphinScheduler standalone server bound to loopback so the only public surface is the TLS-terminated proxy.
Maintenance
- Configuration: the standalone server's settings live under
/opt/dolphinscheduler/standalone-server/conf/(application.yamlfor the listener and datasource,common.propertiesfor data paths). Edit andsudo systemctl restart dolphinschedulerto apply changes. - JVM heap: the standalone heap is set in
/opt/dolphinscheduler/standalone-server/bin/jvm_args_env.sh(default 2 GiB). Increase it on larger VMs for higher task concurrency. - Backups: snapshot the
/var/lib/dolphinschedulerdata disk to capture the database, workflow repository and run history. - Upgrades: deploy a newer DolphinScheduler release alongside, migrate the database with the bundled upgrade SQL, and switch the service over.
- Security patches: unattended-upgrades remains enabled so the OS continues to receive security updates automatically.
Support
cloudimg provides 24/7 expert support for this image. Contact support@cloudimg.co.uk.
Apache DolphinScheduler and Apache are trademarks of The Apache Software Foundation. This image is produced by cloudimg and is not affiliated with or endorsed by The Apache Software Foundation. Apache DolphinScheduler is distributed under the Apache License 2.0.