dbt Core on Ubuntu 24.04 on Azure User Guide
Overview
dbt Core is the open-source data transformation framework that lets analytics engineers build, test and document data pipelines using SQL and Jinja. You write select statements as models, dbt handles dependency ordering, materialisation, testing and documentation, and turns your raw data into trusted, version-controlled datasets. The cloudimg image installs dbt Core 1.9.4 into a Python virtualenv and exposes it system-wide as the dbt command, together with the self-contained dbt-duckdb adapter and a worked example project that builds, tests and documents a small data mart against an embedded DuckDB database - no external data warehouse required. This is a command-line product - there is no web UI. Backed by 24/7 cloudimg support.
What is included:
- dbt Core 1.9.4 installed in a virtualenv at
/opt/dbt/venv, on PATH as/usr/local/bin/dbt - The
dbt-duckdb1.9.3 adapter so the appliance runs end to end with no external warehouse - A worked example project at
/opt/dbt/example(seeds, staging views, a mart and schema tests) - A per-user writable copy at
~/dbt-examplefor every login user - A login banner pointing at the example and the user guide
- Ubuntu 24.04 LTS base with latest security patches at build time
- 24/7 cloudimg support
Prerequisites
An active Azure subscription, an SSH key pair, and a VNet + subnet in the target region. Standard_B2s (2 vCPU / 4 GiB RAM) is plenty for a dbt workstation or CI runner. NSG inbound: allow 22/tcp from your management network. The bundled DuckDB example needs no outbound network; to manage a real warehouse, dbt connects out to your Snowflake / BigQuery / Redshift / Postgres endpoint over its standard port.
Step 1 - Deploy from the Azure Marketplace
Sign in to the Azure Portal, choose Create a resource, search the Marketplace for dbt Core by cloudimg, and select Create. On Basics pick your subscription, resource group, region and size; under Administrator account choose SSH public key and paste your key; under Inbound port rules allow SSH (22). Then Review + create -> Create.
Step 2 - Deploy from the Azure CLI
az vm create \
--resource-group <your-rg> \
--name dbt \
--image <marketplace-image-urn> \
--size Standard_B2s \
--admin-username azureuser \
--ssh-key-values ~/.ssh/id_ed25519.pub \
--vnet-name <your-vnet> --subnet <your-subnet> \
--public-ip-sku Standard
Step 3 - Connect to your VM
ssh azureuser@<vm-public-ip>
Step 4 - Confirm dbt is installed
dbt --version
It reports installed: 1.9.4 for dbt Core and lists the duckdb: 1.9.3 plugin.
Step 5 - Get your writable copy of the example
Every login user gets their own writable copy of the worked example. The DuckDB adapter writes a local database file, so the project must be user-writable - this copies the read-only reference from /opt/dbt/example:
[ -d ~/dbt-example ] || cp -r /opt/dbt/example ~/dbt-example
cd ~/dbt-example && dbt --version
The example ships with a profiles.yml whose dev target points at a local jaffle.duckdb, two CSV seeds, two staging models, a customer_orders mart and not_null / unique schema tests.
Step 6 - Load the seeds
dbt seed loads the bundled CSVs (raw_customers, raw_orders) into DuckDB:
cd ~/dbt-example && dbt seed
dbt reports Completed successfully with PASS=2.
Step 7 - Build the models
dbt run builds the staging views and the customer_orders mart in dependency order:
cd ~/dbt-example && dbt run
dbt creates stg_customers, stg_orders and the customer_orders table, reporting PASS=3.
Step 8 - Test the data
dbt test runs the schema tests declared in models/marts/schema.yml:
cd ~/dbt-example && dbt test
The not_null and unique tests pass with PASS=3 WARN=0 ERROR=0.
Step 9 - Build everything in one command
dbt build runs seeds, models and tests together in dependency order - the command you would wire into CI:
cd ~/dbt-example && dbt build
Step 10 - Generate documentation and lineage
dbt can generate a documentation site and a lineage graph describing every model, seed and test:
cd ~/dbt-example && dbt docs generate
dbt writes target/catalog.json and target/manifest.json. Serve the docs site locally with dbt docs serve (binds to 127.0.0.1:8080; tunnel over SSH to view it), or publish the target/ artifacts to your own static host.
Step 11 - Inspect your DuckDB results
The pipeline materialised a real customer_orders table you can query. List the project's models and confirm the data was built:
cd ~/dbt-example && dbt ls --resource-type model && ls -la jaffle.duckdb
You can query the DuckDB file directly from Python (import duckdb; duckdb.connect("jaffle.duckdb")) or with the DuckDB CLI.
Step 12 - Connect to your own data warehouse
The DuckDB target makes the appliance self-contained, but dbt's real power is over your production warehouse. Replace the outputs block in profiles.yml with your warehouse connection and install the matching adapter into the virtualenv. For example, for Snowflake:
jaffle_duckdb:
target: prod
outputs:
prod:
type: snowflake
account: <your-account>
user: <your-user>
password: <your-password>
role: TRANSFORMER
database: ANALYTICS
warehouse: TRANSFORMING
schema: dbt
threads: 8
Install the adapter and point dbt at the new target:
sudo /opt/dbt/venv/bin/pip install dbt-snowflake
cd ~/dbt-example && dbt run --target prod
The same pattern applies to dbt-bigquery, dbt-redshift and dbt-postgres.
Maintenance
- Reference template: the read-only example lives at
/opt/dbt/example; your writable copy is~/dbt-example. - Virtualenv: dbt and its adapters live in
/opt/dbt/venv; add adapters withsudo /opt/dbt/venv/bin/pip install dbt-<adapter>. - Upgrades: upgrade in place with
sudo /opt/dbt/venv/bin/pip install --upgrade dbt-core dbt-duckdb. - Profiles: dbt reads
profiles.ymlfrom the project directory by default; setDBT_PROFILES_DIRto relocate it (e.g.~/.dbt). - Security patches: unattended-upgrades remains enabled so the OS continues to receive security updates automatically.
Support
cloudimg provides 24/7 expert support for this image. Contact support@cloudimg.co.uk.