Analytics AWS

Apache Superset on AWS User Guide

| Product: Apache Superset on AWS

Overview

This image runs Apache Superset, the open source data exploration and business intelligence platform, as a complete single node deployment. Apache Superset itself runs in a constraints pinned Python virtual environment and is served by the gunicorn application server on port 8088. A local PostgreSQL database holds the Superset metadata. Redis provides caching and the broker and results backend for asynchronous queries. A Celery worker executes long running SQL Lab queries in the background. Every component runs as a managed systemd service, and the Superset web application and Celery worker run as a dedicated unprivileged superset user.

The Superset administrator password, the PostgreSQL password, and the application secret key are all generated on the first boot of every deployed instance. Two instances launched from the same Amazon Machine Image never share credentials. The initial administrator password is written to /root/superset-credentials.txt with mode 0600 so that only the root user can read it.

The PostgreSQL data directory and the Superset home both live on a dedicated EBS data volume mounted at /srv/superset, separate from the operating system disk, so the data tier can be resized independently of the root volume.

Prerequisites

Before you deploy this image you need:

  • An Amazon Web Services account where you can launch EC2 instances
  • IAM permissions to launch instances, create security groups, and subscribe to AWS Marketplace products
  • An EC2 key pair in the target Region for SSH access to the instance
  • A VPC and subnet in the target Region, with a security group allowing inbound port 22 from your management network and inbound port 8088 from the networks your analysts will use
  • The AWS CLI (version 2) installed locally if you plan to deploy from the command line

Step 1: Launch the Instance from the AWS Marketplace

Sign in to the AWS Management Console, open the EC2 service, and select Launch instance. Under Application and OS Images choose AWS Marketplace AMIs and search for Apache Superset. Select the cloudimg listing and choose Select, then Continue on the subscription summary.

Pick an instance type of m5.large or larger — Apache Superset, PostgreSQL, Redis and the Celery worker run together on the instance. Choose your EC2 key pair under Key pair (login). Under Network settings select your VPC and subnet, and either create or select a security group that allows inbound port 22 from your management network and inbound port 8088 from the networks your analysts use. Leave the root volume at the default size or larger.

Select Launch instance. First boot initialisation takes a couple of minutes after the instance state becomes Running and the status checks pass, while the per instance credentials are generated and the metadata schema is prepared.

Step 2: Launch the Instance from the AWS CLI

The following block launches an instance from the cloudimg Apache Superset Marketplace AMI into an existing subnet and security group. Replace <ami-id> with the AMI ID shown on the Marketplace listing, <key-name> with your EC2 key pair name, <subnet-id> with your subnet ID, and <security-group-id> with a security group that opens ports 22 and 8088 as described above.

aws ec2 run-instances \
  --image-id <ami-id> \
  --instance-type m5.large \
  --key-name <key-name> \
  --subnet-id <subnet-id> \
  --security-group-ids <security-group-id> \
  --block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":30,"VolumeType":"gp3"}}]' \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=apache-superset-01}]'

The command prints a JSON document on success. Note the instance ID, then retrieve its public address once it is running with aws ec2 describe-instances --instance-ids <instance-id> --query "Reservations[].Instances[].PublicIpAddress" --output text.

Step 3: Connecting to Your Instance

Connect over SSH with the key pair you selected and the public IP address from step 2. The SSH login user depends on the operating system of the AMI variant you launched:

AMI variant SSH login user
Apache Superset 6 on Ubuntu 24.04 ubuntu
ssh <login-user>@<public-ip>

The first boot service runs before the SSH daemon becomes ready, so the credentials file is always in place when you log in for the first time.

Step 4: Retrieve the Administrator Password

The Superset administrator password is generated on first boot and written to a root only file. Read it with sudo:

sudo cat /root/superset-credentials.txt

You will see a plain text file containing the Superset URL, the login URL, the administrator username (admin), and the administrator password. Copy these values somewhere secure (a password manager or encrypted vault). Do not commit them to source control.

Step 5: Verify the Deployment

From the same SSH session you can confirm every component of the stack is healthy. Confirm the four services are active:

sudo systemctl is-active superset.service superset-celery.service postgresql redis-server

Each line of the output is a service state, and all four should read active:

active
active
active
active

Confirm the first boot initialisation completed:

sudo test -f /var/lib/cloudimg/superset-firstboot.done && echo FIRSTBOOT_DONE

Confirm gunicorn is listening on port 8088:

sudo ss -tln | grep 8088

Confirm the Superset health endpoint responds:

curl -s -o /dev/null -w 'health HTTP %{http_code}\n' http://127.0.0.1:8088/health

A health HTTP 200 response confirms the Superset web application is serving requests.

Step 6: First Sign In to the Superset Web Interface

Open a web browser and navigate to http://<public-ip>:8088/login/. Superset presents the sign in form.

Apache Superset sign in page

Enter the administrator username admin and the administrator password from /root/superset-credentials.txt. Select Sign in. On the first successful sign in Superset records your session and shows the home page with the top navigation bar.

Step 7: The Superset Home Page

Once signed in, the home page gives you quick access to every area of Superset: Dashboards for interactive dashboards, Charts for individual visualizations, Datasets for the logical tables charts are built on, and the SQL menu for the SQL Lab editor. The Settings menu in the top right holds database connections, user and role administration, and your account profile.

Apache Superset home page after first sign in

On a freshly deployed instance the dashboards and charts lists are empty — the sections below walk through connecting a data source and building your first content.

Step 8: Change the Administrator Password

For a production deployment rotate the administrator password that was generated on first boot. Select Settings in the top navigation, then List Users, find the admin user, select the edit action, enter a new password, and save.

You can also reset the administrator password from the command line with the Superset CLI. The environment file holds the per instance secret key and database password the CLI needs, so source it first. Substitute your new password for <new-password>:

set -a; sudo . /etc/superset/superset.env; set +a
cd /srv/superset/home && sudo -u superset env \
  SUPERSET_SECRET_KEY="$SUPERSET_SECRET_KEY" \
  SUPERSET_DB_PASSWORD="$SUPERSET_DB_PASSWORD" \
  PYTHONPATH=/etc/superset FLASK_SKIP_DOTENV=1 \
  /opt/superset/venv/bin/superset fab reset-password \
  --username admin --password '<new-password>'

Step 9: Connect a Database

Superset explores data that lives in your own databases and data warehouses. Open Settings then Database Connections in the top right, then select + Database to open the connection wizard.

Apache Superset Database Connections page

Pick from the built in connectors. Common AWS pairings:

  • PostgreSQL — point at an Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL instance, or another cloudimg PostgreSQL image
  • MySQL or MariaDB — point at Amazon RDS for MySQL or MariaDB
  • Amazon Redshift — the Redshift connector is included for data warehouse workloads
  • Amazon Athena — query data in Amazon S3 through Athena
  • Snowflake, Google BigQuery, ClickHouse, Trino and Presto — modern analytical engines are first class connectors in Superset

Enter the connection details, select Test Connection to confirm Superset can reach the database, then Connect to save it.

Step 10: Explore Data in SQL Lab

The SQL Lab editor is the fastest path from a query to a chart. Open SQL then SQL Lab in the top navigation, choose the database you connected in step 9, write a query, and run it. Long running queries are handed to the Celery worker so they run asynchronously and do not block the web application.

Select Save then Save dataset to persist a query as a logical dataset. From a dataset you can select Create chart to open the explore view, choose a chart type, drag dimensions and metrics into the configuration panel, and save the chart. Charts are then added to dashboards under the Dashboards menu.

Step 11: Enable HTTPS with a Reverse Proxy

Apache Superset listens on plain HTTP on port 8088 by design. For any production deployment put a TLS terminating reverse proxy in front of it so session cookies and credentials cannot be intercepted. The image already sets ENABLE_PROXY_FIX in /etc/superset/superset_config.py, so Superset honours the X-Forwarded-Proto and X-Forwarded-For headers a proxy sets.

A common pattern is nginx with a Let's Encrypt certificate on the same instance. The following assumes you have a DNS record pointing your fully qualified domain name at the instance public IP address.

sudo apt-get update && sudo apt-get install -y nginx certbot python3-certbot-nginx
sudo certbot --nginx -d superset.your-domain.example \
  --non-interactive --agree-tos -m you@your-domain.example --redirect

Configure the nginx server block to proxy to http://127.0.0.1:8088, then restrict the security group so port 8088 is no longer reachable directly from the internet. An alternative is to place an Application Load Balancer with an AWS Certificate Manager certificate in front of the instance and route the target group to port 8088.

Step 12: Managing the Services

Check service status:

sudo systemctl status superset.service --no-pager | head -n 15

Stop, start or restart the Superset web application and the Celery worker together:

sudo systemctl restart superset.service superset-celery.service

View the gunicorn and Celery worker logs:

sudo tail -n 50 /var/log/superset/gunicorn.log
sudo tail -n 50 /var/log/superset/celery.log

Step 13: Backups and Maintenance

The Superset metadata — every database connection, dataset, chart and dashboard — lives in the PostgreSQL superset database. Back it up with pg_dump:

sudo -u postgres pg_dump -Fc superset > /var/backups/superset-metadata.dump

Restore on a new instance, after stopping the Superset services, with pg_restore. Substitute <backup-dir> for the directory holding your dump file:

sudo -u postgres pg_restore -d superset --clean --if-exists <backup-dir>/superset-metadata.dump

Ship the dump to an Amazon S3 bucket for durable off instance storage, and schedule it to run nightly. For kernel and package updates, Ubuntu applies security patches automatically. To update Apache Superset within the 6.x line, activate the virtual environment and upgrade the apache-superset package, then restart the services.

Step 14: Scaling Beyond a Single Instance

For larger deployments decouple the components onto managed services:

  • Move the metadata database to Amazon RDS for PostgreSQL and update SQLALCHEMY_DATABASE_URI in /etc/superset/superset_config.py
  • Move caching and the Celery broker to Amazon ElastiCache for Redis and update the cache and Celery settings in the configuration file
  • Run several Superset web instances behind an Application Load Balancer, and run the Celery workers on their own instances, all pointing at the shared RDS and ElastiCache endpoints

Each of these is documented in the official Apache Superset documentation at https://superset.apache.org/docs/.

Step 15: Server Components

Component Path
Superset Python virtual environment /opt/superset/venv/
Superset CLI /opt/superset/venv/bin/superset
Application configuration /etc/superset/superset_config.py
Environment file /etc/superset/superset.env
Superset home /srv/superset/home/
PostgreSQL data directory /srv/superset/postgresql/16/main/
Log directory /var/log/superset/
gunicorn systemd unit /etc/systemd/system/superset.service
Celery worker systemd unit /etc/systemd/system/superset-celery.service
First boot script /usr/local/sbin/superset-firstboot.sh
First boot service /etc/systemd/system/superset-firstboot.service
Credentials file /root/superset-credentials.txt
First boot sentinel /var/lib/cloudimg/superset-firstboot.done

Support

cloudimg provides 24/7/365 expert technical support for this image. Guaranteed response within 24 hours, one hour average for critical issues. Contact support@cloudimg.co.uk.

For general Apache Superset questions consult the documentation at https://superset.apache.org/docs/ and the community at https://superset.apache.org/community/. Apache Superset is licensed under the Apache License 2.0 and is a trademark of the Apache Software Foundation.