Application Development AWS

Datasette on AWS User Guide

| Product: Datasette on AWS

Overview

This image runs Datasette, the open source multi tool for exploring and publishing data. Datasette takes one or more SQLite databases and instantly publishes them as a browsable, searchable web interface with a faceted UI and a full JSON API over every table and custom query, so anyone can explore your data in a browser or query it programmatically.

Datasette 0.65.2 is installed into a dedicated Python virtual environment under /opt/datasette and run by an unprivileged datasette system account under a systemd service that starts it on boot and restarts it on failure. The served databases directory, which holds the SQLite files Datasette publishes, lives at /var/lib/datasette, a dedicated, independently resizable EBS data volume that survives an operating system disk reimage. A demo SQLite database ships at /var/lib/datasette/demo.db with continents, countries and cities tables and sample rows, so the interface is non empty out of the box and you have a working example to explore.

Datasette ships with no built in authentication, so the runtime binds to the loopback interface only (127.0.0.1:8001) and is never exposed directly. An nginx reverse proxy publishes the interface on port 80 behind HTTP Basic authentication. The admin password is generated on the first boot of every deployed instance, so two instances launched from the same Amazon Machine Image never share a password. It is written to /root/datasette-credentials.txt with mode 0600 so that only the root user can read it.

Prerequisites

Before you deploy this image you need:

  • An Amazon Web Services account where you can launch EC2 instances
  • IAM permissions to launch instances, create security groups, and subscribe to AWS Marketplace products
  • An EC2 key pair in the target Region for SSH access to the instance
  • A VPC and subnet in the target Region, with a security group allowing inbound port 22 from your management network and port 80 for the Datasette interface
  • The AWS CLI (version 2) installed locally if you plan to deploy from the command line

Step 1: Launch the Instance from the AWS Marketplace

Sign in to the AWS Management Console, open the EC2 service, and select Launch instance. Under Application and OS Images choose AWS Marketplace AMIs and search for Datasette. Select the cloudimg listing and choose Select, then Continue on the subscription summary.

Pick an instance type of t3.medium or larger. Choose your EC2 key pair under Key pair (login). Under Network settings select your VPC and subnet, and either create or select a security group that opens port 22 from your management network and port 80 for the Datasette interface. Leave the root volume at the default size or larger.

Select Launch instance. First boot initialisation takes a few seconds after the instance state becomes Running and the status checks pass; this is when the per instance admin password is generated.

Step 2: Launch the Instance from the AWS CLI

The following block launches an instance from the cloudimg Datasette Marketplace AMI into an existing subnet and security group. Replace <ami-id> with the AMI ID shown on the Marketplace listing, <key-name> with your EC2 key pair name, <subnet-id> with your subnet ID, and <security-group-id> with a security group that opens ports 22 and 80 as described above.

aws ec2 run-instances \
  --image-id <ami-id> \
  --instance-type t3.medium \
  --key-name <key-name> \
  --subnet-id <subnet-id> \
  --security-group-ids <security-group-id> \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=datasette}]'

When the instance reaches the Running state and its status checks pass, note its public IP address or DNS name from the EC2 console or with aws ec2 describe-instances.

Step 3: Connect to Your Instance

Connect over SSH using your key pair and the login user for your operating system variant.

OS variant SSH login user
Ubuntu 24.04 ubuntu
ssh -i <key-name>.pem ubuntu@<public-ip>

Step 4: Retrieve the Admin Password

The admin password is unique to your instance and was generated on first boot. Read the credentials file as root:

sudo cat /root/datasette-credentials.txt

The file lists the interface URL, the admin user (admin) and the generated password, along with hints for adding your own databases and installing plugins. Keep this password somewhere safe.

Step 5: Sign In to the Datasette Interface

The Datasette interface is served on port 80 by nginx, which proxies to the runtime on 127.0.0.1:8001 behind HTTP Basic authentication. In a browser, go to:

http://<instance-public-ip>/

You are prompted for credentials by the nginx proxy. Sign in as admin with the password from the credentials file. The home page then loads, listing the bundled demo database and its tables.

The Datasette home page listing the demo database and its tables

Click the demo database, then a table such as countries, to open the table view. Datasette renders the table as a filterable, sortable web page. Use the facet panel to break the rows down by a column value, for example faceting the countries by their continent.

The Datasette table view with a facet on the continent column

Every table and query is also queryable with custom SQL. Open the demo database page and use the SQL editor to run an arbitrary read only query against the data; the results render as a table, and the same data is one click away as JSON or CSV.

A custom SQL query over the demo database rendered as a result table

Step 6: Confirm Datasette Is Running

Over SSH, confirm the runtime and the nginx proxy are active and that the health endpoint answers:

sudo systemctl is-active datasette nginx
curl -s http://127.0.0.1:8001/-/versions.json | head -c 200

You should see both services reported as active, and the /-/versions.json endpoint returns a JSON document reporting the Datasette, Python and SQLite versions. The runtime listens on 127.0.0.1:8001 (loopback only) and nginx publishes it on port 80 behind the login.

Confirm the installed Datasette version with:

/opt/datasette/venv/bin/datasette --version

Step 7: Publish Your Own Databases

Datasette serves every .db file in /var/lib/datasette. The image ships a demo database at /var/lib/datasette/demo.db so the interface is non empty out of the box. To publish your own data, copy your SQLite databases into that directory and restart the service. Copy your file onto the instance (for example with scp), move it into place, then restart:

sudo systemctl restart datasette.service

After the restart, reload the interface in your browser and your database appears on the home page alongside the demo. The service runs datasette serve /var/lib/datasette/*.db bound to loopback, so every database in the directory is served on port 80 through the same authenticating proxy. Keep the files owned by the datasette user so the service can read them. Databases are served read only by default, the safe model for a published data appliance.

Step 8: Call the JSON API

Datasette exposes the same data the web interface renders as a JSON API: append .json to any table or query URL. On the instance you can call the loopback runtime directly with the admin credentials. Read the generated password into a shell variable and query the demo database:

PASS=$(sudo grep '^datasette.admin.pass=' /root/datasette-credentials.txt | cut -d= -f2-)
curl -s -u "admin:${PASS}" 'http://127.0.0.1/demo/countries.json?_size=3' | head -c 400; echo

From outside the instance, send the same request to http://<instance-public-ip>/demo/countries.json with basic auth user admin and the generated password. Any custom SQL query is also available as JSON by appending .json to the query URL, so Datasette doubles as an instant read only API over your SQLite data.

Step 9: Install Plugins

Datasette and its dependencies are installed into a dedicated virtual environment at /opt/datasette/venv. Extend Datasette with plugins, for charting, mapping, full text search and authentication, by installing them into that same virtual environment so the service picks them up. Install a plugin with the bundled Datasette command, sudo /opt/datasette/venv/bin/datasette install <plugin-name>, choosing a plugin such as datasette-vega from the Datasette plugin directory, then restart the service:

sudo systemctl restart datasette.service

After the restart, reload the interface to see the plugin in effect.

Step 10: The Data Volume

The served databases directory lives on a dedicated EBS volume mounted at /var/lib/datasette. This keeps your SQLite databases off the operating system disk and lets you resize or snapshot them independently. Confirm the mount with:

df -h /var/lib/datasette

To grow the directory, expand the EBS volume in the AWS console, then grow the filesystem on the instance with sudo resize2fs on the underlying device. Because this volume holds the demo database and any databases you add, snapshotting it captures the complete published data set.

Step 11: Enable HTTPS

The Datasette interface is served over plain HTTP on port 80 by nginx. For production use, place it behind TLS. Obtain a certificate for your domain (for example with a managed certificate on an Application Load Balancer in front of the instance, or with Certbot installed on the instance), then configure nginx to listen on 443 with your certificate and proxy to 127.0.0.1:8001 exactly as the bundled site does for port 80, keeping the HTTP Basic authentication in place. Restrict the security group so ports 80 and 443 are reachable only from the networks that need the interface.

Step 12: Backup and Maintenance

Back up your deployment by snapshotting the /var/lib/datasette EBS volume, which captures the demo database and every database you have published. You can also copy the individual .db files off the instance. Apply operating system security updates with sudo apt-get update && sudo apt-get upgrade and reboot when a new kernel is installed; Datasette and nginx start automatically on boot.

Support

This image is published and supported by cloudimg. Support covers deployment, publishing your own databases, plugins, faceting and full text search, the JSON API, the authenticating proxy, TLS, authentication and performance tuning. Contact cloudimg through the support channel listed on the AWS Marketplace listing.

All product and company names are trademarks or registered trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.