Machine Learning AWS

MLflow on AWS User Guide

| Product: MLflow on AWS

Overview

This image runs MLflow 3.13, the open source platform for the machine learning lifecycle - experiment tracking, a model registry, and model packaging and deployment - on Ubuntu 24.04 LTS. MLflow is installed into a dedicated Python virtual environment under /opt/mlflow on Python 3.12 and run by an unprivileged mlflow system account under a systemd service that starts the tracking server on boot and restarts it on failure.

The tracking server listens on the loopback address 127.0.0.1:5000 and is never exposed directly. nginx fronts it on port 80 with HTTP Basic Authentication; the unauthenticated /health probe stays open for load balancers, and everything else requires the admin password.

On the first boot of every deployed instance a one-shot service generates a fresh admin password, unique to that instance, and writes it to /root/mlflow-credentials.txt (mode 0600, root only). The SQLite backend store and the artifact store live under /var/lib/mlflow on a dedicated, independently resizable EBS data volume.

The default security group opens port 22 (SSH) and port 80 (HTTP) only.

Prerequisites

  • An AWS account subscribed to this product in AWS Marketplace.
  • An EC2 key pair in your target region for SSH access.
  • A security group allowing inbound TCP 22 (SSH) from your IP and TCP 80 (HTTP) from your users.
  • Recommended instance type: m5.large or larger.
  • The MLflow client (pip install mlflow) on your workstation or training environment.

Connecting to your instance

OS variant Login user Example
Ubuntu 24.04 ubuntu ssh -i your-key.pem ubuntu@<instance-public-ip>

Step 1 - Launch from the AWS Marketplace console

  1. Open the product page in AWS Marketplace and choose Continue to Subscribe, then Continue to Configuration.
  2. Select the MLflow 3.13 on Ubuntu 24.04 delivery option and your region, then Continue to Launch.
  3. Choose your instance type, VPC/subnet, key pair and the security group described above, and launch.

Step 2 - Launch from the AWS CLI

aws ec2 run-instances \
  --image-id ami-xxxxxxxxxxxxxxxxx \
  --instance-type m5.large \
  --key-name your-key \
  --security-group-ids sg-xxxxxxxx \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=mlflow}]'

Step 3 - Connect to your instance

ssh -i your-key.pem ubuntu@<instance-public-ip>

Step 4 - Confirm the services are running

systemctl is-active mlflow.service nginx.service
ss -tln | grep -E ':80 |:5000 '
curl -s http://127.0.0.1/health

Expected output:

active
active
LISTEN 0      511          0.0.0.0:80        0.0.0.0:*
LISTEN 0      2048       127.0.0.1:5000      0.0.0.0:*
LISTEN 0      511             [::]:80           [::]:*
OK

Step 5 - Retrieve your admin password

sudo cat /root/mlflow-credentials.txt
# MLflow - generated on first boot by mlflow-firstboot.service
MLFLOW_URL=http://<instance-public-ip>/
MLFLOW_ADMIN_USER=admin
MLFLOW_ADMIN_PASSWORD=<your-unique-password>

Step 6 - Open the MLflow UI

Browse to http://<instance-public-ip>/ and sign in as admin with the password from Step 5.

MLflow tracking UI served through nginx with HTTP Basic Authentication

Step 7 - Log an experiment from the MLflow client

Point the MLflow client at the instance with the admin credentials, then log a run:

export MLFLOW_TRACKING_URI=http://<instance-public-ip>/
export MLFLOW_TRACKING_USERNAME=admin
export MLFLOW_TRACKING_PASSWORD=<your-unique-password>
import mlflow
mlflow.set_experiment("demo")
with mlflow.start_run():
    mlflow.log_param("alpha", 0.5)
    mlflow.log_metric("rmse", 0.27)

The run, its parameters, metrics and artifacts appear in the UI. Register a model from a run to version and stage it in the model registry.

You can also call the REST API directly with the admin credentials:

PASS=$(sudo grep '^MLFLOW_ADMIN_PASSWORD=' /root/mlflow-credentials.txt | cut -d= -f2-)
curl -s -o /dev/null -w '%{http_code}\n' -u "admin:$PASS" \
  'http://127.0.0.1/api/2.0/mlflow/experiments/search?max_results=1'
200

Step 8 - Confirm the runtime

/opt/mlflow/venv/bin/pip show mlflow | grep ^Version
Version: 3.13.0

Production scale - PostgreSQL and Amazon S3

The image defaults to a SQLite backend store and a local artifact store on the data volume. For team scale, repoint both in /etc/mlflow/mlflow.env:

MLFLOW_BACKEND_STORE_URI=postgresql://user:password@your-db-host:5432/mlflow
MLFLOW_ARTIFACTS_DESTINATION=s3://your-bucket/mlflow

Then sudo systemctl restart mlflow.service. Attach an instance role granting access to the S3 bucket.

Enabling HTTPS

sudo apt-get update && sudo apt-get install -y certbot python3-certbot-nginx
sudo certbot --nginx -d your-domain.example.com

certbot edits the nginx site at /etc/nginx/sites-available/cloudimg-mlflow to add the TLS listener and arranges automatic renewal.

Backup and maintenance

  • All MLflow state lives under /var/lib/mlflow (the SQLite backend store mlflow.db and the artifacts directory) on its own EBS volume. Snapshot that volume to back up experiments, runs and registered models.
  • The admin password is in the nginx htpasswd file /etc/nginx/.mlflow.htpasswd; rotate it with sudo htpasswd /etc/nginx/.mlflow.htpasswd admin.
  • Restart with sudo systemctl restart mlflow.service; logs: sudo journalctl -u mlflow.service.

Support

cloudimg provides 24/7 technical support for this image by email and chat, covering MLflow deployment, experiment tracking, the model registry, backend and artifact store configuration, TLS termination and scaling. Contact details are on the AWS Marketplace listing.