Applications AWS

Prometheus on AWS User Guide

Prometheus on AWS User Guide

This guide covers connecting to your Prometheus instance, verifying the service, using the web interface, writing PromQL queries, adding scrape targets, and configuring alerting.

Prerequisites

  • An AWS account with an active subscription to the cloudimg Prometheus AMI.
  • An EC2 instance launched from the AMI with port 9090 open in its security group.
  • An SSH key pair associated with the instance.

Connecting to Your Instance

Connect via SSH on port 22 using the key pair you selected at launch:

OS variant SSH login user
Ubuntu 24.04 ubuntu
ssh -i /path/to/your-key.pem ubuntu@<instance-public-ip>

Endpoint Information

On first boot, Prometheus writes a summary of the endpoint URLs to /root/prometheus-info.txt. View it with:

sudo cat /root/prometheus-info.txt

Example output:

prometheus.version=3.11.3
prometheus.web.ui=http://172.31.95.189:9090
prometheus.config=/etc/prometheus/prometheus.yml
prometheus.data=/var/lib/prometheus/data

Checking the Service Status

Verify that the Prometheus service is running:

sudo systemctl status prometheus.service

Expected output:

● prometheus.service - Prometheus
     Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; preset: enabled)
     Active: active (running) since Wed 2026-05-27 19:48:50 UTC; 1min ago
       Docs: https://prometheus.io/docs/
   Main PID: 32366 (prometheus)
      Tasks: 8 (limit: 4586)
     Memory: 31.2M (peak: 31.6M)
        CPU: 326ms
     CGroup: /system.slice/prometheus.service
             └─32366 /usr/local/bin/prometheus \
                --config.file=/etc/prometheus/prometheus.yml \
                --storage.tsdb.path=/var/lib/prometheus/data \
                --web.listen-address=0.0.0.0:9090

Check the installed version:

prometheus --version
prometheus, version 3.11.3 (branch: HEAD, revision: eb173f5256d4022afba1e9bc3d19740a76859fae)
  build user:       root@83aad33dd38e
  build date:       20260427-14:45:32
  go version:       go1.26.2
  platform:         linux/amd64
  tags:             netgo,builtinassets

Web Interface

Browse to the Prometheus expression browser at:

http://<instance-public-ip>:9090

The web interface provides query execution, target health monitoring and built-in alerting rules management.

Targets Page

Navigate to Status > Target health (or /targets) to view all configured scrape targets and their current health:

Prometheus Targets page showing self-monitoring target in UP state

By default, Prometheus scrapes its own metrics endpoint at localhost:9090/metrics every 15 seconds. The target health shows UP when the last scrape succeeded.

Expression Browser / PromQL

Navigate to the Query tab (or /graph) to run PromQL queries:

Prometheus expression browser with PromQL query

PromQL Basics

PromQL (Prometheus Query Language) is used to select and aggregate time series data.

Check which targets are up (returns 1 for UP, 0 for DOWN):

up

Query the total number of HTTP requests (if you have a web server scraped):

http_requests_total

Rate of requests per second over the last 5 minutes:

rate(http_requests_total[5m])

Average CPU usage across all scraped instances:

avg(rate(process_cpu_seconds_total[5m])) by (job)

Run queries via the API:

curl -fsS 'http://localhost:9090/api/v1/query?query=up'

Configuration File

The main Prometheus configuration is at /etc/prometheus/prometheus.yml:

sudo cat /etc/prometheus/prometheus.yml
# cloudimg Prometheus 3 default config
# Add additional scrape jobs below the existing 'prometheus' self-monitoring entry.

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    monitor: cloudimg-prometheus

scrape_configs:
  # Prometheus monitors itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Example: uncomment + edit to scrape a node-exporter
  # - job_name: 'node'
  #   static_configs:
  #     - targets: ['10.0.1.20:9100']

  # Example: uncomment + edit to scrape a Grafana instance
  # - job_name: 'grafana'
  #   static_configs:
  #     - targets: ['<grafana-ip>:3000']

After editing the configuration, validate it with promtool:

sudo -u prometheus promtool check config /etc/prometheus/prometheus.yml

Reload the configuration without restarting the service:

curl -X POST http://localhost:9090/-/reload

Adding Scrape Targets

To scrape metrics from another instance, edit /etc/prometheus/prometheus.yml and add a new job under scrape_configs. For example, to scrape a node-exporter:

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['10.0.1.20:9100', '10.0.1.21:9100']

Then reload the configuration:

curl -X POST http://localhost:9090/-/reload

Verify the new target appears on the Targets page within one scrape interval (15 seconds).

Data Storage

Prometheus TSDB data is stored on a dedicated 30 GiB EBS volume mounted at /var/lib/prometheus/data. The volume persists independently of the OS disk and can be resized via the AWS Console without stopping the instance.

Check available disk space:

df -h /var/lib/prometheus/data
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme1n1     30G   72K   28G   1% /var/lib/prometheus/data

The volume is mounted via its filesystem UUID in /etc/fstab for stability across reboots:

UUID=7cd3f9af-4bc9-48ab-a211-f157cafd753d /var/lib/prometheus/data ext4 defaults,nofail 0 2

Retention Configuration

By default, Prometheus retains 15 days of data. To change the retention period, edit the systemd service file:

sudo systemctl edit prometheus.service

Add an override to set a custom retention period, for example 30 days:

[Service]
ExecStart=
ExecStart=/usr/local/bin/prometheus \
    --config.file=/etc/prometheus/prometheus.yml \
    --storage.tsdb.path=/var/lib/prometheus/data \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --web.listen-address=0.0.0.0:9090 \
    --web.enable-lifecycle \
    --storage.tsdb.retention.time=30d

Then reload systemd and restart:

sudo systemctl daemon-reload
sudo systemctl restart prometheus.service

Alerting Overview

Prometheus evaluates alerting rules defined in separate rule files. To create an alerting rule, create a rules file:

sudo nano /etc/prometheus/rules/alerts.yml

Example rule that fires when a target is down for more than 5 minutes:

groups:
  - name: availability
    rules:
      - alert: TargetDown
        expr: up == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Target {{ $labels.job }} is down"
          description: "{{ $labels.instance }} has been down for more than 5 minutes."

Reference the rules file in prometheus.yml:

rule_files:
  - /etc/prometheus/rules/*.yml

Then reload Prometheus to pick up the new rules:

curl -X POST http://localhost:9090/-/reload

View active alerts at http://<instance-public-ip>:9090/alerts.

Health Endpoints

Prometheus exposes two health check endpoints:

Endpoint Purpose
/-/healthy Returns Prometheus Server is Healthy. if the server is running
/-/ready Returns Prometheus Server is Ready. when ready to serve traffic
curl http://localhost:9090/-/healthy
Prometheus Server is Healthy.

Service Management

Action Command
Start sudo systemctl start prometheus.service
Stop sudo systemctl stop prometheus.service
Restart sudo systemctl restart prometheus.service
Reload config curl -X POST http://localhost:9090/-/reload
Status sudo systemctl status prometheus.service
View logs sudo journalctl -u prometheus.service -f

Support

For technical support with this image, contact cloudimg at support@cloudimg.co.uk. cloudimg provides 24/7 support for deployment, configuration, PromQL queries, alerting rules and Grafana integration.