Prometheus on AWS User Guide
Prometheus on AWS User Guide
This guide covers connecting to your Prometheus instance, verifying the service, using the web interface, writing PromQL queries, adding scrape targets, and configuring alerting.
Prerequisites
- An AWS account with an active subscription to the cloudimg Prometheus AMI.
- An EC2 instance launched from the AMI with port 9090 open in its security group.
- An SSH key pair associated with the instance.
Connecting to Your Instance
Connect via SSH on port 22 using the key pair you selected at launch:
| OS variant | SSH login user |
|---|---|
| Ubuntu 24.04 | ubuntu |
ssh -i /path/to/your-key.pem ubuntu@<instance-public-ip>
Endpoint Information
On first boot, Prometheus writes a summary of the endpoint URLs to /root/prometheus-info.txt.
View it with:
sudo cat /root/prometheus-info.txt
Example output:
prometheus.version=3.11.3
prometheus.web.ui=http://172.31.95.189:9090
prometheus.config=/etc/prometheus/prometheus.yml
prometheus.data=/var/lib/prometheus/data
Checking the Service Status
Verify that the Prometheus service is running:
sudo systemctl status prometheus.service
Expected output:
● prometheus.service - Prometheus
Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; preset: enabled)
Active: active (running) since Wed 2026-05-27 19:48:50 UTC; 1min ago
Docs: https://prometheus.io/docs/
Main PID: 32366 (prometheus)
Tasks: 8 (limit: 4586)
Memory: 31.2M (peak: 31.6M)
CPU: 326ms
CGroup: /system.slice/prometheus.service
└─32366 /usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/data \
--web.listen-address=0.0.0.0:9090
Check the installed version:
prometheus --version
prometheus, version 3.11.3 (branch: HEAD, revision: eb173f5256d4022afba1e9bc3d19740a76859fae)
build user: root@83aad33dd38e
build date: 20260427-14:45:32
go version: go1.26.2
platform: linux/amd64
tags: netgo,builtinassets
Web Interface
Browse to the Prometheus expression browser at:
http://<instance-public-ip>:9090
The web interface provides query execution, target health monitoring and built-in alerting rules management.
Targets Page
Navigate to Status > Target health (or /targets) to view all configured scrape targets
and their current health:

By default, Prometheus scrapes its own metrics endpoint at localhost:9090/metrics every
15 seconds. The target health shows UP when the last scrape succeeded.
Expression Browser / PromQL
Navigate to the Query tab (or /graph) to run PromQL queries:

PromQL Basics
PromQL (Prometheus Query Language) is used to select and aggregate time series data.
Check which targets are up (returns 1 for UP, 0 for DOWN):
up
Query the total number of HTTP requests (if you have a web server scraped):
http_requests_total
Rate of requests per second over the last 5 minutes:
rate(http_requests_total[5m])
Average CPU usage across all scraped instances:
avg(rate(process_cpu_seconds_total[5m])) by (job)
Run queries via the API:
curl -fsS 'http://localhost:9090/api/v1/query?query=up'
Configuration File
The main Prometheus configuration is at /etc/prometheus/prometheus.yml:
sudo cat /etc/prometheus/prometheus.yml
# cloudimg Prometheus 3 default config
# Add additional scrape jobs below the existing 'prometheus' self-monitoring entry.
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: cloudimg-prometheus
scrape_configs:
# Prometheus monitors itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Example: uncomment + edit to scrape a node-exporter
# - job_name: 'node'
# static_configs:
# - targets: ['10.0.1.20:9100']
# Example: uncomment + edit to scrape a Grafana instance
# - job_name: 'grafana'
# static_configs:
# - targets: ['<grafana-ip>:3000']
After editing the configuration, validate it with promtool:
sudo -u prometheus promtool check config /etc/prometheus/prometheus.yml
Reload the configuration without restarting the service:
curl -X POST http://localhost:9090/-/reload
Adding Scrape Targets
To scrape metrics from another instance, edit /etc/prometheus/prometheus.yml and add a
new job under scrape_configs. For example, to scrape a node-exporter:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['10.0.1.20:9100', '10.0.1.21:9100']
Then reload the configuration:
curl -X POST http://localhost:9090/-/reload
Verify the new target appears on the Targets page within one scrape interval (15 seconds).
Data Storage
Prometheus TSDB data is stored on a dedicated 30 GiB EBS volume mounted at
/var/lib/prometheus/data. The volume persists independently of the OS disk and can be
resized via the AWS Console without stopping the instance.
Check available disk space:
df -h /var/lib/prometheus/data
Filesystem Size Used Avail Use% Mounted on
/dev/nvme1n1 30G 72K 28G 1% /var/lib/prometheus/data
The volume is mounted via its filesystem UUID in /etc/fstab for stability across reboots:
UUID=7cd3f9af-4bc9-48ab-a211-f157cafd753d /var/lib/prometheus/data ext4 defaults,nofail 0 2
Retention Configuration
By default, Prometheus retains 15 days of data. To change the retention period, edit the systemd service file:
sudo systemctl edit prometheus.service
Add an override to set a custom retention period, for example 30 days:
[Service]
ExecStart=
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/data \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle \
--storage.tsdb.retention.time=30d
Then reload systemd and restart:
sudo systemctl daemon-reload
sudo systemctl restart prometheus.service
Alerting Overview
Prometheus evaluates alerting rules defined in separate rule files. To create an alerting rule, create a rules file:
sudo nano /etc/prometheus/rules/alerts.yml
Example rule that fires when a target is down for more than 5 minutes:
groups:
- name: availability
rules:
- alert: TargetDown
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Target {{ $labels.job }} is down"
description: "{{ $labels.instance }} has been down for more than 5 minutes."
Reference the rules file in prometheus.yml:
rule_files:
- /etc/prometheus/rules/*.yml
Then reload Prometheus to pick up the new rules:
curl -X POST http://localhost:9090/-/reload
View active alerts at http://<instance-public-ip>:9090/alerts.
Health Endpoints
Prometheus exposes two health check endpoints:
| Endpoint | Purpose |
|---|---|
/-/healthy |
Returns Prometheus Server is Healthy. if the server is running |
/-/ready |
Returns Prometheus Server is Ready. when ready to serve traffic |
curl http://localhost:9090/-/healthy
Prometheus Server is Healthy.
Service Management
| Action | Command |
|---|---|
| Start | sudo systemctl start prometheus.service |
| Stop | sudo systemctl stop prometheus.service |
| Restart | sudo systemctl restart prometheus.service |
| Reload config | curl -X POST http://localhost:9090/-/reload |
| Status | sudo systemctl status prometheus.service |
| View logs | sudo journalctl -u prometheus.service -f |
Support
For technical support with this image, contact cloudimg at support@cloudimg.co.uk. cloudimg provides 24/7 support for deployment, configuration, PromQL queries, alerting rules and Grafana integration.