TensorFlow Serving 2 on AWS User Guide
TensorFlow Serving 2 on AWS User Guide
TensorFlow Serving 2 is Google's flexible, high-performance serving system for machine learning models. This image delivers TF Serving 2 running under Docker, with the canonical half_plus_two sample SavedModel preloaded on a dedicated 20 GiB model volume and an nginx basic-auth gateway protecting the REST API on port 80.
Connecting to your instance
| OS variant | Default login user | Connect command |
|---|---|---|
| Ubuntu 24.04 | ubuntu |
ssh -i your-key.pem ubuntu@<instance-public-ip> |
First boot
On first boot a one-shot systemd service (tfserving-firstboot.service) generates a fresh per-instance nginx password and writes it to /root/tensorflow-serving-credentials.txt. The TensorFlow Serving container and nginx then start automatically once firstboot completes.
Retrieve your credentials:
sudo cat /root/tensorflow-serving-credentials.txt
Example output:
# TensorFlow Serving 2 -- Per-VM Credentials
# Generated: Wed May 27 22:15:49 UTC 2026
TFSERVING_VERSION=2.19.1
NGINX_USER=cloudimg
password=5802da51dc7fd2e3616d5a8072c2134feeab80f0eb3137f0
TFSERVING_REST_URL=http://32.198.94.22:8501
TFSERVING_GRPC_URL=32.198.94.22:8500
TFSERVING_REST_GATED_URL=http://32.198.94.22/v1
TFSERVING_SAMPLE_MODEL=half_plus_two
Verifying the service
Check that the systemd unit and Docker container are running:
systemctl status tfserving.service
Example output:
* tfserving.service - TensorFlow Serving 2 Model Server (cloudimg)
Loaded: loaded (/etc/systemd/system/tfserving.service; enabled; preset: enabled)
Active: active (exited) since Wed 2026-05-27 22:15:52 UTC; 18s ago
Process: 33573 ExecStart=/usr/bin/docker compose -f /opt/tfserving/docker-compose.yml up -d (code=exited, status=0/SUCCESS)
Main PID: 33573 (code=exited, status=0/SUCCESS)
CPU: 99ms
May 27 22:15:52 ip-172-31-85-204 systemd[1]: Starting tfserving.service ...
May 27 22:15:52 ip-172-31-85-204 docker[33587]: Container tfserving Started
May 27 22:15:52 ip-172-31-85-204 systemd[1]: Finished tfserving.service ...
Check the model server version:
docker exec tfserving tensorflow_model_server --version
Output:
TensorFlow ModelServer: 2.19.1-rc0
TensorFlow Library: 2.19.1
Querying the model server
Health check (basic-auth gated, port 80)
PASS=$(sudo awk -F= '/^password=/{print $2}' /root/tensorflow-serving-credentials.txt)
curl -s -u "cloudimg:${PASS}" http://127.0.0.1/v1/models/half_plus_two | python3 -m json.tool
Output:
{
"model_version_status": [
{
"version": "1",
"state": "AVAILABLE",
"status": {
"error_code": "OK",
"error_message": ""
}
}
]
}
REST predict (half_plus_two: y = 0.5x + 2)
PASS=$(sudo awk -F= '/^password=/{print $2}' /root/tensorflow-serving-credentials.txt)
curl -s -u "cloudimg:${PASS}" \
-X POST -H 'Content-Type: application/json' \
-d '{"instances": [1.0, 2.0, 5.0]}' \
http://127.0.0.1/v1/models/half_plus_two:predict | python3 -m json.tool
Output:
{
"predictions": [
2.5,
3.0,
4.5
]
}
Raw REST endpoint (port 8501, unauthenticated)
Port 8501 is also published directly on the host for clients that need to bypass nginx:
curl -s http://127.0.0.1:8501/v1/models/half_plus_two | python3 -m json.tool
gRPC endpoint (port 8500, unauthenticated)
The gRPC endpoint is available on port 8500. TF Serving has no built-in gRPC authentication -- front it with an API gateway or service mesh for public-facing workloads.
Verify the port is open:
nc -zv 127.0.0.1 8500
Output:
Connection to 127.0.0.1 8500 port [tcp/*] succeeded!
Serving your own SavedModel
TF Serving expects models in the layout <model_base>/<model_name>/<version>/saved_model.pb.
- Copy your SavedModel to the data volume:
sudo mkdir -p /var/lib/tfserving/models/my_model/1/
sudo cp -r /path/to/saved_model.pb /var/lib/tfserving/models/my_model/1/
sudo chown -R root:root /var/lib/tfserving/models/my_model/
- Update the compose file to load your model name:
sudo sed -i 's/MODEL_NAME: half_plus_two/MODEL_NAME: my_model/' /opt/tfserving/docker-compose.yml
- Restart the stack:
sudo systemctl restart tfserving.service
- Query your model:
PASS=$(sudo awk -F= '/^password=/{print $2}' /root/tensorflow-serving-credentials.txt)
curl -s -u "cloudimg:${PASS}" http://127.0.0.1/v1/models/my_model | python3 -m json.tool
Model storage volume
Model files live on a dedicated 20 GiB gp3 EBS volume mounted at /var/lib/tfserving. This volume is independently resizable from the OS disk.
df -h /var/lib/tfserving
Output:
Filesystem Size Used Avail Use% Mounted on
/dev/nvme1n1 20G 64K 19G 1% /var/lib/tfserving
To expand the volume, resize it in the EC2 console and then run sudo resize2fs /dev/nvme1n1.
Managing the service
Start, stop, or restart TensorFlow Serving:
sudo systemctl start tfserving.service
sudo systemctl stop tfserving.service
sudo systemctl restart tfserving.service
View container logs:
docker logs tfserving --tail 50
Enabling TLS
For production use, terminate TLS at nginx with a certificate from Let's Encrypt or your own CA:
- Install certbot:
sudo apt install certbot python3-certbot-nginx - Obtain a certificate:
sudo certbot --nginx -d your-domain.example.com - Certbot will update the nginx site automatically.
Security recommendations
- Ports 8500 and 8501 are published unauthenticated on the host. Restrict the security group rules for those ports to trusted internal CIDRs, or remove them and route all traffic through nginx on port 80.
- The nginx basic-auth password is stored in
/root/tensorflow-serving-credentials.txt(root-only). For production deployments, replace Basic auth with a more robust mechanism such as mTLS, OAuth2 proxy, or an API gateway.
Stopping and restarting on boot
TensorFlow Serving and nginx are enabled on boot via systemd. If you need to prevent the model server from starting on the next boot:
sudo systemctl disable tfserving.service
Support
This image is supported by cloudimg. For technical assistance, contact support@cloudimg.co.uk.