TensorFlow Serving on AWS User Guide
TensorFlow Serving on AWS User Guide
This guide walks through launching the cloudimg TensorFlow Serving AMI on AWS, retrieving the per-instance nginx basic-auth password, querying the bundled half_plus_two sample model, and swapping in your own TensorFlow SavedModel.
cloudimg's image bundles:
- Docker CE and the docker compose plugin (from Docker's official APT repository).
- The official
tensorflow/serving:latestCPU image, preloaded into the local Docker cache so the customer first boot never pulls from Docker Hub. - Google's canonical
half_plus_twoSavedModel (the formulay = 0.5 * x + 2) at/var/lib/tfserving/models/half_plus_two/1/. - An nginx basic-auth reverse proxy on port 80 that fronts the TF Serving REST API (
/v1/) so the model server is never exposed unauthenticated on the public internet. - A
tfserving-firstboot.servicesystemd unit that rotates the basic-auth password on first boot and writes the credentials to/root/tensorflow-serving-credentials.txt.
Connecting to your instance
After launching the AMI, connect over SSH on port 22 as the default login user for your operating system variant:
| OS variant | SSH login user |
|---|---|
| Ubuntu 24.04 | ubuntu |
The security group must allow inbound TCP 22 (SSH) and 80 (basic-auth-gated REST API). Ports 8500 (gRPC) and 8501 (raw REST) are also published by the AMI for advanced use; restrict or remove those rules in production deployments and front everything through nginx.
ssh -i /path/to/your-key.pem ubuntu@<INSTANCE_PUBLIC_IP>
Retrieve the per-instance basic-auth password
On first boot the image generates a fresh per-instance basic-auth password and writes it (along with the resolved instance URL and a sample curl command) to /root/tensorflow-serving-credentials.txt, readable only by root:
sudo cat /root/tensorflow-serving-credentials.txt
Example output (the password is per-instance):
# TensorFlow Serving — Per-VM Credentials
TFSERVING_VERSION=latest
TFSERVING_REST_URL=http://<PUBLIC_IP>/v1
TFSERVING_GRPC_URL=<PUBLIC_IP>:8500
TFSERVING_SAMPLE_MODEL=half_plus_two
TFSERVING_SAMPLE_PREDICT_URL=http://<PUBLIC_IP>/v1/models/half_plus_two:predict
username=cloudimg
password=<TFSERVING_BASIC_AUTH_PASSWORD>
For shell pipelines, extract the password into a variable:
PASS=$(sudo awk -F= '/^password/{print $2}' /root/tensorflow-serving-credentials.txt)
Check the bundled model is loaded
Query the TF Serving model status through the nginx basic-auth gateway on port 80:
curl -s -u cloudimg:$PASS http://127.0.0.1/v1/models/half_plus_two
You should see the bundled half_plus_two SavedModel reporting state=AVAILABLE:
{
"model_version_status": [
{
"version": "1",
"state": "AVAILABLE",
"status": {
"error_code": "OK",
"error_message": ""
}
}
]
}
If the same request is made without basic auth, nginx returns HTTP 401:
curl -s -o /dev/null -w 'HTTP %{http_code}\n' http://127.0.0.1/v1/models/half_plus_two
HTTP 401
Run a prediction against the bundled model
The bundled SavedModel implements y = 0.5 * x + 2. Send three inputs through the predict endpoint and TF Serving returns one prediction per input:
curl -s -u cloudimg:$PASS -X POST -H 'Content-Type: application/json' \
-d '{"instances":[1.0,2.0,5.0]}' \
http://127.0.0.1/v1/models/half_plus_two:predict
{
"predictions": [2.5, 3.0, 4.5
]
}
The output [2.5, 3.0, 4.5] confirms TF Serving correctly evaluates 0.5 * 1.0 + 2 = 2.5, 0.5 * 2.0 + 2 = 3.0, and 0.5 * 5.0 + 2 = 4.5.
Inspect the running container
The model server runs as a single tfserving Docker container managed by docker compose and supervised by tfserving.service:
sudo docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
NAMES STATUS PORTS
tfserving Up 18 seconds 0.0.0.0:8500-8501->8500-8501/tcp, [::]:8500-8501->8500-8501/tcp
8500 is the gRPC predict endpoint; 8501 is the raw REST predict endpoint published by the container. nginx on port 80 proxies /v1/ to 127.0.0.1:8501/v1/ and enforces basic auth.
The systemd units cloudimg ships:
systemctl is-active docker tfserving nginx tfserving-firstboot
active
active
active
active
Predict from a gRPC client (port 8500)
The container also publishes a gRPC predict endpoint on port 8500. TF Serving's gRPC API is unauthenticated by design — front it with a service mesh or restrict the security group rule to a private CIDR in production.
A minimal Python client uses the tensorflow-serving-api package (installable via pip on the client machine, not on the AMI itself):
import grpc
from tensorflow_serving.apis import predict_pb2, prediction_service_pb2_grpc
import tensorflow as tf
channel = grpc.insecure_channel('<INSTANCE_PUBLIC_IP>:8500')
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
req = predict_pb2.PredictRequest()
req.model_spec.name = 'half_plus_two'
req.inputs['x'].CopyFrom(tf.make_tensor_proto([1.0, 2.0, 5.0], shape=[3]))
resp = stub.Predict(req, timeout=10.0)
print(list(resp.outputs['y'].float_val)) # [2.5, 3.0, 4.5]
Swap in your own TensorFlow SavedModel
TF Serving requires a strict directory layout: <model_base>/<model_name>/<version>/saved_model.pb. To add a new model named my_model:
sudo mkdir -p /var/lib/tfserving/models/my_model/1
sudo cp -r /path/to/your/exported/saved_model.pb /var/lib/tfserving/models/my_model/1/
sudo cp -r /path/to/your/exported/variables /var/lib/tfserving/models/my_model/1/
sudo chown -R root:root /var/lib/tfserving/models/my_model
To serve multiple models, replace the single MODEL_NAME=half_plus_two environment variable in /opt/tfserving/docker-compose.yml with a --model_config_file argument pointing to a TF Serving models.config that lists every model name and base path. Apply the change with:
sudo systemctl restart tfserving.service
Replace the basic-auth password or add users
The image generates a single basic-auth user (cloudimg) on first boot. To rotate the password or add additional users, edit /etc/nginx/.htpasswd directly with htpasswd:
sudo htpasswd /etc/nginx/.htpasswd cloudimg # rotate
sudo htpasswd /etc/nginx/.htpasswd analytics-team # add
sudo systemctl reload nginx.service
Then update /root/tensorflow-serving-credentials.txt to reflect the new password if you want the sample curl line to keep working.
Terminate the gRPC and raw REST ports for public deployments
For a public-facing deployment, narrow the security group to TCP 22 and 80 only — the gRPC port 8500 and the raw REST port 8501 are still bound on the host but no client can reach them. The basic-auth gateway on 80 becomes the only path to the model server.
If you also want to terminate TLS, point nginx at a Let's Encrypt certificate (the cloudimg AMI ships certbot via apt if you sudo apt-get install -y certbot python3-certbot-nginx once your DNS resolves to the instance), and add a listen 443 ssl; block alongside the existing port-80 server in /etc/nginx/sites-available/tfserving.
Updating TF Serving
To move to a newer TF Serving release, pull the new image tag and recreate the container:
sudo docker pull tensorflow/serving:latest
sudo systemctl restart tfserving.service
The bundled half_plus_two SavedModel and any models you've added under /var/lib/tfserving/models/ survive the restart because they live on the host filesystem and are bind-mounted into the container.
Support
24/7 cloudimg technical support: open a ticket at https://www.cloudimg.co.uk/support/ for help with TF Serving deployment, model swaps, nginx hardening, TLS termination, or scaling the AMI behind an Application Load Balancer.