Applications AWS

TensorFlow Serving on AWS User Guide

TensorFlow Serving on AWS User Guide

This guide walks through launching the cloudimg TensorFlow Serving AMI on AWS, retrieving the per-instance nginx basic-auth password, querying the bundled half_plus_two sample model, and swapping in your own TensorFlow SavedModel.

cloudimg's image bundles:

  • Docker CE and the docker compose plugin (from Docker's official APT repository).
  • The official tensorflow/serving:latest CPU image, preloaded into the local Docker cache so the customer first boot never pulls from Docker Hub.
  • Google's canonical half_plus_two SavedModel (the formula y = 0.5 * x + 2) at /var/lib/tfserving/models/half_plus_two/1/.
  • An nginx basic-auth reverse proxy on port 80 that fronts the TF Serving REST API (/v1/) so the model server is never exposed unauthenticated on the public internet.
  • A tfserving-firstboot.service systemd unit that rotates the basic-auth password on first boot and writes the credentials to /root/tensorflow-serving-credentials.txt.

Connecting to your instance

After launching the AMI, connect over SSH on port 22 as the default login user for your operating system variant:

OS variant SSH login user
Ubuntu 24.04 ubuntu

The security group must allow inbound TCP 22 (SSH) and 80 (basic-auth-gated REST API). Ports 8500 (gRPC) and 8501 (raw REST) are also published by the AMI for advanced use; restrict or remove those rules in production deployments and front everything through nginx.

ssh -i /path/to/your-key.pem ubuntu@<INSTANCE_PUBLIC_IP>

Retrieve the per-instance basic-auth password

On first boot the image generates a fresh per-instance basic-auth password and writes it (along with the resolved instance URL and a sample curl command) to /root/tensorflow-serving-credentials.txt, readable only by root:

sudo cat /root/tensorflow-serving-credentials.txt

Example output (the password is per-instance):

# TensorFlow Serving — Per-VM Credentials
TFSERVING_VERSION=latest
TFSERVING_REST_URL=http://<PUBLIC_IP>/v1
TFSERVING_GRPC_URL=<PUBLIC_IP>:8500
TFSERVING_SAMPLE_MODEL=half_plus_two
TFSERVING_SAMPLE_PREDICT_URL=http://<PUBLIC_IP>/v1/models/half_plus_two:predict
username=cloudimg
password=<TFSERVING_BASIC_AUTH_PASSWORD>

For shell pipelines, extract the password into a variable:

PASS=$(sudo awk -F= '/^password/{print $2}' /root/tensorflow-serving-credentials.txt)

Check the bundled model is loaded

Query the TF Serving model status through the nginx basic-auth gateway on port 80:

curl -s -u cloudimg:$PASS http://127.0.0.1/v1/models/half_plus_two

You should see the bundled half_plus_two SavedModel reporting state=AVAILABLE:

{
 "model_version_status": [
  {
   "version": "1",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  }
 ]
}

If the same request is made without basic auth, nginx returns HTTP 401:

curl -s -o /dev/null -w 'HTTP %{http_code}\n' http://127.0.0.1/v1/models/half_plus_two
HTTP 401

Run a prediction against the bundled model

The bundled SavedModel implements y = 0.5 * x + 2. Send three inputs through the predict endpoint and TF Serving returns one prediction per input:

curl -s -u cloudimg:$PASS -X POST -H 'Content-Type: application/json' \
  -d '{"instances":[1.0,2.0,5.0]}' \
  http://127.0.0.1/v1/models/half_plus_two:predict
{
    "predictions": [2.5, 3.0, 4.5
    ]
}

The output [2.5, 3.0, 4.5] confirms TF Serving correctly evaluates 0.5 * 1.0 + 2 = 2.5, 0.5 * 2.0 + 2 = 3.0, and 0.5 * 5.0 + 2 = 4.5.

Inspect the running container

The model server runs as a single tfserving Docker container managed by docker compose and supervised by tfserving.service:

sudo docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
NAMES       STATUS          PORTS
tfserving   Up 18 seconds   0.0.0.0:8500-8501->8500-8501/tcp, [::]:8500-8501->8500-8501/tcp

8500 is the gRPC predict endpoint; 8501 is the raw REST predict endpoint published by the container. nginx on port 80 proxies /v1/ to 127.0.0.1:8501/v1/ and enforces basic auth.

The systemd units cloudimg ships:

systemctl is-active docker tfserving nginx tfserving-firstboot
active
active
active
active

Predict from a gRPC client (port 8500)

The container also publishes a gRPC predict endpoint on port 8500. TF Serving's gRPC API is unauthenticated by design — front it with a service mesh or restrict the security group rule to a private CIDR in production.

A minimal Python client uses the tensorflow-serving-api package (installable via pip on the client machine, not on the AMI itself):

import grpc
from tensorflow_serving.apis import predict_pb2, prediction_service_pb2_grpc
import tensorflow as tf

channel = grpc.insecure_channel('<INSTANCE_PUBLIC_IP>:8500')
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

req = predict_pb2.PredictRequest()
req.model_spec.name = 'half_plus_two'
req.inputs['x'].CopyFrom(tf.make_tensor_proto([1.0, 2.0, 5.0], shape=[3]))

resp = stub.Predict(req, timeout=10.0)
print(list(resp.outputs['y'].float_val))  # [2.5, 3.0, 4.5]

Swap in your own TensorFlow SavedModel

TF Serving requires a strict directory layout: <model_base>/<model_name>/<version>/saved_model.pb. To add a new model named my_model:

sudo mkdir -p /var/lib/tfserving/models/my_model/1
sudo cp -r /path/to/your/exported/saved_model.pb /var/lib/tfserving/models/my_model/1/
sudo cp -r /path/to/your/exported/variables    /var/lib/tfserving/models/my_model/1/
sudo chown -R root:root /var/lib/tfserving/models/my_model

To serve multiple models, replace the single MODEL_NAME=half_plus_two environment variable in /opt/tfserving/docker-compose.yml with a --model_config_file argument pointing to a TF Serving models.config that lists every model name and base path. Apply the change with:

sudo systemctl restart tfserving.service

Replace the basic-auth password or add users

The image generates a single basic-auth user (cloudimg) on first boot. To rotate the password or add additional users, edit /etc/nginx/.htpasswd directly with htpasswd:

sudo htpasswd /etc/nginx/.htpasswd cloudimg          # rotate
sudo htpasswd /etc/nginx/.htpasswd analytics-team    # add
sudo systemctl reload nginx.service

Then update /root/tensorflow-serving-credentials.txt to reflect the new password if you want the sample curl line to keep working.

Terminate the gRPC and raw REST ports for public deployments

For a public-facing deployment, narrow the security group to TCP 22 and 80 only — the gRPC port 8500 and the raw REST port 8501 are still bound on the host but no client can reach them. The basic-auth gateway on 80 becomes the only path to the model server.

If you also want to terminate TLS, point nginx at a Let's Encrypt certificate (the cloudimg AMI ships certbot via apt if you sudo apt-get install -y certbot python3-certbot-nginx once your DNS resolves to the instance), and add a listen 443 ssl; block alongside the existing port-80 server in /etc/nginx/sites-available/tfserving.

Updating TF Serving

To move to a newer TF Serving release, pull the new image tag and recreate the container:

sudo docker pull tensorflow/serving:latest
sudo systemctl restart tfserving.service

The bundled half_plus_two SavedModel and any models you've added under /var/lib/tfserving/models/ survive the restart because they live on the host filesystem and are bind-mounted into the container.

Support

24/7 cloudimg technical support: open a ticket at https://www.cloudimg.co.uk/support/ for help with TF Serving deployment, model swaps, nginx hardening, TLS termination, or scaling the AMI behind an Application Load Balancer.