Apache Flink on AWS User Guide
Overview
Apache Flink is the open source framework and distributed processing engine for stateful computations over unbounded and bounded data streams. This image runs Flink as a single node standalone cluster: a JobManager and a TaskManager run together on one instance, each as its own systemd service, ready for development, testing, single tenant production streaming, and proof of concept workloads.
The Flink distribution is installed at /opt/flink (a symlink to the versioned /opt/flink-2.2.1 directory). A dedicated unprivileged flink system user owns the install and runs both services. Flink's runtime working directories, the checkpoint and savepoint directories, the web upload directory, the temporary directories and the logs, live on a separate EBS data volume mounted at /opt/flink-data so that the working set can be resized independently of the operating system disk.
The JobManager Web Dashboard, which is also the Flink REST API, listens on port 8081. Apache Flink has no built in user accounts or password authentication, so there is no per instance secret to retrieve. On the first boot of every deployed instance a one shot service records the dashboard address and the cluster paths to an information file at /stage/scripts/flink-info.log and confirms the cluster is up. Because Flink itself is unauthenticated, the security responsibility is yours: restrict the port 8081 security group rule to trusted networks, or place the dashboard behind a TLS terminating load balancer.
Prerequisites
Before you deploy this image you need:
- An Amazon Web Services account where you can launch EC2 instances
- IAM permissions to launch instances, create security groups, and subscribe to AWS Marketplace products
- An EC2 key pair in the target Region for SSH access to the instance
- A VPC and subnet in the target Region, with a security group allowing inbound port 22 from your management network and inbound port 8081 from the trusted networks that need the Flink Web Dashboard
- The AWS CLI (version 2) installed locally if you plan to deploy from the command line
Step 1: Launch the Instance from the AWS Marketplace
Sign in to the AWS Management Console, open the EC2 service, and select Launch instance. Under Application and OS Images choose AWS Marketplace AMIs and search for Apache Flink. Select the cloudimg listing and choose Select, then Continue on the subscription summary.
Pick an instance type of m5.large or larger — stream processing is memory and CPU sensitive. Choose your EC2 key pair under Key pair (login). Under Network settings select your VPC and subnet, and either create or select a security group that allows inbound port 22 from your management network and inbound port 8081 from the trusted networks that need the Web Dashboard. Leave the root volume at the default size or larger.
Select Launch instance. First boot initialisation takes well under a minute after the instance state becomes Running and the status checks pass.
Step 2: Launch the Instance from the AWS CLI
The following block launches an instance from the cloudimg Apache Flink Marketplace AMI into an existing subnet and security group. Replace <ami-id> with the AMI ID shown on the Marketplace listing, <key-name> with your EC2 key pair name, <subnet-id> with your subnet ID, and <security-group-id> with a security group that opens ports 22 and 8081 as described above.
aws ec2 run-instances \
--image-id <ami-id> \
--instance-type m5.large \
--key-name <key-name> \
--subnet-id <subnet-id> \
--security-group-ids <security-group-id> \
--block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":30,"VolumeType":"gp3"}}]' \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=apache-flink-01}]'
The command prints a JSON document on success. Note the instance ID, then retrieve its public address once it is running with aws ec2 describe-instances --instance-ids <instance-id> --query "Reservations[].Instances[].PublicIpAddress" --output text.
Step 3: Connect and Review the Cluster Information File
Connect over SSH with the key pair you selected and the public IP address from step 2. The SSH login user depends on the operating system of the AMI variant you launched:
| AMI variant | SSH login user |
|---|---|
| Apache Flink 2 on Ubuntu 24.04 | ubuntu |
The first boot service runs before the SSH daemon becomes ready, so the cluster information file is always in place when you log in for the first time.
sudo cat /stage/scripts/flink-info.log
The file is plain text. It records the dashboard URL, the REST API endpoint, the Flink home directory, the data volume mount point, and the names of the two systemd services. Apache Flink has no built in passwords, so this file holds no secret — it is an operator reference.
Step 4: Check the Flink Services and Version
The JobManager and the TaskManager each run as a systemd service. Confirm both are active:
sudo systemctl is-active flink-jobmanager.service flink-taskmanager.service
Both lines report active. Check the installed Flink release with the command line client:
cd /tmp && sudo -u flink /opt/flink/bin/flink --version
The Java runtime that Flink runs on is OpenJDK 17:
java -version
Step 5: REST API Health
The JobManager REST API is the programmatic face of the cluster. The overview endpoint reports the cluster health in one call:
curl -fsS http://127.0.0.1:8081/overview
The JSON response returns flink-version, the count of registered TaskManagers, the total and available task slots, and counters for running, finished, cancelled, and failed jobs. The registered TaskManagers are listed in detail at the /taskmanagers endpoint:
curl -fsS http://127.0.0.1:8081/taskmanagers | head -c 320
Step 6: The Flink Web Dashboard — Cluster Overview
Open a web browser and navigate to http://<public-ip>:8081/. The Web Dashboard opens on the cluster Overview, which shows the available task slots, the running job count, the list of running jobs and the list of completed jobs, with the Flink version and commit in the header.

Because Apache Flink has no authentication, anyone who can reach port 8081 has full control of the cluster, including the ability to submit and cancel jobs. Keep the port 8081 security group rule restricted to trusted networks, and read Step 12 before exposing the dashboard more widely.
Step 7: Task Managers
The Task Managers page in the left navigation lists every registered TaskManager with its path and ID, data port, last heartbeat, total and free task slots, CPU cores and memory. On the cloudimg image you start with one TaskManager exposing two task slots.

Selecting a TaskManager opens its detail view with per TaskManager metrics, logs, and thread dump. You scale the cluster out by adding more TaskManager processes on additional instances and pointing them at this JobManager — see Step 11.
Step 8: Submit a Job from the Command Line
The Flink command line client at /opt/flink/bin/flink submits, lists and cancels jobs. The Flink distribution ships a set of example jobs under /opt/flink/examples. The streaming TopSpeedWindowing example is a continuous job that is useful for confirming the cluster end to end. Submit it in detached mode so the client returns immediately:
cd /tmp && sudo -u flink /opt/flink/bin/flink run -d /opt/flink/examples/streaming/TopSpeedWindowing.jar
The client prints the assigned JobID. List the running jobs to confirm it was accepted:
cd /tmp && sudo -u flink /opt/flink/bin/flink list
Step 9: Watch a Running Job in the Dashboard
Back in the Web Dashboard, select Jobs then Running Jobs and open the job you submitted. The job view shows the job graph — the streaming dataflow as a directed graph of operators — along with the job state, the start time and duration, and a per subtask metrics table with records and bytes flowing through each operator.

The Submit New Job page in the left navigation does the same thing as the command line client through the browser: upload a job JAR, set the entry class, the program arguments and the parallelism, and submit. Job submission and cancellation through the dashboard are both enabled on this image.
Step 10: Cancel a Job
Cancel a running job from the command line by passing its JobID to flink cancel. The example below cancels every job currently running, so use it only when you intend to stop all of them. Replace the loop with a single flink cancel <job-id> to stop one specific job.
cd /tmp
for jid in $(sudo -u flink /opt/flink/bin/flink list -r 2>/dev/null | grep -oE '[0-9a-f]{32}'); do
sudo -u flink /opt/flink/bin/flink cancel "$jid"
done
A job can also be cancelled from its job view in the Web Dashboard with the Cancel Job control in the top right.
Step 11: Scale Out and Tune the Cluster
The cluster configuration is /opt/flink/conf/config.yaml. After editing it, restart the services so the change takes effect:
sudo systemctl restart flink-jobmanager.service flink-taskmanager.service
The defaults on this image are sized for a modest instance:
jobmanager.memory.process.size:1600mtaskmanager.memory.process.size:2048mtaskmanager.numberOfTaskSlots:2parallelism.default:1
For a larger instance type, raise taskmanager.memory.process.size and taskmanager.numberOfTaskSlots to match the available memory and vCPUs. To scale beyond one instance, launch further instances, install Flink, and start additional TaskManager processes pointed at this JobManager's RPC address — the cluster then has more task slots and can run jobs at higher parallelism. For production workloads, run the JobManager in a highly available configuration with multiple JobManagers and a durable metadata store, as described in the official Flink documentation.
Step 12: Secure the Dashboard Port
Apache Flink does not authenticate Web Dashboard or REST API requests, so the network is your security boundary. Take at least one of these measures before the instance is reachable from any untrusted network:
- Keep the port 8081 inbound rule in the instance's security group restricted to specific trusted CIDR ranges, never
0.0.0.0/0 - Place the dashboard behind an Application Load Balancer that terminates TLS and is itself protected, for example by AWS WAF or by an authentication layer
- Reach the dashboard over an SSH tunnel and leave port 8081 closed to the internet entirely:
ssh -L 8081:127.0.0.1:8081 <login-user>@<public-ip>
With the tunnel open, browse to http://127.0.0.1:8081/ on your workstation.
Step 13: Checkpoints and Savepoints
Flink's fault tolerance is built on checkpoints, periodic snapshots of running job state, and savepoints, durable snapshots that you trigger explicitly for upgrades and migrations. On this image both directories live on the dedicated data volume, under /opt/flink-data/checkpoints and /opt/flink-data/savepoints.
The example below picks the first running job from the REST API and triggers a savepoint for it, or prints a message when nothing is running:
JOB_ID=$(curl -fsS -m 5 http://127.0.0.1:8081/jobs/overview | grep -oE '"jid":"[^"]+"' | head -1 | cut -d'"' -f4)
if [ -n "$JOB_ID" ]; then cd /tmp && sudo -u flink /opt/flink/bin/flink savepoint "$JOB_ID"; else echo "no running jobs to snapshot"; fi
Step 14: Backups and Maintenance
The state that matters on a Flink instance is the savepoints, and any job JARs and configuration you have added. Archive the data volume directory and copy the archive to durable off instance storage:
sudo tar -czf /var/backups/flink-data-$(date +%F).tgz -C /opt flink-data
Ship the archive to an Amazon S3 bucket or another object store, and synchronise the savepoints directory on a schedule for off instance retention.
For kernel and package updates, Ubuntu's unattended-upgrades is enabled by default, so operating system security patches apply automatically. To move to a newer Flink release, download the new binary distribution from https://flink.apache.org/, take a savepoint of each running job, stop the services, swap the /opt/flink symlink to the new versioned directory, carry your configuration across, restart the services, and resume each job from its savepoint.
Step 15: Logs and Troubleshooting
The JobManager and the TaskManager log to the systemd journal and to log files on the data volume:
sudo journalctl -u flink-jobmanager.service --no-pager -n 100
sudo journalctl -u flink-taskmanager.service --no-pager -n 100
sudo ls -1 /opt/flink-data/log/
If a job fails, its exceptions are shown on the Exceptions tab of the job view in the Web Dashboard. If the TaskManager does not appear on the Task Managers page, confirm flink-taskmanager.service is active and check its journal for connection errors to the JobManager.
Support
cloudimg provides 24/7/365 expert technical support for this image. Guaranteed response within 24 hours, one hour average for critical issues. Contact support@cloudimg.co.uk.
For general Apache Flink questions consult the official documentation at https://flink.apache.org/ and the project community resources linked from there.