Apache Cassandra on AWS User Guide
Overview
This image runs Apache Cassandra, the open source distributed NoSQL database built for scalability and high availability with no single point of failure. Cassandra is the only workload on the image, so the platform stays lean, predictable and easy to reason about. The current stable line of Apache Cassandra is provided, version 5.0, installed from the official Apache Cassandra package repository.
The image ships with password authentication and authorization switched on. The Cassandra configuration sets the PasswordAuthenticator, the CassandraAuthorizer and the CassandraRoleManager, so access to the database is controlled from the moment the node starts. On the first boot of your instance a one shot service generates a fresh, strong password for the cassandra superuser, unique to that instance, applies it to the database and writes it to /root/cassandra-credentials.txt, a file that only the root user can read. No shared or default database credentials ship in the image.
Cassandra data, the commit log and saved caches live on a dedicated storage volume mounted at /var/lib/cassandra. Keeping database files on their own volume means storage can be grown, snapshotted and backed up independently of the operating system disk. The node runs as a single node cluster by default; the final section of this guide explains how to add further nodes.
This is a headless image. Cassandra has no web interface; you administer it over SSH with the cqlsh CQL shell and the nodetool cluster utility, both covered below.
Prerequisites
Before you deploy this image you need:
- An Amazon Web Services account where you can launch EC2 instances
- IAM permissions to launch instances, create security groups, and subscribe to AWS Marketplace products
- An EC2 key pair in the target Region for SSH access to the instance
- A VPC and subnet in the target Region, with a security group allowing inbound port 22 from your management network
- The AWS CLI (version 2) installed locally if you plan to deploy from the command line
Recommended instance type: m5.large (2 vCPU, 8 GB RAM) or larger. Cassandra sizes its JVM heap from available memory and benefits from additional CPU and RAM for production workloads.
Step 1: Launch the Instance from the AWS Marketplace
Sign in to the AWS Management Console, open the EC2 service, and select Launch instance. Under Application and OS Images choose AWS Marketplace AMIs and search for Apache Cassandra. Select the cloudimg listing and choose Select, then Continue on the subscription summary.
Pick an instance type of m5.large or larger. Choose your EC2 key pair under Key pair (login). Under Network settings select your VPC and subnet, and either create or select a security group that allows inbound port 22 from your management network. Leave the root volume at the default size or larger; the Cassandra data volume is attached automatically from the image.
Select Launch instance. First boot initialisation, which generates the superuser password and starts Cassandra, takes a minute or two after the instance state becomes Running and the status checks pass.
Step 2: Launch the Instance from the AWS CLI
The following block launches an instance from the cloudimg Apache Cassandra Marketplace AMI into an existing subnet and security group. Replace <ami-id> with the AMI ID shown on the Marketplace listing, <key-name> with your EC2 key pair name, <subnet-id> with your subnet ID, and <security-group-id> with a security group that opens inbound port 22.
aws ec2 run-instances \
--image-id <ami-id> \
--instance-type m5.large \
--key-name <key-name> \
--subnet-id <subnet-id> \
--security-group-ids <security-group-id> \
--metadata-options HttpTokens=required \
--block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":30,"VolumeType":"gp3"}}]' \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=cassandra-01}]'
The command prints a JSON document on success. Note the instance ID, then retrieve its public address once it is running with aws ec2 describe-instances --instance-ids <instance-id> --query "Reservations[].Instances[].PublicIpAddress" --output text.
Step 3: Connect over SSH
Connect over SSH with the key pair you selected and the public IP address from step 2. The SSH login user depends on the operating system of the AMI variant you launched:
| AMI variant | SSH login user |
|---|---|
| Apache Cassandra 5.0 on Ubuntu 24.04 | ubuntu |
ssh <login-user>@<public-ip>
Wait until the instance has passed both EC2 status checks before connecting. The first boot service runs before the SSH daemon is ready, so Cassandra is initialised by the time you can log in.
Step 4: Retrieve the Generated Superuser Password
The first boot service generates a fresh cassandra superuser password for this instance and writes it, with the connection details, to /root/cassandra-credentials.txt. The file is readable only by the root user. Display it from your SSH session:
sudo cat /root/cassandra-credentials.txt
The file looks like this, with a unique password on your instance:
# Apache Cassandra 5.0 — Per-Instance Credentials
# Generated on first boot: Thu May 21 14:05:56 UTC 2026
#
# Open the CQL shell with:
# cqlsh-cloudimg -u cassandra -p '<password below>'
#
CASSANDRA_USER=cassandra
CASSANDRA_PASSWORD=<your generated password>
CQL_HOST=127.0.0.1
CQL_PORT=9042
CLUSTER_NAME=cloudimg-cassandra
The default cassandra / cassandra account that ships with Apache Cassandra is rotated away during first boot, so it no longer works. Use the generated password for every connection.
Step 5: Confirm the Service and the Listener
Cassandra runs under systemd as the cassandra service and starts automatically on boot. Confirm it is active:
systemctl is-active cassandra
The command prints active. Confirm the CQL native transport is listening. Cassandra binds to the loopback address 127.0.0.1 on port 9042 by default, so it is reachable from the instance itself:
ss -tln | grep 9042
You should see a listening socket on 127.0.0.1:9042.
Step 6: Connect with the cqlsh CQL Shell
The image provides cqlsh-cloudimg, a wrapper that launches the Cassandra CQL shell against the local node. Open an interactive session with the generated superuser password. Replace <password> with the value from /root/cassandra-credentials.txt:
cqlsh-cloudimg -u cassandra -p '<password>'
You can also run a single statement without entering the interactive shell by adding -e. The screenshot below shows a complete session that reads the credentials file, opens cqlsh, creates a keyspace and a table, inserts two rows and reads them back.

To verify the connection non interactively from your SSH session, query the server version. Replace <password> with your generated password:
cqlsh-cloudimg -u cassandra -p '<password>' \
-e "SELECT cluster_name, release_version, cql_version FROM system.local;"
The query returns the cluster name, the Cassandra release and the CQL specification version:
cluster_name | release_version | cql_version
--------------------+-----------------+-------------
cloudimg-cassandra | 5.0.8 | 3.4.7
(1 rows)
Step 7: Create a Keyspace and a Table
A keyspace is the top level container for data in Cassandra and defines the replication strategy. The following statements, run inside cqlsh, create a keyspace, switch to it, create a table, insert a row and query it. They mirror the session in the screenshot above.
CREATE KEYSPACE store WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': 1
};
USE store;
CREATE TABLE products (
id uuid PRIMARY KEY,
name text,
price decimal
);
INSERT INTO products (id, name, price)
VALUES (uuid(), 'Mechanical Keyboard', 89.00);
SELECT name, price FROM products;
For a single node image the SimpleStrategy class with a replication_factor of 1 is appropriate. When you add nodes, raise the replication factor and consider NetworkTopologyStrategy so that copies of the data are spread across the cluster. Type EXIT to leave the cqlsh shell.
Step 8: Check Node Health with nodetool
nodetool is the cluster administration utility. nodetool status reports the state of every node. On this single node image one node is listed, and the UN prefix means it is Up and in a Normal state.
nodetool status

nodetool info reports detailed health for the local node, including the JVM heap in use, the uptime, the data load and whether gossip and the native transport are active.
nodetool info

nodetool describecluster summarises the cluster as a whole, listing the snitch, the partitioner and the keyspaces with their replication settings:
nodetool describecluster
Step 9: The Cassandra Data Volume
Cassandra data, the commit log and saved caches are stored under /var/lib/cassandra, which is a dedicated EBS volume separate from the operating system disk. Confirm the mount:
findmnt /var/lib/cassandra
The output shows /var/lib/cassandra is its own ext4 filesystem on a separate device:
TARGET SOURCE FSTYPE OPTIONS
/var/lib/cassandra /dev/nvme1n1 ext4 rw,relatime
Because the data directory is on its own volume you can take an Amazon EBS snapshot of it on its own schedule, and you can grow it independently of the root volume. Check the available space at any time with:
df -h /var/lib/cassandra
Step 10: Managing the Cassandra Service
Cassandra is managed through systemd. The service starts automatically on boot.
Check the service status:
systemctl status cassandra --no-pager
Stop, start and restart the service when needed:
sudo systemctl stop cassandra
sudo systemctl start cassandra
sudo systemctl restart cassandra
Before stopping a Cassandra node it is good practice to drain it, which flushes memtables to disk and stops accepting writes:
nodetool drain
The Cassandra log files are written under /var/log/cassandra. The main log is /var/log/cassandra/system.log; review it first when diagnosing a startup or runtime problem:
sudo tail -f /var/log/cassandra/system.log
Step 11: Backups
The data to back up on a Cassandra instance is the contents of the data directory. Cassandra creates a consistent, hard linked snapshot of every keyspace with nodetool snapshot:
nodetool snapshot
The snapshot files are written under /var/lib/cassandra/data/<keyspace>/<table>/snapshots/. Archive them to durable storage, for example an Amazon S3 bucket:
sudo tar -czf /var/backups/cassandra-snapshot-$(date +%F).tgz -C /var/lib/cassandra/data .
Because /var/lib/cassandra is a dedicated EBS volume, you can also take an EBS snapshot of the volume itself for a point in time copy of all Cassandra data. Remove a snapshot you no longer need with nodetool clearsnapshot.
Step 12: Adding Nodes for a Multi Node Cluster
This image runs Cassandra as a single node cluster, which suits development, testing and smaller production workloads. Cassandra is designed to scale horizontally by adding nodes. To build a multi node cluster:
- Launch additional instances from this same AMI, one per node, in the same VPC
- Allow inbound TCP port
7000between the nodes for internode communication, and port9042from your application tier - On every node, edit
/etc/cassandra/cassandra.yamland setlisten_addressandrpc_addressto the node's own private IP address, and set theseedsvalue underseed_providerto the private IP addresses of one or two seed nodes - Raise the
replication_factorof your keyspaces, and considerNetworkTopologyStrategy, so that data is replicated across the cluster - Restart Cassandra on each node and confirm with
nodetool statusthat every node reaches theUNstate
The Apache Cassandra documentation at https://cassandra.apache.org/doc/latest/ covers cluster topology, replication, repair and performance tuning in depth.
Support
cloudimg provides 24/7/365 expert technical support for this image. Guaranteed response within 24 hours, one hour average for critical issues. Contact support@cloudimg.co.uk.
For general Apache Cassandra administration questions consult the official documentation at https://cassandra.apache.org/doc/latest/.