Databases AWS

Apache Cassandra on AWS User Guide

| Product: Apache Cassandra on AWS

Overview

This image runs Apache Cassandra, the open source distributed NoSQL database built for scalability and high availability with no single point of failure. Cassandra is the only workload on the image, so the platform stays lean, predictable and easy to reason about. The current stable line of Apache Cassandra is provided, version 5.0, installed from the official Apache Cassandra package repository.

The image ships with password authentication and authorization switched on. The Cassandra configuration sets the PasswordAuthenticator, the CassandraAuthorizer and the CassandraRoleManager, so access to the database is controlled from the moment the node starts. On the first boot of your instance a one shot service generates a fresh, strong password for the cassandra superuser, unique to that instance, applies it to the database and writes it to /root/cassandra-credentials.txt, a file that only the root user can read. No shared or default database credentials ship in the image.

Cassandra data, the commit log and saved caches live on a dedicated storage volume mounted at /var/lib/cassandra. Keeping database files on their own volume means storage can be grown, snapshotted and backed up independently of the operating system disk. The node runs as a single node cluster by default; the final section of this guide explains how to add further nodes.

This is a headless image. Cassandra has no web interface; you administer it over SSH with the cqlsh CQL shell and the nodetool cluster utility, both covered below.

Prerequisites

Before you deploy this image you need:

  • An Amazon Web Services account where you can launch EC2 instances
  • IAM permissions to launch instances, create security groups, and subscribe to AWS Marketplace products
  • An EC2 key pair in the target Region for SSH access to the instance
  • A VPC and subnet in the target Region, with a security group allowing inbound port 22 from your management network
  • The AWS CLI (version 2) installed locally if you plan to deploy from the command line

Recommended instance type: m5.large (2 vCPU, 8 GB RAM) or larger. Cassandra sizes its JVM heap from available memory and benefits from additional CPU and RAM for production workloads.

Step 1: Launch the Instance from the AWS Marketplace

Sign in to the AWS Management Console, open the EC2 service, and select Launch instance. Under Application and OS Images choose AWS Marketplace AMIs and search for Apache Cassandra. Select the cloudimg listing and choose Select, then Continue on the subscription summary.

Pick an instance type of m5.large or larger. Choose your EC2 key pair under Key pair (login). Under Network settings select your VPC and subnet, and either create or select a security group that allows inbound port 22 from your management network. Leave the root volume at the default size or larger; the Cassandra data volume is attached automatically from the image.

Select Launch instance. First boot initialisation, which generates the superuser password and starts Cassandra, takes a minute or two after the instance state becomes Running and the status checks pass.

Step 2: Launch the Instance from the AWS CLI

The following block launches an instance from the cloudimg Apache Cassandra Marketplace AMI into an existing subnet and security group. Replace <ami-id> with the AMI ID shown on the Marketplace listing, <key-name> with your EC2 key pair name, <subnet-id> with your subnet ID, and <security-group-id> with a security group that opens inbound port 22.

aws ec2 run-instances \
  --image-id <ami-id> \
  --instance-type m5.large \
  --key-name <key-name> \
  --subnet-id <subnet-id> \
  --security-group-ids <security-group-id> \
  --metadata-options HttpTokens=required \
  --block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":30,"VolumeType":"gp3"}}]' \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=cassandra-01}]'

The command prints a JSON document on success. Note the instance ID, then retrieve its public address once it is running with aws ec2 describe-instances --instance-ids <instance-id> --query "Reservations[].Instances[].PublicIpAddress" --output text.

Step 3: Connect over SSH

Connect over SSH with the key pair you selected and the public IP address from step 2. The SSH login user depends on the operating system of the AMI variant you launched:

AMI variant SSH login user
Apache Cassandra 5.0 on Ubuntu 24.04 ubuntu
ssh <login-user>@<public-ip>

Wait until the instance has passed both EC2 status checks before connecting. The first boot service runs before the SSH daemon is ready, so Cassandra is initialised by the time you can log in.

Step 4: Retrieve the Generated Superuser Password

The first boot service generates a fresh cassandra superuser password for this instance and writes it, with the connection details, to /root/cassandra-credentials.txt. The file is readable only by the root user. Display it from your SSH session:

sudo cat /root/cassandra-credentials.txt

The file looks like this, with a unique password on your instance:

# Apache Cassandra 5.0 — Per-Instance Credentials
# Generated on first boot: Thu May 21 14:05:56 UTC 2026
#
# Open the CQL shell with:
#   cqlsh-cloudimg -u cassandra -p '<password below>'
#
CASSANDRA_USER=cassandra
CASSANDRA_PASSWORD=<your generated password>
CQL_HOST=127.0.0.1
CQL_PORT=9042
CLUSTER_NAME=cloudimg-cassandra

The default cassandra / cassandra account that ships with Apache Cassandra is rotated away during first boot, so it no longer works. Use the generated password for every connection.

Step 5: Confirm the Service and the Listener

Cassandra runs under systemd as the cassandra service and starts automatically on boot. Confirm it is active:

systemctl is-active cassandra

The command prints active. Confirm the CQL native transport is listening. Cassandra binds to the loopback address 127.0.0.1 on port 9042 by default, so it is reachable from the instance itself:

ss -tln | grep 9042

You should see a listening socket on 127.0.0.1:9042.

Step 6: Connect with the cqlsh CQL Shell

The image provides cqlsh-cloudimg, a wrapper that launches the Cassandra CQL shell against the local node. Open an interactive session with the generated superuser password. Replace <password> with the value from /root/cassandra-credentials.txt:

cqlsh-cloudimg -u cassandra -p '<password>'

You can also run a single statement without entering the interactive shell by adding -e. The screenshot below shows a complete session that reads the credentials file, opens cqlsh, creates a keyspace and a table, inserts two rows and reads them back.

A cqlsh session creating a keyspace and table, inserting rows and reading them back

To verify the connection non interactively from your SSH session, query the server version. Replace <password> with your generated password:

cqlsh-cloudimg -u cassandra -p '<password>' \
  -e "SELECT cluster_name, release_version, cql_version FROM system.local;"

The query returns the cluster name, the Cassandra release and the CQL specification version:

 cluster_name       | release_version | cql_version
--------------------+-----------------+-------------
 cloudimg-cassandra |           5.0.8 |       3.4.7

(1 rows)

Step 7: Create a Keyspace and a Table

A keyspace is the top level container for data in Cassandra and defines the replication strategy. The following statements, run inside cqlsh, create a keyspace, switch to it, create a table, insert a row and query it. They mirror the session in the screenshot above.

CREATE KEYSPACE store WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': 1
};

USE store;

CREATE TABLE products (
  id uuid PRIMARY KEY,
  name text,
  price decimal
);

INSERT INTO products (id, name, price)
  VALUES (uuid(), 'Mechanical Keyboard', 89.00);

SELECT name, price FROM products;

For a single node image the SimpleStrategy class with a replication_factor of 1 is appropriate. When you add nodes, raise the replication factor and consider NetworkTopologyStrategy so that copies of the data are spread across the cluster. Type EXIT to leave the cqlsh shell.

Step 8: Check Node Health with nodetool

nodetool is the cluster administration utility. nodetool status reports the state of every node. On this single node image one node is listed, and the UN prefix means it is Up and in a Normal state.

nodetool status

nodetool status showing the node Up and in a Normal state

nodetool info reports detailed health for the local node, including the JVM heap in use, the uptime, the data load and whether gossip and the native transport are active.

nodetool info

nodetool info reporting node health, heap memory and uptime

nodetool describecluster summarises the cluster as a whole, listing the snitch, the partitioner and the keyspaces with their replication settings:

nodetool describecluster

Step 9: The Cassandra Data Volume

Cassandra data, the commit log and saved caches are stored under /var/lib/cassandra, which is a dedicated EBS volume separate from the operating system disk. Confirm the mount:

findmnt /var/lib/cassandra

The output shows /var/lib/cassandra is its own ext4 filesystem on a separate device:

TARGET             SOURCE       FSTYPE OPTIONS
/var/lib/cassandra /dev/nvme1n1 ext4   rw,relatime

Because the data directory is on its own volume you can take an Amazon EBS snapshot of it on its own schedule, and you can grow it independently of the root volume. Check the available space at any time with:

df -h /var/lib/cassandra

Step 10: Managing the Cassandra Service

Cassandra is managed through systemd. The service starts automatically on boot.

Check the service status:

systemctl status cassandra --no-pager

Stop, start and restart the service when needed:

sudo systemctl stop cassandra
sudo systemctl start cassandra
sudo systemctl restart cassandra

Before stopping a Cassandra node it is good practice to drain it, which flushes memtables to disk and stops accepting writes:

nodetool drain

The Cassandra log files are written under /var/log/cassandra. The main log is /var/log/cassandra/system.log; review it first when diagnosing a startup or runtime problem:

sudo tail -f /var/log/cassandra/system.log

Step 11: Backups

The data to back up on a Cassandra instance is the contents of the data directory. Cassandra creates a consistent, hard linked snapshot of every keyspace with nodetool snapshot:

nodetool snapshot

The snapshot files are written under /var/lib/cassandra/data/<keyspace>/<table>/snapshots/. Archive them to durable storage, for example an Amazon S3 bucket:

sudo tar -czf /var/backups/cassandra-snapshot-$(date +%F).tgz -C /var/lib/cassandra/data .

Because /var/lib/cassandra is a dedicated EBS volume, you can also take an EBS snapshot of the volume itself for a point in time copy of all Cassandra data. Remove a snapshot you no longer need with nodetool clearsnapshot.

Step 12: Adding Nodes for a Multi Node Cluster

This image runs Cassandra as a single node cluster, which suits development, testing and smaller production workloads. Cassandra is designed to scale horizontally by adding nodes. To build a multi node cluster:

  • Launch additional instances from this same AMI, one per node, in the same VPC
  • Allow inbound TCP port 7000 between the nodes for internode communication, and port 9042 from your application tier
  • On every node, edit /etc/cassandra/cassandra.yaml and set listen_address and rpc_address to the node's own private IP address, and set the seeds value under seed_provider to the private IP addresses of one or two seed nodes
  • Raise the replication_factor of your keyspaces, and consider NetworkTopologyStrategy, so that data is replicated across the cluster
  • Restart Cassandra on each node and confirm with nodetool status that every node reaches the UN state

The Apache Cassandra documentation at https://cassandra.apache.org/doc/latest/ covers cluster topology, replication, repair and performance tuning in depth.

Support

cloudimg provides 24/7/365 expert technical support for this image. Guaranteed response within 24 hours, one hour average for critical issues. Contact support@cloudimg.co.uk.

For general Apache Cassandra administration questions consult the official documentation at https://cassandra.apache.org/doc/latest/.