Databases

Apache Cassandra User Guide

| Product: Apache Cassandra

Overview

This guide covers the deployment and configuration of Apache Cassandra on Linux using cloudimg AMIs from the AWS Marketplace. Apache Cassandra is a highly scalable, distributed NoSQL database designed for handling large volumes of data across multiple servers with no single point of failure.

What's included in this AMI:

  • Apache Cassandra 4.0.4 with CQL native transport on port 9042
  • Java runtime environment
  • Dedicated data volume at /var/lib/cassandra
  • Systemd service for automatic startup on boot
  • OS package update script for keeping the system current
  • AWS CLI v2 for AWS service integration
  • Systems Manager Agent (SSM) for remote management
  • CloudWatch Agent for monitoring
  • Latest security patches applied at build time
  • 24/7 cloudimg support with guaranteed 24 hour response SLA

Prerequisites

Before launching this AMI, ensure you have:

  1. An active AWS account
  2. An active subscription to the Apache Cassandra listing on AWS Marketplace
  3. An EC2 key pair for SSH access
  4. Familiarity with EC2 instance management and SSH

Recommended Instance Type: m5.large (2 vCPU, 8 GB RAM) or larger. The minimum requirements are 1 vCPU, 1 GB RAM, and 20 GB disk space, but Cassandra benefits significantly from additional memory and CPU for production workloads.

Step 1: Launch the AMI

  1. Navigate to the AWS Marketplace and search for "Apache Cassandra cloudimg"
  2. Click Continue to Subscribe, accept the terms, then Continue to Configuration
  3. Select your preferred Region and Software Version
  4. Click Continue to Launch
  5. Choose Launch through EC2 for full control over instance configuration
  6. Select your instance type (m5.large recommended)
  7. Configure storage: 20 GB gp3 minimum for root, consider larger volumes for production data
  8. Configure your Security Group with the following inbound rules:
Port Protocol Source Purpose
22 TCP Your IP SSH access
7199 TCP Your IP JMX monitoring port
7000 TCP Internal Internode communication
9042 TCP Your IP CQL native transport port

Important: Ports 7000 and 7199 should only be accessible within your VPC for cluster communication and monitoring. Port 9042 should be restricted to trusted application servers or your IP.

  1. Select your EC2 key pair and launch the instance

Step 2: Connect via SSH

Once your instance is running and has passed both status checks (2/2), connect using SSH:

ssh -i your-key.pem ec2-user@<public-ip-address>

To switch to the root user:

sudo su -

Important: Wait for the EC2 instance to reach 2/2 successful status checks before connecting. Early connection attempts may produce permission denied errors.

Step 3: Verify Cassandra is Running

Check the Cassandra node status:

nodetool status

Expected output:

Datacenter: datacenter1
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
UN  127.0.0.1  69.05 KiB  16      100.0%            bb9d3091-c1e0-4ed2-92b4-2897099aba7d  rack1

The UN status indicates the node is Up and in a Normal state.

Step 4: Connect with CQL Shell

Launch the CQL interactive shell:

cqlsh

Expected output:

Connected to Test Cluster at 127.0.0.1:9042
[cqlsh 6.0.0 | Cassandra 4.0.4 | CQL spec 3.4.5 | Native protocol v5]
Use HELP for help.
cqlsh>

Step 5: Create a Keyspace and Table

Create a keyspace:

CREATE KEYSPACE my_app WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': 1
};

Use the keyspace:

USE my_app;

Create a table:

CREATE TABLE users (
  user_id UUID PRIMARY KEY,
  name text,
  email text,
  created_at timestamp
);

Insert data:

INSERT INTO users (user_id, name, email, created_at)
VALUES (uuid(), 'John Doe', 'john@example.com', toTimestamp(now()));

Query data:

SELECT * FROM users;

Type EXIT to leave the CQL shell.

Server Components

Component Install Path
Apache Cassandra /etc/cassandra
Java /bin/java

Note: Component versions may be updated on first boot by the automatic OS package update script.

Filesystem Layout

Mount Point Size Description
/ 38 GB Root filesystem
/boot 2 GB Operating system kernel files
/var/lib/cassandra 9.8 GB Apache Cassandra data directory

Key Cassandra directories:

Directory Purpose
/etc/cassandra Configuration files
/etc/cassandra/cassandra.yaml Main configuration file
/var/lib/cassandra Data, commitlog, and saved caches
/var/lib/cassandra/data SSTable data files
/var/lib/cassandra/commitlog Write ahead log
/var/log/cassandra Cassandra log files

Managing the Cassandra Service

Cassandra is managed via systemd and starts automatically on boot.

Check service status:

systemctl status cassandra

Stop Cassandra:

systemctl stop cassandra

Start Cassandra:

systemctl start cassandra

Restart Cassandra:

systemctl restart cassandra

Check node status with nodetool:

nodetool status
nodetool info
nodetool describecluster

Scripts and Log Files

Script/Log Path Description
initial_boot_update.sh /stage/scripts Updates the OS with the latest packages on first boot
initial_boot_update.log /stage/scripts Output log for the boot update script

On Startup

An OS package update script runs on first boot to ensure the image is fully up to date. You can disable this by removing the script and its crontab entry:

rm -f /stage/scripts/initial_boot_update.sh

crontab -e
# Delete the following line, save and exit:
@reboot /stage/scripts/initial_boot_update.sh

Troubleshooting

Cassandra fails to start

  1. Check service status: systemctl status cassandra
  2. Review logs: tail -f /var/log/cassandra/system.log
  3. Verify disk space: df -h /var/lib/cassandra
  4. Check JVM heap settings in /etc/cassandra/cassandra-env.sh

nodetool status shows DN (Down/Normal)

  1. Restart Cassandra: systemctl restart cassandra
  2. Check for errors in /var/log/cassandra/system.log
  3. Verify JMX port 7199 is not blocked

CQL connection refused on port 9042

  1. Verify Cassandra is running: systemctl status cassandra
  2. Check that native_transport_port is set to 9042 in cassandra.yaml
  3. Verify the security group allows port 9042

High memory usage

  1. Cassandra uses significant heap memory by design
  2. Adjust heap size in /etc/cassandra/cassandra-env.sh
  3. Consider upgrading to a memory optimized instance type (r5, r6i)

Security Recommendations

  • Restrict port access: Only allow CQL port 9042 from trusted application servers
  • Enable authentication: Configure authenticator: PasswordAuthenticator in cassandra.yaml
  • Enable authorization: Configure authorizer: CassandraAuthorizer in cassandra.yaml
  • Encrypt client connections: Enable SSL/TLS for client to node communication
  • Encrypt cluster communication: Enable internode encryption for port 7000
  • Change default credentials: If authentication is enabled, create new users and remove the default cassandra user
  • Keep Cassandra updated: Apply security patches when available
  • Monitor with JMX: Use nodetool and JMX for monitoring cluster health

Support

If you encounter any issues with this product, contact cloudimg support:

  • Email: support@cloudimg.co.uk
  • Website: www.cloudimg.co.uk
  • Support hours: 24/7 with guaranteed 24 hour response SLA