Apache Cassandra User Guide
Overview
This guide covers the deployment and configuration of Apache Cassandra on Linux using cloudimg AMIs from the AWS Marketplace. Apache Cassandra is a highly scalable, distributed NoSQL database designed for handling large volumes of data across multiple servers with no single point of failure.
What's included in this AMI:
- Apache Cassandra 4.0.4 with CQL native transport on port 9042
- Java runtime environment
- Dedicated data volume at /var/lib/cassandra
- Systemd service for automatic startup on boot
- OS package update script for keeping the system current
- AWS CLI v2 for AWS service integration
- Systems Manager Agent (SSM) for remote management
- CloudWatch Agent for monitoring
- Latest security patches applied at build time
- 24/7 cloudimg support with guaranteed 24 hour response SLA
Prerequisites
Before launching this AMI, ensure you have:
- An active AWS account
- An active subscription to the Apache Cassandra listing on AWS Marketplace
- An EC2 key pair for SSH access
- Familiarity with EC2 instance management and SSH
Recommended Instance Type: m5.large (2 vCPU, 8 GB RAM) or larger. The minimum requirements are 1 vCPU, 1 GB RAM, and 20 GB disk space, but Cassandra benefits significantly from additional memory and CPU for production workloads.
Step 1: Launch the AMI
- Navigate to the AWS Marketplace and search for "Apache Cassandra cloudimg"
- Click Continue to Subscribe, accept the terms, then Continue to Configuration
- Select your preferred Region and Software Version
- Click Continue to Launch
- Choose Launch through EC2 for full control over instance configuration
- Select your instance type (
m5.largerecommended) - Configure storage: 20 GB gp3 minimum for root, consider larger volumes for production data
- Configure your Security Group with the following inbound rules:
| Port | Protocol | Source | Purpose |
|---|---|---|---|
| 22 | TCP | Your IP | SSH access |
| 7199 | TCP | Your IP | JMX monitoring port |
| 7000 | TCP | Internal | Internode communication |
| 9042 | TCP | Your IP | CQL native transport port |
Important: Ports 7000 and 7199 should only be accessible within your VPC for cluster communication and monitoring. Port 9042 should be restricted to trusted application servers or your IP.
- Select your EC2 key pair and launch the instance
Step 2: Connect via SSH
Once your instance is running and has passed both status checks (2/2), connect using SSH:
ssh -i your-key.pem ec2-user@<public-ip-address>
To switch to the root user:
sudo su -
Important: Wait for the EC2 instance to reach 2/2 successful status checks before connecting. Early connection attempts may produce permission denied errors.
Step 3: Verify Cassandra is Running
Check the Cassandra node status:
nodetool status
Expected output:
Datacenter: datacenter1
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 69.05 KiB 16 100.0% bb9d3091-c1e0-4ed2-92b4-2897099aba7d rack1
The UN status indicates the node is Up and in a Normal state.
Step 4: Connect with CQL Shell
Launch the CQL interactive shell:
cqlsh
Expected output:
Connected to Test Cluster at 127.0.0.1:9042
[cqlsh 6.0.0 | Cassandra 4.0.4 | CQL spec 3.4.5 | Native protocol v5]
Use HELP for help.
cqlsh>
Step 5: Create a Keyspace and Table
Create a keyspace:
CREATE KEYSPACE my_app WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': 1
};
Use the keyspace:
USE my_app;
Create a table:
CREATE TABLE users (
user_id UUID PRIMARY KEY,
name text,
email text,
created_at timestamp
);
Insert data:
INSERT INTO users (user_id, name, email, created_at)
VALUES (uuid(), 'John Doe', 'john@example.com', toTimestamp(now()));
Query data:
SELECT * FROM users;
Type EXIT to leave the CQL shell.
Server Components
| Component | Install Path |
|---|---|
| Apache Cassandra | /etc/cassandra |
| Java | /bin/java |
Note: Component versions may be updated on first boot by the automatic OS package update script.
Filesystem Layout
| Mount Point | Size | Description |
|---|---|---|
| / | 38 GB | Root filesystem |
| /boot | 2 GB | Operating system kernel files |
| /var/lib/cassandra | 9.8 GB | Apache Cassandra data directory |
Key Cassandra directories:
| Directory | Purpose |
|---|---|
| /etc/cassandra | Configuration files |
| /etc/cassandra/cassandra.yaml | Main configuration file |
| /var/lib/cassandra | Data, commitlog, and saved caches |
| /var/lib/cassandra/data | SSTable data files |
| /var/lib/cassandra/commitlog | Write ahead log |
| /var/log/cassandra | Cassandra log files |
Managing the Cassandra Service
Cassandra is managed via systemd and starts automatically on boot.
Check service status:
systemctl status cassandra
Stop Cassandra:
systemctl stop cassandra
Start Cassandra:
systemctl start cassandra
Restart Cassandra:
systemctl restart cassandra
Check node status with nodetool:
nodetool status
nodetool info
nodetool describecluster
Scripts and Log Files
| Script/Log | Path | Description |
|---|---|---|
| initial_boot_update.sh | /stage/scripts | Updates the OS with the latest packages on first boot |
| initial_boot_update.log | /stage/scripts | Output log for the boot update script |
On Startup
An OS package update script runs on first boot to ensure the image is fully up to date. You can disable this by removing the script and its crontab entry:
rm -f /stage/scripts/initial_boot_update.sh
crontab -e
# Delete the following line, save and exit:
@reboot /stage/scripts/initial_boot_update.sh
Troubleshooting
Cassandra fails to start
- Check service status:
systemctl status cassandra - Review logs:
tail -f /var/log/cassandra/system.log - Verify disk space:
df -h /var/lib/cassandra - Check JVM heap settings in
/etc/cassandra/cassandra-env.sh
nodetool status shows DN (Down/Normal)
- Restart Cassandra:
systemctl restart cassandra - Check for errors in
/var/log/cassandra/system.log - Verify JMX port 7199 is not blocked
CQL connection refused on port 9042
- Verify Cassandra is running:
systemctl status cassandra - Check that
native_transport_portis set to 9042 incassandra.yaml - Verify the security group allows port 9042
High memory usage
- Cassandra uses significant heap memory by design
- Adjust heap size in
/etc/cassandra/cassandra-env.sh - Consider upgrading to a memory optimized instance type (r5, r6i)
Security Recommendations
- Restrict port access: Only allow CQL port 9042 from trusted application servers
- Enable authentication: Configure
authenticator: PasswordAuthenticatorin cassandra.yaml - Enable authorization: Configure
authorizer: CassandraAuthorizerin cassandra.yaml - Encrypt client connections: Enable SSL/TLS for client to node communication
- Encrypt cluster communication: Enable internode encryption for port 7000
- Change default credentials: If authentication is enabled, create new users and remove the default
cassandrauser - Keep Cassandra updated: Apply security patches when available
- Monitor with JMX: Use nodetool and JMX for monitoring cluster health
Support
If you encounter any issues with this product, contact cloudimg support:
- Email: support@cloudimg.co.uk
- Website: www.cloudimg.co.uk
- Support hours: 24/7 with guaranteed 24 hour response SLA