Databases

DuckDB User Guide

Last updated: 2026-03-25 | Product: DuckDB

Overview

This guide covers the deployment and use of DuckDB on Linux using cloudimg AMIs from the AWS Marketplace. DuckDB is a fast, in process analytical database management system designed for running complex analytical queries on large datasets. It is optimized for Online Analytical Processing (OLAP) workloads and can directly query CSV, Parquet, and JSON files.

What's included in this AMI:

DuckDB executable at /apps/duckdb
Ready for analytical workloads and data processing
OS package update script for keeping the system current
AWS CLI v2 for AWS service integration
Systems Manager Agent (SSM) for remote management
CloudWatch Agent for monitoring
Latest security patches applied at build time
24/7 cloudimg support with guaranteed 24 hour response SLA

Prerequisites

Before launching this AMI, ensure you have:

An active AWS account
An active subscription to the DuckDB listing on AWS Marketplace
An EC2 key pair for SSH access
Familiarity with EC2 instance management and SSH

Recommended Instance Type: t3.large (2 vCPU, 8 GB RAM) or larger for analytical workloads. The minimum requirements are 1 vCPU, 1 GB RAM, and 20 GB disk space. DuckDB performance scales well with additional memory and CPU.

Step 1: Launch the AMI

Navigate to the AWS Marketplace and search for "DuckDB cloudimg"
Click Continue to Subscribe, accept the terms, then Continue to Configuration
Select your preferred Region and Software Version
Click Continue to Launch
Choose Launch through EC2 for full control over instance configuration
Select your instance type (t3.large recommended)
Configure storage: 20 GB gp3 minimum, increase for large datasets
Configure your Security Group with the following inbound rules:

Port	Protocol	Source	Purpose
22	TCP	Your IP	SSH access

Select your EC2 key pair and launch the instance

Step 2: Connect via SSH

Once your instance is running and has passed both status checks (2/2), connect using SSH:

ssh -i your-key.pem ec2-user@<public-ip-address>

To switch to the root user:

sudo su -

Step 3: Launch DuckDB

The DuckDB executable is located at /apps/duckdb/duckdb. Launch the interactive CLI:

/apps/duckdb/duckdb

You will see the DuckDB interactive prompt where you can run SQL queries directly.

Create an in memory database (default):

SELECT 'Hello from DuckDB on AWS!' AS message;

Create a persistent database:

/apps/duckdb/duckdb /apps/duckdb/mydata.db

Working with DuckDB

Create a table and insert data:

CREATE TABLE sales (
    id INTEGER,
    product VARCHAR,
    amount DECIMAL(10,2),
    sale_date DATE
);

INSERT INTO sales VALUES
    (1, 'Widget A', 29.99, '2024-01-15'),
    (2, 'Widget B', 49.99, '2024-01-16'),
    (3, 'Widget A', 29.99, '2024-01-17');

SELECT product, SUM(amount) as total_revenue, COUNT(*) as num_sales
FROM sales
GROUP BY product
ORDER BY total_revenue DESC;

Query CSV files directly:

SELECT * FROM read_csv_auto('/path/to/data.csv');

Query Parquet files:

SELECT * FROM read_parquet('/path/to/data.parquet');

Query JSON files:

SELECT * FROM read_json_auto('/path/to/data.json');

Query data directly from S3:

SELECT * FROM read_parquet('s3://your-bucket/data.parquet');

Export query results to CSV:

COPY (SELECT * FROM sales) TO '/tmp/sales_export.csv' (HEADER, DELIMITER ',');

Export to Parquet:

COPY (SELECT * FROM sales) TO '/tmp/sales_export.parquet' (FORMAT PARQUET);

Server Components

Component	Install Path
DuckDB	/apps/duckdb

Note: Component versions may be updated on first boot by the automatic OS package update script.

Filesystem Layout

Mount Point	Size	Description
/	38 GB	Root filesystem
/boot	2 GB	Operating system kernel files
/apps	9.8 GB	DuckDB installation directory

Key directories:

Directory	Purpose
/apps/duckdb	DuckDB installation and executable
/apps/duckdb/duckdb	DuckDB binary executable

Scripts and Log Files

Script/Log	Path	Description
initial_boot_update.sh	/stage/scripts	Updates the OS with the latest packages on first boot
initial_boot_update.log	/stage/scripts	Output log for the boot update script

On Startup

An OS package update script runs on first boot to ensure the image is fully up to date. You can disable this by removing the script and its crontab entry:

rm -f /stage/scripts/initial_boot_update.sh

crontab -e
# Delete the following line, save and exit:
@reboot /stage/scripts/initial_boot_update.sh

Troubleshooting

DuckDB command not found

Use the full path: /apps/duckdb/duckdb
Or add it to your PATH: export PATH=$PATH:/apps/duckdb
Verify the binary exists: ls -la /apps/duckdb/duckdb

Out of memory when querying large datasets

DuckDB processes data in memory; upgrade to a larger instance type
Use PRAGMA memory_limit='4GB'; to set explicit memory limits
Process data in smaller batches using LIMIT and OFFSET

Cannot read S3 files

Configure AWS credentials: ensure the instance has an IAM role with S3 read access
Or set credentials in DuckDB: sql SET s3_region='us-east-1'; SET s3_access_key_id='your-key'; SET s3_secret_access_key='your-secret';

Permission denied on database file

Check file ownership and permissions
Ensure the user has write access to the directory
Try creating the database in a different location: /tmp/test.db

Security Recommendations

Restrict SSH access: Only allow port 22 from trusted IP addresses
Use IAM roles for S3 access: Avoid storing AWS credentials on the instance
Protect database files: Set appropriate file permissions on persistent database files
Keep DuckDB updated: Check for new versions with security fixes
Enable Windows Update: Keep the OS patched with the latest security updates
Back up data: Regularly back up database files to S3

Support

If you encounter any issues with this product, contact cloudimg support:

Email: support@cloudimg.co.uk
Website: www.cloudimg.co.uk
Support hours: 24/7 with guaranteed 24 hour response SLA