Developer Tools AWS

Apache Solr on AWS User Guide

Last updated: 2026-05-22 | Product: Apache Solr on AWS

Overview

Apache Solr is the open source enterprise search platform built on Apache Lucene. It delivers full text search, faceted navigation, hit highlighting, rich document indexing, vector kNN retrieval, and the modern Solr Streaming Expressions for analytics over indexed data. This image runs Solr in standalone mode as a dedicated systemd service, under an unprivileged solr service account, with the OpenJDK 17 runtime.

The Solr server and the Solr Admin UI both listen on TCP port 8983. Solr authentication is enabled out of the box: the Solr BasicAuthPlugin protects every REST endpoint and every API the Admin UI calls, and the RuleBasedAuthorizationPlugin grants the administrator role. A demonstration search core named cloudimg is pre created from Solr's bundled default configset, so the server is queryable as soon as you sign in.

The Solr administrator password is generated on the first boot of every deployed instance. Two instances launched from the same Amazon Machine Image never share a password. On first boot a one shot service generates a fresh password, writes it into the Solr security configuration as a salted hash, and records the plain value at /root/solr-credentials.txt with mode 0600 so that only the root user can read it.

The Solr data directory is held on a dedicated Amazon EBS volume mounted at /var/solr, separate from the operating system disk. Indexes, cores, transaction logs and Solr logs all land on that volume, which can be resized independently of the root volume.

Prerequisites

Before you deploy this image you need:

An Amazon Web Services account where you can launch EC2 instances
IAM permissions to launch instances, create security groups, and subscribe to AWS Marketplace products
An EC2 key pair in the target Region for SSH access to the instance
A VPC and subnet in the target Region, with a security group allowing inbound port 22 from your management network and inbound port 8983 from the networks that need the Solr Admin UI or the Solr REST API
The AWS CLI (version 2) installed locally if you plan to deploy from the command line

Step 1: Launch the Instance from the AWS Marketplace

Sign in to the AWS Management Console, open the EC2 service, and select Launch instance. Under Application and OS Images choose AWS Marketplace AMIs and search for Apache Solr. Select the cloudimg listing and choose Select, then Continue on the subscription summary.

Pick an instance type of m5.large or larger. Search workloads are memory bound, and the default JVM heap is 1024 MB; larger indexes need more memory. Choose your EC2 key pair under Key pair (login). Under Network settings select your VPC and subnet, and either create or select a security group that allows inbound port 22 from your management network and inbound port 8983 from the networks that need the Solr Admin UI or REST API. Leave the root volume at the default size or larger.

Select Launch instance. First boot initialisation takes a few seconds after the instance state becomes Running and the status checks pass.

Step 2: Launch the Instance from the AWS CLI

The following block launches an instance from the cloudimg Apache Solr Marketplace AMI into an existing subnet and security group. Replace <ami-id> with the AMI ID shown on the Marketplace listing, <key-name> with your EC2 key pair name, <subnet-id> with your subnet ID, and <security-group-id> with a security group that opens ports 22 and 8983 as described above.

aws ec2 run-instances \
  --image-id <ami-id> \
  --instance-type m5.large \
  --key-name <key-name> \
  --subnet-id <subnet-id> \
  --security-group-ids <security-group-id> \
  --block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":30,"VolumeType":"gp3"}}]' \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=apache-solr-01}]'

The command prints a JSON document on success. Note the instance ID, then retrieve its public address once it is running with aws ec2 describe-instances --instance-ids <instance-id> --query "Reservations[].Instances[].PublicIpAddress" --output text.

Step 3: Connect and Retrieve the Administrator Password

Connect over SSH with the key pair you selected and the public IP address from step 2. The SSH login user depends on the operating system of the AMI variant you launched:

AMI variant	SSH login user
Apache Solr 9 on Ubuntu 24.04	`ubuntu`

The first boot service runs before the SSH daemon becomes ready, so the credentials file is always in place when you log in for the first time.

ssh <login-user>@<public-ip>
sudo cat /root/solr-credentials.txt

You will see a plain text file containing the Solr Admin UI URL, the administrator username (cloudimg), and the administrator password. Copy these values somewhere secure such as a password manager or an encrypted vault. Do not commit them to source control.

Step 4: Confirm the Solr Service

From the same SSH session, confirm that Solr is running and report its version. solr-firstboot.service is a one shot unit, so it shows as inactive once it has completed; that is expected.

sudo systemctl is-active solr.service
/opt/solr/bin/solr --version

solr.service reports active, and the Solr CLI reports the installed version (Apache Solr 9.10.1 in this image).

Step 5: Verify the Authentication Wall

Solr authentication is enabled, so a request to a REST endpoint without credentials is rejected, and a request with the cloudimg credentials succeeds. The following block reads the password out of the credentials file and makes both calls.

PASS=$(sudo grep '^SOLR_ADMIN_PASSWORD=' /root/solr-credentials.txt | cut -d= -f2-)
curl -s -o /dev/null -w 'no-auth: HTTP %{http_code}\n' http://127.0.0.1:8983/solr/admin/info/system
curl -sf -u "cloudimg:${PASS}" 'http://127.0.0.1:8983/solr/admin/info/system?wt=json' | head -c 300

The first call returns no-auth: HTTP 401, which confirms the BasicAuthPlugin wall is in place. The second call returns the Solr system information as JSON, including the solr-spec-version, the JVM details, and the Solr home directory.

Step 6: Sign in to the Solr Admin UI

Open a web browser and navigate to http://<public-ip>:8983/solr/. Because Solr authentication is enabled, the Admin UI presents its Basic Authentication form. Enter the username cloudimg and the administrator password from /root/solr-credentials.txt, then select Login.

Once authenticated, the Dashboard shows the running Solr instance: system load and memory, the Solr and Lucene versions, the JVM arguments, and the security panel confirming that the BasicAuthPlugin and RuleBasedAuthorizationPlugin are active.

Apache Solr Admin UI dashboard, authenticated as the cloudimg administrator, showing the system, JVM and security panels

Step 7: The cloudimg Core Overview

Select the cloudimg core in the core selector dropdown on the left, then choose Overview. The overview lists the core statistics, the on disk instance, data and index directories under /var/solr/data/cloudimg, and the index segment and replication details.

The Solr Admin UI overview of the cloudimg core, showing document counts, the on disk instance paths and replication details

The left hand menu under the selected core gives you every per core tool: Analysis for testing how text is tokenised, Documents for indexing through the UI, Files for browsing the configset, Query for building searches, Schema for fields and field types, and Plugins / Stats for runtime metrics.

Step 8: Index Documents into a Core

Documents are added to a core through the update endpoint. The following block indexes three documents into the cloudimg core as JSON and commits them so they become searchable immediately.

PASS=$(sudo grep '^SOLR_ADMIN_PASSWORD=' /root/solr-credentials.txt | cut -d= -f2-)
curl -sf -u "cloudimg:${PASS}" -H 'Content-Type: application/json' \
  --data-binary '[
    {"id":"doc-1","title_t":"Getting started with Apache Solr","category_s":"Guide"},
    {"id":"doc-2","title_t":"Indexing and querying documents","category_s":"Guide"},
    {"id":"doc-3","title_t":"Faceted search and analytics","category_s":"Reference"}
  ]' \
  'http://127.0.0.1:8983/solr/cloudimg/update?commit=true'

A status of 0 in the response header confirms the documents were indexed and committed.

Step 9: Run Queries

Solr's query endpoint takes a query in the q parameter and returns matching documents. The following block runs a wildcard query that returns every document, then a targeted query on the title_t field.

PASS=$(sudo grep '^SOLR_ADMIN_PASSWORD=' /root/solr-credentials.txt | cut -d= -f2-)
curl -sf -u "cloudimg:${PASS}" 'http://127.0.0.1:8983/solr/cloudimg/select?q=*:*&rows=0&wt=json' | head -c 200
echo
curl -sf -u "cloudimg:${PASS}" 'http://127.0.0.1:8983/solr/cloudimg/select?q=title_t:querying&wt=json' | head -c 400

The first call reports numFound for the whole core; the second returns the documents whose title matches the search term.

The same queries can be built interactively in the Admin UI. Select the cloudimg core, choose Query, set the query parameters on the left, and select Execute Query. The response document appears on the right.

The Solr Admin UI Query tab running a search against the cloudimg core and returning matching documents as JSON

Step 10: Create a New Core

A core is a single index with its own schema and configuration. With Solr authentication enabled, create a core through the authenticated CoreAdmin API: first stage a configuration directory on disk as the solr service user by copying the bundled _default configset, then register the core with the API. The block first unloads any existing products core so it is safe to run more than once.

PASS=$(sudo grep '^SOLR_ADMIN_PASSWORD=' /root/solr-credentials.txt | cut -d= -f2-)
curl -s -o /dev/null -u "cloudimg:${PASS}" \
  'http://127.0.0.1:8983/solr/admin/cores?action=UNLOAD&core=products&deleteInstanceDir=true'
sudo -u solr cp -a /opt/solr/server/solr/configsets/_default /var/solr/data/products
curl -sf -u "cloudimg:${PASS}" \
  'http://127.0.0.1:8983/solr/admin/cores?action=CREATE&name=products&instanceDir=products&config=solrconfig.xml&schema=managed-schema.xml&wt=json'

A core of products in the response confirms the core was created, and it appears in the Admin UI core selector immediately. Replace the _default configset with your own schema and configuration for a production core, then reload the core. To remove a core again, call the same endpoint with action=UNLOAD&core=products&deleteInstanceDir=true.

Step 11: Add a Second Solr User

Solr's authentication and authorization are managed through the security API. The following block adds a second user and grants it the administrator role. Replace <new-user-password> with the password you choose for the new account.

PASS=$(sudo grep '^SOLR_ADMIN_PASSWORD=' /root/solr-credentials.txt | cut -d= -f2-)
curl -sf -u "cloudimg:${PASS}" -H 'Content-Type: application/json' \
  -d '{"set-user":{"searchadmin":"<new-user-password>"}}' \
  'http://127.0.0.1:8983/solr/admin/authentication'
curl -sf -u "cloudimg:${PASS}" -H 'Content-Type: application/json' \
  -d '{"set-user-role":{"searchadmin":["admin"]}}' \
  'http://127.0.0.1:8983/solr/admin/authorization'

The new user can sign in to the Admin UI and call the REST API straight away. Users and roles are also visible and editable under Security in the Admin UI.

Step 12: Tune the JVM Heap

The Solr JVM heap and other runtime settings are configured in /etc/default/solr.in.sh. The image ships with a 1024 MB heap, which suits development, testing and small indexes. For larger indexes raise SOLR_HEAP, then restart Solr.

sudo sed -i 's/^SOLR_HEAP=.*/SOLR_HEAP="4096m"/' /etc/default/solr.in.sh
sudo systemctl restart solr.service

As a rule of thumb give Solr a heap large enough for the working set of the index, but leave the majority of system memory free for the operating system page cache, which Lucene relies on for fast reads.

Step 13: Enable HTTPS

By default Solr serves plain HTTP on port 8983. For any production deployment, encrypt traffic to the Solr Admin UI and REST API. There are two common approaches.

The simplest is to terminate TLS at an Application Load Balancer in front of the instance: attach an AWS Certificate Manager certificate to the load balancer, forward to the instance on port 8983, and restrict the instance security group so that only the load balancer can reach 8983.

Alternatively, enable Solr's own SSL by generating a keystore, referencing it from /etc/default/solr.in.sh with the SOLR_SSL_* settings, and restarting the service. The full procedure is documented in the official Solr Reference Guide at https://solr.apache.org/guide/ under "Enabling SSL".

Whichever approach you choose, restrict the instance security group so that port 8983 is reachable only from the networks and load balancers that need it.

Step 14: Backups and Maintenance

Solr's replication handler takes a point in time backup of a core's index without downtime. Solr only writes backups to a path under its home directory unless solr.allowPaths is widened, and each snapshot needs a name that does not already exist, so the following block creates a backup directory under /var/solr/data and backs up the cloudimg core into a date stamped snapshot.

PASS=$(sudo grep '^SOLR_ADMIN_PASSWORD=' /root/solr-credentials.txt | cut -d= -f2-)
sudo -u solr install -d -m 0755 /var/solr/data/backups
SNAP="cloudimg-$(date +%Y%m%d-%H%M%S)"
curl -sf -u "cloudimg:${PASS}" \
  "http://127.0.0.1:8983/solr/cloudimg/replication?command=backup&location=/var/solr/data/backups&name=${SNAP}"

A status of OK confirms the backup. Periodically copy /var/solr/data/backups to an Amazon S3 bucket for off instance retention with aws s3 sync /var/solr/data/backups s3://your-bucket/solr-backups/. Because the whole Solr data directory lives on a dedicated EBS volume, you can also take EBS snapshots of that volume on a schedule with Amazon Data Lifecycle Manager.

For kernel and package updates, Ubuntu's unattended-upgrades is enabled by default, so operating system security patches apply automatically. Review the Solr logs under /var/solr/logs, and the systemd journal with sudo journalctl -u solr.service.

Step 15: Scaling Beyond a Single Instance

This image runs Solr in standalone mode, which suits single tenant production search, development and testing, and embedded application search. For larger or highly available deployments, move to SolrCloud:

Stand up an Apache ZooKeeper ensemble, or use a managed ZooKeeper, to coordinate the cluster
Launch several Solr instances from this AMI and point each at the ZooKeeper ensemble through /etc/default/solr.in.sh
Create collections that are sharded and replicated across the instances, so the index is distributed and queries are served with redundancy
Put the Solr instances behind an Application Load Balancer for a single search endpoint

SolrCloud, sharding, replication and collection management are documented in the official Solr Reference Guide at https://solr.apache.org/guide/ under "SolrCloud".

Support

cloudimg provides 24/7/365 expert technical support for this image. Guaranteed response within 24 hours, one hour average for critical issues. Contact support@cloudimg.co.uk.

For general Apache Solr questions consult the Solr Reference Guide at https://solr.apache.org/guide/ and the project documentation at https://solr.apache.org/.