Applications Azure

Apache Jena Fuseki on Ubuntu 24.04 on Azure User Guide

| Product: Apache Jena Fuseki 5.6.0 on Ubuntu 24.04 LTS on Azure

Overview

Apache Jena Fuseki is the SPARQL server from the Apache Jena project — a standards-compliant RDF triple store that serves the SPARQL 1.1 Query and Update protocols and the SPARQL Graph Store HTTP Protocol over an embedded Jetty web server, with a bundled browser UI for running queries, loading data, and managing datasets. The cloudimg image installs Fuseki 5.6.0 from the Apache archive with SHA-512 verification, alongside OpenJDK 17. A TDB2 persistent dataset named /ds is pre-configured and its database files live on a dedicated Azure data disk mounted at /var/lib/fuseki, so your RDF data survives reboots and is independently resizable. An nginx reverse proxy fronts Fuseki on port 80; the admin UI and dataset-management endpoints are protected by Apache Shiro with a per-VM admin password that is generated and written to a root-only file at first boot.

What is included:

  • Apache Jena Fuseki 5.6.0 at /opt/fuseki (FUSEKI_HOME), bundled SPARQL UI
  • OpenJDK 17 JRE headless from Ubuntu 24.04 noble main
  • A TDB2 persistent dataset /ds, database files on a dedicated 30 GiB Azure data disk mounted at /var/lib/fuseki (FUSEKI_BASE)
  • fuseki.service running as the unprivileged fuseki service user, bound to 127.0.0.1:3030
  • An nginx reverse proxy on port 80 with an unauthenticated /health endpoint
  • Apache Shiro admin auth (shiro.ini); the /$/ping server-liveness endpoint stays anonymous
  • A per-VM admin password generated at first boot, plain copy in /root/fuseki-credentials.txt (mode 0600)
  • 24/7 cloudimg support

Prerequisites

Active Azure subscription, SSH key, VNet + subnet. Standard_B2ms (8 GB RAM, 1024m heap default) is suitable for dev, test, and single-tenant production graphs. For larger graphs raise the VM size and tune JVM_ARGS in /etc/fuseki.env. NSG inbound: allow 22/tcp from your management CIDR and 80/tcp from any client CIDR that needs the Fuseki UI or SPARQL endpoints. Fuseki serves plain HTTP on port 80 — front it with TLS (for example an Azure Application Gateway, Front Door, or a reverse proxy with a certificate) and your own domain before production.

Step 1-3: Deploy + SSH (standard pattern)

Deploy the VM from the Azure Marketplace offer, then connect over SSH as azureuser:

ssh azureuser@<vm-ip>

Step 4: Service Status + Health

Confirm the services are active and the unauthenticated health and ping endpoints respond:

sudo systemctl is-active fuseki.service nginx.service
curl -s -o /dev/null -w 'health: HTTP %{http_code}\n' http://127.0.0.1/health
printf 'ping:   '; curl -s 'http://127.0.0.1/$/ping'; echo

fuseki.service and nginx.service report active, /health returns HTTP 200 (the nginx-served liveness check), and /$/ping returns a timestamp (Fuseki's own anonymous liveness endpoint).

fuseki.service and nginx.service active; nginx /health returns HTTP 200; Fuseki /$/ping returns a timestamp; nginx listens on :80 and Fuseki on 127.0.0.1:3030

Step 5: Read the Per-VM Admin Credentials

The admin password is unique to this VM and is written to a root-only file at first boot:

sudo cat /root/fuseki-credentials.txt

This prints FUSEKI_ADMIN_USER (admin), FUSEKI_ADMIN_PASSWORD (the per-VM password), and the FUSEKI_URL. Keep this password safe — it unlocks the admin UI and dataset management.

Step 6: Admin Auth Round-Trip

The server-management endpoints under /$/ are protected by Apache Shiro. Read the password into a shell variable first, then prove the auth wall. Run these from your SSH session (replace <FUSEKI_ADMIN_PASSWORD> with the value from Step 5 if you copy a single line):

  • Without auth or with a wrong password, the datasets endpoint is rejected: curl -s -o /dev/null -w '%{http_code}\n' -u admin:wrong 'http://127.0.0.1/$/datasets' returns 401.
  • With the per-VM password it returns the datasets JSON: curl -s -u admin:<FUSEKI_ADMIN_PASSWORD> 'http://127.0.0.1/$/datasets' returns 200 and lists the TDB2-backed /ds dataset.

The per-VM admin password file, GET /$/datasets rejected with HTTP 401 for a wrong password and accepted with HTTP 200 for the per-VM password, and the /ds dataset listed

Step 7: Version, Data Disk, and a SPARQL Round-Trip

Confirm the Fuseki version, that the /ds TDB2 dataset lives on the dedicated Azure data disk, and run an anonymous SPARQL SELECT that counts the triples in /ds:

df -h /var/lib/fuseki | awk 'NR==1 || /fuseki/'
sudo ls -1 /var/lib/fuseki/databases/ds/ | head
curl -s -G 'http://127.0.0.1/ds/query' \
  --data-urlencode 'query=SELECT (COUNT(*) AS ?n) WHERE { ?s ?p ?o }' \
  -H 'Accept: application/sparql-results+json'

The data disk is mounted at /var/lib/fuseki, the TDB2 database files (Data-0001, tdb.lock) back the /ds dataset, and the SPARQL query returns the current triple count.

Fuseki 5.6.0 version, the /dev/sdc data disk mounted at /var/lib/fuseki, the TDB2 database files for /ds, and a SPARQL COUNT query result

Step 8: Fuseki UI — Dashboard and Datasets

Open http://<vm-ip>/ in a browser. The Fuseki dashboard lists your datasets. The bundled /ds dataset is shown with query, add data, edit, and info actions. When prompted, sign in as admin with the per-VM password from Step 5.

The Apache Jena Fuseki dashboard showing version 5.6.0 and the /ds dataset with query, add data, edit, and info actions

Step 9: Run a SPARQL Query in the UI

Click query on the /ds dataset to open the SPARQL query editor (YASGUI). Paste a SELECT and run it with the play button. The example below lists every resource that has an rdfs:label:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?s ?label WHERE { ?s rdfs:label ?label } ORDER BY ?label

The results table renders below the editor with the matching rows and the query timing.

The Fuseki SPARQL query editor running a SELECT against /ds and returning the matching label rows in the results table

Step 10: Load Data via the UI

Click add data on the /ds dataset to open the upload page. Choose one or more RDF files (Turtle, RDF/XML, TriG, N-Triples, JSON-LD), optionally name a target graph, and click upload all to load them into the dataset. You can also load data over HTTP with the SPARQL Graph Store Protocol at http://<vm-ip>/ds/data.

The Fuseki add-data page for /ds with the select files and upload all controls and the upload status table

Step 11: Server Info and Statistics

Click info on the /ds dataset to see the available services (the SPARQL Query, Update, and Graph Store Protocol endpoints), per-endpoint request statistics, and the dataset size (use count triples in all graphs to tally the graphs).

The Fuseki dataset info page for /ds showing available services, per-endpoint request statistics, and the dataset size controls

Step 12: Load Triples over SPARQL Update

You can insert data programmatically with SPARQL Update against http://<vm-ip>/ds/update. The update endpoint is part of the dataset service. Send an INSERT DATA request with Content-Type: application/sparql-update, then read it back with a SELECT against http://<vm-ip>/ds/query. For example, posting INSERT DATA { <http://example.org/a> <http://www.w3.org/2000/01/rdf-schema#label> "Example" } adds one triple, and a SELECT * WHERE { ?s ?p ?o } returns it.

Step 13: Configure a New Dataset

Create additional datasets from the dashboard's manage tab, then new dataset — choose a name and a dataset type (Persistent (TDB2) for durable storage on the data disk, or In-memory for scratch). Persistent datasets are written under /var/lib/fuseki/databases/. You can also drop a .ttl assembler file into /var/lib/fuseki/configuration/ and restart the service:

sudo systemctl restart fuseki.service

Step 14: Tune the JVM Heap

Fuseki's heap is set in /etc/fuseki.env (JVM_ARGS=-Xms512m -Xmx1024m by default). For larger graphs increase the maximum heap and restart:

sudo sed -i 's/-Xmx1024m/-Xmx2048m/' /etc/fuseki.env
sudo systemctl restart fuseki.service

Step 15: Logs and Troubleshooting

Fuseki logs to the systemd journal. Tail the service logs and check the listening socket:

sudo journalctl -u fuseki.service --no-pager | tail -30
sudo ss -tln | grep -E ':80 |:3030 '

The Shiro auth configuration is at /var/lib/fuseki/shiro.ini; the dataset assembler files are in /var/lib/fuseki/configuration/. If a query fails with a 500, check the journal for the underlying SPARQL parse or TDB2 error.

Security

The admin UI and all /$/ server-management endpoints require the per-VM admin password (Apache Shiro, authcBasic). The /$/ping liveness endpoint and the nginx /health endpoint are intentionally anonymous so load balancers can probe them. The read query path on /ds is open by default for convenience — if your deployment requires authenticated reads, tighten the [urls] section of /var/lib/fuseki/shiro.ini to require admin on /ds/** and restart the service. Fuseki serves plain HTTP on port 80; terminate TLS upstream (Application Gateway, Front Door, or a TLS-terminating reverse proxy) and restrict the NSG to known client CIDRs before exposing it to the internet. The per-VM password is rotated automatically on the first boot of every VM, so no two deployments share a credential.

Support

cloudimg provides 24/7 support for this image. Email support@cloudimg.co.uk. The $0.04 per vCPU hour cloudimg charge covers packaging, security patching, image maintenance, and expert support. Apache Jena Fuseki is licensed under the Apache License 2.0 and is free; the cloudimg charge is for the managed, patched, support-backed image.