Databases Azure

Weaviate on Ubuntu 24.04 on Azure User Guide

| Product: Weaviate on Ubuntu 24.04 LTS on Azure

Overview

Weaviate is the open source vector database for AI - store data objects together with their vector embeddings and run fast semantic, keyword and hybrid search over them through GraphQL and REST APIs, making it a strong backend for retrieval-augmented generation (RAG), recommendation and similarity search. The cloudimg image installs Weaviate 1.38 from the official release binary at /opt/weaviate, run by an unprivileged weaviate system account, fronts it with an nginx reverse proxy on TCP 80, persists all data on a dedicated Azure data disk, and generates a unique API key on the first boot of every VM. Backed by 24/7 cloudimg support.

What is included:

  • Weaviate 1.38 server binary at /opt/weaviate, run by an unprivileged weaviate account
  • nginx reverse proxy on :80 in front of the Weaviate server (bound to loopback :8080, REST + GraphQL)
  • A dedicated Azure data disk at /var/lib/weaviate holding stored objects, vector indexes and metadata (PERSISTENCE_DATA_PATH) — separate from the OS disk and re-provisioned with every VM
  • Secure by default: anonymous access disabled, API-key auth enabled, with a per-VM key (WEAVIATE_API_KEY) generated at first boot in a root-only file
  • weaviate.service + nginx.service as systemd units, enabled and active
  • 24/7 cloudimg support

The image ships no embedding model and is CPU only — bring your own vectors or configure an external vectorizer.

Prerequisites

An active Azure subscription, an SSH key pair, and a VNet + subnet in the target region. Standard_B2s (2 vCPU / 4 GiB RAM) is a good starting point; scale up for larger collections or higher query volume. NSG inbound: allow 22/tcp from your management network and 80/tcp from the clients that query Weaviate (front port 80 with TLS for public exposure — see Enabling HTTPS).

Step 1 — Deploy from the Azure Marketplace

Sign in to the Azure Portal, choose Create a resource, search the Marketplace for Weaviate by cloudimg, and select Create. On Basics pick your subscription, resource group, region and size; under Administrator account choose SSH public key and paste your key; under Inbound port rules allow SSH (22) and HTTP (80). Review the dedicated data disk on the Disks tab, then Review + createCreate.

Step 2 — Deploy from the Azure CLI

az vm create \
  --resource-group <your-rg> \
  --name weaviate \
  --image <marketplace-image-urn> \
  --size Standard_B2s \
  --admin-username azureuser \
  --ssh-key-values ~/.ssh/id_ed25519.pub \
  --vnet-name <your-vnet> --subnet <your-subnet> \
  --public-ip-sku Standard

az vm open-port --resource-group <your-rg> --name weaviate --port 80 --priority 1010

Step 3 — Connect to your VM

ssh azureuser@<vm-public-ip>

Step 4 — Confirm the services are running

systemctl is-active weaviate.service nginx.service
ss -tln | grep -E ':80 |:8080 '
curl -s -o /dev/null -w '%{http_code}\n' http://127.0.0.1/v1/.well-known/ready

Both services report active, the server listens on loopback 127.0.0.1:8080 and nginx on :80, and the readiness endpoint returns 200.

Weaviate running on the cloudimg Azure image - services active, readiness 200, API-key auth enforced

Step 5 — Retrieve your API key

The API key is generated uniquely on the first boot of your VM and written to a root-only file:

sudo cat /root/weaviate-credentials.txt

The WEAVIATE_API_KEY value is the key; callers authenticate with the standard Authorization: Bearer <api-key> header.

Step 6 — Call the API

The readiness probe is open; data APIs require the key as a Bearer token. Confirm the server version:

KEY=$(sudo grep '^WEAVIATE_API_KEY=' /root/weaviate-credentials.txt | cut -d= -f2-)
curl -s -H "Authorization: Bearer $KEY" http://127.0.0.1/v1/meta | python3 -c "import sys,json; print('version:', json.load(sys.stdin)['version'])"

A request without the key is rejected with 401:

curl -s -o /dev/null -w '%{http_code}\n' http://127.0.0.1/v1/schema

Step 7 — Create a collection and add objects

Point the Weaviate client at the VM, or use REST. Create a collection (class), then add objects with your own vectors. Replace <vm-public-ip>:

curl http://<vm-public-ip>/v1/schema -H "Authorization: Bearer $KEY" \
  -H 'Content-Type: application/json' \
  -d '{"class":"Article","vectorizer":"none"}'

curl http://<vm-public-ip>/v1/objects -H "Authorization: Bearer $KEY" \
  -H 'Content-Type: application/json' \
  -d '{"class":"Article","properties":{"title":"Hello"},"vector":[0.1,0.2,0.3]}'

Query with GraphQL at /v1/graphql, including nearVector semantic search and hybrid search. Collections you create persist on the dedicated data disk at /var/lib/weaviate.

Production notes

  • Configure a vectorizer/generative module and its provider key in /etc/weaviate/weaviate.env, then sudo systemctl restart weaviate.service.
  • The gRPC API is available on port 50051 for high-throughput clients; open it in the NSG if needed.

Enabling HTTPS

For production, terminate TLS at nginx with a real domain pointed at the VM's public IP. Install certbot and request a certificate (replace the domain):

sudo apt-get update && sudo apt-get install -y certbot python3-certbot-nginx
sudo certbot --nginx -d your-domain.example.com

certbot edits the nginx site at /etc/nginx/sites-available/cloudimg-weaviate to add the TLS listener and arranges automatic renewal.

Backup and maintenance

All Weaviate data — stored objects, vector indexes and metadata — lives on the dedicated data disk at /var/lib/weaviate. Snapshot that disk in Azure to back up your vector store, or configure Weaviate's backup module. The API key is in /etc/weaviate/weaviate.env (AUTHENTICATION_APIKEY_ALLOWED_KEYS). Keep the OS patched with sudo apt update && sudo apt upgrade. Restart with sudo systemctl restart weaviate.service; logs: sudo journalctl -u weaviate.service.

Support

This image is backed by 24/7 cloudimg support. Contact us by email and chat for help with schema and collection design, vectorizer and module configuration, backups, TLS termination and scaling.

All product and company names are trademarks or registered trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.