Weaviate on Ubuntu 24.04 on Azure User Guide
Overview
Weaviate is the open source vector database for AI - store data objects together with their vector embeddings and run fast semantic, keyword and hybrid search over them through GraphQL and REST APIs, making it a strong backend for retrieval-augmented generation (RAG), recommendation and similarity search. The cloudimg image installs Weaviate 1.38 from the official release binary at /opt/weaviate, run by an unprivileged weaviate system account, fronts it with an nginx reverse proxy on TCP 80, persists all data on a dedicated Azure data disk, and generates a unique API key on the first boot of every VM. Backed by 24/7 cloudimg support.
What is included:
- Weaviate 1.38 server binary at
/opt/weaviate, run by an unprivilegedweaviateaccount - nginx reverse proxy on
:80in front of the Weaviate server (bound to loopback:8080, REST + GraphQL) - A dedicated Azure data disk at
/var/lib/weaviateholding stored objects, vector indexes and metadata (PERSISTENCE_DATA_PATH) — separate from the OS disk and re-provisioned with every VM - Secure by default: anonymous access disabled, API-key auth enabled, with a per-VM key (
WEAVIATE_API_KEY) generated at first boot in a root-only file weaviate.service+nginx.serviceas systemd units, enabled and active- 24/7 cloudimg support
The image ships no embedding model and is CPU only — bring your own vectors or configure an external vectorizer.
Prerequisites
An active Azure subscription, an SSH key pair, and a VNet + subnet in the target region. Standard_B2s (2 vCPU / 4 GiB RAM) is a good starting point; scale up for larger collections or higher query volume. NSG inbound: allow 22/tcp from your management network and 80/tcp from the clients that query Weaviate (front port 80 with TLS for public exposure — see Enabling HTTPS).
Step 1 — Deploy from the Azure Marketplace
Sign in to the Azure Portal, choose Create a resource, search the Marketplace for Weaviate by cloudimg, and select Create. On Basics pick your subscription, resource group, region and size; under Administrator account choose SSH public key and paste your key; under Inbound port rules allow SSH (22) and HTTP (80). Review the dedicated data disk on the Disks tab, then Review + create → Create.
Step 2 — Deploy from the Azure CLI
az vm create \
--resource-group <your-rg> \
--name weaviate \
--image <marketplace-image-urn> \
--size Standard_B2s \
--admin-username azureuser \
--ssh-key-values ~/.ssh/id_ed25519.pub \
--vnet-name <your-vnet> --subnet <your-subnet> \
--public-ip-sku Standard
az vm open-port --resource-group <your-rg> --name weaviate --port 80 --priority 1010
Step 3 — Connect to your VM
ssh azureuser@<vm-public-ip>
Step 4 — Confirm the services are running
systemctl is-active weaviate.service nginx.service
ss -tln | grep -E ':80 |:8080 '
curl -s -o /dev/null -w '%{http_code}\n' http://127.0.0.1/v1/.well-known/ready
Both services report active, the server listens on loopback 127.0.0.1:8080 and nginx on :80, and the readiness endpoint returns 200.

Step 5 — Retrieve your API key
The API key is generated uniquely on the first boot of your VM and written to a root-only file:
sudo cat /root/weaviate-credentials.txt
The WEAVIATE_API_KEY value is the key; callers authenticate with the standard Authorization: Bearer <api-key> header.
Step 6 — Call the API
The readiness probe is open; data APIs require the key as a Bearer token. Confirm the server version:
KEY=$(sudo grep '^WEAVIATE_API_KEY=' /root/weaviate-credentials.txt | cut -d= -f2-)
curl -s -H "Authorization: Bearer $KEY" http://127.0.0.1/v1/meta | python3 -c "import sys,json; print('version:', json.load(sys.stdin)['version'])"
A request without the key is rejected with 401:
curl -s -o /dev/null -w '%{http_code}\n' http://127.0.0.1/v1/schema
Step 7 — Create a collection and add objects
Point the Weaviate client at the VM, or use REST. Create a collection (class), then add objects with your own vectors. Replace <vm-public-ip>:
curl http://<vm-public-ip>/v1/schema -H "Authorization: Bearer $KEY" \
-H 'Content-Type: application/json' \
-d '{"class":"Article","vectorizer":"none"}'
curl http://<vm-public-ip>/v1/objects -H "Authorization: Bearer $KEY" \
-H 'Content-Type: application/json' \
-d '{"class":"Article","properties":{"title":"Hello"},"vector":[0.1,0.2,0.3]}'
Query with GraphQL at /v1/graphql, including nearVector semantic search and hybrid search. Collections you create persist on the dedicated data disk at /var/lib/weaviate.
Production notes
- Configure a vectorizer/generative module and its provider key in
/etc/weaviate/weaviate.env, thensudo systemctl restart weaviate.service. - The gRPC API is available on port 50051 for high-throughput clients; open it in the NSG if needed.
Enabling HTTPS
For production, terminate TLS at nginx with a real domain pointed at the VM's public IP. Install certbot and request a certificate (replace the domain):
sudo apt-get update && sudo apt-get install -y certbot python3-certbot-nginx
sudo certbot --nginx -d your-domain.example.com
certbot edits the nginx site at /etc/nginx/sites-available/cloudimg-weaviate to add the TLS listener and arranges automatic renewal.
Backup and maintenance
All Weaviate data — stored objects, vector indexes and metadata — lives on the dedicated data disk at /var/lib/weaviate. Snapshot that disk in Azure to back up your vector store, or configure Weaviate's backup module. The API key is in /etc/weaviate/weaviate.env (AUTHENTICATION_APIKEY_ALLOWED_KEYS). Keep the OS patched with sudo apt update && sudo apt upgrade. Restart with sudo systemctl restart weaviate.service; logs: sudo journalctl -u weaviate.service.
Support
This image is backed by 24/7 cloudimg support. Contact us by email and chat for help with schema and collection design, vectorizer and module configuration, backups, TLS termination and scaling.
All product and company names are trademarks or registered trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.