Application Development Azure

LocalAI on Ubuntu 24.04 on Azure User Guide

| Product: LocalAI on Ubuntu 24.04 LTS on Azure

Overview

LocalAI is a free, open source, self hosted drop in replacement for the OpenAI API. It exposes the same OpenAI compatible REST API for chat completions, completions and embeddings, but runs entirely on your own infrastructure with no external API calls and no GPU required, using the llama.cpp GGUF backend for inference on commodity CPU. The cloudimg image installs LocalAI 4.4.3 as the single local-ai binary at /usr/local/bin/local-ai, runs it as a dedicated localai system user bound to loopback behind an nginx reverse proxy on TCP 80 with HTTP Basic auth, stores all models on a dedicated Azure data disk, pre pulls a small instruct model so the API and the web UI work the moment the VM boots, and generates a unique web password on the first boot of every VM. Backed by 24/7 cloudimg support.

What is included:

  • LocalAI 4.4.3 at /usr/local/bin/local-ai
  • The built in web UI and the OpenAI compatible REST API (/v1/chat/completions, /v1/embeddings, /v1/models and more), fronted by nginx on :80
  • nginx HTTP Basic auth (LocalAI has no built in authentication) with a per VM password in a root only file
  • A small CPU friendly instruct model (smollm2-135m-instruct) pre pulled so the chat API and the web UI work out of the box with no internet access
  • The CPU llama.cpp inference backend pre pulled, so inference runs offline with no GPU
  • A dedicated Azure data disk at /var/lib/localai holding the models directory, separate from the OS disk and re provisioned with every VM
  • localai.service + nginx.service as systemd units, enabled and active
  • 24/7 cloudimg support

Prerequisites

An active Azure subscription, an SSH key pair, and a VNet + subnet in the target region. Standard_B4ms (4 vCPU / 16 GiB RAM) is a good starting point for CPU inference; scale up for larger models or higher throughput. NSG inbound: allow 22/tcp from your management network and 80/tcp for the web UI and API (front with TLS for public exposure, see Enabling HTTPS).

Step 1 - Deploy from the Azure Marketplace

Sign in to the Azure Portal, choose Create a resource, search the Marketplace for LocalAI by cloudimg, and select Create. On Basics pick your subscription, resource group, region and size; under Administrator account choose SSH public key and paste your key; under Inbound port rules allow SSH (22) and HTTP (80). Review the dedicated data disk on the Disks tab, then Review + create then Create.

Step 2 - Deploy from the Azure CLI

az vm create \
  --resource-group <your-rg> \
  --name localai \
  --image <marketplace-image-urn> \
  --size Standard_B4ms \
  --admin-username azureuser \
  --ssh-key-values ~/.ssh/id_ed25519.pub \
  --vnet-name <your-vnet> --subnet <your-subnet> \
  --public-ip-sku Standard

az vm open-port --resource-group <your-rg> --name localai --port 80 --priority 1010

Step 3 - Connect to your VM

ssh azureuser@<vm-public-ip>

Step 4 - Confirm the services are running

systemctl is-active localai.service nginx.service

Both services report active. LocalAI loads the pre pulled model on first boot and is ready to serve the OpenAI compatible API immediately.

Step 5 - Retrieve your web password

The admin password is generated uniquely on the first boot of your VM and written to a root only file:

sudo cat /root/localai-credentials.txt

This file contains localai.user (admin) and localai.password, plus the URL for the web UI. Store the password somewhere safe.

Step 6 - Check the health endpoint

nginx serves an unauthenticated health endpoint for load balancers and probes:

curl -s http://localhost/health

It returns ok.

Step 7 - Open the web UI

Browse to http://<vm-public-ip>/ and sign in as admin with the password from Step 5. The home page shows system status, the pre pulled model loaded and ready, and a chat box. Use the left nav to reach Chat, Install Models, Studio and Talk.

LocalAI web UI home dashboard showing the loaded model and system status

Open Chat in the left nav, pick the smollm2-135m-instruct model, type a message and send it. The model runs on the VM CPU and streams its reply:

LocalAI chat playground showing a real CPU response from the pre pulled model

Step 8 - Call the OpenAI compatible API

The OpenAI compatible REST API is served on the same port 80 behind the same Basic auth. List the available models:

curl -s -u admin:<LOCALAI_ADMIN_PASSWORD> http://localhost/v1/models | jq .

You get a JSON response whose data array includes the pre pulled smollm2-135m-instruct model. A request without credentials returns 401.

Run a chat completion against the pre pulled model:

curl -s -u admin:<LOCALAI_ADMIN_PASSWORD> http://localhost/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"smollm2-135m-instruct","messages":[{"role":"user","content":"In one sentence, what is Azure?"}],"max_tokens":64,"temperature":0.2}' | jq -r '.choices[0].message.content'

The response is generated by the model running locally on the VM CPU. Point any OpenAI client library or tool at http://<vm-public-ip>/v1 with HTTP Basic credentials and change only the base URL to run existing applications against models on your own hardware.

Step 9 - Generate embeddings

LocalAI also serves the OpenAI compatible embeddings endpoint. Embedding models are installed from the gallery (see Step 10); once an embedding model such as bert-embeddings is installed, request a vector with:

curl -s -u admin:<LOCALAI_ADMIN_PASSWORD> http://localhost/v1/embeddings \
  -H 'Content-Type: application/json' \
  -d '{"model":"bert-embeddings","input":"The quick brown fox"}' | jq '.data[0].embedding | length'

The response carries the embedding vector under data[0].embedding.

Step 10 - Install more models from the gallery

The image ships one small model so the API works out of the box. LocalAI's gallery offers hundreds more. In the web UI, open Install Models in the left nav, search the gallery, and click install on any model:

LocalAI model gallery showing installable models with backend and size

You can also install a model from the command line with the model apply API:

curl -s -u admin:<LOCALAI_ADMIN_PASSWORD> http://localhost/models/apply \
  -H 'Content-Type: application/json' \
  -d '{"id":"localai@qwen2.5-0.5b-instruct"}'

Installed models are downloaded into the models directory on the dedicated data disk and appear in /v1/models once ready. Larger models need more RAM and CPU, so choose a VM size to match.

LocalAI Install Models gallery view with the available model catalogue

Step 11 - Confirm models live on the dedicated disk

All models are stored on the dedicated Azure data disk so they survive OS changes and can be resized independently:

findmnt /var/lib/localai

The mount is backed by a separate Azure data disk captured into the image and re provisioned on every VM. The pre pulled model and the CPU inference backend live under /var/lib/localai/models and /var/lib/localai/backends.

Enabling HTTPS

The nginx reverse proxy terminates plain HTTP on port 80. For public exposure, put a certificate in front of it. The simplest path is to add a DNS name for the VM and use the companion cloudimg nginx-ssl-certbot image as a TLS reverse proxy, or install certbot and extend the existing nginx site with a listen 443 ssl; server block and your certificate paths. Keep LocalAI itself bound to loopback so the only public surface is the authenticated, TLS terminated proxy.

Maintenance

  • Configuration: the runtime settings (models path, backends path, loopback address, context size, threads) are set in /etc/localai/localai.env. Edit it and sudo systemctl restart localai to apply changes.
  • Backups: snapshot the /var/lib/localai data disk to preserve your installed models.
  • Upgrades: replace /usr/local/bin/local-ai with a newer release and restart the service.
  • Security patches: unattended-upgrades remains enabled so the OS continues to receive security updates automatically.

Support

cloudimg provides 24/7 expert support for this image. Contact support@cloudimg.co.uk.

LocalAI is a trademark of its respective owner. This image is produced by cloudimg and is not affiliated with or endorsed by the LocalAI project. LocalAI is distributed under the MIT License.