Ollama preinstalled for AWS with NVIDIA GPU acceleration. The easiest way to run open large language models (Llama, Mistral, Gemma, Qwen, DeepSeek) on Ubuntu 24.04 behind an nginx reverse proxy, gated by a unique password generated on first boot. Backed by 24/7 cloudimg support.
## Ollama by cloudimg
Ollama is the easiest way to run open large language models locally. It downloads, quantizes and serves models such as Llama, Mistral, Gemma, Phi, Qwen and DeepSeek with a single command, exposing a REST API that is also OpenAI chat-completions compatible. This Amazon Machine Image delivers Ollama fully installed as a system service on an NVIDIA GPU instance, so a private, self-hosted LLM endpoint is running within minutes of launch. The release available is Ollama 0.30.
## GPU Accelerated
The NVIDIA datacenter driver is preinstalled and verified on real hardware, and Ollama auto-detects the GPU to offload model inference. Launch on a g4dn, g5 or g6 instance and your models run on the GPU out of the box.
## Secure First Boot
Ollama ships with no built-in authentication, so access is gated by HTTP Basic Authentication at an nginx reverse proxy, with a unique password generated for every instance on first boot and written to a root only file. No shared or default credentials ship in the image.
## Ready To Use
Pull a model, chat from the CLI, or call the REST and OpenAI-compatible endpoints from LangChain, LlamaIndex or any OpenAI SDK. A small starter model is pre-pulled and model weights live on a dedicated, resizable volume.
## cloudimg Support
cloudimg provides 24/7 technical support for this image, covering deployment, model selection, GPU sizing, quantization, the OpenAI-compatible API, TLS and scaling.