vLLM | Support by cloudimg

Machine Learning Free Trial Available

Overview

vLLM preinstalled for AWS with NVIDIA GPU acceleration. The high-throughput OpenAI-compatible LLM inference and serving engine, on Ubuntu 24.04 behind an nginx reverse proxy, secured by a unique API key generated on first boot. Backed by 24/7 cloudimg support.

Description

## vLLM by cloudimg

vLLM is a high-throughput, memory-efficient inference and serving engine for large language models. Its PagedAttention scheduler delivers state-of-the-art serving throughput and it exposes an OpenAI-compatible REST API, so existing OpenAI SDK code works unchanged. This Amazon Machine Image delivers vLLM fully installed as a system service on an NVIDIA Ampere GPU instance, so a private, self-hosted LLM inference endpoint is running within minutes of launch. The release available is vLLM 0.22.

## GPU Accelerated

The NVIDIA datacenter driver and CUDA toolkit are preinstalled and verified on real hardware, and vLLM serves the model on the GPU out of the box. Launch on a g5 (A10G) or g6 (L4) Ampere+ instance.

## Secure First Boot

vLLM's native API-key authentication is enabled, with a unique key generated for every instance on first boot and written to a root only file. No shared or default key ships in the image.

## Ready To Use

Call the OpenAI-compatible endpoints from any OpenAI SDK, LangChain or LlamaIndex. A small open-weights model is pre-downloaded; serve a different model by editing the service environment file.

## cloudimg Support

cloudimg provides 24/7 technical support for this image, covering deployment, model selection, GPU sizing, throughput tuning, the OpenAI-compatible API, TLS and scaling.

Key Features

  • vLLM, the high-throughput OpenAI-compatible LLM inference and serving engine (PagedAttention), preinstalled as a systemd service behind an nginx reverse proxy on Ubuntu 24.04
  • GPU accelerated on NVIDIA Ampere+ (g5/g6): driver and CUDA toolkit preinstalled and verified, model served on the GPU out of the box
  • Secure by default: native API-key authentication with a unique key generated for every instance on first boot, plus 24/7 cloudimg support

Related Technologies

vllm llm inference openai compatible pagedattention serving gpu ai cloudimg

Deploy on AWS

Launch this pre-configured AMI on AWS with 24/7 support from cloudimg.

View on AWS Marketplace

24/7 Support Included

Email: support@cloudimg.co.uk

Phone: (+44) 0333 006 4730

Product Details

Category
Machine Learning
Support
24/7, 365 days/year
Platform
AWS (Amazon Web Services)
Last Updated
2026-06-09