Infinity preinstalled for AWS with NVIDIA GPU acceleration. The high-throughput server for text embeddings and reranking with an OpenAI-compatible API, on Ubuntu 24.04 behind an nginx reverse proxy, secured by a unique password generated on first boot. Backed by 24/7 cloudimg support.
## Infinity Embeddings Server by cloudimg
Infinity is a high-throughput, low-latency server for serving text embedding and reranking models. It serves models such as BGE, GTE, E5 and Sentence Transformers with dynamic batching and an OpenAI-compatible embeddings API, so existing OpenAI SDK code works unchanged. This Amazon Machine Image delivers Infinity fully installed as a system service on an NVIDIA GPU instance, so a private, self-hosted embeddings endpoint is running within minutes of launch. The release available is Infinity 0.0.77.
## GPU Accelerated
The NVIDIA datacenter driver is preinstalled and verified on real hardware, and Infinity runs on the GPU for fast batched embedding generation out of the box. Launch on a g4dn, g5 or g6 instance.
## Secure First Boot
Access is gated by HTTP Basic Authentication at an nginx reverse proxy, with a unique password generated for every instance on first boot and written to a root only file. No shared or default credentials ship in the image.
## Ready To Use
Generate embeddings from the OpenAI SDK or the native API and feed them into a vector database such as Weaviate or Chroma for retrieval augmented generation. A small open-weights model is pre-downloaded; serve a different embedding or reranking model by editing the service environment file.
## cloudimg Support
cloudimg provides 24/7 technical support for this image, covering deployment, model selection, GPU sizing, throughput tuning, the OpenAI-compatible API, TLS and scaling.