Infinity Embeddings Server | by cloudimg

Machine Learning Free Trial Available

Overview

Infinity preinstalled for AWS with NVIDIA GPU acceleration. The high-throughput server for text embeddings and reranking with an OpenAI-compatible API, on Ubuntu 24.04 behind an nginx reverse proxy, secured by a unique password generated on first boot. Backed by 24/7 cloudimg support.

Description

## Infinity Embeddings Server by cloudimg

Infinity is a high-throughput, low-latency server for serving text embedding and reranking models. It serves models such as BGE, GTE, E5 and Sentence Transformers with dynamic batching and an OpenAI-compatible embeddings API, so existing OpenAI SDK code works unchanged. This Amazon Machine Image delivers Infinity fully installed as a system service on an NVIDIA GPU instance, so a private, self-hosted embeddings endpoint is running within minutes of launch. The release available is Infinity 0.0.77.

## GPU Accelerated

The NVIDIA datacenter driver is preinstalled and verified on real hardware, and Infinity runs on the GPU for fast batched embedding generation out of the box. Launch on a g4dn, g5 or g6 instance.

## Secure First Boot

Access is gated by HTTP Basic Authentication at an nginx reverse proxy, with a unique password generated for every instance on first boot and written to a root only file. No shared or default credentials ship in the image.

## Ready To Use

Generate embeddings from the OpenAI SDK or the native API and feed them into a vector database such as Weaviate or Chroma for retrieval augmented generation. A small open-weights model is pre-downloaded; serve a different embedding or reranking model by editing the service environment file.

## cloudimg Support

cloudimg provides 24/7 technical support for this image, covering deployment, model selection, GPU sizing, throughput tuning, the OpenAI-compatible API, TLS and scaling.

Key Features

  • Infinity, the high-throughput GPU server for text embeddings and reranking with an OpenAI-compatible API, preinstalled as a systemd service behind an nginx reverse proxy on Ubuntu 24.04
  • GPU accelerated on NVIDIA g4dn/g5/g6: driver preinstalled and verified, embeddings generated on the GPU out of the box
  • Secure by default: HTTP Basic Authentication with a unique password generated for every instance on first boot, plus 24/7 cloudimg support

Related Technologies

infinity embeddings reranking rag openai compatible sentence transformers gpu ai cloudimg

Deploy on AWS

Launch this pre-configured AMI on AWS with 24/7 support from cloudimg.

View on AWS Marketplace

24/7 Support Included

Email: support@cloudimg.co.uk

Phone: (+44) 0333 006 4730

Product Details

Category
Machine Learning
Support
24/7, 365 days/year
Platform
AWS (Amazon Web Services)
Last Updated
2026-06-09