Infinity Embeddings Server | by cloudimg

Name: Infinity Embeddings Server | by cloudimg
Brand: cloudimg
Availability: InStock

Machine Learning Free Trial Available

Overview

Infinity preinstalled for AWS with NVIDIA GPU acceleration. The high-throughput server for text embeddings and reranking with an OpenAI-compatible API, on Ubuntu 24.04 behind an nginx reverse proxy, secured by a unique password generated on first boot. Backed by 24/7 cloudimg support.

Description

## Infinity Embeddings Server by cloudimg

Infinity is a high-throughput, low-latency server for serving text embedding and reranking models. It serves models such as BGE, GTE, E5 and Sentence Transformers with dynamic batching and an OpenAI-compatible embeddings API, so existing OpenAI SDK code works unchanged. This Amazon Machine Image delivers Infinity fully installed as a system service on an NVIDIA GPU instance, so a private, self-hosted embeddings endpoint is running within minutes of launch. The release available is Infinity 0.0.77.

## GPU Accelerated

The NVIDIA datacenter driver is preinstalled and verified on real hardware, and Infinity runs on the GPU for fast batched embedding generation out of the box. Launch on a g4dn, g5 or g6 instance.

## Secure First Boot

Access is gated by HTTP Basic Authentication at an nginx reverse proxy, with a unique password generated for every instance on first boot and written to a root only file. No shared or default credentials ship in the image.

## Ready To Use

Generate embeddings from the OpenAI SDK or the native API and feed them into a vector database such as Weaviate or Chroma for retrieval augmented generation. A small open-weights model is pre-downloaded; serve a different embedding or reranking model by editing the service environment file.

## cloudimg Support

cloudimg provides 24/7 technical support for this image, covering deployment, model selection, GPU sizing, throughput tuning, the OpenAI-compatible API, TLS and scaling.

Key Features

Infinity, the high-throughput GPU server for text embeddings and reranking with an OpenAI-compatible API, preinstalled as a systemd service behind an nginx reverse proxy on Ubuntu 24.04
GPU accelerated on NVIDIA g4dn/g5/g6: driver preinstalled and verified, embeddings generated on the GPU out of the box
Secure by default: HTTP Basic Authentication with a unique password generated for every instance on first boot, plus 24/7 cloudimg support

Related Technologies

infinity embeddings reranking rag openai compatible sentence transformers gpu ai cloudimg

Deploy on AWS

Launch this pre-configured AMI on AWS with 24/7 support from cloudimg.

View on AWS Marketplace

24/7 Support Included

Email: support@cloudimg.co.uk

Phone: (+44) 0333 006 4730

Product Details

Category: Machine Learning
Support: 24/7, 365 days/year
Platform: AWS (Amazon Web Services)
Last Updated: 2026-06-09

Infinity Embeddings Server | by cloudimg

Overview

Description

Key Features

Related Technologies

Deploy on AWS

Product Details

Related Products

Android Studio AMI

Cyberduck AMI

Docker CE on Windows AMI

Docker Community Edition AMI

ELK Stack AMI

GitLab AMI