voyage-3.5-lite Embedding Model
Text embedding model optimized for general-purpose retrieval quality, latency, and cost for AI applications. 32K context length.
ExploreProduct Description
Overview
Text embedding models are neural networks that transform texts into numerical vectors. They are a crucial building block for semantic search/retrieval systems and retrieval-augmented generation (RAG) and are responsible for the retrieval quality. voyage-3-lite is a lightweight general-purpose embedding model optimized for latency and cost that outperforms OpenAI-v3-large by 6.34% on average across evaluated domains. Enabled by Matryoshka learning and quantization-aware training, voyage-3.5-lite supports smaller dimensions and int8 and binary quantization that dramatically reduce vectorDB costs with minimal impact on retrieval quality. Latency is 52 ms for a single query with at most 200 tokens, and throughput is 106M tokens per hour at $0.03 per 1M tokens on an ml.g6.xlarge. Learn more about voyage-3.5-lite here: https://blog.voyageai.com/2025/05/20/voyage-3-5/
Highlights
Optimized for latency and cost, outperforming OpenAI v3 large by 6.34%. Compared with OpenAI-v3-large (float, 3072), voyage-3.5-lite (int8, 2048) reduces vector database costs by 83%, while achieving higher retrieval quality.
Supports embeddings of 2048, 1024, 512, and 256 dimensions and offers multiple embedding quantization, including float (32-bit floating point), int8 (8-bit signed integer), uint8 (8-bit unsigned integer), binary (bit-packed int8), and ubinary (bit-packed uint8).
32K token context length, well-suited for applications on long documents. Latency is 52 ms for a single query with at most 200 tokens. 106M tokens per hour at $0.03 per 1M tokens on an ml.g6.xlarge.