voyage-code-3.5-lite Embedding Model
Optimized text embedding model for general retrieval, low latency, and cost-efficient AI use. Supports 32K context.
ExploreProduct Description
Overview:
Text embedding models convert text into numerical vectors and are a key component in semantic search, retrieval systems, and RAG models as they determine retrieval quality. voyage-3.5-lite is a lightweight general-purpose embedding model optimized for latency and cost, outperforming OpenAI-v3-large by an average of 6.34% across evaluated domains. Using Matryoshka learning and quantization-aware training, it supports smaller dimensions and int8/binary quantization, significantly lowering vector database costs with minimal impact on retrieval quality. It delivers 52 ms latency per query (up to 200 tokens) and processes 106M tokens per hour at $0.03 per 1M tokens on an ml.g6.xlarge. Learn more: https://blog.voyageai.com/2025/05/20/voyage-3-5/
Highlights:
Optimized for latency and cost, outperforming OpenAI-v3-large by 6.34%. Compared to OpenAI-v3-large (float, 3072), voyage-3.5-lite (int8, 2048) reduces vector database costs by 83% while delivering higher retrieval quality.
Supports embedding dimensions of 2048, 1024, 512, and 256 with multiple quantization options: float (32-bit), int8, uint8, binary, and ubinary.
Offers 32K token context length, ideal for long-document applications. Achieves 52 ms latency per query (≤200 tokens) and processes up to 106M tokens/hour at $0.03 per 1M tokens on an ml.g6.xlarge instance.