voyage-3.5 Embedding Model
General-purpose, multilingual text embedding model built for search, retrieval, and AI workloads with a 32K context window.
ExploreProduct Description
Overview
Text embedding models convert text into numerical vectors and are fundamental to semantic search, retrieval systems, and retrieval-augmented generation (RAG), directly influencing retrieval performance. voyage-3.5 is a state-of-the-art, general-purpose, multilingual embedding model that exceeds OpenAI-v3-large by an average of 8.26% across evaluated domains. Leveraging Matryoshka learning and quantization-aware training, it supports lower embedding dimensions and int8 or binary quantization, significantly cutting vector database costs with minimal impact on retrieval quality. The model delivers 62.5 ms latency for single queries (≤200 tokens) and achieves 40M tokens/hour throughput at $0.08 per 1M tokens on an ml.g6.xlarge instance.
Highlights
Designed for high-quality general-purpose and multilingual retrieval, outperforming OpenAI-v3-large by 8.26% on average; reduces vector database costs by up to 83% compared to OpenAI-v3-large (float, 3072) when using int8 (2048) with better retrieval quality.
Supports embedding sizes of 2048, 1024, 512, and 256 dimensions, with multiple quantization options: float, int8, uint8, binary, and ubinary.
Offers a 32K token context window, ideal for long documents, with 62.5 ms latency (≤200 tokens) and 40M tokens/hour throughput at $0.08 per 1M tokens on ml.g6.xlarge.