voyage-multimodal-3.5 Embedding Model
Rich multimodal embedding model that can vectorize interleaved text, content-rich images, and video. 32K context length.
ExploreProduct Description
Overview
Multimodal embedding models are neural networks that transform multiple modalities, such as text and images, into numerical vectors. They are a crucial building block for semantic search/retrieval systems and retrieval-augmented generation (RAG) and are responsible for the retrieval quality.
voyage-multimodal-3.5 is a state-of-the-art multimodal embedding model capable of vectorizing not only text, images, and video individually, but also content that interleaves all three modalities. It delivers excellent performance for mixed-modality searches involving text and visual content such as PDF screenshots, figures, tables, videos, and more. Enabled by Matryoshka learning and quantization-aware training, voyage-multimodal-3.5 supports embeddings in 2048, 1024, 512, and 256 dimensions, with multiple quantization options.
Learn more about voyage-multimodal-3.5 here: https://blog.voyageai.com/2026/01/15/voyage-multimodal-3-5
Highlights
State-of-the-art multimodal embedding model capable of vectorizing not only text, images, and video individually, but also content that interleaves all three modalities. It delivers excellent performance for mixed-modality searches involving text and visual content such as PDF screenshots, figures, tables, videos, and more.
Supports embeddings of 2048, 1024, 512, and 256 dimensions and offers multiple embedding quantization, including float (32-bit floating point), int8 (8-bit signed integer), uint8 (8-bit unsigned integer), binary (bit-packed int8), and ubinary (bit-packed uint8).
32K token context length.