Login Login
Shopping Cart
Login to Request Offer
Please login to proceed with your request

voyage-multimodal-3.5 Embedding Model

Rich multimodal embedding model that can vectorize interleaved text, content-rich images, and video. 32K context length.

Explore
Product Description

Overview

Multimodal embedding models are neural networks that transform multiple modalities, such as text and images, into numerical vectors. They are a crucial building block for semantic search/retrieval systems and retrieval-augmented generation (RAG) and are responsible for the retrieval quality.

voyage-multimodal-3.5 is a state-of-the-art multimodal embedding model capable of vectorizing not only text, images, and video individually, but also content that interleaves all three modalities. It delivers excellent performance for mixed-modality searches involving text and visual content such as PDF screenshots, figures, tables, videos, and more. Enabled by Matryoshka learning and quantization-aware training, voyage-multimodal-3.5 supports embeddings in 2048, 1024, 512, and 256 dimensions, with multiple quantization options.

Learn more about voyage-multimodal-3.5 here: https://blog.voyageai.com/2026/01/15/voyage-multimodal-3-5 

Highlights

  • State-of-the-art multimodal embedding model capable of vectorizing not only text, images, and video individually, but also content that interleaves all three modalities. It delivers excellent performance for mixed-modality searches involving text and visual content such as PDF screenshots, figures, tables, videos, and more.

  • Supports embeddings of 2048, 1024, 512, and 256 dimensions and offers multiple embedding quantization, including float (32-bit floating point), int8 (8-bit signed integer), uint8 (8-bit unsigned integer), binary (bit-packed int8), and ubinary (bit-packed uint8).

  • 32K token context length.

Tell Us About Your Needs
Submit Request
GS Catalyst Assistant
Talk to GS Catalyst Assistant
×