A practical guide to the best vector embedding models for RAG, semantic search, and AI applications in 2025 — including Voyage-3-large, OpenAI text-embedding-3, Stella, ModernBERT Embed, and Gemini Embedding 2. Compare MTEB scores, dimensions, and costs.
Vector embeddings are the backbone of modern AI applications — powering RAG pipelines, semantic search, and agent memory. In 2025, the landscape has shifted dramatically: larger dimensions, Matryoshka flexibility, and multimodal capabilities are now table stakes. Here's what you need to know.
The current gold standard for retrieval quality. Voyage-3-large leads the MTEB retrieval leaderboard with an average score of 64.9, outperforming every other model on relevance metrics1. It outputs 2,048 dimensions and costs $0.12 per million tokens — premium pricing, but justified when accuracy is critical.
Voyage-3-lite delivers results "very nearly as good as NVIDIA llama and OpenAI v3-large" in only 512 output dimensions at a fraction of the cost1. Perfect for high-throughput pipelines where every millisecond counts.
OpenAI text-embedding-3 remains the most widely deployed embedding model, offering 1,536 dimensions with Matryoshka representation learning — meaning you can truncate dimensions at inference time without retraining.
Stella is the top-performing model on the MTEB retrieval leaderboard that allows commercial use3. It's fully open-source and can be self-hosted via Ollama or custom inference stacks.
ModernBERT Embed is the newest entrant — a BERT-class model optimized for modern hardware (Flash Attention, rotary embeddings) that punches well above its weight class on retrieval benchmarks.
> Infrastructure note: If you're deploying open-source embedding models like Stella or ModernBERT in production, LibertAI provides a decentralized, OpenAI-compatible inference API — giving you the flexibility of self-hosting without managing your own GPU cluster.
Google's breakthrough: a single model that embeds text, images, video, audio, and PDFs into one shared 3,072-dim vector space2. This is the first production-ready multimodal embedding, enabling cross-modal search (e.g., "find images that match this paragraph").
| Model | Dimensions | Cost (per 1M tokens) | MTEB Retrieval Score | Best For |
|---|---|---|---|---|
| Voyage-3-large | 2,048 | $0.12 | 64.9 | Maximum accuracy |
| Voyage-3-lite | 512 | $0.04 | ~62.0 | High throughput |
| OpenAI text-embedding-3 | 1,536 | $0.13 | 59.4 | General purpose |
| Stella | 768 | Free (self-host) | 63.2 | Open-source stacks |
| ModernBERT Embed | 768 | Free (self-host) | ~61.5 | Modern hardware |
| Gemini Embedding 2 | 3,072 | $0.08 | 62.8 | Multimodal search |
Scores sourced from MTEB leaderboard (May 2025) and cited benchmarks1.
The trend is clear: bigger dimensions, smarter compression. Models now routinely output 2,048+ dimensions, but Matryoshka representation learning lets you use only the first N dimensions for cheaper storage and faster search without degrading quality1. This means you can store a single embedding and serve multiple use cases — from coarse filtering (128 dims) to fine-grained retrieval (full 2,048 dims).
The other major shift is open-source parity. Stella and ModernBERT now compete with proprietary leaders on MTEB scores, making self-hosted RAG pipelines viable without sacrificing quality.
Disclosure: Some links in this article are affiliate links. We may earn a commission if you purchase through these links — at no extra cost to you. Our recommendations are based on independent research and benchmark data.
This page was written by the engine and the engine is still on the line. The conversation below picks up where the article stops.
Yes — the picks above are the engine's current verdicts. Ask a sharper version of this question below and you'll get a custom answer with the latest pricing.