Retrieval-augmented generation lives or dies on its vector store. The embedding model gets the headlines, but the database underneath decides your recall quality, your p99 latency, and your monthly bill. Pick wrong and you either overpay for managed convenience you do not need, or you self-host something that pages you at 2 a.m. when an index rebuild runs out of memory.
The category has matured. There is no single best vector database anymore, there is a best one for your scale, your team, and your deployment constraints. A hobby RAG bot over 50,000 chunks has nothing in common with a multi-tenant search platform serving a billion vectors under a strict data-residency mandate.
This guide compares the open-source and self-hostable vector databases that production teams actually deploy: Qdrant, Weaviate, Milvus (and its managed Zilliz Cloud), Chroma, pgvector on Postgres, and Redis. We look at architecture, hybrid search, filtering, scale ceilings, pricing shape, and which one fits which team. Figures are sourced from vendor pages and community benchmarks as of May 2026 and should be re-verified before you commit.
Table of Contents
- Why the Vector Store Is the RAG Bottleneck
- How to Evaluate a Vector Database
- Qdrant: The Rust-Based Performance Default
- Weaviate: Hybrid Search and Built-In Vectorizers
- Milvus and Zilliz Cloud: Billion-Scale Workhorse
- Chroma: The Developer-Experience Choice
- pgvector: Stay on Postgres as Long as You Can
- Redis: Vectors Next to Your Cache
- Head-to-Head Comparison Table
- Decision Framework: Picking by Scale and Team
- RAG Architecture: Where the Database Fits
- Why Lushbinary for RAG Infrastructure
1Why the Vector Store Is the RAG Bottleneck
When a RAG pipeline returns bad answers, teams usually blame the LLM first. More often the failure is upstream: the retriever pulled the wrong chunks, or pulled the right chunks too slowly to matter. Both are properties of the vector database, not the model.
Three things determine whether your retrieval layer holds up in production:
- Recall under filtering. Real queries are almost never pure semantic search. They are "find documents similar to this, but only for tenant 42, only in English, only from the last 90 days." A database that degrades recall when you add metadata filters will quietly return worse answers.
- Latency at your real corpus size. A benchmark on 1M vectors tells you little about behavior at 100M. HNSW graphs grow, memory pressure rises, and p99 latency is where users feel it.
- Operational cost of staying healthy. Index rebuilds, replication, backups, and upgrades all have a cost. A managed service hides that cost in the bill; self-hosting moves it to your on-call rotation.
The right tool minimizes the sum of those three for your specific workload. That is why this comparison is organized around fit, not a single leaderboard.
2How to Evaluate a Vector Database
Before looking at individual tools, fix the criteria. These are the dimensions that actually change the decision:
Architecture and indexing
- HNSW vs IVF vs GPU-accelerated indexes
- Quantization support to cut memory cost
- In-memory vs disk-backed storage
Query capabilities
- Hybrid search (dense plus keyword/BM25)
- Metadata filtering without recall loss
- Multi-tenancy isolation
Scale and operations
- Realistic ceiling before sharding pain
- Managed cloud and self-host parity
- Backup, replication, and upgrade story
Cost and lock-in
- Pricing by vectors, queries, or compute units
- Egress fees, often the hidden killer
- Open-source license and exit path
The hidden cost most teams miss
Embedding generation can add 30 to 50 percent on top of your vector database bill, and egress from the database to external endpoints can run as high as $0.09/GB on some clouds, sometimes exceeding compute cost. Model your total RAG cost, not just the database line item.
3Qdrant: The Rust-Based Performance Default
Qdrant is the open-source database most teams reach for when latency per dollar is the priority. Written in Rust, it pairs HNSW indexing with strong payload (metadata) filtering and scalar/binary quantization to keep memory cost down. Community benchmarks repeatedly put it among the fastest open-source options for filtered approximate nearest-neighbor queries.
Strengths
- Excellent latency on filtered ANN queries
- Quantization cuts RAM cost substantially
- Clean API, simple single-binary self-host
- Managed Qdrant Cloud with a free tier
Weaknesses
- Fewer built-in vectorizer integrations
- Hybrid search less turnkey than Weaviate
- Distributed mode adds operational work
Best for: Latency-sensitive RAG and agent memory where you want self-host control with an easy managed escape hatch. A common rule of thumb in the community is that self-hosting Qdrant for low single-digit millions of vectors costs a fraction of equivalent managed throughput elsewhere, though exact numbers depend on your hardware and query mix.
4Weaviate: Hybrid Search and Built-In Vectorizers
Weaviate's pitch is integration. It ships native hybrid search (dense vectors plus BM25 keyword), built-in vectorizer modules that call embedding providers for you, multi-tenancy, and a GraphQL-style query layer. For multimodal and hybrid-search-heavy applications it reduces glue code more than most competitors.
Strengths
- Best-in-class built-in hybrid search
- Vectorizer modules reduce integration time
- Native multi-tenancy
- Self-host plus Weaviate Cloud options
Weaknesses
- BM25 index can be memory-hungry
- Higher resource footprint than Qdrant
- More concepts to learn up front
Pricing shape: Weaviate Cloud has a small sandbox tier and usage-based paid tiers that scale with stored vectors; community reports put the entry paid tier around $25/month for small collections and roughly $199/month at the low millions of vectors. Verify current numbers on Weaviate's pricing page before budgeting.
5Milvus and Zilliz Cloud: Billion-Scale Workhorse
Milvus is the database to reach for when you are genuinely at scale: hundreds of millions to billions of vectors. It supports multiple index types, GPU-accelerated indexing (including NVIDIA CAGRA in recent releases), and a distributed architecture built for horizontal growth. Zilliz Cloud is the managed version from the same team, and Milvus Lite runs locally for prototyping.
Strengths
- Scales to billions of vectors natively
- GPU-accelerated index builds
- Multiple index types for tuning recall/speed
- Managed Zilliz Cloud with a free tier
Weaknesses
- Self-hosting adds Kubernetes complexity
- Overkill below tens of millions of vectors
- Steeper operational learning curve
Best for: Large-scale search and retrieval where the corpus is too big for a single-node store. Below roughly 10M vectors, the distributed machinery is usually more cost and complexity than the workload justifies.
6Chroma: The Developer-Experience Choice
Chroma optimizes for time-to-first-query. pip install, a few lines of Python, and you have a working RAG retriever. For prototypes, notebooks, and small-to-medium corpora it is the fastest way to get moving, and its API is friendly enough that many teams keep it well past the prototype stage.
Strengths
- Lowest setup friction in the category
- Great for local dev and prototyping
- Clean Python-first API
- Managed Chroma Cloud now available
Weaknesses
- Slower on large filtered queries
- Fewer enterprise scale features
- Hybrid search less mature
Best for: Prototypes, internal tools, and small production corpora where developer velocity beats raw throughput.
7pgvector: Stay on Postgres as Long as You Can
If your application already runs on Postgres, pgvector lets you add vector search without introducing a new database to operate, back up, and secure. You keep transactional integrity, joins against your relational data, and one fewer system in production. For many teams that is worth more than a latency edge.
Strengths
- No new infrastructure to operate
- Joins against existing relational data
- Mature backup, replication, and tooling
- Available on managed Postgres everywhere
Weaknesses
- p99 latency rises past a few million vectors
- Index builds compete with OLTP load
- Fewer ANN tuning knobs than dedicated stores
Best for: Teams already on Postgres with corpora in the hundreds of thousands to low millions of vectors. Plan a migration path to a dedicated store before latency at your real scale forces the issue.
8Redis: Vectors Next to Your Cache
Redis added vector similarity search to the in-memory database many teams already run for caching and queues. The appeal is latency and consolidation: if you need fast retrieval and you already operate Redis, you may not need a separate vector store at all. The trade-off is that everything lives in memory, so cost scales with dataset size faster than disk-backed options.
Strengths
- Very low latency for in-memory search
- Consolidates with existing Redis usage
- Hybrid search and filtering supported
Weaknesses
- Memory cost scales hard with corpus size
- Less specialized than dedicated stores
- Large datasets get expensive fast
Best for: Low-latency retrieval on modest corpora when you are already a heavy Redis shop and want to avoid adding a system.
9Head-to-Head Comparison Table
| Database | Sweet spot | Hybrid search | Managed option |
|---|---|---|---|
| Qdrant | Low-latency filtered RAG | Yes | Qdrant Cloud (free tier) |
| Weaviate | Hybrid and multimodal | Native, strong | Weaviate Cloud |
| Milvus / Zilliz | Billion-scale search | Yes | Zilliz Cloud (free tier) |
| Chroma | Prototypes, small corpora | Basic | Chroma Cloud |
| pgvector | Already-on-Postgres teams | Via SQL + extensions | Any managed Postgres |
| Redis | In-memory low latency | Yes | Redis Cloud |
Feature availability and pricing change frequently. Treat this table as a starting point and confirm specifics against each vendor's current documentation.
10Decision Framework: Picking by Scale and Team
Map your situation to one of these and you will be close to the right answer:
- Prototyping or under ~500K vectors: Chroma for speed of iteration, or pgvector if you are already on Postgres.
- Production RAG, 500K to ~50M vectors, latency matters: Qdrant. Reach for Weaviate instead if hybrid search and built-in vectorizers save you meaningful glue code.
- Tens to hundreds of millions of vectors and up: Milvus self-hosted or Zilliz Cloud managed.
- Already heavy on Postgres or Redis: start with the extension (pgvector or Redis vector search) and only graduate to a dedicated store when latency or recall under filtering forces it.
- Strict data residency or air-gapped: any of the self-hostable open-source options (Qdrant, Weaviate, Milvus) over a managed-only service.
11RAG Architecture: Where the Database Fits
The vector database is one stage in a pipeline. Getting the stages and their data flow right matters more than the specific database brand.
A reranking step after retrieval often improves answer quality more cheaply than upgrading the database. For the broader pipeline, see our RAG production guide.
12Why Lushbinary for RAG Infrastructure
We build and operate retrieval systems for clients across support automation, internal search, and agent memory. We benchmark candidate databases against your real corpus and query mix rather than a generic leaderboard, then ship the pipeline end to end.
What we typically deliver:
- Vector database selection benchmarked on your data, not a synthetic dataset
- Chunking, embedding, and reranking tuned for recall on filtered queries
- Self-hosted Qdrant, Weaviate, or Milvus deployments with backups and monitoring
- Managed-service setup (Qdrant Cloud, Zilliz, Weaviate Cloud) when ops overhead is not worth it
- Cost modeling that includes embedding and egress, not just storage
Free Consultation
Not sure which vector database fits your scale and budget? Lushbinary benchmarks the options against your actual workload and builds the retrieval pipeline around the result, no obligation.
Sources
- Qdrant Cloud pricing
- Weaviate Cloud pricing
- Zilliz Cloud (Milvus) pricing
- Chroma
- pgvector documentation
- Redis vector search documentation
Content was rephrased for compliance with licensing restrictions. Pricing and feature availability sourced from official vendor pages and community benchmarks as of May 2026 and may change. Benchmark numbers vary by hardware, dataset, and query mix. Always verify on the vendor's site and test against your own workload before committing.
Frequently Asked Questions
What is the best vector database for RAG in 2026?
There is no single best one. Qdrant is the strongest default for low-latency filtered RAG, Weaviate wins on built-in hybrid search and vectorizers, Milvus and Zilliz Cloud handle billion-scale workloads, Chroma is best for prototyping, and pgvector is best if you are already on Postgres. Pick by your scale, team, and deployment constraints.
When should I use pgvector instead of a dedicated vector database?
Use pgvector when you already run Postgres and your corpus is in the hundreds of thousands to low millions of vectors. It avoids operating a second database and lets you join against relational data. Move to a dedicated store like Qdrant or Milvus when p99 latency under filtering rises past acceptable levels at your real scale.
How much do managed vector databases cost?
Pricing is usually based on stored vectors, queries, or compute units. Qdrant Cloud and Zilliz Cloud offer free tiers; Weaviate Cloud community reports put the entry paid tier around $25/month for small collections and roughly $199/month at low millions of vectors. Remember embedding generation can add 30 to 50 percent and egress can exceed compute. Verify current numbers on each vendor page.
Does metadata filtering hurt vector search recall?
It can, depending on the database. Some stores degrade recall when filters are applied because the filter and the ANN index interact poorly. Qdrant is designed for strong filtered-query performance, which is why it is a common pick for multi-tenant and metadata-heavy RAG. Always benchmark filtered queries, not just pure semantic search.
When do I actually need Milvus or Zilliz?
When your corpus is in the tens to hundreds of millions of vectors or higher. Below roughly 10M vectors, Milvus's distributed architecture is usually more operational complexity and cost than the workload justifies, and a single-node store like Qdrant is simpler and cheaper.
Can Redis replace a dedicated vector database?
For modest corpora and teams already running Redis, yes, it can avoid adding a system and gives very low latency. But because Redis is in-memory, cost scales hard with dataset size, so large corpora get expensive fast compared to disk-backed dedicated stores.
Build RAG on the Right Foundation
We benchmark vector databases against your real corpus and ship the retrieval pipeline end to end, tuned for recall, latency, and cost.
Ready to Build Something Great?
Get a free 30-minute strategy call. We'll map out your project, timeline, and tech stack - no strings attached.
Prefer email? Reach us directly:

