Retrieval-augmented generation lives or dies on its vector store. The embedding model gets the headlines, but the database underneath decides your recall quality, your p99 latency, and your monthly bill. Pick wrong and you either overpay for managed convenience you do not need, or you self-host something that pages you at 2 a.m. when an index rebuild runs out of memory.

The category has matured. There is no single best vector database anymore, there is a best one for your scale, your team, and your deployment constraints. A hobby RAG bot over 50,000 chunks has nothing in common with a multi-tenant search platform serving a billion vectors under a strict data-residency mandate.

This guide compares the open-source and self-hostable vector databases that production teams actually deploy: Qdrant, Weaviate, Milvus (and its managed Zilliz Cloud), Chroma, pgvector on Postgres, and Redis. We look at architecture, hybrid search, filtering, scale ceilings, pricing shape, and which one fits which team. Figures are sourced from vendor pages and community benchmarks as of May 2026 and should be re-verified before you commit.

Table of Contents

Why the Vector Store Is the RAG Bottleneck
How to Evaluate a Vector Database
Qdrant: The Rust-Based Performance Default
Weaviate: Hybrid Search and Built-In Vectorizers
Milvus and Zilliz Cloud: Billion-Scale Workhorse
Chroma: The Developer-Experience Choice
pgvector: Stay on Postgres as Long as You Can
Redis: Vectors Next to Your Cache
Head-to-Head Comparison Table
Decision Framework: Picking by Scale and Team
RAG Architecture: Where the Database Fits
Why Lushbinary for RAG Infrastructure

1Why the Vector Store Is the RAG Bottleneck

When a RAG pipeline returns bad answers, teams usually blame the LLM first. More often the failure is upstream: the retriever pulled the wrong chunks, or pulled the right chunks too slowly to matter. Both are properties of the vector database, not the model.

Three things determine whether your retrieval layer holds up in production:

Recall under filtering. Real queries are almost never pure semantic search. They are "find documents similar to this, but only for tenant 42, only in English, only from the last 90 days." A database that degrades recall when you add metadata filters will quietly return worse answers.
Latency at your real corpus size. A benchmark on 1M vectors tells you little about behavior at 100M. HNSW graphs grow, memory pressure rises, and p99 latency is where users feel it.
Operational cost of staying healthy. Index rebuilds, replication, backups, and upgrades all have a cost. A managed service hides that cost in the bill; self-hosting moves it to your on-call rotation.

The right tool minimizes the sum of those three for your specific workload. That is why this comparison is organized around fit, not a single leaderboard.

2How to Evaluate a Vector Database

Before looking at individual tools, fix the criteria. These are the dimensions that actually change the decision:

Architecture and indexing

HNSW vs IVF vs GPU-accelerated indexes
Quantization support to cut memory cost
In-memory vs disk-backed storage

Query capabilities

Hybrid search (dense plus keyword/BM25)
Metadata filtering without recall loss
Multi-tenancy isolation

Scale and operations

Realistic ceiling before sharding pain
Managed cloud and self-host parity
Backup, replication, and upgrade story

Cost and lock-in

Pricing by vectors, queries, or compute units
Egress fees, often the hidden killer
Open-source license and exit path

The hidden cost most teams miss

Embedding generation can add 30 to 50 percent on top of your vector database bill, and egress from the database to external endpoints can run as high as $0.09/GB on some clouds, sometimes exceeding compute cost. Model your total RAG cost, not just the database line item.

3Qdrant: The Rust-Based Performance Default

Qdrant is the open-source database most teams reach for when latency per dollar is the priority. Written in Rust, it pairs HNSW indexing with strong payload (metadata) filtering and scalar/binary quantization to keep memory cost down. Community benchmarks repeatedly put it among the fastest open-source options for filtered approximate nearest-neighbor queries.

Strengths

Excellent latency on filtered ANN queries
Quantization cuts RAM cost substantially
Clean API, simple single-binary self-host
Managed Qdrant Cloud with a free tier

Weaknesses

Fewer built-in vectorizer integrations
Hybrid search less turnkey than Weaviate
Distributed mode adds operational work

Best for: Latency-sensitive RAG and agent memory where you want self-host control with an easy managed escape hatch. A common rule of thumb in the community is that self-hosting Qdrant for low single-digit millions of vectors costs a fraction of equivalent managed throughput elsewhere, though exact numbers depend on your hardware and query mix.

4Weaviate: Hybrid Search and Built-In Vectorizers

Weaviate's pitch is integration. It ships native hybrid search (dense vectors plus BM25 keyword), built-in vectorizer modules that call embedding providers for you, multi-tenancy, and a GraphQL-style query layer. For multimodal and hybrid-search-heavy applications it reduces glue code more than most competitors.

Strengths

Best-in-class built-in hybrid search
Vectorizer modules reduce integration time
Native multi-tenancy
Self-host plus Weaviate Cloud options

Weaknesses

BM25 index can be memory-hungry
Higher resource footprint than Qdrant
More concepts to learn up front

Pricing shape: Weaviate Cloud has a small sandbox tier and usage-based paid tiers that scale with stored vectors; community reports put the entry paid tier around $25/month for small collections and roughly $199/month at the low millions of vectors. Verify current numbers on Weaviate's pricing page before budgeting.

5Milvus and Zilliz Cloud: Billion-Scale Workhorse

Milvus is the database to reach for when you are genuinely at scale: hundreds of millions to billions of vectors. It supports multiple index types, GPU-accelerated indexing (including NVIDIA CAGRA in recent releases), and a distributed architecture built for horizontal growth. Zilliz Cloud is the managed version from the same team, and Milvus Lite runs locally for prototyping.

Strengths

Scales to billions of vectors natively
GPU-accelerated index builds
Multiple index types for tuning recall/speed
Managed Zilliz Cloud with a free tier

Weaknesses

Self-hosting adds Kubernetes complexity
Overkill below tens of millions of vectors
Steeper operational learning curve

Best for: Large-scale search and retrieval where the corpus is too big for a single-node store. Below roughly 10M vectors, the distributed machinery is usually more cost and complexity than the workload justifies.

6Chroma: The Developer-Experience Choice

Chroma optimizes for time-to-first-query. pip install, a few lines of Python, and you have a working RAG retriever. For prototypes, notebooks, and small-to-medium corpora it is the fastest way to get moving, and its API is friendly enough that many teams keep it well past the prototype stage.

Strengths

Lowest setup friction in the category
Great for local dev and prototyping
Clean Python-first API
Managed Chroma Cloud now available

Weaknesses

Slower on large filtered queries
Fewer enterprise scale features
Hybrid search less mature

Best for: Prototypes, internal tools, and small production corpora where developer velocity beats raw throughput.

7pgvector: Stay on Postgres as Long as You Can

If your application already runs on Postgres, pgvector lets you add vector search without introducing a new database to operate, back up, and secure. You keep transactional integrity, joins against your relational data, and one fewer system in production. For many teams that is worth more than a latency edge.

Strengths

No new infrastructure to operate
Joins against existing relational data
Mature backup, replication, and tooling
Available on managed Postgres everywhere

Weaknesses

p99 latency rises past a few million vectors
Index builds compete with OLTP load
Fewer ANN tuning knobs than dedicated stores

Best for: Teams already on Postgres with corpora in the hundreds of thousands to low millions of vectors. Plan a migration path to a dedicated store before latency at your real scale forces the issue.

8Redis: Vectors Next to Your Cache

Redis added vector similarity search to the in-memory database many teams already run for caching and queues. The appeal is latency and consolidation: if you need fast retrieval and you already operate Redis, you may not need a separate vector store at all. The trade-off is that everything lives in memory, so cost scales with dataset size faster than disk-backed options.

Strengths

Very low latency for in-memory search
Consolidates with existing Redis usage
Hybrid search and filtering supported

Weaknesses

Memory cost scales hard with corpus size
Less specialized than dedicated stores
Large datasets get expensive fast

Best for: Low-latency retrieval on modest corpora when you are already a heavy Redis shop and want to avoid adding a system.

9Head-to-Head Comparison Table

Database	Sweet spot	Hybrid search	Managed option
Qdrant	Low-latency filtered RAG	Yes	Qdrant Cloud (free tier)
Weaviate	Hybrid and multimodal	Native, strong	Weaviate Cloud
Milvus / Zilliz	Billion-scale search	Yes	Zilliz Cloud (free tier)
Chroma	Prototypes, small corpora	Basic	Chroma Cloud
pgvector	Already-on-Postgres teams	Via SQL + extensions	Any managed Postgres
Redis	In-memory low latency	Yes	Redis Cloud

Feature availability and pricing change frequently. Treat this table as a starting point and confirm specifics against each vendor's current documentation.

10Decision Framework: Picking by Scale and Team

Map your situation to one of these and you will be close to the right answer:

Prototyping or under ~500K vectors: Chroma for speed of iteration, or pgvector if you are already on Postgres.
Production RAG, 500K to ~50M vectors, latency matters: Qdrant. Reach for Weaviate instead if hybrid search and built-in vectorizers save you meaningful glue code.
Tens to hundreds of millions of vectors and up: Milvus self-hosted or Zilliz Cloud managed.
Already heavy on Postgres or Redis: start with the extension (pgvector or Redis vector search) and only graduate to a dedicated store when latency or recall under filtering forces it.
Strict data residency or air-gapped: any of the self-hostable open-source options (Qdrant, Weaviate, Milvus) over a managed-only service.

11RAG Architecture: Where the Database Fits

The vector database is one stage in a pipeline. Getting the stages and their data flow right matters more than the specific database brand.

A reranking step after retrieval often improves answer quality more cheaply than upgrading the database. For the broader pipeline, see our RAG production guide.

12Why Lushbinary for RAG Infrastructure

We build and operate retrieval systems for clients across support automation, internal search, and agent memory. We benchmark candidate databases against your real corpus and query mix rather than a generic leaderboard, then ship the pipeline end to end.

What we typically deliver:

Vector database selection benchmarked on your data, not a synthetic dataset
Chunking, embedding, and reranking tuned for recall on filtered queries
Self-hosted Qdrant, Weaviate, or Milvus deployments with backups and monitoring
Managed-service setup (Qdrant Cloud, Zilliz, Weaviate Cloud) when ops overhead is not worth it
Cost modeling that includes embedding and egress, not just storage

Free Consultation

Not sure which vector database fits your scale and budget? Lushbinary benchmarks the options against your actual workload and builds the retrieval pipeline around the result, no obligation.

Sources

Content was rephrased for compliance with licensing restrictions. Pricing and feature availability sourced from official vendor pages and community benchmarks as of May 2026 and may change. Benchmark numbers vary by hardware, dataset, and query mix. Always verify on the vendor's site and test against your own workload before committing.

Frequently Asked Questions

What is the best vector database for RAG in 2026?

There is no single best one. Qdrant is the strongest default for low-latency filtered RAG, Weaviate wins on built-in hybrid search and vectorizers, Milvus and Zilliz Cloud handle billion-scale workloads, Chroma is best for prototyping, and pgvector is best if you are already on Postgres. Pick by your scale, team, and deployment constraints.

When should I use pgvector instead of a dedicated vector database?

Use pgvector when you already run Postgres and your corpus is in the hundreds of thousands to low millions of vectors. It avoids operating a second database and lets you join against relational data. Move to a dedicated store like Qdrant or Milvus when p99 latency under filtering rises past acceptable levels at your real scale.

How much do managed vector databases cost?

Pricing is usually based on stored vectors, queries, or compute units. Qdrant Cloud and Zilliz Cloud offer free tiers; Weaviate Cloud community reports put the entry paid tier around $25/month for small collections and roughly $199/month at low millions of vectors. Remember embedding generation can add 30 to 50 percent and egress can exceed compute. Verify current numbers on each vendor page.

Does metadata filtering hurt vector search recall?

It can, depending on the database. Some stores degrade recall when filters are applied because the filter and the ANN index interact poorly. Qdrant is designed for strong filtered-query performance, which is why it is a common pick for multi-tenant and metadata-heavy RAG. Always benchmark filtered queries, not just pure semantic search.

When do I actually need Milvus or Zilliz?

When your corpus is in the tens to hundreds of millions of vectors or higher. Below roughly 10M vectors, Milvus's distributed architecture is usually more operational complexity and cost than the workload justifies, and a single-node store like Qdrant is simpler and cheaper.

Can Redis replace a dedicated vector database?

For modest corpora and teams already running Redis, yes, it can avoid adding a system and gives very low latency. But because Redis is in-memory, cost scales hard with dataset size, so large corpora get expensive fast compared to disk-backed dedicated stores.

Build RAG on the Right Foundation

We benchmark vector databases against your real corpus and ship the retrieval pipeline end to end, tuned for recall, latency, and cost.

Ready to Build Something Great?

Get a free 30-minute strategy call. We'll map out your project, timeline, and tech stack - no strings attached.

Let's Talk About Your Project

Prefer email? Reach us directly:

connect@lushbinary.com

Vector Database Comparison for RAG: Qdrant vs Weaviate vs Milvus vs pgvector

1Why the Vector Store Is the RAG Bottleneck

2How to Evaluate a Vector Database

Architecture and indexing

Query capabilities

Scale and operations

Cost and lock-in

3Qdrant: The Rust-Based Performance Default

Strengths

Weaknesses

4Weaviate: Hybrid Search and Built-In Vectorizers

Strengths

Weaknesses

5Milvus and Zilliz Cloud: Billion-Scale Workhorse

Strengths

Weaknesses

6Chroma: The Developer-Experience Choice

Strengths

Weaknesses

7pgvector: Stay on Postgres as Long as You Can

Strengths

Weaknesses

8Redis: Vectors Next to Your Cache

Strengths

Weaknesses

9Head-to-Head Comparison Table

10Decision Framework: Picking by Scale and Team

11RAG Architecture: Where the Database Fits

12Why Lushbinary for RAG Infrastructure

Sources

Frequently Asked Questions

What is the best vector database for RAG in 2026?

When should I use pgvector instead of a dedicated vector database?

How much do managed vector databases cost?

Does metadata filtering hurt vector search recall?

When do I actually need Milvus or Zilliz?

Can Redis replace a dedicated vector database?

Build RAG on the Right Foundation

Ready to Build Something Great?

Contact Us

Ship Better Engineering, Every Week

One Subscription. Every Flagship AI Model.

More from the Blog

How to Build an AI Calorie Tracker App Like Cal AI: Features, Tech Stack & MVP Cost

How to Build an AI App Builder Like Lovable: Architecture, Tech Stack & Cost

ContactUs