Logo
Back to Blog
AI & AutomationMay 29, 202615 min read

Vector Database Comparison for RAG: Qdrant vs Weaviate vs Milvus vs pgvector

The vector store decides your RAG recall, latency, and bill, not the embedding model. We compare Qdrant, Weaviate, Milvus and Zilliz, Chroma, pgvector, and Redis on architecture, hybrid search, scale ceilings, and pricing, with a decision framework by scale and team.

Lushbinary Team

Lushbinary Team

AI & Cloud Solutions

Vector Database Comparison for RAG: Qdrant vs Weaviate vs Milvus vs pgvector

Retrieval-augmented generation lives or dies on its vector store. The embedding model gets the headlines, but the database underneath decides your recall quality, your p99 latency, and your monthly bill. Pick wrong and you either overpay for managed convenience you do not need, or you self-host something that pages you at 2 a.m. when an index rebuild runs out of memory.

The category has matured. There is no single best vector database anymore, there is a best one for your scale, your team, and your deployment constraints. A hobby RAG bot over 50,000 chunks has nothing in common with a multi-tenant search platform serving a billion vectors under a strict data-residency mandate.

This guide compares the open-source and self-hostable vector databases that production teams actually deploy: Qdrant, Weaviate, Milvus (and its managed Zilliz Cloud), Chroma, pgvector on Postgres, and Redis. We look at architecture, hybrid search, filtering, scale ceilings, pricing shape, and which one fits which team. Figures are sourced from vendor pages and community benchmarks as of May 2026 and should be re-verified before you commit.

Table of Contents

  1. Why the Vector Store Is the RAG Bottleneck
  2. How to Evaluate a Vector Database
  3. Qdrant: The Rust-Based Performance Default
  4. Weaviate: Hybrid Search and Built-In Vectorizers
  5. Milvus and Zilliz Cloud: Billion-Scale Workhorse
  6. Chroma: The Developer-Experience Choice
  7. pgvector: Stay on Postgres as Long as You Can
  8. Redis: Vectors Next to Your Cache
  9. Head-to-Head Comparison Table
  10. Decision Framework: Picking by Scale and Team
  11. RAG Architecture: Where the Database Fits
  12. Why Lushbinary for RAG Infrastructure

1Why the Vector Store Is the RAG Bottleneck

When a RAG pipeline returns bad answers, teams usually blame the LLM first. More often the failure is upstream: the retriever pulled the wrong chunks, or pulled the right chunks too slowly to matter. Both are properties of the vector database, not the model.

Three things determine whether your retrieval layer holds up in production:

  • Recall under filtering. Real queries are almost never pure semantic search. They are "find documents similar to this, but only for tenant 42, only in English, only from the last 90 days." A database that degrades recall when you add metadata filters will quietly return worse answers.
  • Latency at your real corpus size. A benchmark on 1M vectors tells you little about behavior at 100M. HNSW graphs grow, memory pressure rises, and p99 latency is where users feel it.
  • Operational cost of staying healthy. Index rebuilds, replication, backups, and upgrades all have a cost. A managed service hides that cost in the bill; self-hosting moves it to your on-call rotation.

The right tool minimizes the sum of those three for your specific workload. That is why this comparison is organized around fit, not a single leaderboard.

2How to Evaluate a Vector Database

Before looking at individual tools, fix the criteria. These are the dimensions that actually change the decision:

Architecture and indexing

  • HNSW vs IVF vs GPU-accelerated indexes
  • Quantization support to cut memory cost
  • In-memory vs disk-backed storage

Query capabilities

  • Hybrid search (dense plus keyword/BM25)
  • Metadata filtering without recall loss
  • Multi-tenancy isolation

Scale and operations

  • Realistic ceiling before sharding pain
  • Managed cloud and self-host parity
  • Backup, replication, and upgrade story

Cost and lock-in

  • Pricing by vectors, queries, or compute units
  • Egress fees, often the hidden killer
  • Open-source license and exit path

The hidden cost most teams miss

Embedding generation can add 30 to 50 percent on top of your vector database bill, and egress from the database to external endpoints can run as high as $0.09/GB on some clouds, sometimes exceeding compute cost. Model your total RAG cost, not just the database line item.

3Qdrant: The Rust-Based Performance Default

Qdrant is the open-source database most teams reach for when latency per dollar is the priority. Written in Rust, it pairs HNSW indexing with strong payload (metadata) filtering and scalar/binary quantization to keep memory cost down. Community benchmarks repeatedly put it among the fastest open-source options for filtered approximate nearest-neighbor queries.

Strengths

  • Excellent latency on filtered ANN queries
  • Quantization cuts RAM cost substantially
  • Clean API, simple single-binary self-host
  • Managed Qdrant Cloud with a free tier

Weaknesses

  • Fewer built-in vectorizer integrations
  • Hybrid search less turnkey than Weaviate
  • Distributed mode adds operational work

Best for: Latency-sensitive RAG and agent memory where you want self-host control with an easy managed escape hatch. A common rule of thumb in the community is that self-hosting Qdrant for low single-digit millions of vectors costs a fraction of equivalent managed throughput elsewhere, though exact numbers depend on your hardware and query mix.

4Weaviate: Hybrid Search and Built-In Vectorizers

Weaviate's pitch is integration. It ships native hybrid search (dense vectors plus BM25 keyword), built-in vectorizer modules that call embedding providers for you, multi-tenancy, and a GraphQL-style query layer. For multimodal and hybrid-search-heavy applications it reduces glue code more than most competitors.

Strengths

  • Best-in-class built-in hybrid search
  • Vectorizer modules reduce integration time
  • Native multi-tenancy
  • Self-host plus Weaviate Cloud options

Weaknesses

  • BM25 index can be memory-hungry
  • Higher resource footprint than Qdrant
  • More concepts to learn up front

Pricing shape: Weaviate Cloud has a small sandbox tier and usage-based paid tiers that scale with stored vectors; community reports put the entry paid tier around $25/month for small collections and roughly $199/month at the low millions of vectors. Verify current numbers on Weaviate's pricing page before budgeting.

5Milvus and Zilliz Cloud: Billion-Scale Workhorse

Milvus is the database to reach for when you are genuinely at scale: hundreds of millions to billions of vectors. It supports multiple index types, GPU-accelerated indexing (including NVIDIA CAGRA in recent releases), and a distributed architecture built for horizontal growth. Zilliz Cloud is the managed version from the same team, and Milvus Lite runs locally for prototyping.

Strengths

  • Scales to billions of vectors natively
  • GPU-accelerated index builds
  • Multiple index types for tuning recall/speed
  • Managed Zilliz Cloud with a free tier

Weaknesses

  • Self-hosting adds Kubernetes complexity
  • Overkill below tens of millions of vectors
  • Steeper operational learning curve

Best for: Large-scale search and retrieval where the corpus is too big for a single-node store. Below roughly 10M vectors, the distributed machinery is usually more cost and complexity than the workload justifies.

6Chroma: The Developer-Experience Choice

Chroma optimizes for time-to-first-query. pip install, a few lines of Python, and you have a working RAG retriever. For prototypes, notebooks, and small-to-medium corpora it is the fastest way to get moving, and its API is friendly enough that many teams keep it well past the prototype stage.

Strengths

  • Lowest setup friction in the category
  • Great for local dev and prototyping
  • Clean Python-first API
  • Managed Chroma Cloud now available

Weaknesses

  • Slower on large filtered queries
  • Fewer enterprise scale features
  • Hybrid search less mature

Best for: Prototypes, internal tools, and small production corpora where developer velocity beats raw throughput.

7pgvector: Stay on Postgres as Long as You Can

If your application already runs on Postgres, pgvector lets you add vector search without introducing a new database to operate, back up, and secure. You keep transactional integrity, joins against your relational data, and one fewer system in production. For many teams that is worth more than a latency edge.

Strengths

  • No new infrastructure to operate
  • Joins against existing relational data
  • Mature backup, replication, and tooling
  • Available on managed Postgres everywhere

Weaknesses

  • p99 latency rises past a few million vectors
  • Index builds compete with OLTP load
  • Fewer ANN tuning knobs than dedicated stores

Best for: Teams already on Postgres with corpora in the hundreds of thousands to low millions of vectors. Plan a migration path to a dedicated store before latency at your real scale forces the issue.

8Redis: Vectors Next to Your Cache

Redis added vector similarity search to the in-memory database many teams already run for caching and queues. The appeal is latency and consolidation: if you need fast retrieval and you already operate Redis, you may not need a separate vector store at all. The trade-off is that everything lives in memory, so cost scales with dataset size faster than disk-backed options.

Strengths

  • Very low latency for in-memory search
  • Consolidates with existing Redis usage
  • Hybrid search and filtering supported

Weaknesses

  • Memory cost scales hard with corpus size
  • Less specialized than dedicated stores
  • Large datasets get expensive fast

Best for: Low-latency retrieval on modest corpora when you are already a heavy Redis shop and want to avoid adding a system.

9Head-to-Head Comparison Table

DatabaseSweet spotHybrid searchManaged option
QdrantLow-latency filtered RAGYesQdrant Cloud (free tier)
WeaviateHybrid and multimodalNative, strongWeaviate Cloud
Milvus / ZillizBillion-scale searchYesZilliz Cloud (free tier)
ChromaPrototypes, small corporaBasicChroma Cloud
pgvectorAlready-on-Postgres teamsVia SQL + extensionsAny managed Postgres
RedisIn-memory low latencyYesRedis Cloud

Feature availability and pricing change frequently. Treat this table as a starting point and confirm specifics against each vendor's current documentation.

10Decision Framework: Picking by Scale and Team

Map your situation to one of these and you will be close to the right answer:

  • Prototyping or under ~500K vectors: Chroma for speed of iteration, or pgvector if you are already on Postgres.
  • Production RAG, 500K to ~50M vectors, latency matters: Qdrant. Reach for Weaviate instead if hybrid search and built-in vectorizers save you meaningful glue code.
  • Tens to hundreds of millions of vectors and up: Milvus self-hosted or Zilliz Cloud managed.
  • Already heavy on Postgres or Redis: start with the extension (pgvector or Redis vector search) and only graduate to a dedicated store when latency or recall under filtering forces it.
  • Strict data residency or air-gapped: any of the self-hostable open-source options (Qdrant, Weaviate, Milvus) over a managed-only service.

11RAG Architecture: Where the Database Fits

The vector database is one stage in a pipeline. Getting the stages and their data flow right matters more than the specific database brand.

Documents and data sourcesChunk and embedVector databaseindex + metadata filterRetrievertop-k + rerankLLM generationGrounded answerUser query is embedded, filtered, retrieved, then passed as contextRetrieval quality caps answer quality, no matter the model

A reranking step after retrieval often improves answer quality more cheaply than upgrading the database. For the broader pipeline, see our RAG production guide.

12Why Lushbinary for RAG Infrastructure

We build and operate retrieval systems for clients across support automation, internal search, and agent memory. We benchmark candidate databases against your real corpus and query mix rather than a generic leaderboard, then ship the pipeline end to end.

What we typically deliver:

  • Vector database selection benchmarked on your data, not a synthetic dataset
  • Chunking, embedding, and reranking tuned for recall on filtered queries
  • Self-hosted Qdrant, Weaviate, or Milvus deployments with backups and monitoring
  • Managed-service setup (Qdrant Cloud, Zilliz, Weaviate Cloud) when ops overhead is not worth it
  • Cost modeling that includes embedding and egress, not just storage

Free Consultation

Not sure which vector database fits your scale and budget? Lushbinary benchmarks the options against your actual workload and builds the retrieval pipeline around the result, no obligation.

Sources

Content was rephrased for compliance with licensing restrictions. Pricing and feature availability sourced from official vendor pages and community benchmarks as of May 2026 and may change. Benchmark numbers vary by hardware, dataset, and query mix. Always verify on the vendor's site and test against your own workload before committing.

Frequently Asked Questions

What is the best vector database for RAG in 2026?

There is no single best one. Qdrant is the strongest default for low-latency filtered RAG, Weaviate wins on built-in hybrid search and vectorizers, Milvus and Zilliz Cloud handle billion-scale workloads, Chroma is best for prototyping, and pgvector is best if you are already on Postgres. Pick by your scale, team, and deployment constraints.

When should I use pgvector instead of a dedicated vector database?

Use pgvector when you already run Postgres and your corpus is in the hundreds of thousands to low millions of vectors. It avoids operating a second database and lets you join against relational data. Move to a dedicated store like Qdrant or Milvus when p99 latency under filtering rises past acceptable levels at your real scale.

How much do managed vector databases cost?

Pricing is usually based on stored vectors, queries, or compute units. Qdrant Cloud and Zilliz Cloud offer free tiers; Weaviate Cloud community reports put the entry paid tier around $25/month for small collections and roughly $199/month at low millions of vectors. Remember embedding generation can add 30 to 50 percent and egress can exceed compute. Verify current numbers on each vendor page.

Does metadata filtering hurt vector search recall?

It can, depending on the database. Some stores degrade recall when filters are applied because the filter and the ANN index interact poorly. Qdrant is designed for strong filtered-query performance, which is why it is a common pick for multi-tenant and metadata-heavy RAG. Always benchmark filtered queries, not just pure semantic search.

When do I actually need Milvus or Zilliz?

When your corpus is in the tens to hundreds of millions of vectors or higher. Below roughly 10M vectors, Milvus's distributed architecture is usually more operational complexity and cost than the workload justifies, and a single-node store like Qdrant is simpler and cheaper.

Can Redis replace a dedicated vector database?

For modest corpora and teams already running Redis, yes, it can avoid adding a system and gives very low latency. But because Redis is in-memory, cost scales hard with dataset size, so large corpora get expensive fast compared to disk-backed dedicated stores.

Build RAG on the Right Foundation

We benchmark vector databases against your real corpus and ship the retrieval pipeline end to end, tuned for recall, latency, and cost.

Ready to Build Something Great?

Get a free 30-minute strategy call. We'll map out your project, timeline, and tech stack - no strings attached.

Let's Talk About Your Project

Prefer email? Reach us directly:

Contact Us

Subscribe · Newsletter

Ship Better Engineering, Every Week

Practical writing on AI agents, cloud architecture, and product teardowns. Read by builders at startups and Fortune 500s.

  • New deep-dives on AI agents and cloud architecture
  • Engineering teardowns of shipped products
  • No spam, unsubscribe in one click

We respect your inbox. Read our privacy policy.

Exclusive Offer for Lushbinary Readers
WidelAI

One Subscription. Every Flagship AI Model.

Stop juggling multiple AI subscriptions. WidelAI gives you access to Claude, GPT, Gemini, and more - all under a single plan.

Claude Opus & SonnetGPT-5.5 & o3Gemini ProSingle DashboardAPI Access

Use code at checkout for 10% off your subscription:

Vector DatabaseRAGQdrantWeaviateMilvusZillizChromapgvectorRedisEmbeddingsSemantic SearchHybrid SearchAI InfrastructureRetrieval

ContactUs