Read the landing pages and every vector database does the same five things. Sub-millisecond similarity search. Billions of vectors. Horizontal scale. Hybrid search. Production-ready. The copy is interchangeable because the benchmarks are interchangeable — everyone publishes the query that makes them look fastest, on the dataset that flatters them, against the competitor they beat.
None of that decides anything. The benchmark you care about is the one nobody runs: your data, your filter patterns, your write volume, your team's ability to operate the thing at 3am. The vector database decision is not "which is fastest." It's "which survives my actual workload and my actual on-call rotation." Those are different questions with different answers.
The axes that actually matter
Pure recall at a given latency, on a clean benchmark, is the axis everyone competes on and the one that matters least in production. Here are the ones that decide it.
Scale, honestly stated. Not the marketing ceiling — your number. Are you storing 50,000 vectors or 500 million? The answer changes everything, and most teams are at the low end. A RAG system over a company's internal docs is often under a million vectors. At that scale, the "scales to billions" pitch is selling you a problem you don't have, and the operational cost of a distributed system you'll never fill. At hundreds of millions, scale becomes real and the conversation changes.
Filtering. This is the axis that quietly breaks RAG systems, and it's barely on the landing pages. Real queries are never pure vector search. They're "find similar documents where tenant_id = X and status = active and created after March." How the database combines the vector search with the metadata filter determines whether you get correct results or fast garbage. Pre-filtering (narrow by metadata, then search the survivors) gives correct results but can be slow if the filter is broad. Post-filtering (search first, then drop non-matching) is fast but can return a near-empty result set when the filter is selective — you asked for 10 results, the vector search returned 10, the filter killed 9, and now you've silently lost recall. In a multi-tenant system, filtering is not a feature. It's the whole game, because tenant isolation runs through it.
Hybrid search. Pure semantic search misses exact-match needs — product SKUs, error codes, names, acronyms. "Find the doc about error E4012" should match on the literal string, not just on vibes. Hybrid search fuses dense vector similarity with sparse keyword matching (BM25 or equivalent). Some databases do this natively; with others you bolt on a second system and reconcile two result sets yourself. If your domain has identifiers that must match exactly, native hybrid is worth a lot.
Ops burden. Who runs this at 3am? A managed service trades money for never thinking about replication, sharding, or node failure. A self-hosted cluster trades money saved for an on-call rotation that now owns a stateful distributed system. Be honest about your team. A four-person startup that self-hosts a sharded vector cluster has just signed up to become a database operations team instead of building product.
Cost at your scale and write pattern. Managed vector databases price on stored vectors and queries, and the bill grows with the corpus whether or not anyone queries it. Self-hosted shifts cost to compute and engineering time. And write pattern matters as much as read: a corpus that's rebuilt nightly stresses indexing throughput; a corpus with constant small updates stresses something else entirely. Price your actual pattern, not a steady-state read benchmark.
pgvector vs the dedicated stores
pgvector (Postgres extension). If your data already lives in Postgres and you're under a few million vectors, this is very often the right answer and the boring one nobody recommends because there's no landing page selling it. You get vector search inside the database you already run, back up, and monitor. Filtering is just a SQL WHERE clause against indexed columns — the pre-filter problem largely dissolves because the planner handles it. Multi-tenancy is row-level security you may already have. One system, one backup story, one thing to operate. The ceiling is real — at very high scale or very high query concurrency, a dedicated store pulls ahead — but most teams hit product-market fit long before they hit pgvector's ceiling.
Pinecone. Fully managed, zero ops, fast to stand up. You pay for that in money and in lock-in — it's a hosted service with its own API, not something you run. Strong choice when you want vectors to be someone else's operational problem and the cost fits. Weak choice when you need the vector data to live next to relational data, or when egress and per-vector pricing at scale start to sting.
Weaviate. Native hybrid search and a built-in module ecosystem. Good when hybrid is central to your use case and you want it first-class rather than assembled. Self-hostable or managed. The module surface is powerful and is also more concept to learn and operate.
Qdrant. Strong filtering, good performance, sane self-hosting story, Rust core. A common pick when you've outgrown pgvector but don't want managed lock-in, and when rich metadata filtering is a first-order requirement. Self-host it or use their cloud.
Milvus. Built for very large scale with a distributed architecture. The right tool when you genuinely have hundreds of millions of vectors and the team to run a distributed system. The wrong tool when you have two million vectors and four engineers — you've taken on a cluster's worth of operational complexity to solve a problem you don't have.
When Postgres is enough
Default to pgvector and make the dedicated store earn its place. You're past Postgres when one of these is true and you can prove it with numbers, not vibes: you're north of roughly 10 million vectors and query latency under load has degraded past your budget; your query concurrency saturates the database and vector search is contending with your transactional traffic; you need native hybrid search and assembling it on Postgres is more work than adopting a store that has it; or your write pattern (massive continuous re-indexing) is starving your relational workload.
If none of those is true, a dedicated vector database is operational complexity you're buying on spec. Two systems to back up, two to monitor, two failure modes, a sync pipeline keeping them consistent, and a new thing on the on-call rotation. That's a real cost paid every day against a benchmark advantage you may never need.
The wrong-pick failure mode
The expensive mistake is rarely "too slow." It's choosing for a scale you imagine instead of the scale you have. A team picks Milvus because a future deck says billions, spends three months operating a distributed cluster, and ships late with two million vectors that pgvector would have served from a box they already ran. The vector store became the project instead of a component of it.
The other failure mode is filtering discovered in production. The PoC ran unfiltered queries and looked great. Then multi-tenancy arrived, post-filtering silently dropped recall, and users in busy tenants got near-empty results that no benchmark would have caught. Filtering and tenant isolation have to be load-tested with realistic selectivity before launch, against the database you actually chose — not assumed because the demo was fast.
What fixed looks like
The choice is justified by your real numbers — vector count, write pattern, filter selectivity, concurrency, on-call capacity — not by a benchmark someone else ran. Filtering and multi-tenant isolation are load-tested at realistic selectivity and recall holds. The system count matches the team that operates it. If it's pgvector, it's pgvector on purpose, with the migration trigger written down so you know exactly when to move. If it's a dedicated store, it's because Postgres provably ran out, and you can name the metric that proved it.
This is for you if
You're a funded US company building retrieval into a product, you have a real corpus and real users, and the vector store choice now load-bears tenant isolation, latency, and cost. This work is part of a RAG or AI-platform build, typically $50k+; designing the retrieval and storage layer inside a larger product runs $100k+.
It's not for you if you're prototyping — use pgvector or whatever is fastest to stand up and revisit when you have traffic. It's not for you if you've already chosen and it's working; "could be marginally faster on a benchmark" is not a reason to migrate a production data store. And it's not for you if the real problem is retrieval quality, not storage — the wrong chunks come back fast no matter which database returns them.