The dashboard is red. P99 query latency has tripled in a month, the database CPU sits at 85% during business hours, and your strongest engineer just dropped a doc in the channel titled "Sharding Strategy." It has a diagram. The diagram has arrows. Everyone nods.
Stop. Sharding is the most expensive, most irreversible scaling decision you can make, and nine times out of ten the team reaches for it because the database is the obvious thing that's slow, not because they've ruled out the four cheaper fixes sitting above it. A shard is forever. A missing index is a Tuesday. Let's put these in the order they actually belong.
Why "shard it" is the seductive wrong answer
Sharding feels like the grown-up move. The big companies shard. The blog posts are about sharding. It signals that you're operating at scale.
It also permanently changes the shape of every query you will ever write. Once data is split across shards by some key, cross-shard joins become application-level fan-out. Transactions that used to be one BEGIN/COMMIT now span machines and need a distributed coordination story you do not want to own. Unique constraints across the whole dataset stop being free. Every new engineer has to learn your shard topology before they can write a query that touches two entities. You have traded a database problem for a distributed-systems problem, and distributed systems are where six-figure incidents are born.
The honest test before anyone draws a shard diagram: can you name the single query, with its execution plan, that forced this decision? If the answer is "the database is just slow," you don't have a sharding problem. You have a diagnosis problem.
Step zero: find out what is actually slow
Most "the database can't scale" conversations are running on vibes. Before any architecture changes, you need numbers, and Postgres will hand them to you.
Turn on pg_stat_statements and read the top queries by total time — not by latency, by total time, which is calls times mean latency. The query that runs ten thousand times a minute at 8ms is your problem, not the report that runs once an hour at 4 seconds. Pull EXPLAIN (ANALYZE, BUFFERS) on the top offenders. Look at pg_stat_user_tables for sequential scans on large tables. Check pg_stat_activity for lock waits and idle-in-transaction connections eating slots.
Half the time the picture that comes back is embarrassing: one N+1 query in the ORM firing a thousand round-trips per page load, or a missing index turning a 200ms lookup into a full table scan. You do not shard your way out of an N+1. You fix the N+1.
Step one: indexes and query shape
The cheapest scaling lever in existence is a CREATE INDEX CONCURRENTLY that turns a sequential scan into an index scan. It is not glamorous. It is frequently a 100x improvement on the exact query melting your CPU.
Look for the patterns that index away cleanly. A WHERE status = ? AND created_at > ? filter with no composite index. A foreign key with no index on the child side, making every parent delete a table scan. An ORDER BY ... LIMIT that sorts the whole table because the sort column isn't indexed. A query filtering on a jsonb field that could use a GIN index or, better, a promoted column.
Then fix the query shape itself. Batch the N+1 into a single IN query or a join. Add pagination so nobody pulls forty thousand rows to show twenty. Replace SELECT * on a wide table with the columns you actually render. None of this changes your architecture. All of it changes your CPU graph.
This step alone resolves the majority of "we need to scale the database" panics. Spend a real week here before you spend a quarter anywhere else.
Step two: connection pooling
The second-cheapest fix is the one teams skip because it's invisible until it bites. Postgres allocates a real process and a chunk of memory per connection. A few hundred direct connections from a horizontally-scaled app tier will exhaust the server's memory and scheduler long before the data itself is the limit.
Put PgBouncer (or your managed equivalent) in front in transaction-pooling mode. Suddenly two thousand application connections multiplex down onto a few dozen real database connections, and the CPU you thought was a data problem turns out to have been a connection-management problem. Set sane pool sizes, kill the idle-in-transaction connections holding slots hostage, and watch the panic subside.
A surprising number of "the database is dying" incidents are pure connection exhaustion. Cheap to fix. Easy to miss.
Step three: read replicas
Now you've tuned queries, added indexes, and pooled connections, and the database is still hot. Look at the read/write ratio. For almost every B2B SaaS and consumer product, reads dominate writes by ten or twenty to one. That ratio is the entire case for read replicas.
A streaming replica is a near-exact copy of the primary that takes read traffic off it. You route dashboards, search, list views, analytics, and report queries to replicas, and you keep writes plus any read-your-own-writes path on the primary. The primary's CPU drops because it's no longer serving the heavy reads, and you can add more replicas as read load grows. This is horizontal read scaling without touching your data model.
The one thing you must design for, not discover: replication lag. A replica is milliseconds-to-seconds behind. If a user submits a form and the next page reads from a lagging replica, they see stale data and file a bug. The fix is a deliberate routing policy — writes and immediate read-after-write go to the primary, everything else goes to replicas — not a global "send all reads to the replica" switch. We've built this routing layer where the read/write split was load-bearing for performance. How we architect multi-tenant data layers →
Step four: partitioning
Read replicas scale reads. They do nothing for a single table that's grown so large that writes, vacuums, and index maintenance on it are the bottleneck. Before you split data across machines, split it within one machine.
Postgres native partitioning breaks one giant logical table into physical child tables by a key — usually time (created_at by month) for append-heavy event and log data, sometimes a tenant or region key. Queries that filter on the partition key only touch the relevant partitions (partition pruning), indexes stay small per partition, and you can drop a whole month of old data with a DETACH instead of a billion-row DELETE. Vacuum runs per partition instead of fighting the whole table.
This buys you most of the locality benefit people think they're getting from sharding, with none of the distributed-systems tax. It's a schema change, not a topology change. The data is still in one database with one transaction boundary.
Step five: only now, sharding
If — after correct indexes, pooling, replicas, and partitioning — your write throughput or your working-set size genuinely exceeds what one primary can hold, you have earned the right to shard. The signal is specific: the primary's write CPU is saturated, vertical scaling has a ceiling you've hit, and partitioning within one node can't keep the working set in memory.
Now do it deliberately. Choose a shard key that matches how you actually query — usually tenant_id, so a given customer's data lives on one shard and most queries stay single-shard. Accept that cross-shard analytics moves to a separate system. Decide your rebalancing story before you launch, not after a shard fills up. And consider whether a Postgres-compatible distributed layer (Citus and friends) gives you sharding semantics without hand-rolling the coordination logic.
The point isn't that sharding is wrong. It's that sharding chosen as step one, skipping the four cheaper steps, is how teams pay distributed-systems costs for a problem an index would have solved.
What fixed looks like
A database scaling decision made in order, with evidence at each gate:
pg_stat_statementsandEXPLAINdata identifying the actual slow queries, not a vibe.- Indexes and query-shape fixes applied first — the 100x-for-an-afternoon wins taken before anything else.
- Connection pooling in place so connection count isn't masquerading as a data problem.
- Read replicas with a deliberate write/read-after-write routing policy, sized to your read/write ratio.
- Partitioning on the large append-heavy tables before any cross-machine split.
- Sharding reserved for a measured write-throughput or working-set ceiling, with a shard key that matches your query patterns and a rebalancing plan that exists before launch.
Every step you can skip is a step you didn't have to pay for. Most teams can stop at step three.
This is for you if
You're funded, you have real load, and the database is the bottleneck — and someone's already drawing the shard diagram. A database scaling review runs $50k+: we instrument the system, find what's actually slow, and work the cheap fixes before the expensive ones, so you don't take on a distributed-systems liability you didn't need. A full scaling re-architecture — replica routing, partitioning, and a sharding design built to survive growth — runs $100k+.
It's especially for you if you're heading into a growth phase where load is about to multiply, or if vertical scaling has stopped buying you headroom. It is not for you if you have a few thousand rows and a quiet CPU graph — at that stage sharding is a solution to a problem you don't have, and the right move is to keep the schema simple and ship.
// diagnose before you distribute
< transmit >