← Insights
build

Six Schema Mistakes That Cost You a Rewrite at Series A

The schema you sketched in week two is load-bearing by Series A. Six database design mistakes that look harmless at launch and force a data-layer rewrite under growth

The schema you drew on a whiteboard in week two is still running your business at Series A. Nobody revisited it because it worked. Then a migration that ran in 80 milliseconds on your dev tenant takes 40 minutes and locks a table in production, an enterprise prospect asks for an audit trail you can't produce, and you realize the foundation you never looked at is the thing now holding everything up.

A schema mistake doesn't announce itself. It compounds silently until the fix is a data migration on live customer data — the most expensive, highest-risk work in software. Get six of these wrong and you're not refactoring; you're rebuilding the data layer with the lights on. Let's go through them.

1 · Premature denormalization

The mistake: copying fields across tables "for performance" before you have a measured performance problem. The customer's name stored on the order, the product price stored on the line item, the account tier stored on every user row.

The failure mechanism: denormalized data drifts. The customer renames their company; now you have three spellings across orders depending on when they were placed. Worse, you've made every read fast and every write a fan-out — updating one logical fact means touching N rows, and miss one and the data is silently inconsistent. You discover the inconsistency when a customer disputes an invoice and your numbers don't agree with themselves.

The production-grade alternative: normalize first. Reference by foreign key, join at read time, and let Postgres do what it's good at. Denormalize deliberately and later, only against a measured hot path, and only with a clear ownership rule for keeping the copy in sync — a trigger, a materialized view, or an explicit recompute job. There's one honest exception: snapshotting a value that must be historically accurate (the price at time of sale). That's not denormalization, that's recording a fact. Name the difference.

2 · No soft-delete strategy

The mistake: DELETE FROM users WHERE id = ?. The row is gone. The decision to physically delete was made implicitly, by using the default.

The failure mechanism: the row had foreign-key children — orders, comments, audit events — that now either cascade into oblivion or orphan. A customer asks "what happened to this account?" and the answer is gone. A regulator asks for records on a deleted entity and you have nothing. An admin fat-fingers a delete and there's no undo. Then GDPR shows up and asks you to actually delete specific data, and now you can't tell hard-delete-for-compliance apart from delete-by-accident because you only have one mechanism for both.

The production-grade alternative: decide your deletion model on day one. Most tables want soft delete: a deleted_at timestamptz column, a partial index WHERE deleted_at IS NULL, and queries that filter it by default (a view or RLS policy, not hope). Reserve hard deletes for an explicit, audited compliance path. The two are different operations with different authorization — model them that way.

3 · Enums as bare strings vs. lookup tables

The mistake: status text holding 'active', 'Active', 'ACTIVE', 'activ', and 'pending_review' — free text standing in for a constrained set.

The failure mechanism: with no constraint, every typo in every code path becomes a permanent value in your data. Six months in, a SELECT DISTINCT status returns eleven values for what should be four states. Reporting breaks. A WHERE status = 'active' silently misses the 'Active' rows. You can't add behavior per state because you don't actually know the states.

The production-grade alternative: for a small, stable set with no metadata, use a Postgres enum type or a CHECK constraint — the database rejects garbage at write time. For a set that has attributes (a display label, a sort order, an is-terminal flag) or that non-engineers need to manage, use a lookup table with a foreign key. Avoid the trap in the middle: native Postgres enums are genuinely painful to reorder or remove values from in a migration, so if the set will churn, the lookup table wins. Pick based on how often the set changes.

4 · UUID vs. serial, chosen by default

The mistake: reaching for serial/bigserial because it's the tutorial default — or reaching for random uuid v4 because it "feels distributed" — without thinking about either.

The failure mechanism: sequential integer IDs leak business information (a competitor sees invoice/1042 and knows your volume) and become a nightmare to merge across shards or systems. Random UUID v4 primary keys, on the other hand, destroy index locality: because they're random, every insert lands in a random spot in the B-tree, fragmenting the index and tanking write throughput as the table grows — exactly the problem you don't see until the table is large.

The production-grade alternative: use a time-ordered identifier — UUID v7 (or ULID) — as the primary key. You get the non-enumerable, merge-friendly, globally-unique properties of a UUID and the index locality of a sequential key, because the value sorts by creation time. Keep a separate internal bigint only if you have a measured need for it. The "just use serial" and "just use uuid4" instincts are both wrong defaults in 2026.

5 · The missing tenant boundary

The mistake: building multi-tenant on a shared schema where isolation lives in application code — every query remembers to add WHERE tenant_id = ? — and the database has no opinion about it.

The failure mechanism: this is a cross-tenant data leak waiting for one missed clause. A raw query in a reporting job, an ORM edge case, a bulk operation, an admin endpoint written fast under deadline — miss the filter once and Customer A sees Customer B's data. That's not a bug ticket; it's an existential incident with contractual and legal consequences. And there's no structural guard. You're trusting that every query, forever, written by every engineer, remembered the clause.

The production-grade alternative: enforce the tenant boundary in the database. Postgres row-level security policies keyed off a session variable (SET LOCAL app.tenant_id) set at the transaction boundary, so the database — not the application — refuses to return another tenant's rows even when the app forgets. The tenant_id column belongs on every tenant-scoped table from the first migration; retrofitting it across a populated schema is the rewrite this whole article is about. We've built isolation-first data layers where this was non-negotiable. How we approach multi-tenancy →

6 · JSON columns as a crutch

The mistake: metadata jsonb as a junk drawer. Anything you're not sure how to model goes in the blob. Six months later half your business logic reads from inside JSON.

The failure mechanism: JSON columns have no schema, so they have no constraints, no foreign keys, and no guarantees. Two records spell the same key differently. A field that's a string here is a number there. Querying metadata->>'plan_tier' can't use an ordinary index well, can't enforce that the value is valid, and can't join cleanly. You've turned your relational database into a slow, untyped document store — and you find out when a report needs to aggregate on a field buried in the blob and every row is shaped slightly differently.

The production-grade alternative: jsonb is the right tool for genuinely unstructured, caller-defined data — third-party webhook payloads you store verbatim, sparse user-defined custom fields, event bodies. It is the wrong tool for anything your code branches on. If you WHERE, JOIN, ORDER BY, or enforce a rule on a field, promote it to a real column with a real type and a real constraint. The test: does my application read this value to make a decision? If yes, it's a column.

What fixed looks look like

A schema designed so the week-two decisions survive the Series-A load:

  • Normalized by default; denormalize only against measured hot paths with explicit sync ownership.
  • A deliberate deletion model — soft delete with deleted_at and default-filtered reads; hard delete only on an audited compliance path.
  • Constrained states via enum/check or lookup table, never free text.
  • Time-ordered UUIDs (v7/ULID) for keys — non-enumerable and index-friendly.
  • A tenant_id on every tenant-scoped table and database-enforced row-level security from migration one.
  • jsonb reserved for truly unstructured data; anything you branch on is a typed column.

None of this is exotic. It's the difference between a schema that's load-bearing and a schema that's a liability — decided before there are four million rows on top of it.

This is for you if

You're funded, you have real customers writing real data, and the schema underneath them was sketched before any of that existed. This is the $25k+ data-layer review: auditing the foundation and migrating off the mistakes before they become a live-data rewrite — which is the $100k+ version of the same work, done in a panic.

It's especially for you if you're approaching a Series A diligence process, where investors will look at exactly these decisions, or if migrations have started getting scary. It is not for you if you're pre-product-market-fit with a hundred rows and no paying customers — at that stage, ship, keep the schema simple, and come back when the data has weight. Hardening a foundation nobody is standing on yet is premature.

// audit the foundation before it's load-bearing

< transmit >