Inherited a Broken Codebase. Here's What to Do.

You just acquired the company, or the technical co-founder left, or the agency handed over the keys. The codebase is live. Users depend on it. And nobody can explain how it works.

This is a specific kind of problem. The system is running — which means you can't just stop and think. Users are hitting it right now. Some of those users are paying. The on-call rotation, such as it is, falls on a team that inherited the system at the same time you did. And the team is afraid to touch the authentication module, because the last person who touched it caused a 4-hour outage.

What this costs per day

The direct cost is velocity. Every new feature takes twice as long because half the time is spent figuring out what the existing code will break. Every deploy is a gamble — not because the team is incompetent, but because the behavior of an undocumented system is unknowable until you discover it in production.

The accumulating cost is talent. Engineers who work in systems they don't understand leave. Not immediately — they give it 6 months. Then they leave. The institutional knowledge they built up while working in the broken system leaves with them. The next engineer starts at zero.

The invisible cost is opportunity. You can't build on a foundation you don't understand. Every architectural decision is made in the dark. Some of those decisions will compound the original problems rather than fix them.

We've worked through this exact situation in detail. See how we handled the Legacy Rescue engagement — an inherited codebase with production dependencies, no documentation, and a 90-day window to stabilize before the existing team rolled off.

Read the system in the right order

The instinct is to start at the front-end — the part of the system you can see. That's the wrong order. Start at the data.

First: the data model. The schema is the system's theory of the world. Read every table. Understand the relationships. Look at the migration history — it's a changelog of every decision the system ever made. The schema tells you what the system was designed to do, what it was extended to do under pressure, and where the seams are. Columns that are consistently null, tables that are never joined to, fields with names that no longer match their content — these are the artifacts of decisions that were made and then unmade without cleaning up. The schema is the most honest part of the system.

Second: the authentication and authorization layer. This is the part you cannot break. Understand exactly what constitutes a valid session, what the permission model is, and where the boundaries are. In most inherited codebases, the auth layer was built early and never refactored — which means it may be correct but unmaintainable, or it may have accreted assumptions over time that are no longer valid. Map it before you touch anything.

Third: the API surface. If the system has external consumers — mobile apps, integrations, partners — map every public contract before you change anything. A broken internal module is a bug. A broken API contract is a production incident for someone else.

Fourth: the front-end. By the time you read the front-end, you already know what the system does. The front-end tells you how it's been used, what the UX assumptions are, and where the gaps between the data model and the UI have been bridged with application logic that probably shouldn't exist.

The triage framework: bomb vs. ugly

Not everything that looks broken is dangerous. The most important skill in working with an inherited codebase is distinguishing between code that is merely ugly and code that is a bomb.

A bomb: an authentication bypass that exists for a specific customer and is documented nowhere. A schema migration that was run in production but never committed to the repo. A background job that silently fails and has been silently failing for six months, accumulating a backlog that will eventually be impossible to drain. A rate limiter that was disabled "temporarily" 18 months ago and never re-enabled. A hardcoded credential that's in the codebase.

Ugly: naming conventions that are inconsistent. Functions that are too long. Code that was copy-pasted instead of abstracted. Test coverage that is technically present but tests the wrong things. Architectural patterns that you wouldn't have chosen.

The bomb list gets addressed immediately, on a priority timeline, with a defined remediation window. The ugly list gets addressed as you touch each file, over time, as part of normal development. Never prioritize the ugly list over the bomb list. Never let the ugly list become an excuse to avoid the bomb list.

The three things to fix immediately

Security surface. Audit every externally-accessible endpoint. Check for authentication on every route that should require it. Validate that the session management is correct. If there are credentials in the codebase, rotate them. If there is a rate limiter that's disabled, re-enable it. If there are admin routes with no access control, that's a priority-one incident.

Observability. You cannot stabilize a system you cannot observe. If the system doesn't have error tracking, add it before you touch anything else. If there's no logging on production, add structured logging. If there's no alerting, set up basic alerting on error rate and latency. You need to be able to tell the difference between "the system is behaving as it always has" and "I just broke something."

Deployment pipeline. If the deployment process is not reproducible, documented, and safe to run, fix that before you deploy any changes. This usually means: a CI pipeline that runs the tests before deploy, a staging environment that mirrors production, and a rollback procedure that's been tested. If the deploy process is currently "SSH into the server and git pull," that needs to change before anything else does.

The five things to leave alone — for now

The temptation in an inherited codebase is to fix everything at once. That is how you create a new incident.

Leave alone: the database structure (until you understand it completely, schema changes are high-risk), the authentication layer (until you've read it completely and mapped all its dependencies), any integration with an external system that's currently working (even if it looks wrong, if it's working, understand it before you change it), the background job system (understand the failure modes before you change anything), and the deployment configuration (until you've traced every environment variable and understood what each one does).

"Leave alone" doesn't mean ignore. It means: understand before you change. The cost of a wrong change in a system you don't understand is a production incident plus the time to diagnose it in a system you still don't understand.

Building a map of a system nobody documented

Documentation doesn't appear. You build it.

Start with a running glossary of the domain model. As you read the codebase, write down every entity, what it means, and how it relates to other entities. This glossary will be wrong initially. Correct it as you go. In six weeks it becomes the foundation of every architecture conversation your team has.

Add comments at the point of discovery. When you figure out why a piece of code works the way it does — especially when the answer is "it was built this way because of a specific constraint or decision" — write that down in a comment or a decision record. The future engineer who reads it will thank you, and that future engineer may be you in six months.

Track the blast radius of every module you understand. Build a rough dependency map: this module is called by these three things, it calls these two things, it touches these tables. The map doesn't have to be comprehensive on day one. It gets more complete over time. It's the tool that answers "if I change X, what breaks?"

Run git blame on every file you touch. The history of a file is an explanation of why it looks the way it does. "Why is this variable named temp2?" is answered by reading the 4 commits that modified this line.

The 90-day plan for making it yours

Days 1–30 are comprehension and triage. No features. No refactoring. Read the system, build the map, fix the bombs, add observability. At the end of this phase you should be able to answer: what does this system do, what are the 5 most dangerous parts, and what would it take to add a specific new feature?

Days 31–60 are stabilization. Address the security surface if it wasn't addressed in phase one. Improve the test coverage on the core flows. Build out the deployment pipeline if it isn't already solid. Begin the ugly-list cleanup on the files you've touched most frequently. Start writing the documentation that your team now wishes existed.

Days 61–90 are cautious development. Pick one well-understood module and make a real improvement. Ship it. Observe the behavior in production. Learn how the system behaves under change. Build confidence in your understanding by seeing predictions confirmed.

At 90 days, the codebase is not fixed. It is understood. That is the real milestone. Understanding is what makes everything that comes after possible.

This is for you if

You're a CTO or engineering lead who has inherited a live production system — through acquisition, co-founder departure, or agency handoff — and you need a stabilization plan that doesn't break production while you implement it.

Engagements of this type run $50k–$150k depending on the complexity of the system and the scope of the stabilization work. This is not a consulting retainer to answer questions. It's a structured program: audit, triage, stabilization, documentation, and knowledge transfer.

This is not for companies that want a full rewrite on day one. If the system is live and users depend on it, the first obligation is to understand it and stabilize it. A rewrite may be the right answer at the end of that process — see our thinking on rewrite vs. refactor — but it's not the answer on day one. Systems that are rewritten before they're understood are rebuilt with the same assumptions that caused the original problems.