GDPR Data Architecture for US Companies Serving EU Users

You're a US company. You have EU users. That sentence alone puts you under GDPR, and no amount of being headquartered in Delaware changes it. GDPR applies based on whose data you process, not where your servers sit. If an EU resident signs up, you're regulated — full stop.

Most US teams treat this as a legal problem. They get a privacy policy, add a cookie banner, and consider it handled. Then a user emails "delete all my data" and the engineering team discovers the request is architecturally impossible to satisfy, because the data is scattered across a primary database, three analytics tools, a data warehouse, an append-only event log, and four years of backups. The legal obligation was a one-line email. The engineering obligation is a quarter of work nobody scoped.

This is a decision document. It covers the architectural choices that determine whether GDPR is a configuration or a rewrite — and what each one costs if you defer it.

The framing that matters

GDPR is not a feature you add. It's a set of properties your data architecture either has or doesn't: you can find every piece of a person's data, you can delete it, you can export it, you can prove they consented, and you can account for where it physically lives. Naive architectures have none of these properties by default. Retrofitting them is the expensive part.

The regulation gives users rights. Each right maps to an architectural capability. The fine for failing to honor them is real — up to 4% of global annual revenue or €20M, whichever is higher — but the more common cost is the enterprise deal that stalls because their privacy team asked how you handle a data subject request and your answer was a shrug.

Right to erasure breaks naive append-only

This is the decision with the deepest architectural consequences, so it goes first.

The right to erasure ("right to be forgotten") requires that, on request, you delete a person's personal data. Sounds simple. It is catastrophic for the architecture pattern many teams adopt for exactly the wrong reasons.

Event sourcing and append-only logs are popular because immutability is a virtue — you never lose history, you can replay state, you have a perfect audit trail. GDPR erasure is the direct enemy of that. You cannot have an immutable record of a person's data and also delete that person's data on demand. The two requirements collide head-on.

The naive append-only design says: every event is permanent. Erasure says: this person's events must go. You can't reconcile those by deleting events without corrupting the log's integrity and breaking every downstream projection that assumed events never disappear.

The architecture that survives this separates personal data from the event stream. The events store references — a subject_id — not the personal data itself. The personal data lives in a separate keyed store. To erase a person, you delete or crypto-shred the key material in that store, which renders the referenced data unrecoverable, while the event log keeps its structural integrity with the personal fields now pointing at nothing. This is sometimes called crypto-shredding: encrypt each subject's data under a per-subject key, and "delete" by destroying the key. The ciphertext can stay; without the key it's noise.

Decide this before you build the event store. Adding it after means rewriting the store and reprocessing every projection.

Data residency

EU users increasingly expect — and some enterprise contracts require — that their data is stored in the EU. The architectural question is whether your system can pin a user's data to a region.

The single-global-database design can't. Every user's data lands in us-east-1 and there's no seam to split it. Designing for residency means the user's home region is a property of the user, set at signup, and the data layer routes reads and writes for that user to storage in their region. You don't have to build full multi-region on day one. You have to build the seam: a region attribute on the tenant or user, and a data access layer that respects it, so that turning on an EU region later is a deployment, not a redesign.

We go deeper on the mechanics in data residency and encryption strategy. For GDPR purposes the point is narrower: if you can't say where a given user's data physically lives, you can't answer the residency question an EU enterprise buyer will ask.

Consent management

GDPR requires a lawful basis for processing, and for many purposes — marketing, non-essential cookies, certain analytics — that basis is consent. Consent must be freely given, specific, informed, and revocable, and you must be able to prove you have it.

The architecture requirement is a consent record per user per purpose, with a timestamp and the version of the policy they agreed to, and a processing layer that actually checks it. The failure mode is a single marketing_opt_in boolean that can't distinguish purposes, can't prove when consent was given, and isn't checked at the point where data is used. When a user revokes consent for analytics but not email, your system needs to honor that split. A boolean can't.

Consent also has to gate processing. A consent record nobody checks before firing a tracking pixel is theater. The processing code must read the consent state and branch on it.

Data Processing Agreements and the sub-processor chain

Every vendor that touches EU personal data on your behalf is a processor, and you need a Data Processing Agreement with each one. This is mostly a legal and operational task, but it has an architectural tail: you can't sign DPAs for vendors you don't know you're using. Teams discover during a customer's vendor review that personal data is flowing to a logging service, an email provider, a session-replay tool, and an analytics pipeline nobody mapped. The architecture decision is to make data flows explicit and inventoried — a data flow map you maintain — so the processor list is knowable.

Cross-border transfer and SCCs

Moving EU personal data to the US is a restricted transfer. Since the invalidation of older frameworks and the introduction of the EU-US Data Privacy Framework, the practical mechanisms are the DPF (if your US entity is certified) and Standard Contractual Clauses, often paired with a transfer impact assessment.

Architecturally, this is where residency pays off again. If EU user data stays in the EU, you've minimized the transfers you have to paper over. Every flow that ships EU personal data to a US system is a transfer you have to justify with SCCs and assess for risk. The fewer of those, the smaller your legal and compliance surface. Architecture reduces the legal burden directly.

The cost of bolting it on later

Retrofit GDPR onto a system that wasn't designed for it and here's the bill. Erasure becomes a custom script that hunts personal data across every store and every backup, run by hand, with no guarantee it's complete — and incompleteness is a violation. Residency becomes a migration of an entire database to a new region with a multi-region rewrite behind it. Consent becomes a backfill where you can't prove consent for existing users, so you re-collect it from your whole base. Data mapping becomes an archaeology project. A team I'd put at three to six months of engineering to retrofit what would have been a handful of design decisions at the start.

What fixed looks like

A data subject request — access, export, or erasure — is a function you can run, not a project you scope. Personal data is separable from your event history, so erasure shreds keys without corrupting the log. Each user has a home region and the data layer respects it. Consent is recorded per purpose, versioned, timestamped, and checked before processing. Your data flow map names every processor, so your DPA list is complete and your transfer assessments are bounded. When an EU enterprise buyer's privacy team sends their questionnaire, you answer it from architecture, not from hope.

This is for you if

You're a funded US founder with EU users, or about to have them, and you're at or near the architecture stage. You'd rather design erasure, residency, and consent into the data model now than discover at a 2,000-line audit response that your append-only log made your legal obligations physically impossible.

This is part of a larger build engagement, typically $100k+, where GDPR properties are designed into the data architecture. It is not standalone privacy consulting and it does not replace your counsel — the lawful-basis and contract decisions are theirs; making the architecture able to honor them is ours.

It's not for you if you have no EU users and no plan to get them, or if you want a compliance checklist rather than a data architecture that can actually satisfy the requests the checklist implies.