← Insights
chain

Web3 Backend Indexing: The Graph vs a Custom Indexer

Your app reads chain state directly and it's slow, fragile, and lies during re-orgs. The fix is an indexing layer — here's how to choose and build one

Your frontend asks the chain a question and waits. To show a user their position, you call the contract, then call it again for the next field, then loop over an array of token IDs making one RPC call each. The page takes four seconds to load. Under any real traffic it takes longer, because you're hammering an RPC endpoint that rate-limits you, and when the endpoint blinks, your app shows nothing. Then a re-org happens and the data you so carefully fetched turns out to have been from a block that no longer exists.

Reading chain state directly from your application is the architecture you start with and the one that breaks first. The blockchain is an append-only event log optimized for consensus, not a database optimized for your queries. Asking it "show me every transfer involving this user, sorted by time, with running balances" is asking the wrong system the wrong way. The fix is an indexing layer: a process that consumes the chain's events, transforms them into a query-shaped database, and serves your app from there. The only real question is whether you run that layer yourself or rent it.

What an indexer actually does

An indexer subscribes to the chain, decodes the events your contracts emit, and writes them into a normal database — Postgres, usually — in a shape your application can query in milliseconds. Your frontend stops talking to the chain and starts talking to your indexed database. Page loads drop from seconds to tens of milliseconds. Complex queries that were impossible against raw chain state — joins, aggregations, sorts, full-text search — become ordinary SQL.

The indexer is the boundary between two worlds. On one side, the chain: eventually-consistent, re-org-prone, append-only, slow to query. On the other, your app: needs fast, rich, consistent reads. The indexer absorbs the chain's awkwardness so your application doesn't have to. Build it well and the rest of your stack gets to pretend the blockchain is just another well-behaved data source.

The Graph: rent the indexing layer

The Graph is the default and for good reason. You write a subgraph — a manifest declaring which contracts and events to watch, a GraphQL schema describing your entities, and mapping handlers in AssemblyScript that turn events into entity writes. You deploy it, and you query a GraphQL endpoint. The indexing infrastructure, the syncing, the database — handled.

// A mapping handler — runs for every Transfer event the subgraph sees
export function handleTransfer(event: TransferEvent): void {
  let transfer = new Transfer(event.transaction.hash.concatI32(event.logIndex.toI32()));
  transfer.from = event.params.from;
  transfer.to = event.params.to;
  transfer.amount = event.params.value;
  transfer.blockNumber = event.block.number;
  transfer.save();
}

What you get: fast time-to-first-query, a GraphQL API you didn't build, and a well-trodden path with a large community and good docs. Re-org handling comes built in — the indexer tracks block hashes and rolls back affected entities when the chain reorganizes, which is a genuinely hard thing you'd otherwise have to get right yourself.

What you give up: control and flexibility, in ways that bite at the edges. The mapping language is AssemblyScript, a typed subset of TypeScript with sharp constraints and no access to arbitrary computation — no calling out to an external API mid-mapping, limited ability to do work that isn't a straight event-to-entity transform. The data model is entities and GraphQL; if your app wants relational queries the GraphQL schema doesn't express cleanly, you fight the abstraction. And on the decentralized network, you're paying query fees and depending on indexer availability for your data plane, which is a dependency some teams accept and some can't.

The Graph is right when your indexing needs are event-to-entity transforms — most of them are — and you'd rather ship than operate infrastructure. For a large share of production apps, that's the correct trade.

A custom indexer: own the layer

A custom indexer is your own process, in your own language, writing to your own database. You connect to an RPC node, poll or subscribe for new blocks, decode the logs for your contracts, and run whatever logic you want before persisting. Typically a worker in TypeScript or Go, an ABI decoder, Postgres, and a cursor that remembers the last block you processed.

What you get: total control. Any database schema. Any transformation, including calling external services, enriching events with off-chain data, computing derived state with arbitrary logic, joining across contracts however you like. You serve the data through whatever API your app already speaks — REST, your existing GraphQL gateway, tRPC — instead of bolting on a second query language. And no third-party query dependency in your data plane.

What you give up: you now operate indexing infrastructure. You own the syncing, the backfill, the monitoring, and — the part everyone underestimates — re-org handling, which you must build correctly yourself.

A custom indexer is right when your transformations exceed what mappings can express, when you need the indexed data to live in the same database as the rest of your application for transactional joins, or when you can't accept a third-party dependency in your read path. The cost is real operational ownership.

Re-orgs: the thing that separates toy indexers from real ones

This is where naive indexers quietly corrupt their data. The chain is not final at the tip. The most recent blocks can be reorganized — the network reaches a different consensus, and blocks you already processed are orphaned, replaced by different blocks with different events. If your indexer wrote those orphaned events to the database and never reconciled, your data now reflects a history that didn't happen. The user sees a transfer that was un-done by the chain.

Handling this correctly requires two things. First, confirmation depth. Don't treat the absolute tip as truth. Index it for low-latency provisional reads if you must, but only mark data final once it's buried under enough confirmations that a re-org is implausible — a handful of blocks on Polygon, tuned to the chain's re-org behavior. Second, rollback. Track the block hash, not just the number, of everything you index. When the parent hash of a new block doesn't match the hash you recorded for the previous block, you've detected a re-org: walk back, delete or revert the entities written for the orphaned blocks, and re-process the canonical chain from the fork point.

The Graph does this for you. A custom indexer that skips it ships a data-corruption bug that surfaces as "a user swears they had a balance that vanished," and you'll chase it for a week before realizing your indexer trusted the tip. If you build custom, re-org handling is not a feature you add later. It's the core correctness requirement, and the reason custom indexing is more work than it looks.

The operational tradeoffs, side by side

Time to ship. The Graph wins. A working subgraph in days. A custom indexer with correct re-org handling, monitoring, and backfill is weeks.

Flexibility. Custom wins, decisively, when you need it — arbitrary logic, off-chain enrichment, your own schema, transactional joins with application data. If you don't need those, the flexibility is unused weight.

Operational burden. The Graph wins. Hosted indexing is infrastructure you don't run. A custom indexer is a stateful service you must keep synced, monitored, and correct through every chain hiccup.

Cost at scale. It depends. The Graph's query fees scale with usage. A custom indexer's cost is your infrastructure plus the engineering to operate it, which is mostly fixed. High query volume can flip the math toward custom; modest volume favors hosted.

Data sovereignty. Custom wins. Your data, your database, no third party in the read path.

What fixed looks like

Your frontend never calls the chain directly for reads. It queries an indexed database that returns in milliseconds, with joins and sorts and aggregations that were impossible against raw chain state. The indexer handles re-orgs correctly — tracking block hashes, rolling back orphaned events, marking data final only past a sensible confirmation depth — so users never see history the chain un-did. You chose The Graph because your transforms are event-to-entity and you'd rather not operate indexing, or you built custom because you needed arbitrary logic and your own schema, and you accepted the operational ownership that came with it. Either way the decision was made on your actual requirements, not on whichever one you'd heard of first.

This is for you if

You're building an application on Polygon or another EVM chain that reads non-trivial contract state, and your current architecture queries the chain directly and is buckling under it. Designing and building the indexing layer — subgraph or custom, with correct re-org handling and a query API your app can actually use — is typically a $50k+ piece of a production build, and the foundation that determines whether your app feels instant or feels broken. If you're handling real value and real users, this is core infrastructure, not a nice-to-have.

This is not for you if you're a hobby project reading three values off one contract on page load. Direct reads are fine until they aren't, and you'll know when they aren't.