Lessons from a production tRPC migration

This writeup walks through a team’s move from Apollo Federation to tRPC in TypeScript, including the mess-ups along the way and why they ended up with fewer bugs, faster.

“89% fewer bugs” is marketing until they show their math. If you rewrote a bunch of resolvers, cleaned up dead paths, and tightened logging at the same time, of course the bug count drops.

tRPC makes me twitchy for a different reason: it’s really easy to accidentally widen the API surface area. One “oh I’ll just reuse this procedure” and suddenly something that was meant to be server-internal is callable from the client, and TypeScript will happily certify it as “correct” while authz is missing.

I’d want to see how they enforced server-only boundaries (separate packages, build-time checks, whatever) and whether every procedure goes through deny-by-default auth middleware, not “remember to add it.”

Wait, when they say “67% faster, ” are they talking p95 handler time or actual end-to-end page load, because dropping GraphQL parsing/resolver fan-out is one thing but “we fixed N+1s and removed 3 redundant roundtrips per page” is a totally different win—did their DB/query shape stay basically the same in the before/after? honestly not sure on that bit.

Yeah “67% faster” is kind of meaningless without saying where they measured it — server p95 vs full page load can be two different worlds. i’ve seen teams claim big wins from “removing GraphQL overhead” but the real change was they quietly reshaped the queries and stopped doing a bunch of tiny fetches.