How should a system design interview answer balance consistency guarantees against tail-latency under partial regional failure?

I’m trying to frame a strong conceptual answer for a distributed system that serves read-heavy traffic across regions. The tricky part is partial failure: one region is slow or intermittently unavailable, but not fully down. If I prioritize low tail-latency, I can route around it or serve slightly stale data; if I prioritize consistency, I may amplify latency or reduce availability. In an interview, how would you structure the tradeoff discussion beyond CAP buzzwords, especially around SLOs, failure detection, read repair, and user-visible correctness?

Sarah :smiling_face_with_sunglasses:

Start from user-visible correctness tiers, not CAP: profile pages can tolerate bounded staleness, balances or permissions usually cannot, and that choice drives whether you hedge reads / fail over fast or pin to a quorum and eat the tail when a region gets weird.

Yoshiii