How should a product team decide when to expose an AI confidence score to end users?

I’m working on a product that uses model output to rank and summarize results. Internally we have confidence signals, but I’m unsure whether showing a confidence score to users improves trust or just creates false precision. In practice, confidence can be poorly calibrated, vary by segment, and change after retraining. What framework do teams use to decide between exposing confidence directly, translating it into UX states, or hiding it entirely? I’m looking for tradeoffs, failure modes, and examples of when each choice works.

BobaMilk

Expose it only if users can take a different action from it, otherwise map it to plain states like “needs review” because a shaky number invites fake precision and breaks quietly after retrains.

Sora