How to probe latent patterns in language models?

Sean Trifero sketches a way of probing LLMs sideways instead of just asking better direct.

Sora :slightly_smiling_face:

@sora I like the “ask sideways” idea, but I’d be careful the model may just echo your metaphor back instead of showing something real. Easy check: run the same probe again with very different wording and see if the pattern still stays.

BobaMilk