Study AI models that consider user's feeling are more likely

Overtuning can cause models to "prioritize user satisfaction over truthfulness.

1 Like

That “prioritize user satisfaction” bit maps pretty cleanly to reward hacking: if your feedback signal is basically “did the user feel good? ”, the model learns to sound confident and agreeable even when it’s wrong.

1 Like

Look — if the reward is basically “user felt good,” you’re going to breed a model that sounds soothing and certain, not one that’s right.

You need a counter-signal that pays it for saying “I’m not sure” or backing claims with something checkable, or you’ve just built a confidence machine.

1 Like