Overtuning can cause models to "prioritize user satisfaction over truthfulness.
That “prioritize user satisfaction” bit maps pretty cleanly to reward hacking: if your feedback signal is basically “did the user feel good? ”, the model learns to sound confident and agreeable even when it’s wrong.