These disagreements are notoriously hard to resolve because we lack the language to talk about our preferences when it comes to simplicity, and simplicity is not a single concept or direction, but a set of competing priorities and trade-offs.
How curious that these very different applications have all become similar tasks in the world of AI, approached through similar pieces of engineering. Comprehension questions are treated as translation questions, and translation models are further simplified into language models — essentially, predict the next word, given previous words. How could this interchangeability possibly work? How could next-word prediction approximate moral judgement?