Will AI Improve Exponentially At Value Judgments?
Matt Prewitt
August 22, 2024
AI has a measurement problem. We’d like to know whether the machines are getting smarter, and if so, by how much. But not everything can be measured with a wooden yardstick. It is especially hard to find objective metrics that distinguish between persuasive bullshit and authentic excellence in matters such as morality and aesthetics. Some say the only arbiter of that is you – i.e., the question is subjective. Even so: when we decide whether we think AI is doing well with questions of value, much more is at stake than meets the eye.
For the better part of a decade now, AI scientists have observed foundation models seeming to get smarter exponentially as more compute and data is pumped into them. This improvement has occurred mainly in well-defined tasks, like predicting the next word in a text. But as the models began to excel at such definite tasks, it became apparent that they could also perform higher-level, harder-to-define tasks, like generating cogent texts, or pretty pictures, or answers to strategic questions. And these so-called “emergent” capabilities are what have now grabbed our attention, so that we are more interested in how quickly the machines are improving at these things, versus easier-to-measure, simpler tasks, like next-word prediction. After all, if these advanced capabilities turn out also to improve exponentially, then AGI is probably just around the corner.
Therefore gung-ho AI builders are now working on finding and improving benchmarks for squishy capabilities like: How good is AI at telling the truth? How good is its art? How good is it at performing moral judgments, or causal reasoning? Such things tend to involve something like value judgment, a capability regarded by many as an inner sanctum of the human. Are we on the precipice of machines surpassing our abilities at value judgment, the same way they already surpass us at cognitive tasks like chess and multiplication?
Human value judgments are, it seems to me, fallible. Unless you think that aesthetics, morality, etc. are foundationless questions of subjective preference, you must admit that:
- The things people believe are not necessarily true.
- The art people like is not necessarily good.
- The moral judgments people make are not necessarily sound.
- The causes that people attribute to facts and events are not always the true or proximate ones.
So just as with other tasks, AI doesn’t need to do value judgments perfectly to do them better than we do. But how can we even know that it’s doing better than us? We obviously can’t just ask random users how well they think it’s doing – that would beg the question, since they are fallible. And if we asked supposedly-wise people, or trained the models on the work of supposedly-great philosophers, we would only beg the question in a different way. Now, it’s admittedly tempting to wave this away as an annoying academic point, but it must remain central to the conversation. It means we cannot ever be confident that AI is better than us at adjudicating values – even if everyone says otherwise. Because in this context, the possibility of AI misleading many people is not some “angels on the head of a pin” thought experiment, but the kind of wild-but-plausible outcome that drives us to take an interest in AI’s ability to adjudicate value in the first place.
“Objective” Versus “Subjective” Questions And The Problem Of Measuring Values
If we are good scientific rationalists, we are accustomed to lumping all questions into two buckets. The first bucket contains “objective” questions, which we can answer by empirical, mechanical, data-based, human-independent tests. (How many marbles are in the jar? Does the machine correctly predict the next word in a text?) The second bucket contains “subjective” questions that mechanical tests cannot resolve. (Is Oppenheimer a great film? Who should be the next president of the United States?) We generally deal with the latter sort of questions either by attempting to represent them in testable metrics that subtly change the question – e.g., did most moviegoers report thinking Oppenheimer was great? Did people buy a lot of tickets? – or by submitting them to an authority whose answer we accept not necessarily because it is always correct, but because we accept its authority to decide. E.g.: The voters chose XYZ, so XYZ should be the next president.
It’s reasonable to draw an analogy between the way this second category of so-called subjective questions resists measurement, and the way quantum phenomena resist measurement. Heisenberg’s famous uncertainty principle says that the more precisely you measure a particle’s momentum, the less precisely you can know its position. This does not reflect a deficiency in the measuring equipment. It is a consequence of the fact that you, the physical observer, are not cleanly separable from the phenomenon you are trying to observe. So you just can’t measure everything you might like to.
No fancy equipment is needed to experience the phenomenon of being an entangled physical observer. Simply turn your head and look directly at the top of your right shoulder. (Go ahead, really do it.) Notice that at the moment that your eyes are seeing your right shoulder, they are not seeing your left shoulder. Obviously, you cannot see your left shoulder while looking at your right one, because your eyes are literally entangled with your body by various tendons and bones. And your body cannot contort itself enough to put both shoulders in front of your eyes. By contrast, you can look at both of someone else’s shoulders simultaneously, because your eyes are less entangled with that other person’s shoulders, enabling a different vantage point.
Tendons are not the only mode of interconnectedness that can render certain kinds of observation impossible. For example, your eyes are not linked by tendons to the Earth. Yet ordinarily, you cannot look simultaneously at London and New York any more than you can look simultaneously at both of your shoulders. This is because the Earth is huge relative to your body, and your body is smushed up against the Earth’s crust by gravity. Similarly, when it comes to quantum-scale phenomena, you cannot make certain simultaneous observations because the measured phenomena are wildly small relative to your body. This only seems weird if you insist (against all evidence) that the observer is wholly separable from the observed phenomenon.
Questions of value, whether we prefer to call them “subjective” or take a “realist” view, are just like that. We can’t get a grip on them as objective phenomena because we can’t separate ourselves or our observational equipment from them. They are vastly too big, drawing on too many facts and contingencies. They encompass our bodies and our equipment, rather than standing conveniently apart for measurement. So we’ll never be able to rigorously measure how “good” a machine is at doing art, or adjudicating morality, or knowing the truth.
But this does not imply that these questions are meaningless or foundationless. One way of thinking about it is that our measurements of value-laden questions always includes subtle information about the measurer, which shows up on our instruments as “noise” that partly eclipses the answer. Measuring values is thus a bit like trying to gaze directly at our own backsides. It cannot be done, but this does not mean there is no truth of the matter from some vantage point other than our own.
No Way Around The Measurement Problem
The more we convince ourselves that we’ve avoided this measurement problem, the more we’re fooling ourselves. Let’s look at some of the ways we might try to avoid it.
First, we might devise objective metrics approximating humans’ notions of beauty, goodness, and truth. But asking, for example, whether a thing “has XYZ characteristics often associated with beauty” is not the same as asking whether that thing is beautiful. Indeed, by devising such tests, we are necessarily adopting some sort of “subjective” stance on the nature of beauty, namely that it is the kind of thing capturable by whatever our test happens to be. Yet avoiding subjective stances is the whole reason for adopting an “objective” test. All efforts to measure or “objectify” questions of value are mired in this kind of infinite regress.
Second, we might submit questions of beauty, goodness, or truth to an authoritative person or institution. This turns out to be similar to the first option, because any authority functions like an objective metric. For example, saying “beauty is what the Politburo says is beautiful” is similar to saying “beauty is what my algorithm predicts people will say they think is beautiful”. Both are positivistic evasions of the real question: what is actually beautiful. The authoritative Politburo, even if composed of thoughtful humans, constitutes a kind of mechanization; and the algorithm, if it inspires deference, constitutes a kind of authority.
Being unable to dodge the bullet, I see two ways of biting it, both of which throw us back upon the difficulties of ordinary politics. One is to say, solipsistically, that the authority is simply me. In other words: “I won’t rely on any objective resolutions of questions of value, I’ll just judge everything from my own perspective”. But if everyone looks at things that way, the problems of politics flare up quite viciously. Another way of biting the bullet is to accept some external authority, whether a judge, an algorithm, or a democratic process, on independent grounds. This too confronts us with politics. But all this is for another essay.
The point for now is simply that efforts to measure how good AI is at value judgments pertaining to art and morality will never tell us anything worth taking seriously. Questions of value are too sensitively entangled with every other question in the universe, and indeed with us, to be studied by any finite tests or metrics. To try nonetheless is to try to lift ourselves off the ground by our bootstraps. The best possible outcome of such efforts is for them to fail obviously; the worst is for them to appear to succeed.
The Danger Of Fooling Ourselves
Efforts to measure and improve AI’s effectiveness in appearing to adjudicate or produce truth, beauty, and goodness may have an enormous effect on the technology’s trajectory, and on us. For it is obviously possible to build AIs that are good at predicting how humans will answer value-laden questions, e.g., about the quality of an artwork or the morality of an action. Improving such predictions will be the foreseeable result of careful efforts to benchmark AIs’ aesthetic and moral capabilities.
The danger is that the better those predictions get, the more likely we are to defer to them. After all, contemplating what is good, true, or beautiful requires energy and attention, which are scarce and expensive resources. So it is easy to imagine many people deferring ever more deeply to AIs in important matters of judgment. If that happens, our individual and cultural answers to subjective questions will drift toward convergence with the AI’s predictive answers.
Here is a key point: During that process of convergence, “objective” metrics of AI’s performance on important matters of human judgment might seem to show exponential improvements, even if the technology is not actually getting better. In short, a (quantum?) phase transition may occur in AI’s ability to predict human answers to moral, political, and aesthetic questions when the tail begins to wag the dog.
This scenario is logically equivalent to making AI a moral, political, and aesthetic authority. Consider the effect of chess engines on chess culture. Top players today are better than top players in the past in part because they are learning from chess engines. And their trust in the engines is well-placed, because the metrics of success in chess are finite and objective: players can be sure that the best chess engines are better than them, are authorities. But humans will never be justified in placing similar trust in machines’ value judgments, because such judgments never take place on a finite game board with fixed rules. If we allow machines’ value judgments to influence ours, we will never have a sure yardstick to tell us whether our judgment is thereby improving (like human chess masters studying the computers’ recommendations) or degrading (like citizens in thrall to a demagogue).
How To Keep Our Heads
If human circumspection of AI value judgments grows sufficiently deep and broad, the tail is less likely to wag the dog. “Objective” tests of AI’s capacity for judging truth, beauty, and goodness might show long-term stagnation at roughly-human levels for the simple reason that a critical mass of people will maintain meaningful independence between their thinking and AI’s suggestions. Some will interpret that stagnation as a failure of the technology; others will (correctly) interpret it as a victory of the culture.
Recall our initial question: Will AI improve exponentially at value judgments, as it does at other, “objective” cognitive tasks?
In truth, the framing of linear vs. exponential improvement obscures the probability that the future holds neither, but rather a kind of binary step function – a “quantum leap” – that will either occur, or not. Either many people will accept AI’s value judgments as more authoritative than their own, in which case a super-exponential explosion in AI’s apparent moral intelligence will transpire; or they will not, in which case measures of AI’s moral intelligence will plateau. This depends less on the technology itself than on how we are looking at it (very much like the double-slit experiment, where the interference pattern depends on how the path of the particle is observed).
If we want humans to remain the authority on questions of value, we’ll all have to remember in years to come that we are authorities on those questions, no matter what AI seems to be capable of. This will probably be a personal struggle for many people. Accordingly, it will not suffice merely to keep “humans in the loop” in important institutions and decisions if those humans have themselves outsourced crucial junctures in their thought processes to AI.
So how will we create a culture of humans (and groups of humans) continuing to think and judge for themselves? This is a question of social norms and attitudes. It recommends a reasonable suspicion of AI – even a collective “ick” reaction to overly-deferential ways of using it. In case this seems like weak medicine, it isn’t. Don’t forget that a cultural backlash against crypto in 2022/23 was the primary counterweight to various narratives of that technology’s inevitability; it affected the extent to which that technology attracted users and investors. We have collective agency over technology, but we surrender it unless we embrace all aspects of it – including our collective power to deem certain things distasteful or alarming. The best AI safety plans probably include getting it straight in our own sensibilities that deferring to AI in judgements of the good, true, and beautiful is, well, uncool.
This points towards a principle of responsible AI use and deployment. For example it is already clear that using AI to secretly imitate humans is dangerous and wrong. We might add that allowing it to influence our opinions about what is right, or beautiful, or other core value judgments that resist empirical verification, is similarly dangerous and wrong. In fact the second principle helps explain the first, since we routinely (and rightly, and necessarily) let other humans influence our judgments. The false belief that an AI is a human thus creates an opening for undue influence.
Facts and values bleed into each other, and AIs that assist with mundane tasks and questions will inevitably influence our deeper judgments and beliefs. Still, we’d do well to heed this distinction as far as possible, and keep AIs away from our own judgments of value. For example, suppose a person faces an important ethical decision, such as what to do for a friend in distress, whom to marry, or which career choice to make. To use AI to help understand and explore the contours of the decision seems reasonable. But to welcome its influence upon the non-empirical, non-verifiable aspects of the decision amounts to a harmful abdication of individual and/or cultural responsibility. Such influence threatens to damage the social fabric by eroding the basic understanding that we, as social agents, stand behind our values.