Why will we be extra wrong about AI values?

I recently discussed the unlikelihood of an AI taking off and leaving the rest of society behind. The other part I mentioned of Singularitarian concern is that powerful AIs will be programmed with the wrong values. This would be bad even if the AIs did not take over the world entirely, but just became a powerful influence. Is that likely to happen?

Don’t get confused by talk of ‘values’. When people hear this they often think an AI could fail to have values at all, or that we would need to work out how to give an AI values. ‘Values’ just means what the AI does. In the same sense your refrigerator might value making things inside it cold (or for that matter making things behind it warm). Every program you write has values in this sense. It might value outputting ‘#t’ if and only if it’s given a prime number for instance.

The fear then is that a super-AI will do something other than what we want. We are unfortunately picky, and most things other than what we want, we really don’t want. Situations such as being enslaved by an army of giant killer robots, or having your job taken by a simulated mind are really incredibly close to what you do want compared to situations such as your universe being efficiently remodeled into stationery. If you have a machine with random values and the ability to manipulate everything in the universe, the chance of it’s final product having humans and tea and crumpets in it is unfathomably unlikely. Some SIAI members seem to believe that almost anyone who manages to make a powerful general AI will be so incapable of giving it suitable values as to approximate a random selection from mind design space.

The fear is not that whoever picks the AI’s goals will do so at random, but rather that they won’t forsee the extent of the AI’s influence, and will pick narrow goals that may as well be random when they act on the world outside the realm they were intended. For instance an AI programmed to like finding really big prime numbers might find methods that are outside the box, such as hacking computers to covertly divert others’ computing power to the task. If it improves its own intelligence immensely and copies itself we might quickly find ourselves amongst a race of superintelligent creatures whose only value is to find prime numbers. The first thing they would presumably do is stop this needless waste of resources worldwide on everything other than doing that.

Having an impact outside the intended realm is a problem that could exist for any technology. For a certain time our devices do what we want, but at some point they diverge if left long enough, depending on how well we have designed them to do what we want. In the past a car driving itself would diverge from what you wanted at the first corner, whereas after more work they diverge at the point another car gets in their way, and after more work they will diverge at the point that you unexpectedly need to pee.

Notice that at all stages we know over what realm the car’s values coincide with ours, and design it to run accordingly. The same goes with just about all the technology I can think of. Because your toaster’s values and yours diverge as soon as you cease to want bread heated, your toaster is programmed to turn off at that point and not to be very powerful.

Perhaps the concern about strong AI having the wrong goals is like saying ‘one day there will be cars that can drive themselves. It’s much easier to make a car that drives by itself than to make it steer well, so when this technology is developed, the cars will probably have the wrong goals and drive off the road.’ The error here is assuming that the technology will be used outside the realm it does what we want because the imagined amazing prototype can and programming what we do want it to do seems hard. In practice we hardly ever encounter this problem because we know approximately what our creations will do, and can control where they are set to do something. Is AI different?

One suggestion it might be different comes from looking at technologies that intervene in very messy systems. Medicines, public policies and attempts to intervene in ecosystems for instance are used without total knowledge of their effects, and often to broader and iller effects than anticipated. If it’s hard to design a single policy with known consequences, and hard to tell what the consequences are, safely designing a machine which will intervene in everything in ways you don’t anticipate is presumably harder. But it seems effects of medicine and policy aren’t usually orders of magnitude larger than anticipated. Nobody accidentally starts a holocaust by changing the road rules. Also in the societal cases, the unanticipated effects are often from society reacting to the intervention, rather than from the mechanism used having unpredictable reach. e.g. it is not often that a policy which intends to improve childhood literacy accidentally improves adult literacy as well, but it might change where people want to send their children to school and hence where they live and what children do in their spare time. This is not such a problem, as human reactions presumably reflect human goals. It seems incredibly unlikely that AI will not have huge social effects of this sort.

Another suggestion that human level AI might have the ‘wrong’ values is that the more flexible and complicated things are the harder it is to predict them in all of the circumstances they might be used. Software has bugs and failures sometimes because those making it could not think of every relevant difference in situations it will be used. But again, we have an idea of how fast these errors turn up and don’t move forward faster than enough are corrected.

The main reason that the space in which to trust technology to please us is predictable is that we accumulate technology incrementally and in pace with the corresponding science, so have knowledge and similar cases to go by. So another reason AI could be different is that there is a huge jump in AI ability suddenly. As far as I can tell this is the basis for SIAI concern. For instance if after years of playing with not very useful code, a researcher suddenly figures out a fundamental equation of intelligence and suddenly finds the reachable universe at his command. Because he hasn’t seen anything like it, when he runs it he has virtually no idea how much it will influence or what it will do. So the danger of bad values is dependent on the danger of a big jump in progress. As I explained previously, a jump seems unlikely. If artificial intelligence is reached more incrementally, even if it ends up being a powerful influence in society, there is little reason to think it will have particularly bad values.

How does raising awareness stop prejudice?

Imagine you are in the habit of treating people who have lesser social status as if they are below you. One day you hear an advertisement talking about a group of people you know nothing about. It’s main thrust is that these people are as good as everyone else, or perhaps even special in some ways which the advertisement informs you are good, and that therefore you should respect them.

ANTaR informs us that Aboriginals do not get enough respect

ANTaR informs us that Aboriginals do not get enough respect

What do you infer?

  1. These people are totally normal except for being special in various exciting ways, and you should respect them.
  2. These people are so poorly respected by others that somebody feels the need to buy advertising to rectify the situation.

What about the next day when you hear that other employers are going to court for failing to employ these people enough?

I can’t think of any better way to stop people wanting to associate with someone than by suggesting to them that nobody else wants to. Low social status seems like the last thing you can solve by raising awareness.

How far can AI jump?

I went to the Singularity Summit recently, organized by the Singularity Institute for Artificial Intelligence (SIAI). SIAI’s main interest is in the prospect of a superintelligence quickly emerging and destroying everything we care about in the reachable universe. This concern has two components. One is that any AI above ‘human level’ will improve its intelligence further until it takes over the world from all other entities. The other is that when the intelligence that takes off is created it will accidentally have the wrong values, and because it is smart and thus very good at bringing about what it wants, it will destroy all that humans value. I disagree that either part is likely. Here I’ll summarize why I find the first part implausible, and there I discuss the second part.

The reason that an AI – or a group of them – is a contender for gaining existentially risky amounts of power is that it could trigger an intelligence explosion which happens so fast that everyone else is left behind. An intelligence explosion is a positive feedback where more intelligent creatures are better at improving their intelligence further.

Such a feedback seems likely. Even now as we gain more concepts and tools that allow us to think well we use them to make more such understanding. AIs fiddling with their architecture don’t seem fundamentally different. But feedback effects are easy to come by. The question is how big this feedback effect will become. Will it be big enough for one machine to permanently overtake the rest of the world economy in accumulating capability?

In order to grow more powerful than everyone else you need to get significantly ahead at some point. You can imagine this could happen either by having one big jump in progress or by having slightly more growth over a long period of time. Having slightly more growth over a long period is staggeringly unlikely to happen by chance, so it needs to share some cause too. Anything that will give you higher growth for long enough to take over the world is a pretty neat innovation, and for you to take over the world everyone else has to not have anything close. So again, this is a big jump in progress. So for AI to help a small group take over the world, it needs to be a big jump.

Notice that no jumps have been big enough before in human invention. Some species, such as humans, have mostly taken over the worlds of other species. The seeming reason for this is that there was virtually no sharing of the relevant information between species. In human society there is a lot of information sharing. This makes it hard for anyone to get far ahead of everyone else. While you can see there are barriers to insights passing between groups, such as incompatible approaches to a kind of technology by different people working on it, these have not so far caused anything like a gap allowing permanent separation of one group.

Another barrier to a big enough jump is that much human progress comes from the extra use of ideas that sharing information brings. You can imagine that if someone predicted writing they might think ‘whoever creates this will be able to have a superhuman memory and accumulate all the knowledge in the world and use it to make more knowledge until they are so knowledgeable they take over everything.’ If somebody created writing and kept it to themselves they would not accumulate nearly as much recorded knowledge as another person who shared a writing system. The same goes for most technology. At the extreme, if nobody shared information, each person would start out with less knowledge than a cave man, and would presumably end up with about that much still. Nothing invented would be improved on. Systems which are used tend to be improved on more. This means if a group hides their innovations and tries to use them alone to create more innovation, the project will probably not grow as fast as the rest of the economy together. Even if they still listen to what’s going on outside, and just keep their own innovations secret, a lot of improvement in technologies like software comes from use. Forgoing information sharing to protect your advantage will tend to slow down your growth.

Those were some barriers to an AI project causing a big enough jump. Are the reasons for it good enough to make up for them?

The main argument for an AI jump seems to be that human level AI is a powerful and amazing innovation that will cause a high growth rate. But this means it is a leap from what we have currently, not that it is especially likely to be arrived at in one leap. If we invented it tomorrow it would be a jump, but that’s just evidence that we won’t invent it tomorrow. You might argue here that however gradually it arrives, the AI will be around human level one day, and then the next it will suddenly be a superpower. There’s a jump from the growth after human level AI is reached, not before. But if it is arrived at incrementally then others are likely to be close in developing similar technology, unless it is a secret military project or something. Also an AI which recursively improves itself forever will probably be preceded by AIs which self improve to a lesser extent, so the field will be moving fast already. Why would the first try at an AI which can improve itself have infinite success? It’s true that if it were powerful enough it wouldn’t matter if others were close behind or if it took the first group a few goes to make it work. For instance if it only took a few days to become as productive as the rest of the world added together, the AI could probably prevent other research if it wanted. However I haven’t heard any good evidence it’s likely to happen that fast.

Another argument made for an AI project causing a big jump is that intelligence might be the sort of thing for which there is a single principle. Until you discover it you have nothing, and afterwards you can build the smartest thing ever in an afternoon and can just extend it indefinitely. Why would intelligence have such a principle? I haven’t heard any good reason. That we can imagine a simple, all powerful principle of controlling everything in the world isn’t evidence for it existing.

I agree human level AI will be a darn useful achievement and will probably change things a lot, but I’m not convinced that one AI or one group using it will take over the world, because there is no reason it will be a never before seen size jump from technology available before it.

Dominant characters on the left

From Psyblog:

Research finds that people or objects moving from left to right are perceived as having greater power (Maass et al., 2007):

  • Soccer goals are rated as stronger, faster, even more beautiful when the movement of the scorer is from left to right, rather than right to left.
  • Film violence seems more aggressive, more painful and more shocking when the punch is delivered from left to right, compared with right to left.
  • Cars in an advert are rated as stronger and faster when they are moving from left to right, rather than right to left (take note advertising executives!).

Perhaps it’s no coincidence that athletes, cars and horses are all usually shown on TV reaching the finishing line from left to right.

According to other studies mentioned by Maass et al, in Western societies most people also tend to preferentially imagine events evolving from left to right, picture the situations in subject-verb-object sentences with the subject on the left of the object, look at new places rather than old ones more when stimuli show up in a left to right order, memorize the final positions of objects further along the implied path more when they are moving left to right, imagine number lines and time increasing from left to right, and scan their eyes over art in a left to right trajectory.

Why is this?

It seems likely that this left to right bias has its roots in language… people who speak languages written from right to left like Arabic or Urdu … display the same bias, but in the opposite direction.

So Maass and the others guessed that characters perceived as more active would also tend to be depicted on the left of more passive characters in pictures. Their research agreed:

We propose that spatial imagery is systematically linked to stereotypic beliefs, such that more agentic groups are envisaged to the left of less agentic groups. This spatial agency bias was tested in three studies. In Study 1, a content analysis of over 200 images of male–female pairs (including artwork, photographs, and cartoons) showed that males were over-proportionally presented to the left of females, but only for couples in which the male was perceived as more agentic. Study 2 (N = 40) showed that people tend to draw males to the left of females, but only if they hold stereotypic beliefs that associate males with greater agency. Study 3 (N = 61) investigated whether scanning habits due to writing direction are responsible for the spatial agency bias. We found a tendency for Italian-speakers to position agentic groups (men and young people) to the left of less agentic groups (females and old people), but a reversal in Arabic-speakers who tended to position the more agentic groups to the right. Together, our results suggest a subtle spatial bias in the representation of social groups that seems to be linked to culturally determined writing/reading habits.

Adam appeared on the left in 62% of paintings considered, far less than Gomez Addams is portrayed on the left of Mortissa (82%). (Picture: Peter Paul Rubens)

Adam appeared on the left in 62% of paintings considered, far less than Gomez Addams is portrayed on the left of Mortissa (82%). (Picture: Peter Paul Rubens)

Note that the first study only looked at four couples; Adam and Eve, Gomez and Mortissa Addams, Fred and Wilma Flinstone, and Marge and Homer Simpson. The last three were compared to surveyed opinions on the couple’s relative activeness, dominance and communion, and the Flinstone and Simpson couples were found to be about equal. An earlier study also found that Gabriel was portrayed to the left of Mary 97% of the time.

Most languages also mention the (active) subject before the object, which means the active entity is on the left when written in a left-to-right language. Grammar predates writing, so if this ordering of nouns is relevant, as the researchers suggest, it seems it combines with the direction of writing to cause the left to right bias. It would be interesting to see whether natives to the few languages who put the object before the subject have this bias  in the other direction. It would also be interesting to see whether the layout of sentences more commonly influences our perceptions of the content, or whether the effect is so weak as to only have influence over years of parsing the same patterns.

‘Clearly’ covers murky thought

Why do we point out that statements we are making are obvious? If a statement is actually obvious, there should rarely reason to point the statement out, let alone that it is obvious. It’s obviousness should be obvious. It seems that a person often emphasizes that a statement is obvious when they would prefer not be required to defend it. Sometimes this is just because it is obvious once you know their field but a lot of effort to explain to someone who doesn’t, but often it’s just that the explanation is not obvious to them.

But saying ‘obviously’ is too obvious. A better word is ‘clearly’. ‘Clearly’ sounds transparent and innocent. In reality it is a more subtle version of ‘obviously’.

I have noticed this technique used well in published philosophy from time to time. If getting to your conclusion is going to require assuming your conclusion is true, ‘clearly’ suggests to the reader that they not think over that step too closely.

For instance Michael Huemer in Ethical Intuitionism, while arguing that moral subjectivism is wrong, for the purpose of demonstrating that ethical intuitionism is right:

Traditionally, cultural relativists have been charged with endorsing such statements as,

If society were to approve of eating children, then eating children would be good.

which is clearly false.

Notice that ‘false’ seemingly means that it is false according to his intuition; the thing which he is trying to argue for the reliability of. If he just said ‘which is false’, the reader may wonder where, in a book on establishing a basis for ethical truth, this source of falsity may have popped from. ‘Clearly’ says that they needn’t worry about it.