Tag Archives: AI

Light cone eating AI explosions are not filters

Some existential risks can’t account for any of the Great Filter. Here are two categories of existential risks that are not filters:

Too big: any disaster that would destroy everyone in the observable universe at once, or destroy space itself, is out. If others had been filtered by such a disaster in the past, we wouldn’t be here either. This excludes events such as simulation shutdown and breakdown of a metastable vacuum state we are in.

Not the end: Humans could be destroyed without the causal path to space colonization being destroyed. Also much of human value could be destroyed without humans being destroyed. e.g. Super-intelligent AI would presumably be better at colonizing the stars than humans are. The same goes for transcending uploads. Repressive totalitarian states and long term erosion of value could destroy a lot of human value and still lead to interstellar colonization.

Since these risks are not filters, neither the knowledge that there is a large minimum total filter nor the use of SIA increases their likelihood.  SSA still increases their likelihood for the usual Doomsday Argument reasons. I think the rest of the risks listed in Nick Bostrom’s paper can be filters. According to SIA averting these filter existential risks should be prioritized more highly relative to averting non-filter existential risks such as those in this post. So for instance AI is less of a concern relative to other existential risks than otherwise estimated. SSA’s implications are less clear – the destruction of everything in the future is a pretty favorable inclusion in a hypothesis under SSA with a broad reference class, but as always everything depends on the reference class.

Might law save us from uncaring AI?

Robin has claimed a few times that law is humans’ best bet for protecting ourselves from super-intelligent robots. This seemed unlikely to me, and he didn’t offer much explanation. I figured laws would protect us while AI was about as intellectually weak as us, but not if when it was far more powerful. I’ve changed my mind somewhat though, so let me explain.

When is it efficient to kill humans?

At first glance, it looks like creatures with the power to take humans’ property would do so if the value of the property minus the cost of stealing it was greater than the value of anything the human might produce with it. When AI is so cheap and efficient that the human will be replaced immediately, and the replacement will use resources enough better to make up for the costs of stealing and replacement, the human is better dead. This might be soon after humans are overtaken. However such reasoning is really imagining one powerful AI’s dealings with one person, then assuming that generalizes to many of each. Does it?

What does law do?

In a group of agents where none is more powerful than the rest combined, and there is no law, basically the strongest coalition of agents gets to do what they want, including stealing others’ property. There is an ongoing cost of conflict, so overall the group would do better if they could avoid this situation, but those with power at a given time benefits from stealing, so it goes on. Law  basically lets everyone escape the dynamic of groups dominating one another (or some of it) by everyone in a very large group pre-committing to take the side of whoever is being dominated in smaller conflicts. Now wherever the strong try to dominate the weak, the super-strong awaits to crush the strong. Continue reading

‘Cheap’ goals won’t explode intelligence

An intelligence explosion is what hypothetically happens when a clever creature finds that the best way to achieve its goals is to make itself even cleverer first, and then to do so again and again as its heightened intelligence makes the the further investment cheaper and cheaper. Eventually the creature becomes uberclever and can magically (from humans’ perspective) do most things, such as end humanity in pursuit of stuff it likes more. This is predicted by some to be the likely outcome for artificial intelligence, probably as an accidental result of a smart enough AI going too far with any goal other than forwarding everything that humans care about.

In trying to get to most goals, people don’t invest and invest until they explode with investment. Why is this? Because it quickly becomes cheaper to actually fulfil a goal at than it is to invest more and then fulfil it. This happens earlier the cheaper the initial goal. Years of engineering education prior to building a rocket will speed up the project, but it would slow down the building of a sandwich.

A creature should only invest in many levels of intelligence improvement when it is pursuing goals significantly more resource intensive than creating many levels of intelligence improvement. It doesn’t matter that inventing new improvements to artificial intelligence gets easier as you are smarter, because everything else does too.  If intelligence makes other goals easier a the same rate as it makes building more intelligence easier, no goal which is cheaper than building a given amount of intelligence improvement with your current intelligence could cause  an intelligence explosion of that size.

Plenty of questions anyone is currently looking for answers to, such as ‘how do we make super duper nanotechnology?’, ‘how do we cure AIDS?’, ‘how do I get really really rich?’ and even a whole bunch of math questions are likely easier than inventing multiple big advances in AI. The main dangerous goals are infinitely expensive questions such as ‘how many digits of pi can we work out?’ and ‘please manifest our values maximally throughout as much of the universe as possible’. If someone were to build a smart AI and set it to solve any of those relatively cheap goals, it would not accidentally lead to an intelligence explosion. The risk is only with the very expensive goals.

The relative safety of smaller goals here could be confused with the relative safety of goals that comprise a small part of human values. A big fear with an intelligence explosion is that the AI will only know about a few of human goals, so will destroy everything else humans care about in pursuit of them. Notice that these are two different parameters: the proportion of the set of important goals the intelligence knows about and the expense of carrying out the task. Safest are cheap tasks where the AI knows about many of our values it may influence. Worst are potentially infinitely expensive goals with a tiny set of relevant values, such as any variation on ‘do as much of x as you can’.

Everyone else prefers laws to values

How do you tell what a superhuman AI's values are? ( picture: ittybittiesforyou - see bottom)

How do you tell what a superhuman AI's values are? ( picture: ittybittiesforyou - see bottom)

Robin Hanson says that it is more important to have laws than shared values. I agree with him when ‘shared values’ means that shared indexical values remain about different people, e.g. If you and I share a high value of orgasms, you value you having orgasms and I value me having orgasms. Unless we are dating it’s all the same to me if you prefer croquet to orgasms. I think the singularitarians aren’t talking about this though. They want to share values in such a way that AI wants them to have orgasms. In principle this would be far better than having different values and trading. Compare gains from trading with the world economy to gains from the world economy’s most heartfelt wish being to please you. However I think that laws will get far more attention than values overall in arranging for an agreeable robot transition, and rightly so. Let me explain, then show you how this is similar to some more familiar situations.

Greater intelligences are unpredictable

If you know exactly what a creature will do in any given situation before it does it, you are at least as smart as it (if we don’t include it’s physical power as intelligence). Greater intelligences are inherently unpredictable. If you know the intelligence is trying to do, then you know what kind of outcome to expect, but guessing how it will get there is harder. This should be less so for lesser intelligences, and more so for more different intelligences. I will have less trouble guessing what a ten year old will do in chess against me than a grand master, though I can guess the outcome in both cases. If I play someone with a significantly different way of thinking about the game they may also be hard to guess.

Unpredictability is dangerous

This unpredictability is a big part of the fear of a superhuman AI. If you don’t know what path an intelligence will take to the goal you set it, you don’t know whether it will affect other things that you care about. This problem is most vividly illustrated by the much discussed case where the AI in question is suddenly very many orders of magnitude smarter than a human. Imagine we initially gave it only a subset of our values, such as our yearning to figure out whether P = NP, and we assume that it won’t influence anything outside its box. It might determine that the easiest way to do this is to contact outside help, build powerful weapons, take more resources by force, and put them toward more computing power. Because we weren’t expecting it to consider this option, we haven’t told it about our other values that are relevant to this strategy, such as the popular penchant for being alive.

I don’t find this type of scenario likely, but others do, and the problem could arise at a lesser scale with weaker AI. It’s a bit like the problem that every genie owner in fiction has faced. There are two solutions. One is to inform the AI about all of human values, so it doesn’t matter how wide it’s influence is. The other is to restrict its actions. SIAI interest seems to be in giving the AI human values (whatever that means), then inevitably surrendering control to it. If the AI will inevitably likely be so much smarter than humans that it will control everything fovever almost immediately, I agree that values are probably the thing to focus on. But consider the case where AI improves fast but by increments, and no single agent becomes more powerful than all of human society for a long time.

Unpredictability also makes it hard to use values to protect from unpredictability

When trying to avoid the dangers of unpredictability, the same unpredictability causes another problem for using values as a means of control. If you don’t know what an entity will do with given values, it is hard to assess whether it actually has those values. It is much easier to assess whether it is following simpler rules. This seems likely to be the basis for human love of deontological ethics and laws. Utilitarians may get better results in principle, but from the perspective of anyone else it’s not obvious whether they are pushing you in front of a train for the greater good or specifically for the personal bad. You would have to do all the calculations yourself and trust their information. You also can’t rely on them to behave in any particular way so that you can plan around them, unless you make deals with them, which is basically paying them to follow rules, so is more evidence for my point.

‘We’ cannot make the AI’s values safe.

I expect the first of these things to be a particular problem with greater than human intelligences. It might be better in principle if an AI follows your values, but you have little way to tell whether it is. Nearly everyone must trust the judgement, goodness and competency of whoever created a given AI, be it a person or another AI. I suspect this gets overlooked somewhat because safety is thought of in terms of what to do when *we* are building the AI. This is the same problem people often have thinking about government. They underestimate the usefulness of transparency there because they think of the government as ‘we’. ‘We should redistribute wealth’ may seem unproblematic, whereas ‘I should allow an organization I barely know anything about to take my money on the vague understanding that they will do something good with it’ does not. For people to trust AIs the AIs should have simple enough promised behavior that people using them can verify that they are likely doing what they are meant to.

This problem gets worse the less predictable the agents are to you. Humans seem to naturally find rules more important for more powerful people and consequences more important for less powerful people. Our world also contains some greater than human intelligences already: organizations. They have similar problems to powerful AI. We ask them to do something like ‘cheaply make red paint’ and often eventually realize their clever ways to do this harm other values, such as our regard for clean water. The organization doesn’t care much about this because we’ve only paid it to follow one of our values while letting it go to work on bits of the world where we have other values. Organizations claim to have values, but who can tell if they follow them?

To control organizations we restrict them with laws. It’s hard enough to figure out whether a given company did or didn’t give proper toilet breaks to its employees. It’s virtually impossible to work out whether their decisions on toilet breaks are as close to optimal according some popularly agreed set of values.

It may seem this is because values are just harder to influence, but this is not obvious. Entities follow rules because of the incentives in place rather than because they are naturally inclined to respect simple constraints. We could similarly incentivise organizations to be utilitarian if we wanted. We just couldn’t assess whether they were doing it. Here we find rules more useful and values less for these greater than human intelligences than we do for humans.

We judge and trust friends and associates according to what we perceive to be their values. We drop a romantic partner because they don’t seem to love us enough even if they have fulfilled their romantic duties. But most of us will not be put off using a product because we think the company doesn’t have the right attitude, though we support harsh legal punishments for breaking rules. Entities just a bit superhuman are too hard to control with values.

You might point out here that values are not usually programmed specifically in organizations, whereas in AI they are. However this is not a huge difference from the perspective of everyone who didn’t program the AI. To the programmer, giving an AI all of human values may be the best method of avoiding assault on them. So if the first AI is tremendously powerful, so nobody but the programmer gets a look in, values may matter most. If the rest of humanity still has a say, as I think they will, rules will be more important.

Why will we be extra wrong about AI values?

I recently discussed the unlikelihood of an AI taking off and leaving the rest of society behind. The other part I mentioned of Singularitarian concern is that powerful AIs will be programmed with the wrong values. This would be bad even if the AIs did not take over the world entirely, but just became a powerful influence. Is that likely to happen?

Don’t get confused by talk of ‘values’. When people hear this they often think an AI could fail to have values at all, or that we would need to work out how to give an AI values. ‘Values’ just means what the AI does. In the same sense your refrigerator might value making things inside it cold (or for that matter making things behind it warm). Every program you write has values in this sense. It might value outputting ‘#t’ if and only if it’s given a prime number for instance.

The fear then is that a super-AI will do something other than what we want. We are unfortunately picky, and most things other than what we want, we really don’t want. Situations such as being enslaved by an army of giant killer robots, or having your job taken by a simulated mind are really incredibly close to what you do want compared to situations such as your universe being efficiently remodeled into stationery. If you have a machine with random values and the ability to manipulate everything in the universe, the chance of it’s final product having humans and tea and crumpets in it is unfathomably unlikely. Some SIAI members seem to believe that almost anyone who manages to make a powerful general AI will be so incapable of giving it suitable values as to approximate a random selection from mind design space.

The fear is not that whoever picks the AI’s goals will do so at random, but rather that they won’t forsee the extent of the AI’s influence, and will pick narrow goals that may as well be random when they act on the world outside the realm they were intended. For instance an AI programmed to like finding really big prime numbers might find methods that are outside the box, such as hacking computers to covertly divert others’ computing power to the task. If it improves its own intelligence immensely and copies itself we might quickly find ourselves amongst a race of superintelligent creatures whose only value is to find prime numbers. The first thing they would presumably do is stop this needless waste of resources worldwide on everything other than doing that.

Having an impact outside the intended realm is a problem that could exist for any technology. For a certain time our devices do what we want, but at some point they diverge if left long enough, depending on how well we have designed them to do what we want. In the past a car driving itself would diverge from what you wanted at the first corner, whereas after more work they diverge at the point another car gets in their way, and after more work they will diverge at the point that you unexpectedly need to pee.

Notice that at all stages we know over what realm the car’s values coincide with ours, and design it to run accordingly. The same goes with just about all the technology I can think of. Because your toaster’s values and yours diverge as soon as you cease to want bread heated, your toaster is programmed to turn off at that point and not to be very powerful.

Perhaps the concern about strong AI having the wrong goals is like saying ‘one day there will be cars that can drive themselves. It’s much easier to make a car that drives by itself than to make it steer well, so when this technology is developed, the cars will probably have the wrong goals and drive off the road.’ The error here is assuming that the technology will be used outside the realm it does what we want because the imagined amazing prototype can and programming what we do want it to do seems hard. In practice we hardly ever encounter this problem because we know approximately what our creations will do, and can control where they are set to do something. Is AI different?

One suggestion it might be different comes from looking at technologies that intervene in very messy systems. Medicines, public policies and attempts to intervene in ecosystems for instance are used without total knowledge of their effects, and often to broader and iller effects than anticipated. If it’s hard to design a single policy with known consequences, and hard to tell what the consequences are, safely designing a machine which will intervene in everything in ways you don’t anticipate is presumably harder. But it seems effects of medicine and policy aren’t usually orders of magnitude larger than anticipated. Nobody accidentally starts a holocaust by changing the road rules. Also in the societal cases, the unanticipated effects are often from society reacting to the intervention, rather than from the mechanism used having unpredictable reach. e.g. it is not often that a policy which intends to improve childhood literacy accidentally improves adult literacy as well, but it might change where people want to send their children to school and hence where they live and what children do in their spare time. This is not such a problem, as human reactions presumably reflect human goals. It seems incredibly unlikely that AI will not have huge social effects of this sort.

Another suggestion that human level AI might have the ‘wrong’ values is that the more flexible and complicated things are the harder it is to predict them in all of the circumstances they might be used. Software has bugs and failures sometimes because those making it could not think of every relevant difference in situations it will be used. But again, we have an idea of how fast these errors turn up and don’t move forward faster than enough are corrected.

The main reason that the space in which to trust technology to please us is predictable is that we accumulate technology incrementally and in pace with the corresponding science, so have knowledge and similar cases to go by. So another reason AI could be different is that there is a huge jump in AI ability suddenly. As far as I can tell this is the basis for SIAI concern. For instance if after years of playing with not very useful code, a researcher suddenly figures out a fundamental equation of intelligence and suddenly finds the reachable universe at his command. Because he hasn’t seen anything like it, when he runs it he has virtually no idea how much it will influence or what it will do. So the danger of bad values is dependent on the danger of a big jump in progress. As I explained previously, a jump seems unlikely. If artificial intelligence is reached more incrementally, even if it ends up being a powerful influence in society, there is little reason to think it will have particularly bad values.