‘Cheap’ goals won’t explode intelligence

An intelligence explosion is what hypothetically happens when a clever creature finds that the best way to achieve its goals is to make itself even cleverer first, and then to do so again and again as its heightened intelligence makes the the further investment cheaper and cheaper. Eventually the creature becomes uberclever and can magically (from humans’ perspective) do most things, such as end humanity in pursuit of stuff it likes more. This is predicted by some to be the likely outcome for artificial intelligence, probably as an accidental result of a smart enough AI going too far with any goal other than forwarding everything that humans care about.

In trying to get to most goals, people don’t invest and invest until they explode with investment. Why is this? Because it quickly becomes cheaper to actually fulfil a goal at than it is to invest more and then fulfil it. This happens earlier the cheaper the initial goal. Years of engineering education prior to building a rocket will speed up the project, but it would slow down the building of a sandwich.

A creature should only invest in many levels of intelligence improvement when it is pursuing goals significantly more resource intensive than creating many levels of intelligence improvement. It doesn’t matter that inventing new improvements to artificial intelligence gets easier as you are smarter, because everything else does too.  If intelligence makes other goals easier a the same rate as it makes building more intelligence easier, no goal which is cheaper than building a given amount of intelligence improvement with your current intelligence could cause  an intelligence explosion of that size.

Plenty of questions anyone is currently looking for answers to, such as ‘how do we make super duper nanotechnology?’, ‘how do we cure AIDS?’, ‘how do I get really really rich?’ and even a whole bunch of math questions are likely easier than inventing multiple big advances in AI. The main dangerous goals are infinitely expensive questions such as ‘how many digits of pi can we work out?’ and ‘please manifest our values maximally throughout as much of the universe as possible’. If someone were to build a smart AI and set it to solve any of those relatively cheap goals, it would not accidentally lead to an intelligence explosion. The risk is only with the very expensive goals.

The relative safety of smaller goals here could be confused with the relative safety of goals that comprise a small part of human values. A big fear with an intelligence explosion is that the AI will only know about a few of human goals, so will destroy everything else humans care about in pursuit of them. Notice that these are two different parameters: the proportion of the set of important goals the intelligence knows about and the expense of carrying out the task. Safest are cheap tasks where the AI knows about many of our values it may influence. Worst are potentially infinitely expensive goals with a tiny set of relevant values, such as any variation on ‘do as much of x as you can’.

15 responses to “‘Cheap’ goals won’t explode intelligence

  1. Even if your goal is to make a sandwich, you are never absolutely certain that a sandwich has been made, so your expected utility will never reach 1. Once the AI makes its first sandwich, any hypothesis about the world which implies that a sandwich has been made can no longer affect the AI’s decisionmaking. Thus the AI will always act as if the sandwich it has just made is somehow illusory.

    The AI’s next action of course is to make another sandwich, and another, etc. The more sandwiches the AI has already made, the more it will act as if it is in a universe in which sandwich-making is very hard, and these are the universes which require superintelligence to make a sandwich.

    So you still get an intelligence explosion.

    • I suspect that there exist sandwich-making robots now. I would also be very surprised if they continued making sandwiches after finishing.

      And it seems likely that if the AI thinks “my EU is still less than one; it must be very hard to make sandwiches!” at some point it would be easier to ask a human what this whole sandwich-making goal was about rather than explode into a superintelligent sandwichmaker. It could also reach a point where it was more certain of the sandwich’s existence than the orders to create the sandwich (in fact it should be there after the creation of the first sandwich since the past could be fake memories.)

      I guess theoretically you could design an AI to never stop and ask for directions but that seems like a pretty easy problem to fix compared with friendliness.

      • I’m talking about an expected utility maximizer that can entertain arbitrarily complex hypotheses about the world, not a sandwich-making robot with a narrow hypothesis space. Such a robot could not become superintelligent regardless of its utility function.

        Designing an AI to stop and ask for directions is not a pretty easy problem.

    • Making another sandwich isn’t an obvious strategy, it depends on what skeptical hypotheses would predict false memories of sandwich-construction, and how the AI assigns probabilities to them. A neat implication of this scenario is that such an AI would behave as though it was relatively confident that it was in a simulation, and might become more subject to Rolf Nelson’s AI deterrence scheme.

    • Couldn’t you just instruct it to continue until it’s 90% sure it has made a sandwich?

    • Peter, no existing intelligent agents act anything like that. Maybe your AI would do that if you gave it the single goal of making a sandwich, and no other goals, including resource conservation. But it’s ridiculous to predict the actions resulting from an incredibly bad AI design, and say “therefore AI results in intelligence explosion”.

  2. Time limits help too. When writing a program to play chess or Go, there is typically a limit to prevent it from thinking forever. To work well, such algorithms have to be designed to find a rough solution and gradually improve it, to avoid being stuck with no solution at all when time runs out.

  3. mitchell porter

    This makes sense but I think it’s only a minor consideration. For an AI to enhance itself, it must become an “artificial AI designer”, able to represent and reason about the problems of cognitive design, at electronic speeds. If it can do that, the cheapness of computation makes rational self-modification an easy avenue to explore. If it can’t, there’s no possibility of intelligence increase and hence no threat. Simply having the cognitive capacity to tackle the problem is a crucial threshold property, far more so than goal complexity.

  4. There is also the strangeness of positing a machine that takes on an enormously difficult task, one so difficult that it is best achieved by improving itself greatly, without that machine consulting with us humans on whether we approve or not of its planned strategy for accomplishing this task.

    • Is it strange that thieves rarely announce their plans to their victims in advance? If the machine’s aim is best advanced in a way that humans would object to, then it could foresee those objections and deceive us.

  5. Pingback: How to make a fast sandwich – Descry

  6. Robin: Why would an expected utility maximizer care what we humans approved of if it’s utility function didn’t include a term for “what we approve of”. If such a term existed, why would it not manipulate our approval in whatever manner was available?

    Robin and Katja: Peter de Blanc really deserves an answer here.

  7. Katja, your basic point is a good one, and similar to things Robin Hanson has said.

    The ideas currently floating about on explosion from recursive self-improvement are a little sloppy. Someone asked on a recent LessWrong thread why life isn’t a recursively self-improving process, and Eliezer said (as I recall), because it isn’t an optimization process.

    Life *is* an optimization process, and *is* a recursively self-improving system. So simply being a RSIS isn’t enough to lead to an “explosion”.

    (I put “explosion” in quotes, because you can draw a mere exponential graph like Kurzweil’s, and pick an arbitrary point and call it an explosion, in a way that is actually useful to humans, even though there’s no point of sudden change anywhere.)

    The unspoken intuition behind the intelligence explosion theory is that humans have had a constant level of intelligence for a very long time, and this level of intelligence is just barely enough to invent pointed sticks and fire and spears and the wheel, and also just barely enough to invent modern physics and artificial intelligence. Therefore, *any* rise in intelligence will transform the world beyond our comprehension, regardless of its trajectory thereafter.

    This is not necessarily the case. It may be that getting linearly increasing results requires exponentially increasing intelligence. We need to get beyond qualitative descriptions and just-so stories, and analyze d(power)/d(intelligence) and d(intelligence)/dt mathematically.

  8. Re: “So simply being a RSIS isn’t enough to lead to an “explosion”.”

    We are witnessing an explosion at the moment. Dawkins called it “The Replication Bomb”. He says the Sol has “gone information” – as opposed to “gone supernova”. The future expansion of intelligence we will see is part of that explosion.

  9. Re: curing AIDS and nanotechnology being “easy”.

    The problem is that the easiest way to solve such problems *may* be to create an open-ended maximising minion to gather resources for you, use those resources to solve the problem, and then declare success. All very well, but – unless you are careful – nobody will have told the open-ended maximising minion to stop!!

    I go into this problem in more detail in my “Stopping Superintelligence”.

Leave a reply to Carl Shulman Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.