Category Archives: 1

Why focus on making robots nice?

From Michael Anderson and Susan Leigh Anderson in Scientific American:

Today’s robots…face a host of ethical quandaries that push the boundaries of artificial intelligence, or AI, even in quite ordinary situations.

Imagine being a resident in an assisted-living facility…you ask the robot assistant in the dayroom for the remote …But another resident also wants the remote …The robot decides to hand the remote to her. …This anecdote is an example of an ordinary act of ethical decision making, but for a machine, it is a surprisingly tough feat to pull off.

We believe that the solution is to design robots able to apply ethical principles to new and unanticipated situations… for them to be welcome among us their actions should be perceived as fair, correct or simply kind. Their inventors, then, had better take the ethical ramifications of their programming into account…

It seems there are a lot of articles focussing on the problem that some of the small  decisions robots will make will be ‘ethical’. There are also many fearing that robots may want to do particularly unethical things, such as shoot people.

Working out how to make a robot behave ‘ethically’ in this narrow sense (arguably all behaviour has an ethical dimension) is an odd problem to set apart from the myriad other problems of making a robot behave usefully. Ethics doesn’t appear to pose unique technical problems. The aforementioned scenario is similar to ‘non-ethical’ problems of making a robot prioritise its behaviour. On the other hand, teaching a robot when to give a remote control to a certain woman is not especially generalisable to other ethical issues such as teaching it which sexual connotations it may use in front of children, except in sharing methods so broad as to also include many more non-ethical behaviours.

The authors suggests that robots will follow a few simple absolute ethical rules like Asimov’s. Perhaps this could unite ethical problems as worth considering together. However if robots are given such rules, they will presumably also be following big absolute rules for other things. For instance if ‘ethics’ is so narrowly defined as to include only choices such as when to kill people and how to be fair, there will presumably be other rules about the overall goals when not contemplating murder. These would matter much more than the ‘ethics’. So how to pick big rules and guess their far reaching effects would again not be an ethics-specific issue. On top of that, until anyone is close to a situation where they could be giving a robot such an abstract rule to work from, the design of said robots is so open as to make the question pretty pointless except as a novel way of saying ‘what ethics do I approve of?’.

I agree that it is useful to work out what you value (to some extent) before you program a robot to do it, particularly including overall aims. Similarly I think it’s a good idea to work out where you want to go before you program your driverless car to drive you there. This doesn’t mean there is any eerie issue of getting a car to appreciate highways when it can’t truly experience them. It also doesn’t present you with any problem you didn’t have when you had to drive your own car – it has just become a bit more pressing.

Rainbow Robot

Making rainbows has much in common with other manipulations of water vapor. Image by Jenn and Tony Bot via Flickr

Perhaps, on the contrary, ethical problems are similar in that humans have very nuanced ideas about them and can’t really specify satisfactory general principles to account for them. If the aim is for robots to learn how to behave just from seeing a lot of cases, without being told a rule, perhaps this is a useful category of problems to set apart? No – there are very few things humans deal with that they can specify directly. If a robot wanted to know the complete meaning of almost any word it would have to deal with a similarly complicated mess.

Neither are problems of teaching (narrow) ethics to robots united in being especially important, or important in similar ways, as far as I can tell. If the aim is about something like treating people well, people will be much happier if the robot gives the remote control to anyone rather than ignoring them all until it has finished sweeping the floors than if it gets the question of who to give it to correct. Yet how to get a robot to prioritise floor cleaning below remote allocating at the right times seems an uninteresting technicality, both to me and seemingly to authors of popular articles. It doesn’t excite any ‘ethics’ alarms. It’s like wondering how the control panel will be designed in our teleportation chamber: while the rest of the design is unclear, it’s a pretty uninteresting question. When the design is more clear, to most it will be an uninteresting technical matter. How robots will be ethical or kind is similar, yet it gets a lot of attention.

Why is it so exciting to talk about teaching robots narrow ethics? I have two guesses. One, ethics seems such a deep and human thing, it is engaging to frighten ourselves by associating it with robots. Two, we vastly overestimate the extent to which value of outcomes to reflects the virtue of motives, so we hope robots will be virtuous, whatever their day jobs are.

SIA and the Two Dimensional Doomsday Argument

This post might be technical. Try reading this if I haven’t explained everything well enough.

When the Self Sampling Assumption (SSA) is applied to the Great Filter it gives something pretty similar to the Doomsday Argument, which is what it gives without any filter. SIA gets around the original Doomsday Argument. So why can’t it get around the Doomsday Argument in the Great Filter?

The Self Sampling Assumption (SSA) says you are more likely to be in possible worlds which contain larger ratios of people you might be vs.  people know you are not*.

If you have a silly hat, SSA says you are more likely to be in world 2 - assuming Worlds 1 and 2 are equally likely to exist (i.e. you haven't looked aside at your companions), and your reference class is people.

The Doomsday Argument uses the Self Sampling Assumption. Briefly, it argues that if there are many generations more humans, the ratio of people who might be you (are born at the same time as you) to people you can’t be (everyone else) will be smaller than it would be if there are few future generations of humans. Thus few generations is more likely than previously estimated.

An unusually large ratio of people in your situation can be achieved by a possible world having unusually few people unlike you in it or unusually many people like you, or any combination of these.

 

Fewer people who can't be me or more people who may be me make a possible world more likely according to SSA.

For instance on the horizontal dimension, you can compare a set of worlds which all have the same number of people like you, and different numbers of people you are not. The world with few people unlike you has the largest increase in probability.

 

Doomsday

The top row from the previous diagram. The Doomsday Argument uses possible worlds varying in this dimension only.

The Doomsday Argument is an instance of variation in the horizontal dimension only. In every world there is one person with your birth rank, but the numbers of people with future birth ranks differ.

At the other end of the spectrum you could be comparing  worlds with the same number of future people and vary the number of current people, as long as you are ignorant of how many current people there are.

The vertical axis. The number of people in your situation changes, while the number of others stays the same. The world with a lot of people like you gets the largest increase in probability.

This gives a sort of Doomsday Argument: the population will fall, most groups won’t survive.

The Self Indication Assumption (SIA) is equivalent to using SSA and then multiplying the results by the total population of people both like you and not.

In the horizontal dimension, SIA undoes the Doomsday Argument. SSA favours smaller total populations in this dimension, which are disfavoured to the same extent by SIA, perfectly cancelling.

[1/total] * total = 1
(in bold is SSA shift alone)

In vertical cases however, SIA actually makes the Doomsday Argument analogue stronger. The worlds favoured by SSA in this case are the larger ones, because they have more current people. These larger worlds are further favoured by SIA.

[(total – 1)/total]*total = total – 1

The second type of situation is relatively uncommon, because you will tend to know more about the current population than the future population. However cases in between the two extremes are not so rare. We are uncertain about creatures at about our level of technology on other planets for instance, and also uncertain about creatures at some future levels.

This means the Great Filter scenario I have written about is an in between scenario. Which is why the SIA shift doesn’t cancel the SSA Doomsday Argument there, but rather makes it stronger.

Expanded from p32 of my thesis.

——————————————-
*or observers you might be vs. those you are not for instance – the reference class may be anything, but that is unnecessarily complicated for the the point here.

SIA says AI is no big threat

Artificial Intelligence could explode in power and leave the direct control of humans in the next century or so. It may then move on to optimize the reachable universe to its goals. Some think this sequence of events likely.

If this occurred, it would constitute an instance of our star passing the entire Great Filter. If we should cause such an intelligence explosion then, we are the first civilization in roughly the past light cone to be in such a position. If anyone else had been in this position, our part of the universe would already be optimized, which it arguably doesn’t appear to be. This means that if there is a big (optimizing much of the reachable universe) AI explosion in our future, the entire strength of the Great Filter is in steps before us.

This means a big AI explosion is less likely after considering the strength of the Great Filter, and much less likely if one uses the Self Indication Assumption (SIA).

The large minimum total filter strength contained in the Great Filter is evidence for larger filters in the past and in the future. This means evidence against the big AI explosion scenario, which requires that the future filter is tiny.

SIA implies that we are unlikely to give rise to an intelligence explosion for similar reasons, but probably much more strongly. As I pointed out before, SIA says that future filters are much more likely to be large than small. This is easy to see in the case of AI explosions. Recall that SIA increases the chances  of hypotheses where there are more people in our present situation. If we precede an AI explosion, there is only one civilization in our situation, rather than potentially many if we do not. Thus the AI hypothesis is disfavored (by a factor the size of the extra filter it requires before us).

What the Self Sampling Assumption (SSA), an alternative principle to SIA, says depends on the reference class. If the reference class includes AIs, then we should strongly not anticipate such an AI explosion. If it does not, then we strongly should (by the doomsday argument). These are both basically due to the Doomsday Argument.

In summary, if you begin with some uncertainty about whether we precede an AI explosion, then updating on the observed large total filter and accepting SIA should make you much less confident in that outcome. The Great Filter and SIA don’t just mean that we are less likely to peacefully colonize space than we thought, they also mean we are less likely to horribly colonize it, via an unfriendly AI explosion.

Know thyself vs. know one another

People often aspire to the ideal of honesty, implicitly including both honesty to themselves and honesty with others. Those who care about it a lot often aim to be as honest as they can bring themselves to be, across circumstances. If the aim is to get correct information to yourself and other people however, I think this approach isn’t the greatest.

There is probably a trade off between being honest with yourself and honest to others, so trying hard to be honest to others only detriments being honest to yourself, which in turn also prevents correct information getting to others.

Why would there be a trade off? Imagine your friend said, ‘I promise that anything you tell me I will repeat to anyone who asks’. How honest would you be with that friend? If you say to yourself that you will report your thoughts to others, why wouldn’t the same effect apply?

Progress in forcing yourself to be honest to others must be somewhat an impediment to being honest to yourself. Being honest with yourself is presumably also a disincentive to your being honest with others later, but that is less of a cost, since if you are dishonest with yourself you are presumably deceiving them about those topics either way.

For example imagine you are wondering what you really think of your friend Errol’s art. If you are committed to truthfully admitting whatever the answer is to Errol or your other friends, it will be pretty tempting to sincerely interpret whatever experience you are having as ‘liking Errol’s art’. This way both you and the others come off deceived. If you were committed to lying in such circumstances, you would at least have the freedom to find out the truth yourself. This seems like the superior option for the truth-loving honesty enthusiast.

This argument relies on the assumptions that you can’t fully consciously control how deluded you are about the contents of your brain, and that the unconscious parts of your mind that control this respond to incentives. These things both seem true to me.

Light cone eating AI explosions are not filters

Some existential risks can’t account for any of the Great Filter. Here are two categories of existential risks that are not filters:

Too big: any disaster that would destroy everyone in the observable universe at once, or destroy space itself, is out. If others had been filtered by such a disaster in the past, we wouldn’t be here either. This excludes events such as simulation shutdown and breakdown of a metastable vacuum state we are in.

Not the end: Humans could be destroyed without the causal path to space colonization being destroyed. Also much of human value could be destroyed without humans being destroyed. e.g. Super-intelligent AI would presumably be better at colonizing the stars than humans are. The same goes for transcending uploads. Repressive totalitarian states and long term erosion of value could destroy a lot of human value and still lead to interstellar colonization.

Since these risks are not filters, neither the knowledge that there is a large minimum total filter nor the use of SIA increases their likelihood.  SSA still increases their likelihood for the usual Doomsday Argument reasons. I think the rest of the risks listed in Nick Bostrom’s paper can be filters. According to SIA averting these filter existential risks should be prioritized more highly relative to averting non-filter existential risks such as those in this post. So for instance AI is less of a concern relative to other existential risks than otherwise estimated. SSA’s implications are less clear – the destruction of everything in the future is a pretty favorable inclusion in a hypothesis under SSA with a broad reference class, but as always everything depends on the reference class.