Monthly Archives: October 2009

Generous people cross the street before the beggar

Robert Wiblin points to a study showing that the most generous people are the most keen to avoid situations where they will be generous, even though the people they would have helped will go without.

We conduct an experiment to demonstrate the importance of sorting in the context of social preferences. When individuals are constrained to play a dictator game, 74% of the subjects share. But when subjects are allowed to avoid the situation altogether, less than one third share. This reversal of proportions illustrates that the influence of sorting limits the generalizability of experimental findings that do not allow sorting. Moreover, institutions designed to entice pro-social behavior may induce adverse selection. We find that increased payoffs prevent foremost those subjects from opting out who share the least initially. Thus the impact of social preferences remains much lower than in a mandatory dictator game, even if sharing is subsidized by higher payoffs…

A big example of generosity inducing institutions causing adverse selection is market transactions with poor people.

For some reason we hold those who trade with another party responsible for that party’s welfare. We blame a company for not providing its workers with more, but don’t blame other companies for lack of charity to the same workers. This means that you can avoid responsibility to be generous by not trading with poor people.

Many consumers feel that if they are going to trade with poor people they should buy fair trade or thoroughly research the supplier’s niceness. However they don’t have the money or time for those, so instead just avoid buying from poor people. Only the less ethical remain to contribute to the purses of the poor.

Probably the kindest girl in my high school said to me once that she didn’t want a job where she would get rich because there are so many poor people in the world. I said that she should be rich and give the money to the poor people then. Nobody was wowed by this idea. I suspect something similar happens often with people making business and employment decisions. Those who have qualms about a line of business such as trade with poor people tend not to go into that, but opt for something guilt free already, while the less concerned do the jobs where compassion might help.

Trust in the adoration of strangers

Attractive people are more trusting when they think they can be seen:

Here, we tested the effects of cues of observation on trusting behavior in a two-player Trust game and the extent to which these effects are qualified by participants’ own attractiveness. Although explicit cues of being observed (i.e., when participants were informed that the other player would see their face) tended to increase trusting behavior, this effect was qualified by the participants’ other-rated attractiveness (estimated from third-party ratings of face photographs). Participants’ own physical attractiveness was positively correlated with the extent to which they trusted others more when they believed they could be seen than when they believed they could not be seen. This interaction between cues of observation and own attractiveness suggests context dependence of trusting behavior that is sensitive to whether and how others react to one’s physical appearance.

Probably rightly so. It’s interesting that people do not get used to the average level of good treatment expected for their attractiveness, but are sensitive to the difference in treatment when visible and when not. Is it inbuilt that we should expect some difference there, or is it just very noticeable?

I wonder whether widespread beauty enhancement increases overall trust in society, and enhances productivity accordingly, or whether favorable treatment and returned trust both adapt to relative position. Does advertising suggesting that the world is chock full of model material decrease trust between real people?

Everyone else prefers laws to values

How do you tell what a superhuman AI's values are? ( picture: ittybittiesforyou - see bottom)

How do you tell what a superhuman AI's values are? ( picture: ittybittiesforyou - see bottom)

Robin Hanson says that it is more important to have laws than shared values. I agree with him when ‘shared values’ means that shared indexical values remain about different people, e.g. If you and I share a high value of orgasms, you value you having orgasms and I value me having orgasms. Unless we are dating it’s all the same to me if you prefer croquet to orgasms. I think the singularitarians aren’t talking about this though. They want to share values in such a way that AI wants them to have orgasms. In principle this would be far better than having different values and trading. Compare gains from trading with the world economy to gains from the world economy’s most heartfelt wish being to please you. However I think that laws will get far more attention than values overall in arranging for an agreeable robot transition, and rightly so. Let me explain, then show you how this is similar to some more familiar situations.

Greater intelligences are unpredictable

If you know exactly what a creature will do in any given situation before it does it, you are at least as smart as it (if we don’t include it’s physical power as intelligence). Greater intelligences are inherently unpredictable. If you know the intelligence is trying to do, then you know what kind of outcome to expect, but guessing how it will get there is harder. This should be less so for lesser intelligences, and more so for more different intelligences. I will have less trouble guessing what a ten year old will do in chess against me than a grand master, though I can guess the outcome in both cases. If I play someone with a significantly different way of thinking about the game they may also be hard to guess.

Unpredictability is dangerous

This unpredictability is a big part of the fear of a superhuman AI. If you don’t know what path an intelligence will take to the goal you set it, you don’t know whether it will affect other things that you care about. This problem is most vividly illustrated by the much discussed case where the AI in question is suddenly very many orders of magnitude smarter than a human. Imagine we initially gave it only a subset of our values, such as our yearning to figure out whether P = NP, and we assume that it won’t influence anything outside its box. It might determine that the easiest way to do this is to contact outside help, build powerful weapons, take more resources by force, and put them toward more computing power. Because we weren’t expecting it to consider this option, we haven’t told it about our other values that are relevant to this strategy, such as the popular penchant for being alive.

I don’t find this type of scenario likely, but others do, and the problem could arise at a lesser scale with weaker AI. It’s a bit like the problem that every genie owner in fiction has faced. There are two solutions. One is to inform the AI about all of human values, so it doesn’t matter how wide it’s influence is. The other is to restrict its actions. SIAI interest seems to be in giving the AI human values (whatever that means), then inevitably surrendering control to it. If the AI will inevitably likely be so much smarter than humans that it will control everything fovever almost immediately, I agree that values are probably the thing to focus on. But consider the case where AI improves fast but by increments, and no single agent becomes more powerful than all of human society for a long time.

Unpredictability also makes it hard to use values to protect from unpredictability

When trying to avoid the dangers of unpredictability, the same unpredictability causes another problem for using values as a means of control. If you don’t know what an entity will do with given values, it is hard to assess whether it actually has those values. It is much easier to assess whether it is following simpler rules. This seems likely to be the basis for human love of deontological ethics and laws. Utilitarians may get better results in principle, but from the perspective of anyone else it’s not obvious whether they are pushing you in front of a train for the greater good or specifically for the personal bad. You would have to do all the calculations yourself and trust their information. You also can’t rely on them to behave in any particular way so that you can plan around them, unless you make deals with them, which is basically paying them to follow rules, so is more evidence for my point.

‘We’ cannot make the AI’s values safe.

I expect the first of these things to be a particular problem with greater than human intelligences. It might be better in principle if an AI follows your values, but you have little way to tell whether it is. Nearly everyone must trust the judgement, goodness and competency of whoever created a given AI, be it a person or another AI. I suspect this gets overlooked somewhat because safety is thought of in terms of what to do when *we* are building the AI. This is the same problem people often have thinking about government. They underestimate the usefulness of transparency there because they think of the government as ‘we’. ‘We should redistribute wealth’ may seem unproblematic, whereas ‘I should allow an organization I barely know anything about to take my money on the vague understanding that they will do something good with it’ does not. For people to trust AIs the AIs should have simple enough promised behavior that people using them can verify that they are likely doing what they are meant to.

This problem gets worse the less predictable the agents are to you. Humans seem to naturally find rules more important for more powerful people and consequences more important for less powerful people. Our world also contains some greater than human intelligences already: organizations. They have similar problems to powerful AI. We ask them to do something like ‘cheaply make red paint’ and often eventually realize their clever ways to do this harm other values, such as our regard for clean water. The organization doesn’t care much about this because we’ve only paid it to follow one of our values while letting it go to work on bits of the world where we have other values. Organizations claim to have values, but who can tell if they follow them?

To control organizations we restrict them with laws. It’s hard enough to figure out whether a given company did or didn’t give proper toilet breaks to its employees. It’s virtually impossible to work out whether their decisions on toilet breaks are as close to optimal according some popularly agreed set of values.

It may seem this is because values are just harder to influence, but this is not obvious. Entities follow rules because of the incentives in place rather than because they are naturally inclined to respect simple constraints. We could similarly incentivise organizations to be utilitarian if we wanted. We just couldn’t assess whether they were doing it. Here we find rules more useful and values less for these greater than human intelligences than we do for humans.

We judge and trust friends and associates according to what we perceive to be their values. We drop a romantic partner because they don’t seem to love us enough even if they have fulfilled their romantic duties. But most of us will not be put off using a product because we think the company doesn’t have the right attitude, though we support harsh legal punishments for breaking rules. Entities just a bit superhuman are too hard to control with values.

You might point out here that values are not usually programmed specifically in organizations, whereas in AI they are. However this is not a huge difference from the perspective of everyone who didn’t program the AI. To the programmer, giving an AI all of human values may be the best method of avoiding assault on them. So if the first AI is tremendously powerful, so nobody but the programmer gets a look in, values may matter most. If the rest of humanity still has a say, as I think they will, rules will be more important.

What do trust and sharing do to reputations?

Bryan Caplan asked, ‘when doesn’t reputation work well?

He answers,

To me, venereal disease is the most striking response.  Unlike other disease, V.D. is simple to prevent: Only have sex with people who credibly show that they aren’t infected.  How hard is that?  But according to Wikipedia, AIDS alone kills over 2 million people per year.

He suggests this is caused by a demand problem (people are strangely willing to sleep with someone without evidence of their not having VDs) and a supply problem (people who have good reputations can’t take over the whole market), and asks whether there are other areas where reputation fails.

Making good decisions about small risks far in the future while horny is probably a rare skill, but not the only reason for the demand problem I think. Asking someone to credibly show that they aren’t infected credibly shows that you don’t trust them to tell you on their own. Trust is a handy thing to have the appearance of in relationships, but unfortunately requires behaving trustingly. A survey of  Texan girls shows 28% of them think they sometimes or never ‘have the right to’ ask their partner if he has been tested for STDs  (all the questions in the survey are  in terms of ‘rights’ to act certain ways, and I’m not sure what that means, but I guess it implies that asking would detriment their partner unacceptably).

Does this generalize to suggest other areas reputation doesn’t work that well? I think so. Knowing someone’s reputation allows you to trust them more. This means if you want to demonstrate that you trust someone already, something you should not do is visibly seek their reputation. Reputation should work less well then when demonstrating trust is useful and seeking information about reputation is visible.

When else is showing trust useful? Any time in relationships. Sure enough, I could assess a new boyfriend much better if I rung all his exes and got appraisals. But asking for their numbers is awkward. It would make him think I don’t trust his account of himself. Which would usually be entirely sensible of course. Out of earshot we might passionately use gossip and status cues to keep track of reputations, but if you invited your partner to seek reviews of your past behavior from others (as businesses do happily) it would be an implicit accusation of distrust.

Friends are another group to whom showing trust is important. Again, once you are friends with someone, reputation doesn’t work as well as it can in other situations because seeking it out or relying on it suggests distrust, or that you suspect the friendship isn’t enough to ensure  the other person behave well. If your friend asks to borrow a book for instance, and you have no previous data on whether they return things, you don’t usually ask them or other friends nearby about their track record. You probably lose the book, but it’s worth it. With friends and lovers, reputation is important for who you get involved with, but once you are involved the need to show trust hinders assessment on smaller issues.

Another area reputation can work poorly is when it is shared as a disorganized commons.  Stereotypes can be thought of as reputations attached to identities used by more than one person. Where stereotypes are triggered by a real statistical differences between populations, there is often an externality between those sharing a given reputation. Every time my sister elopes with a butcher’s son, or another woman does well on a math test, or a man from my social class goes to jail, it is not only their reputation which is changed, but incrementally mine too. This might provide useful information about me for onlookers, but the lack of feedback to the person triggering the change means no reason for them to adjust their behavior to take into account the effects on others. For instance had I much concern for my younger brothers’ treatment at high school I might have behaved differently when going through a couple of years before. This should be more of a problem if groups of people become relatively more similar, for instance if many copies exist of one upload they will have bigger interests in the behavior of their reputation sharers. More generally, our keen interest in constructing expectations of others from reputations is presumably a partial cause of whatever problems stereotypes entail.

Reputations can also work well when shared of course. In fact sharing is the only way that reputation does work, though often it is sharing of an identity by many instants of a person, which we do not usually think of as sharing. One person usually does take into account the wellbeing of their future moments to some extent at least. That so many people voluntarily affiliate with groups that lead to others having certain expectations of them is evidence that sharing between people can be great for those involved too. Companies for instance dress their employees the same and encourage shared style and behaviour, in the hope that their brand will be trusted. Because the members of the brand are rewarded or punished according to their effect on the whole company, not just themselves, the externality is removed and there are big gains to be made.

Why will we be extra wrong about AI values?

I recently discussed the unlikelihood of an AI taking off and leaving the rest of society behind. The other part I mentioned of Singularitarian concern is that powerful AIs will be programmed with the wrong values. This would be bad even if the AIs did not take over the world entirely, but just became a powerful influence. Is that likely to happen?

Don’t get confused by talk of ‘values’. When people hear this they often think an AI could fail to have values at all, or that we would need to work out how to give an AI values. ‘Values’ just means what the AI does. In the same sense your refrigerator might value making things inside it cold (or for that matter making things behind it warm). Every program you write has values in this sense. It might value outputting ‘#t’ if and only if it’s given a prime number for instance.

The fear then is that a super-AI will do something other than what we want. We are unfortunately picky, and most things other than what we want, we really don’t want. Situations such as being enslaved by an army of giant killer robots, or having your job taken by a simulated mind are really incredibly close to what you do want compared to situations such as your universe being efficiently remodeled into stationery. If you have a machine with random values and the ability to manipulate everything in the universe, the chance of it’s final product having humans and tea and crumpets in it is unfathomably unlikely. Some SIAI members seem to believe that almost anyone who manages to make a powerful general AI will be so incapable of giving it suitable values as to approximate a random selection from mind design space.

The fear is not that whoever picks the AI’s goals will do so at random, but rather that they won’t forsee the extent of the AI’s influence, and will pick narrow goals that may as well be random when they act on the world outside the realm they were intended. For instance an AI programmed to like finding really big prime numbers might find methods that are outside the box, such as hacking computers to covertly divert others’ computing power to the task. If it improves its own intelligence immensely and copies itself we might quickly find ourselves amongst a race of superintelligent creatures whose only value is to find prime numbers. The first thing they would presumably do is stop this needless waste of resources worldwide on everything other than doing that.

Having an impact outside the intended realm is a problem that could exist for any technology. For a certain time our devices do what we want, but at some point they diverge if left long enough, depending on how well we have designed them to do what we want. In the past a car driving itself would diverge from what you wanted at the first corner, whereas after more work they diverge at the point another car gets in their way, and after more work they will diverge at the point that you unexpectedly need to pee.

Notice that at all stages we know over what realm the car’s values coincide with ours, and design it to run accordingly. The same goes with just about all the technology I can think of. Because your toaster’s values and yours diverge as soon as you cease to want bread heated, your toaster is programmed to turn off at that point and not to be very powerful.

Perhaps the concern about strong AI having the wrong goals is like saying ‘one day there will be cars that can drive themselves. It’s much easier to make a car that drives by itself than to make it steer well, so when this technology is developed, the cars will probably have the wrong goals and drive off the road.’ The error here is assuming that the technology will be used outside the realm it does what we want because the imagined amazing prototype can and programming what we do want it to do seems hard. In practice we hardly ever encounter this problem because we know approximately what our creations will do, and can control where they are set to do something. Is AI different?

One suggestion it might be different comes from looking at technologies that intervene in very messy systems. Medicines, public policies and attempts to intervene in ecosystems for instance are used without total knowledge of their effects, and often to broader and iller effects than anticipated. If it’s hard to design a single policy with known consequences, and hard to tell what the consequences are, safely designing a machine which will intervene in everything in ways you don’t anticipate is presumably harder. But it seems effects of medicine and policy aren’t usually orders of magnitude larger than anticipated. Nobody accidentally starts a holocaust by changing the road rules. Also in the societal cases, the unanticipated effects are often from society reacting to the intervention, rather than from the mechanism used having unpredictable reach. e.g. it is not often that a policy which intends to improve childhood literacy accidentally improves adult literacy as well, but it might change where people want to send their children to school and hence where they live and what children do in their spare time. This is not such a problem, as human reactions presumably reflect human goals. It seems incredibly unlikely that AI will not have huge social effects of this sort.

Another suggestion that human level AI might have the ‘wrong’ values is that the more flexible and complicated things are the harder it is to predict them in all of the circumstances they might be used. Software has bugs and failures sometimes because those making it could not think of every relevant difference in situations it will be used. But again, we have an idea of how fast these errors turn up and don’t move forward faster than enough are corrected.

The main reason that the space in which to trust technology to please us is predictable is that we accumulate technology incrementally and in pace with the corresponding science, so have knowledge and similar cases to go by. So another reason AI could be different is that there is a huge jump in AI ability suddenly. As far as I can tell this is the basis for SIAI concern. For instance if after years of playing with not very useful code, a researcher suddenly figures out a fundamental equation of intelligence and suddenly finds the reachable universe at his command. Because he hasn’t seen anything like it, when he runs it he has virtually no idea how much it will influence or what it will do. So the danger of bad values is dependent on the danger of a big jump in progress. As I explained previously, a jump seems unlikely. If artificial intelligence is reached more incrementally, even if it ends up being a powerful influence in society, there is little reason to think it will have particularly bad values.