Normative reductionism

Here’s a concept that seems useful, but that I don’t remember ever hearing explicitly referred to (with my own tentative name for it—if it turns out to not already have one in some extensive philosophical literature, I might think more about whether it is a good name):

Normative reductionism: The value of a world history is equal to the value of its parts (for some definition of relevant parts).

For instance, if two world histories only differ between time t and time t’, according to NR you do not need to know what happened at other times to evaluate them in full. Similarly, the value of Alice’s life, or the value of Alice enjoying a nap, depend on the nature of her life or the nap, and not for instance on other people’s lives or events that took place before she was born with no effect on her (unless perhaps she has preferences about those events or they involve people having preferences about her, but still the total value can be decomposed into the value of different preferences being fulfilled or not). Straightforward hedonistic utilitarianism probably implies normative reductionism.

My impression is that people have different intuitions about this and vary in how much they assume it, and that it mostly isn’t entirely aligned with other axes of ethical view, either logically or sociologically, though is related to them. So it seems maybe worth noting explicitly.

Total horse takeover

I hear a lot of talk of ‘taking over the world’. What is it to take over the world? Have I done it if I am king of the world? Have I done it if I burn the world? Have humans or the printing press or Google or the idea of ‘currency’ done it? 

Let’s start with something more tractable, and be clear on what it is to take over a horse. 

A natural theory is that to take over a horse is to be the arbiter of everything about the horse —to be the one deciding the horse’s every motion.

But you probably don’t actually want to control the horse’s every motion, because the horse’s own ability to move itself is a large part of its value-add. Flaccid horse mass isn’t that helpful, not even if we throw in the horse’s physical strength to move itself according to your commands, and some sort of magical ability for you to communicate muscle-level commands to it. If you were in command of the horse’s every muscle, it would fall over. (If you directed its cellular processes too, it would die; if you controlled its atoms, you wouldn’t even have a dead horse.) 

Information and computing capacity

The reason this isn’t so good is that balancing and maneuvering a thousand pounds of fast-moving horse flesh balanced on flexible supports is probably hard for you, at least via an interface of individual muscles, at least without more practice being a horse. I think for two reasons:

  • Lack of information e.g. about exactly where every part of the horse’s body is and where its hoofs are touching the ground how hard
  • Lack of computing power to dedicate to calculating desired horse muscle motions from the above information and your desired high level horse behavior

(Even if you have these things, you don’t obviously know how to use them to direct the horse well, but you can probably figure this out in finite time, so it doesn’t seem like a really fundamental problem.)

Tentative claim: holding more levers is good for you only insofar as you have the information and computing capacity to calculate which directions you should want those levers pushed. 

So, you seem to be getting a lot out of the horse and various horse subcomponents making their own decisions about steering and balance and breathing and snorting and mitosis and where electrons should go. That is, you seem to be getting a lot out of not being in control of the horse. In fact so far it seems like the more you are in control of the horse in this sense, the worse things go for you. 

Is there a better concept of ‘taking over’—a horse, or the world—such that someone relatively non-omniscient might actually benefit from it? (Maybe not—maybe extreme control is just bad if you aren’t near-omniscient, which would be good to know.) 

What riding a horse is like

Perhaps a good first question: is there any sort of power that won’t make things worse for you? Surely yes: training a horse to be ridden in the usual sense seems like ‘having control over’ the horse more than you would otherwise, and seems good for you. So what is this kind of control like? 

Well, maybe you want the horse to go to London with you on it, so you get on it and pull the reins to direct it to London. You don’t run into the problems above, because aside from directing its walking toward London, it sticks to its normal patterns of activity pretty closely (for instance, it continues breathing and keeping its body in an upright position and doing walking motions in roughly the direction its head is pointed).

So maybe in general: you want to command the horse by giving it a high level goal (‘take me to London’) then you want it to do the backchaining and fill in all the details (move right leg forward, hop over this log, breathe..). That’s not quite right though, because the horse has no ability to chart a path from here to London, due to its ignorance of maps and maybe London as a concept. So you are hoping to do the first step of the backchaining—figure out the route—and then to give the horse slightly lower level goals such as, ‘turn left here’, ‘go straight’, and for it to do the rest. Which still sounds like giving it a high level goal, then having it fill in the instrumental subgoals and do them.

But that isn’t quite right. You probably also want to steer the details there somewhat. You are moment-to-moment adjusting the horse’s motion to keep you on it, for instance. Or to avoid scaring some chickens. Or to keep to the side as another horse goes by. While not steering it entirely, at that level. You are relying on its own ability to avoid rocks and holes and to dodge if something flies toward it, and to put some effort into keeping you on it. How does this fit into our simple model? 

Perhaps you want the horse to behave as it would—rather than suddenly leaving every decision to you—but for you to be able to adjust any aspect of it, and have it again work out how to support that change with lower level choices. You push it to the left and it finds new places to put its feet to make that work, and adjusts its breathing and heart rate to make the foot motions work. You pull it to a halt, and it changes its leg muscle tautnesses and heart rate and breathing to make that work. 


On this model, in practice your power is limited by what kinds of changes the horse can and will fill in new details for. If you point its head in a new direction, or ask it to sit down, it can probably recalculate its finer motions and support that. Whereas if you decide that it should have have holes in its legs, it just doesn’t have an affordance for doing that. And if you do it, it will bleed a lot and run into trouble rather than changing its own bloodflow. If you decide it should move via a giant horse-sized bicycle, it probably can’t support that, even if in principle its physiology might allow it. If you hold up one of its legs so its foot is high in the air, it will ‘support’ that change by moving its leg back down again, which is perhaps not what you were going for.

This suggests that taking over a thing is not zero sum. There is not a fixed amount of control to be had by intentional agents. Because perhaps you have all the control that anyone has over a horse, in the sense that if the horse ever has a choice, it will try to support your commands to it. But still it just doesn’t know how to control its own heart rate consciously or ride a giant horse-sized bicycle. Then one day it learns these skills, and can let you adjust more of its actions. You had all the control the whole time, but all became more.


One issue with this concept of taking over is that it isn’t clear what it means to ‘support’ a change. Each change has a number of consequences, and some of them are the point while others are undesirable side effects, such that averting them is an integral part of supporting the change. For instance, moving legs faster means using up blood oxygen and also traveling faster. If you gee up the horse, you want it to support this by replacing the missing blood oxygen, but not to jump on a treadmill to offset the faster travel.

For the horse to get this right in general, it seems that it needs to know about your higher level goals. In practice with horses, they are just built so that if they decide to run faster their respiratory system supplies more oxygen and they aren’t struck by a compulsion to get on a treadmill, and if that weren’t true we would look for a different animal to ride. The fact that they always assume one kind of thing is the goal of our intervention is fine, because in practice we do basically always want legs for motion and never for using up oxygen.

Maybe there is a systematic difference between desirable consequences and ones that should be offset—in the examples that I briefly think of, the desirable consequences seem more often to do with relationships with larger scale things, and the ones that need offsetting are to do with internal things, but that isn’t always true (I might travel because I want to be healthier, but I want to be in the same relationship with those who send me mail). If the situation seems to turn inputs into outputs, then the outputs are often the point, though that is also not always true (e.g. a garbage burner seeks to get rid of garbage, not create smoke). Both of these also seem maybe contingent on our world, whereas I’m interested in a general concept. 

Total takeover

I’ll set that aside, and for now define a desirable model of controlling a system as something like: the system behaves as it would, but you can adjust aspects of the system and have it support your adjustment, such that the adjustment forwards your goals. 

There isn’t a clear notion of ‘all the control’, since at any point there will be things that you can’t adjust (e.g. currently the shape of the horse’s mitochondria, for a long time the relationship between space and time in the horse system), either because you or the system don’t have a means of making the adjustment intentionally, or the system can’t support the adjustment usefully. However ‘all of the control that anyone has’ seems more straightforward, at least if we define who is counted in ‘anyone’. (If you can’t control the viral spread, is the virus a someone who has some of the universe’s control?)

I think whether having all of the control at a particular time gets at what I usually mean by having ‘taken over’ depends on what we expect to happen with new avenues of control that appear. If they automatically go to whoever had control, then having all of the control at one time seems like having taken over. If they get distributed more randomly (e.g. the horse learns to ride a bicycle, but keeps that power for itself, or a new agent is created with a power), so that your fraction of control deteriorates over time, that seems less like having taken over. If that is how our world is, I think I want to say that one cannot take it over.


This was a lot of abstract reasoning. I especially welcome correction from someone who feels they have successfully controlled a horse to a non-negligible degree.

Politics is work and work needs breaks

A post written a few years ago, posting now during a time of irrelevance (as far as I know with my limited attention on politics or social media) so as not to be accidentally taking a side in any specific political debates.


Alexandra What has happened is shocking and everyone should oppose it.

Beatrice I’m eating a sandwich. It is better than I expected.

Alexandra I can’t believe you would write about a sandwich at a time like this. Don’t you oppose what happened?

(Claire is excited by the breakfast bread discussion but guesses that contributing a picture of her bagel is pretty inappropriate and looks at SMBC instead.)

People break up their time into work and leisure. You might think of work vs. leisure as roughly ‘doing stuff because other people pay for it’ vs. ‘doing stuff because you want to’. You can also think of it as roughly ‘effortful’ vs. ‘relaxing’. Often these categories align—other people pay you to do effortful things, and the things you want to do are more relaxing. They don’t always align. Helping your aged relative go to the doctor might be effortful but a thing you do because you want to, or your job might be so easy that you come home with a bunch of energy for challenging tasks.

I’m going to call these ‘resting-‘ vs. ‘challenged-‘ and ‘-boss’ vs. ‘-employee’. So entirely free time is mostly resting-boss and paid work is usually challenged-employee. But you can also get resting-employee and challenged-boss activities. This all maybe relies on some models of rest and attention and effort and such that don’t work out, but they seem at least close the models that most people practically rely on. For whatever reason, most people prefer to spend a decent fraction of their time doing non-effortful things, unless something terrible will happen soon if they don’t.

People mostly use social media as leisure, both in the sense that nobody would intentionally pay them for it, and in the sense that it is not effortful. When important political things are happening, social media naturally turns to discussion of them. Which, if you are lucky, might be thoughtful analysis of world affairs, with a bunch of statistics and rethinking your assumptions and learning new facts about the world. Which is all great, but it is not leisure in the ‘not effort’ sense. When I need a break from researching the likely social consequences of artificial intelligence, moving to researching the likely social consequences of changing identity politics in America does not hit the spot as well as you might hope. I assume something similar is true of many people.

When there are opportunities to move a lot of leisure time from resting-boss idle chat to challenged-boss political discussions, people can be appalled when others do not redirect their use of social media to talking about the problem. They are thinking, ‘when you are doing what you want, you should be wanting this! If you would really spend your limited time on pictures of animals that look like pastry when you can help to stop this travesty, you are terrible!’

However this means moving time that was in ‘relaxing’ to ‘effortful’, which as far as I can tell is not super sustainable. In the sense that people usually need to spend some amount of time relaxing to be happy and able to do effortful things at other times. Redistributing all of the relaxing time to effortful time makes sense when there is a very immediate threat—for instance, your house is on fire, or you have a deadline this week that will cause you to lose your job if you don’t dedicate all of your time to it. However if you have a problem on the scale of months’ or years’ worth of effort, I think most people would favor making that effort as a sustainable trek, with breaks and lunch and jokes. For instance, if you are trying to get tenure in a few years, many would predict that you are less likely to succeed if you now attempt to ban all leisure from your life and work all of the time.

When there are political events that seem to some people to warrant talking about all of the time, and some people who really don’t want to, I think this less implies a difference in concern about the problem than you might think. The disagreeing parties could also be framing work and leisure differently, or disagreeing over how short-lived the important problem is, or when the high leverage times for responding to it are.

Skill and leverage

Sometimes I hear people say ‘how can make a big difference to the world, when I can’t make a big difference to that pile of dishes in my sock drawer?’ or ‘How can I improve the sustainability of world energy usage when I can’t improve the sustainability of my own Minecraft usage?’ The basic thought is that if you can’t do ‘easy’ things that humans are meant to be able to do, on the scale of your own life, you probably lack general stuff-doing ability, and are not at the level where you can do something a million times more important. 

I think this is a generally wrong model, for two reasons. One is that the difficulty of actions is not that clearly well ordered—if you have a hard time keeping your room tidy, this just doesn’t say that much about whether you can write well or design rockets or play the piano.

The second reason is that the difficulty of actions doesn’t generally scale with their consequences. I think this is more unintuitive.

Some examples:

  1. Applying for funding for a promising new anti-cancer drug is probably about as hard as applying for funding for an investigation into medieval references to toilet paper (and success is probably easier), but the former is much more valuable.  
  2. Having a good relationship with your ex Bob might be about as hard and take about the same skills as having a good relationship with your more recent ex Trevor, but if you have children with Trevor, the upside of that effort may be a lot higher.
  3. If you have a hard time making a speech at your brother’s birthday, you will probably also have a hard time making a speech to the UN. But, supposing it is fifty thousand times more important, it isn’t going to be fifty thousand times harder. It’s not even clear that it is going to be harder at all—it probably depends on the topic and your relationship with your family and the UN.
  4. Writing a good book about x-risk is not obviously much harder than writing a good book about the role of leprechauns through the ages, but is vastly more consequential in expectation.

My basic model is that you can have skills that let you do particular physical transformations (an empty file into a book, some ingredients into a cake), and there are different places you can do those tricks, and some of the places are just much higher leveraged than others. Yet the difficulty is mostly related to the skill or trick. If you are trying to start a fire, holding the burning match against the newspapers under the logs is so much better than holding it in the air nearby or on the ground or at the top of the logs, and this doesn’t involve the match being better or worse in any way.

In sum, there isn’t a clear ladder of actions a person can progress through, with easy unimportant ones at the bottom, and hard important ones at the top. There will be hard-for-you unimportant actions, and easy-for-you important actions. The last thing you should do if you come across a hard-for-you unimportant action is stop looking for other things to do. If you are bad at keeping your room clean and room cleanliness isn’t crucial to your wellbeing, then maybe look for the minimum version of cleanliness that that lets you live happily, and as quickly as possible get to finding things that are easier for you, and places to deploy them that are worthwhile.

But exactly how complex and fragile?

This is a post about my own confusions. It seems likely that other people have discussed these issues at length somewhere, and that I am not up with current thoughts on them, because I don’t keep good track of even everything great that everyone writes. I welcome anyone kindly directing me to the most relevant things, or if such things are sufficiently well thought through that people can at this point just correct me in a small number of sentences, I’d appreciate that even more.


The traditional argument for AI alignment being hard is that human value is ‘complex’ and ‘fragile’. That is, it is hard to write down what kind of future we want, and if we get it even a little bit wrong, most futures that fit our description will be worthless. 

The illustrations I have seen of this involve a person trying to write a description of value conceptual analysis style, and failing to put in things like ‘boredom’ or ‘consciousness’, and so getting a universe that is highly repetitive, or unconscious. 

I’m not yet convinced that this is world-destroyingly hard. 

Firstly, it seems like you could do better than imagined in these hypotheticals:

  1. These thoughts are from a while ago. If instead you used ML to learn what ‘human flourishing’ looked like in a bunch of scenarios, I expect you would get something much closer than if you try to specify it manually. Compare manually specifying what a face looks like, then generating examples from your description to using modern ML to learn it and generate them.
  2. Even in the manually describing it case, if you had like a hundred people spend a hundred years writing a very detailed description of what went wrong, instead of a writer spending an hour imagining ways that a more ignorant person may mess up if they spent no time on it, I could imagine it actually being pretty close. I don’t have a good sense of how far away it is.

I agree that neither of these would likely get you to exactly human values.

But secondly, I’m not sure about the fragility argument: that if there is basically any distance between your description and what is truly good, you will lose everything. 

This seems to be a) based on a few examples of discrepancies between written-down values and real values where the written down values entirely exclude something, and b) assuming that there is a fast takeoff so that the relevant AI has its values forever, and takes over the world.

My guess is that values that are got using ML but still somewhat off from human values are much closer in terms of not destroying all value of the universe, than ones that a person tries to write down. Like, the kinds of errors people have used to illustrate this problem (forget to put in, ‘consciousness is good’) are like forgetting to say faces have nostrils in trying to specify what a face is like, whereas a modern ML system’s imperfect impression of a face seems more likely to meet my standards for ‘very facelike’ (most of the time).

Perhaps a bigger thing for me though is the issue of whether an AI takes over the world suddenly. I agree that if that happens, lack of perfect alignment is a big problem, though not obviously an all value nullifying one (see above). But if it doesn’t abruptly take over the world, and merely becomes a large part of the world’s systems, with ongoing ability for us to modify it and modify its roles in things and make new AI systems, then the question seems to be how forcefully the non-alignment is pushing us away from good futures relative to how forcefully we can correct this. And in the longer run, how well we can correct it in a deep way before AI does come to be in control of most decisions. So something like the speed of correction vs. the speed of AI influence growing.

These are empirical questions about the scales of different effects, rather than questions about whether a thing is analytically perfect. And I haven’t seen much analysis of them. To my own quick judgment, it’s not obvious to me that they look bad.

For one thing, these dynamics are already in place: the world is full of agents and more basic optimizing processes that are not aligned with broad human values—most individuals to a small degree, some strange individuals to a large degree, corporations, competitions, the dynamics of political processes. It is also full of forces for aligning them individually and stopping the whole show from running off the rails: law, social pressures, adjustment processes for the implicit rules of both of these, individual crusades. The adjustment processes themselves are not necessarily perfectly aligned, they are just overall forces for redirecting toward alignment. And in fairness, this is already pretty alarming. It’s not obvious to me that imperfectly aligned AI is likely to be worse than the currently misaligned processes, and even that it won’t be a net boon for the side of alignment.

So then the largest remaining worry is that it will still gain power fast and correction processes will be slow enough that its somewhat misaligned values will be set in forever. But it isn’t obvious to me that by that point it isn’t sufficiently well aligned that we would recognize its future as a wondrous utopia, just not the very best wondrous utopia that we would have imagined if we had really carefully sat down and imagined utopias for thousands of years. This again seems like an empirical question of the scale of different effects, unless there is a an argument that some effect will be totally overwhelming.