One might expect that a simple estimation would be equally likely to overestimate or underestimate the true value of interest. For instance, a back of the envelope calculation of the number of pet shops in New York City seems as likely to be too high as too low.
Apparently this doesn’t work for do-gooding. The more you look into an intervention, the worse it gets. At least generally. At least in GiveWell’s experience, and my imagination. I did think it fit my real (brief) experience in evaluating charities, but after listing considerations I considered in my more detailed calculation of Cool Earth’s cost-effectiveness, more are positive than negative (see list at the end). The net effect of these complications was still negative for Cool Earth, and I still feel like the other charities also suffered many negative complications. However I don’t trust my intuition that this is obviously strongly true. More information welcome.
In this post I’ll assume charitable interventions do consistently look worse as you evaluate them more thoroughly, and concern myself with the question of why. Below are a number of attempted explanations for this phenomenon that I and various friends can think of.
Regression to the altruistic intervention mean
Since we are looking for the very best charities, we start with ones that look best. And like anything that looks best, these charities will tend to be less good than they look. This is an explanation Jonah mentions.
In this regression to the mean story, which mean is being regressed to? If it is the mean for charities, then ineffective charities should look better on closer inspection. I find this hard to believe. I suspect that even if a casual analysis suggests $1500 will give someone a one week introduction to food gardening, which they hope will reduce their carbon footprint by 0.1 tonnes per year, the real result of such spending will be much less than the tonne per $1000 implied by a simple calculation. The participant’s time will be lost in attending and gardening, the participants probably won’t follow up by making a garden, they probably won’t keep it up for long, or produce much food from it, and so on. There will also be some friendships formed, some leisure and mental health from any gardening that ultimately happens, some averted trips to the grocery store. My guess is that these positive factors don’t make up for the negative ones any better than they do for more apparently effective charities.
Regression to the possible action mean + value is fragile
Instead perhaps the mean you should regress to is not that of existing charities, but rather that of possible charities, or – similarly – possible actions. This would suggest all apparently positive value charities are worse than they look – the average possible action is probably neutral or negative. There are a lot of actions that just involve swinging your arms around or putting rocks in your ears for instance. Good outcomes are a relatively small fraction of possible outcomes, and similarly, good plans are a probably a relatively small fraction of possible plans.
The initial calculation of a charity’s cost-effectiveness usually uses the figures that the charity provided you with. This information might be expected to be both selected for being optimistic looking, whether due to outright dishonesty, or selection from among a number of possible pieces of data they could have told you about.
This seems plausible, but is hard to believe as the main effect I think. For one thing, in many of these cases it would be hard for the charity to be very selective – there are some fairly obvious metrics to measure, and they probably don’t have that many measures of them. For instance, for a tree planting charity, it is very natural for them to tell us how many trees they have planted, and natural for us to look at the overall money they have spent. They could have selected a particularly favorable operationalization of how many trees they planted, but there still doesn’t seem to be that much wiggle room, and this wouldn’t show up (and so produce a difference between the early calculation and later ones) unless one did a very in-depth investigation.
Common relationships tend to make things worse with any change
Another reason that I doubt the advertising explanation is that a very similar thing seems to happen with personal plans, which appear to be less geared toward advertising, at least on relatively non-cynical accounts. That is, if I consider the intervention of catching the bus to work, and I estimate that the bus takes ten minutes and comes every five minutes, and it takes me three minutes to walk at each end, then I might think it will take me 13-18m to get to work. In reality, it will often take longer, and almost never take less time.
I don’t think this is because I go out of my way to describe the process favorably, but rather because almost every change that can be made from the basic setup – where I interact with the bus as planned and nothing else happens – makes things slower rather than faster. If the bus comes late, I will be late. If the bus comes more than a few minutes early, I will miss it and also be late. Things can get in the way of the bus and slow it down arbitrarily, but it is hard for them to get out of the way of the bus and speed it up so much. I can lose my ticket, or not have the correct change, but I can’t benefit much from finding another ticket, or having more change than I need.
These kinds of relationships between factors that can change in the world and the things we want are common. Often a goal requires a few inputs to come together, such that having extra of an input doesn’t help if you don’t have extra of the others, yet having less of one wastes the others. Having an extra egg doesn’t help me make more cake, while missing an egg disproportionately shrinks the cake. Often two things need to meet, so moving either of them in any direction makes things worse. If I need the eggs and the flour to meet in the bowl, pouring either of them anywhere different will cause destruction.
This could be classed under value being fragile, but I think it is worth pointing out the more specific forms this takes. In the case of charities, this might apply because you need a number of people to meet together in the same place as a vaccination clinic has been set up, at the same time as some nurses, and as a large number of specific items.
This might explain why things turn out in practice to be worse than they were in plans, however I’m not sure this is the only thing to be explained. It seems that also if look at plans in more depth (without seeing the messy real-world instantiation) you become more pessimistic. This might just be because you remember to account for the things that might go wrong in the real world. But looking at the Cool Earth case, these kinds of effects don’t seem to account for any of the negative considerations.
Another common kind of relationship between things in the real world is the one where if you change a thing, it produces a force which pushes it back the way it came. For instance, if you donate blankets to the poor, they will acquire fewer blankets in other ways, so in total they will not have as many more blankets as you gave them. Or if you be a vegetarian, the price of meat will go down a little, and someone else will eat more meat. This does account for a few of the factors in the Cool Earth case, so that’s promising. For instance, protecting trees changes the price of wood, and removing carbon from the atmosphere lowers the rate at which other processes remove carbon from the atmosphere.
Abstraction tends to cause overestimation
Another kind of negative consideration in the Cool Earth case is that saving trees really only means saving them from being logged with about 30% probability. Similarly, it only means saving them for some number of years, not indefinitely. I think of these as instances of a general phenomenon where a thing gets labeled as something which makes up a large fraction of it, and then reasoned about as if it entirely consists of that thing. And since the other things that really make it up don’t serve the same purpose in the reasoning, estimates tend to be wrong in the direction of the thing being smaller. For instance, if I intend to walk up a hill, I might conceptualize this as involving entirely walking up the hill, and so make a time estimate from that. Whereas in fact, it will involve some amount of pausing, zigzagging, and climbing over things, which do not have the same quality of moving me up the hill at 3mph. Similarly, an hour’s work contains some non-work often, and a 300 pound cow contains some things other than steaks.
But, you may ask, shouldn’t the costs be underestimated too? And in that case, the cost-effectiveness should come out the same. That does seem plausible. One thought is that prices are often known a lot better than the value of whatever they are prices on, so there is perhaps less room for error. e.g. you can see how much it costs to buy a cow, due to the cow market, but it’s less obvious what you will get out of it. This seems a bit ad hoc, but on the other hand some thought experiments suggest to me that something like this is often going on when things turn out worse than hoped.
If you have a plan, then that has some value. If things change at all, then your plan gets less useful, because it doesn’t apply so well. Thus you should be expected to consistently lose value when reality diverges from your expectations in any direction, which it always does. Again, this mostly seeks to explain why things would turn out worse in reality than expected, but that could explain some details making plans look worse.
Careful evaluators attend to negatives
Suppose you are evaluating a charity, and you realize there is a speculative reason to suspect that the charity is less good than you thought. It’s hard to tell how likely it is, but you feel like it roughly halves the value. I expect you take this into account, though it may be a struggle to do so well. On the other hand, if you think of a speculative positive consideration that feels like it doubles the value of your charity, but it’s hard to put numbers on it, it is more allowable to ignore it. A robust, conservative estimate is often better than a harder to justify, more subjective, but more accurate estimate. Especially in situations where you are evaluating things for others, and trying to be transparent.
This situation may arise in part because people expect most things to be worse than they appear – overestimating value seems like a greater risk than underestimating it does.
Construal level theory
We apparently tend to think of think of valuable things (such as our goals) as more abstract than bad things (such as impediments to those goals). At least this seems plausible, and I vaguely remember reading it in a psychology paper or two, though I can’t find any now. If this is so, then when we do very simple calculations, one might expect them to focus on abstract features of the issue, and so disproportionately the positive features. This seems like a fairly incomplete explanation, as I’m not sure why good things would naturally seem more abstract. I also find this hard to cash out in any concrete cases – it’s hard to run a plausible calculation of the value of giving to Cool Earth that focuses on abstracted bad things, other than the costs which are already included.
Appendix: some examples of more or less simple calculations
A basic calculation of Cool Earth’s cost to reduce a tonne of CO2 825,919 pounds in 2012 x 6 years/(352,000 acres x 260 tonnes/acre) = 6 pence/tonne = 10 cents/tonne
If we take into account:
- McKinsey suggests approach is relatively cost-effective (positive/neutral)
- Academic research suggests community-led conservation is effective (positive/neutral)
- there are plausible stories about market failures (positive/neutral)
- The CO2 emitted by above ground sources is an underestimate (positive)
- The 260 tonnes/acre figure comes from one area, but the other areas may differ (neutral/positive)
- The projects ‘shield’ other areas, which are also not logged as a result (positive)
- Cool Earth’s activities might produce other costs or benefits (neutral/positive)
- Upcoming projects will cost a different amount to those we have looked at (neutral/positive)
- When forestry is averted, those who would have felled it will do something else (neutral/negative)
- When forestry is averted, the price of wood rises, producing more forestry (negative)
- Forests are only cleared 30% less than they would be (negative)
- The cost they claim for protecting one acre is higher than that inferred from dividing their operating costs by what they have protected (negative)
- The forest may be felled later (negative)
- Similarity of past and future work (neutral)
- other effects on CO2 from averting forestry (neutral)
- CO2 sequestered in wood in long run (neutral)
- CO2 sequestered in other uses of cleared land (neutral)
…we get $1.34/tonne. However that was 8 positive (or positive/neutral), 4 completely neutral, and only 5 negative (or negative/neutral) modifications.
In case you wonder, the —/neutral considerations made no difference to the calculation, but appear to be if anything somewhat ‘—‘. Some of these were not small, but only considered as added support for parts of the story, so improved our confidence but didn’t change the estimate (yeah, I didn’t do the ‘update your prior on each piece of information’ thing, but more or less just multiplied the best numbers I could find or make up together).
Examples of estimates which aren’t goodness-related:
If I try to estimate the size of a mountain, will it seem smaller as I learn more? (yes)
Simple estimate: lets say it takes about 10h to climb, and I think I can walk uphill at about 2mph => 20 mile walk. Let’s say I think the angle is around 10 degrees. Then sin(10) = height/20. Then height = 20sin10 = 3.5 miles
- the side is bumpy, so I probably walk 20 miles in less than 20 miles => mountain is shorter
- the path up the mountain is not straight – it goes around trees and rocks and things, so I probably walk 20 miles in even less than 20 miles => the mountain is shorter
- when I measured that I could walk at about 2mph, I was probably walking the whole time, whereas when we went up the mountain, probably I really stopped sometimes to look at things, or due to confusion about the path, or whatever, so probably I can’t walk 2mph up the mountain => the walk is probably less than 20 miles. => the mountain is shorter
If I try to estimate the temperature, will it seem lower as I learn more? (neutral)
Simple estimate: I look at the thermometer, and it says a number. I think that’s the temperature.
- the thermometer could be in the shade or the sun or the wind (neutral)
- the thermometer is probably a bit delayed, which could go either way (neutral)
- I may misread the thermometer a bit depending on whether my eye is even with it – this could go either way (neutral)
> In this regression to the mean story, which mean is being regressed to? If it is the mean for charities, then ineffective charities should look better on closer inspection. I find this hard to believe.
Why? The basic logic is sound. Regression to the mean is not exclusively for high scoring things. The top 10% of people scoring on the SAT will regress down to their mean, but so will the bottom 10% regress upwards, by an amount based on the reliability of the test itself. What factors would selectively make regression upwards impossible?
> I suspect that even if a casual analysis suggests $1500 will give someone a one week introduction to food gardening, which they hope will reduce their carbon footprint by 0.1 tonnes per year, the real result of such spending will be much less than the tonne per $1000 implied by a simple calculation. The participant’s time will be lost in attending and gardening, the participants probably won’t follow up by making a garden, they probably won’t keep it up for long, or produce much food from it, and so on. There will also be some friendships formed, some leisure and mental health from any gardening that ultimately happens, some averted trips to the grocery store.
This example is largely irrelevant. When Givewell looks at charities, it is not looking at the median charity out of the literally hundreds of thousands or millions available; it is looking at charities which have been suggested by fans to possibly be the *top* charities in the world, in the tippy-top .001% percentile or whatever. It is not surprising that with unreliable information sources, Givewell’s evaluations tend to be lower. Is food gardening literally possibly the *worst charity in the world*, in the bottommost-bottom percentile? I rather doubt it is. I could go into Guidestar and easily find a dozen charities I’d consider worse than that (sweaters for baby seals, sending dying kids on very expensive trips to Disney world, that sort of thing) without even going into charities which embroiled in scandal.
Givewell doesn’t even bother to look at them because there’s no way such could ever be a top charity, so naturally, they never observe regression upwards, and anyone ignoring the selection effect will be misled.
You would have to look at what are reputed to be the worst charities around in order to construct an analogous charity example.
To give a real example rather than a fictional one, regression would predict that sometimes if you look closely at what you think is a terrible charity, it won’t actually be as bad as you think it is. And as it turns out, I’ve done just that; some time ago, I thought I’d look into Girl Scouts of USA, expecting to find terrible inefficiency and waste in their filings: http://www.gwern.net/Girl%20Scouts%20and%20good%20governance And… I found GSUSA was bad but not *quite* as terrible as I thought. In other words, “the more I look into an intervention, the better it gets” – when I look at things I expect to be downright awful!
> My guess is that these positive factors don’t make up for the negative ones any better than they do for more apparently effective charities.
Why? Again, what’s the actual mechanism whereby bad estimates are 100% reliable and there is no regression upwards to the mean of all charities but good estimates have reliabilities <1 and so regress down? All of your subsequent examples and mechanisms seem like they would apply equally to good and bad charities.
Charities are thought better than they; evils are thought worse. I think that’s the contrast determining the regression, and only the first could be explained by statistical regression.
I’m not sure why you didn’t offer what seems to me the most straightforward explanation: with scant information, the construal level will be abstract, minimizing what is incidental, whether the drawbacks of charity or the silver lining in evils.
I don’t know what I was thinking here, but regression can explain both.
Thanks for reminding me about construal level theory. Added.
Pingback: When should an effective altruist be vegetarian? | Meteuphoric
On the tree planting charity example, my mind immediately leapt to overattribution. Maybe they made a $100k grant towards a project that used $1M to plant trees, or spent $50k advocating for a $50m public investment in protecting forests, or counted net trees saved by their decision to use recycled paper. Charities have little incentive to underattribute benefits to themselves.
Pingback: Research immersiveness | Meteuphoric