Category Archives: 1

When will AI surpass us at being limited?

Crossposted from world spirit sock puppet.

It’s not always better to be more capable. As I mentioned yesterday, it can (famously) be helpful in negotiations to have your hands tied. That is, to be disempowered from giving up everything the other party wants.

I had previously thought of this as a somewhat rare corner case of human behavior—I for one don’t haggle very often—but I now think negotiations where this is an element are are quite common: yesterday I described it in friendly (and honest) negotiations about how to spend time, for instance. And I also see a related thing in the practice of dietary commitments.

But is being less capable helpful outside of negotiating? And is this going to become AI related?

Yes and yes!

Commitments: more good things come to those who can commit (e.g. rides out of deserts, secrets, trust, love). ‘Committing’ generally involves cutting off certain options to yourself, whether in practical terms or via you being the kind of honorable person who can’t bear to do a thing they promised not to do. These are both kinds of limitations. If you were a more powerful creature, who was fully capable of breaking down any barrier, and fully capable of breaking a promise—a creature to whom all options were always open—then commitments would be less available to you.

Transparency: a big way humans know what is going on inside other humans, well enough to trust them, is that there is a connection between what is happening inside them and what is happening on their faces and in their bodies, and they usually can’t control this very well. People who can break this connection and control their external behavior independently tend to be feared and distrusted. It is valuable to be unable to stop these signals escaping.

Consistency: a big way we predict how a specific human will behave in the future is that each human has specific kinds of behavior that come easily to them, and it is hard for them to behave entirely differently. So if you are friends with someone who you have observed be attentive and kind to other people for five years, it is very likely that they continue behaving in that way going forward. Whereas a creature with more freedom of behavior could wholly inhabit that persona for five years, then change to a different one.

Relatedly, we know a lot about what to expect from a human stranger because of our prior knowledge of humans. If humans had the power to rewrite their internal dynamics and become totally different creatures, then we would much less know what to expect from one.

Scope of risk: people are safer to interact with if you know they are limited in their ability to cause destruction. You might prefer to hire a person who you think would be less able to wrest control of your organization if they wanted to. You might prefer to babysit a child who does not know how to pick locks or set fires. So a person might be more employable, or be taken care of by better babysitters, if they are less capable. Similarly, an extremely capable AI system might be a less desirable accountant than a human, if you can only fully trust the human to not be up to the task of hacking your accounts.

These are all to do with interacting with other creatures. For a creature alone in the universe, I don’t know of any situation where they are better off being less capable. But when you need to trust another creature, it is better to know more about them, and better to know they are cut off from options that might harm you.

In the usual picture of AI progress, AI is worse than humans at various tasks, and we are waiting for it to surpass us everywhere, at which point humans will be obsolete as labor. But in a world where AI needs to interact with other agents (humans or AIs) the aforementioned value of being less capable complicates things: perhaps there are skills where AI is already more capable than humans, but where that capability is a liability. For instance, lying smoothly and otherwise generating outward behavior that isn’t revealing about internal dynamics, switching between entirely different personas, and hacking skills. Given that, what does the trajectory look like?

Vibe signaling externalities and the people-to-places pipeline

Crossposted from world spirit sock puppet.

People are sending signals all the time, and those signals are to my knowledge usually about themselves: they are smart, or kind, or attractive, or not naive, or have their shit together, or care about Palestine, or care about you, or are friendly, or artsy, or professional, or relatively in the know about the cultural currents of TikTok or DC.

People are also taking in signals all the time, and these signals are often about other people, and often even closely related to the signals being intentionally sent: Alice is trying to seem friendly, and Bob perceives her as friendly. But also a lot of signals people take in are about places. People read places as safe or dangerous, lighthearted or depressing, silly or serious, asking them to know more, or get more power, or do more. Suggesting they laugh drunkenly under the moonlight, or get up at 5 and pray. Encouraging submission or rebellion.

These signals that make the world feel one way or another make a big difference to people. They make one neighborhood nice to live in and another feel off, one workplace energizing and another deflating. But they are—to my knowledge—almost entirely unintentional side effects of the ways people behave for other reasons. People don’t dress nicely to collaborate in making you feel like you are in a thriving part of town. They dress nicely to make someone think something about them. And someone probably does, but then the signal is left there for everyone else to sweep into their average perception of the vibe in this part of town.

A lot of ways people behave that affect the vibe are probably not intended as signaling at all—for instance, perhaps I grow roses in my front garden because I love roses, and it nonetheless affects people’s read of the vibe. Or perhaps I keep piles of scrap metal there because I want them for something, and that has a different effect.

But an interesting dynamic to me is that a lot of efforts are going into sending signals about people, and those signals are being read as messages about places. Because places can’t send their own signals, but vibes are a very big part of how people experience places, and place vibes are heavily influenced by people’s attempts to paint themselves as one thing or another.

People try to look not-to-be-messed-with and strangers read the street as dangerous. People try to look generative and strangers read the neighborhood as wealthy enough to have time for this. People try to look rich and people read the area as safe. People try to look beautiful and people read the scene as shallow. People try to look smart, and people read the office as unwelcoming.

In sum I posit that there are massive externalities in vibes, and especially in the vibes of places, and there is a particular path of causality from signaling about people to unintentional signals about places.

(I’m not very confident about all this—I was just thinking about it this evening, arriving in and mildly exploring New York City. I think there’s a lot to be said about organizations’ roles in this that I haven’t gone into—for instance in a bar or restaurant or stand up comedy club, people are trying directly to make you experience a vibe. These are small places where the vibe of the place has been mostly internalized—someone owns it.)

AI risk was not invested by AI CEOs to hype their companies

Crossposted from world spirit sock puppet.

I hear that many people believe that the idea of advanced AI threatening human existence was invented by AI CEOs to hype their products. I’ve even been condescendingly informed of this, as if I am the one at risk of naively accepting AI companies’ preferred narratives.

If you are reading this, you are probably familiar enough with the decades-old AI safety community to know this isn’t true. But I don’t have a good direct way to reach the people who could use this information, and still I hate to leave such a falsehood uncontested. So if this is obvious, I hope the post is still perhaps useful to point more distant and confused people toward.

~

I personally know that AI risk was not invented by the tech CEOs because I have been near the middle of it since at least 2009—before any of the prominent AI companies existed, let alone had CEOs who might be trying to hype their products.

Here’s are some miscellaneous events over the years to give you a sense of the implausibility of this:

  • 2008 – I attempt to contact Eliezer Yudkowsky to inform him that I am ‘trying to figure out the optimal way to use my life’ and would like to hear a better account of why his plan (of worrying about AI risk) is good. I have read about it online, but would like a clearer account. Traveling the world shortly after undergrad later, I meet a handful of people in person in the Bay Area who care about this, and one argues strongly that I should prioritize AI risk over my previously preferred causes e.g. climate change. I decide to think about this.
  • 2009 – I am still not very convinced that AI is the most important thing to work on, but go to stay with the people who are worried about it for a few months. I argue about it a lot with a handful of them. There seem to be about twenty of them locally in the South Bay, though many more who comment on the relevant blogs. My photography collection from this era is quite sparse.


    I go to The Singularity Summit for my first time (and its fourth), which is very lively and full of people who are thinking seriously about the future of AI.

  • 2010 – Deepmind is founded. (I am back at school.)
  • 2011 – I start a philosophy PhD at CMU, hoping to be eligible to work at somewhere like the Future of Humanity Institute one day, which is a happening hub of discussion about existential risk, AI and other important issues, that I like to visit.
  • 2012 – I visit the Bay more and hang out with the growing AI risk community there. I visit the UK and do the same. I go to the AGI 2012 Winter Intelligence Conference.
  • 2013 – I move to Berkeley and work at MIRI for a semester during grad school. I measure algorithmic progress over time across various computer science domains, as input to expectations for artificial intelligence in future. I visit the UK and attend the Center for Effective Altruism’s ‘weekend away’ where we have a debate on which cause is best, between global poverty, animal welfare and extinction risk. Extinction risk wins—the crowd leaves having changed their mind in that direction on net. The three advocates just before or after:

  • 2014 – I join MIRI properly. I research The Asilomar Conference and Leó Szilárd as evidence about whether it is worth people trying to deal with risks early, because people around mostly believe that the risks from AI are at least a decade away, and there is disagreement about whether that makes it futile. I run an online reading group about Superintelligence, a new book about AI risk. I co-found AI Impacts, a project to answer questions about the future of AI, because AI risk seems at least fairly plausibly the most important thing to work on, and I want to investigate more and share my thinking with others.
  • 2015 – I attend the first FLI conference—it seems that more people and more prominent people are interested in AI safety! OpenAI is founded.

  • 2016 – I lead a team to run the first Expert Survey on Progress in AI. The median probability given to an outcome of advanced AI that is “Extremely Bad (e.g., human extinction)” is already 5%.
  • 2017 – Some people around me are getting very worried, and saying AGI will happen within several years. My survey gets a shocking amount of media attention, becoming the ‘16th most discussed paper’ in 2017 according to Altmetric. Apparently there is interest in this topic..
  • 2018 – I go to a big workshop for people working on AI risk in the English countryside, and a Chilean summit where I talk on TV and the radio about AI risk. It feels like interest is still picking up, and I feel optimistic about talking to the public.

  • 2019 – GPT-2 comes out. Someone tries to get it to name our house. My favorite names include things like “World peace: tigers and humans” and “rooftop hillside: the highest place in the world”. It is hilarious and useless, but also magical and wild. The things we have worried about for years are feeling more tangible, and people’s ‘AI timelines’ are shrinking.
  • 2020 – The world is reminded that really crazy things can happen. AI Impacts becomes remote. I spend the year with my household, who are almost all working on AI risk. We enjoy whiteboards a lot and run at least one good house conference in this period.

  • 2021 – Anthropic is founded

AI unemployment and AI extinction are often the same

Crossposted from world spirit sock puppet.

How much should the ideal person cry wolf?

Crossposted from world spirit sock puppet.

It is a fact about wolves and rationality that you should warn people about wolves quite a few times for every effective wolf attack.

In particular, there is an asymmetry between the costs of having one’s flock devoured and averting a non-eventuating wolf attack. If the carnage is a hundred times worse, then it’s worth up to ninety-nine false alarms to stop it.

The original fable was about a boy who would continually lie about wolves, and that is definitely poor form.

But in modern parlance, ‘crying wolf’ seems to be used for just being openly alarmed about things that turn out ok—I don’t hear much implication of deceit.

And in modern sensibilities, being seen to ‘cry wolf’—by even once raising an alarm that isn’t consummated with disaster—is something people seem to really fear. I think multiple people have asked me about whether AI safety people might have ‘cried wolf’ about some earlier GPT model. I’m not aware of anyone doing that, but the idea that they might have is so tantalizing that it bears investigating. Because if even a a few people somewhere did, it would be such a nice embarrassing blow to AI safety people.

And I probably responded in the tempting way: jumping to assure that I don’t recall hearing any such fears from these quarters. But I think that worsens public thought norms by implicitly buying into the unspoken premise that it would be quite shameful and naive to have raised even one warning.

And so relatedly, probably people who see real risks from AI are scared to voice them, lest they be seen to ‘cry wolf’ and tank the credit of the movement for the next round of dangers. Because it is taken for granted that one should only get one chance to raise an alarm. That the first warning must be for the most undeniably big, bad, real wolf.

This is not the wolf lookout system we want.

‘Warnings’ are usually about fairly bad events, and therefore they tend to be worth making when the probability of those events is still low. This creates a real difficulty for society in adjusting people’s credit when the low probability events they have warned of do not come to pass. Most of the time, if the person is right, the events still shouldn’t happen! The person wasn’t saying they were likely! Yet you don’t want to let the alarmist off the hook, with plausible deniability for arbitrarily many alarms.

I think the solution to this difficulty should look much more quantitative, like collecting rich track records of the predictions made by a person or a movement, and scoring them well. The present solution of childishly denouncing any unmet danger is insane.

And meanwhile if there are bad risks that have a low chance of appearing on every warning, we should still warn of them, and not be too much cowed by innumerate customs.