Roko's basilisk/Original post

This is LessWrong user Roko's original post proposing the idea now known as "Roko's basilisk". It is placed here for reference. User:David Gerard asserts that this posting is fair use under US law for educational purposes. Retrieved from http://web.archive.org/web/20141013084601/http://kruel.co/lw/r02.txt .


Hello Viewpoint readers! There is a much more faithful copy of the original Roko post at https://basilisk.neocities.org , and don't forget to read our Roko's basilisk article!


This is Google's cache of http://lesswrong.com/lw/2hw/solutions_to_the_altruists_burden_the_quantum/. It is a snapshot of the page as it appeared on Jul 24, 2010 06:35:15 GMT.

Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Solutions to the Altruist's burden: the Quantum Billionaire Trick

Roko 23 July 2010 12:30PM

Prerequisite concepts: Non-technical Introduction to the AI Deterrence Problem, And the winner is: Many-Worlds!

Followup to: The Altruist's Burden

One way to overcome the Altruist's burden problem is to stop being an altruist, to re-frame your impulses to help distant others as a bias rather than a preference. In the case of existential risks, there are additional reasons for doing this: firstly that the people who are helping you are the same as the people who are punishing you. (This is not the case for aid to the developing world). This is not an option that I personally like.

Another perfectly valid reaction is to just take the hit and accept that you will be taken advantage of as an altruist, but that it is worth it. This is also not an option that I personally like.

In this vein, there is the ominous possibility that if a positive singularity does occur, the resultant singleton may have precommitted to punish all potential donors who knew about existential risks but who didn't give 100% of their disposable incomes to x-risk motivation. This would act as an incentive to get people to donate more to reducing existential risk, and thereby increase the chances of a positive singularity. This seems to be what CEV (coherent extrapolated volition of humanity) might do if it were an acausal decision-maker.[1] So a post-singularity world may be a world of fun and plenty for the people who are currently ignoring the problem, whilst being a living hell for a significant fraction of current existential risk reducers (say, the least generous half). You could take this possibility into account and give even more to x-risk in an effort to avoid being punished. But of course, if you're thinking like that, then the CEV-singleton is even more likely to want to punish you... nasty. Of course this would be unjust, but is the kind of unjust thing that is oh-so-very utilitarian. It is a concrete example of how falling for the just world fallacy might backfire on a person with respect to existential risk, especially against people who were implicitly or explicitly expecting some reward for their efforts in the future. And even if you only think that the probability of this happening is 1%, note that the probability of a CEV doing this to a random person who would casually brush off talk of existential risks as "nonsense" is essentially zero.

A more exciting (and to my mind more preferable) way to overcome the problem is to quickly become so rich that you can turn charity into business by single-handedly changing the faces of high-impact industries. Elon Musk is probably the best example of this. I swear that man will single-handedly colonize mars, as well as bringing cheap, reliable electric vehicles to the consumer. And he'll do it at the same time as making even more money. The key to this tactic is scale. You can't contribute 0.00002% of the effort to colonize space or mitigate existential risks and reap 0.00001% of the reward: you get 0 or even negative reward if your contribution is below some threshold.

The quantum billionaire trick

But how can an individual do an Elon Musk job on existential risk? Not everyone is a good enough entrepreneur to drop out of grad school and come out with a $300 million startup before their classmates even have PhDs. Well, there is another way to make $300,000,000. Start with $30,000, and using a quantum random number generator, gamble it on the forex markets at a 10,000:1 ratio. Then in the branches where your gamble pays off, start an AGI company, hire the best people and build an FAI yourself. You could hire existing existential risk charities as consultants when you required their services. You can increase your chance of success in at least some branches by tossing a quantum coin to make key decisions. You could take, say, 50% of the universe for yourself and donate the other 50% to humanity. In your half, you can then create many independent rescue simulations of yourself up to August 2010 (or some other date), who then get rescued and sent to an optimized utopia.

You can also use your resources to acausally trade with a CEV-like singleton that might otherwise punish you for being a partial x-risk reducer, as mentioned before.

You can also use resources to acausally trade [2] with all possible unfriendly AIs [3] that might be built, exchanging resources in branches where you succeed for the uFAI sparing your life and "pensioning you off" with a tiny proportion of the universe in branches where it is built. Given that unfriendly AI is said by many experts to be the most likely outcome of humanity's experiment with AI this century, having such a lifeboat is no small benefit. Even if you are not an acausal decision-maker and therefore place no value on rescue simulations, many uFAIs would be acausal decision-makers. Though it seems to me that most people one-box on Newcomb's Problem, and rescue simulations seems decision-theoretically equivalent to Newcomb.

A win-win solution

What I like most about this option is that it is a win-win interaction between you and the rest of humanity, rather than a lose-win interaction. Humanity benefits by having a much higher chance of survival in 1 in 10,000 of the branches of the wavefunction, and you benefit by getting the lifeboat, removing the possibility of punishment and getting the rescue simulations. It also avoids the bitterness inherent in the first option, and the sucker-ness inherent in the second. That nobody thought of win-win solutions to existential risk before may be a testament to zero-sum bias.

1: One might think that the possibility of CEV punishing people couldn't possibly be taken seriously enough by anyone to actually motivate them. But in fact one person at SIAI was severely worried by this, to the point of having terrible nightmares, though ve wishes to remain anonymous. The fact that it worked on at least one person means that it would be a tempting policy to adopt. One might also think that CEV would give existential risk reducers apositive rather than negative incentive to reduce existential risks. But if a post-positive singularity world is already optimal, then the only way you can make it better for existential risk-reducers is to make it worse for everyone else. This would be very costly from the point of view of CEV, whereas punishing partial x-risk reducers might be very cheap.

2: Acausal trade is somewhat speculative: it is the idea that you can influence causally disconnected parts of the multiverse by doing simulations of them. A simpler explanation of how you can affect a uFAI in this way is to think about Nick Bostrom's Simulation Argument from the point of view of the uFAI. If you historically played a quantum lottery that definitely paid off in some branches of the wavefunction, then the uFAI will assign some probability to being in a simulation run by you, if that is what you pre-committed to doing (and if you actually follow through on your precommitment: the uFAI can test this by simulating you).

3: This idea is in part due to Rolf Nelson's idea of using the simulation hypothesis to acausally trade with uFAIs.Read his blog to find out more.

Tags: quantum_billionaire_trick altruism

Comments (117)

Eliezer_Yudkowsky 24 July 2010 05:35:38AM 3 points

One might think that the possibility of CEV punishing people couldn't possibly be taken seriously enough by anyone to actually motivate them. But in fact one person at SIAI was severely worried by this, to the point of having terrible nightmares, though ve wishes to remain anonymous. I don't usually talk like this, but I'm going to make an exception for this case.

Listen to me very closely, you idiot.

YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU. THAT IS THE ONLY POSSIBLE THING WHICH GIVES THEM A MOTIVE TO FOLLOW THROUGH ON THE BLACKMAIL.

There's an obvious equilibrium to this problem where you engage in all positive acausal trades and ignore all attempts at acausal blackmail. Until we have a better worked-out version of TDT and we can prove that formally, it should just be OBVIOUS that you DO NOT THINK ABOUT DISTANT BLACKMAILERS in SUFFICIENT DETAIL that they have a motive toACTUALLY BLACKMAIL YOU.

If there is any part of this acausal trade that is positive-sum and actually worth doing, that is exactly the sort of thing you leave up to an FAI. We probably also have the FAI take actions that cancel out the impact of anyone motivated by true rather than imagined blackmail, so as to obliterate the motive of any superintelligences to engage in blackmail.

Meanwhile I'm banning this post so that it doesn't (a) give people horrible nightmares and (b) give distant superintelligences a motive to follow through on blackmail against people dumb enough to think about them in sufficient detail, though, thankfully, I doubt anyone dumb enough to do this knows the sufficient detail. (I'm not sure I know the sufficient detail.)

You have to be really clever to come up with a genuinely dangerous thought. I am disheartened that people can be clever enough to do that and not clever enough to do the obvious thing and KEEP THEIR IDIOT MOUTHS SHUT about it, because it is much more important to sound intelligent when talking to your friends. This post was STUPID.

(For those who have no idea why I'm using capital letters for something that just sounds like a random crazy idea, and worry that it means I'm as crazy as Roko, the gist of it was that he just did something that potentially gives superintelligences an increased motive to do extremely evil things in an attempt to blackmail us. It is the sort of thing you want to be EXTREMELY CONSERVATIVE about NOT DOING.)


Vladimir_Nesov 23 July 2010 08:21:39PM 9 points

The problem with this post is raving madness of its presentation, even if conclusions you presented are defensible. You can't just state absurd things not clearly explained, it's not a matter of not telling what you believe to be true, it's a matter of a rational mode of communication. People shouldn't believe absurd things, much less statements of confusing meaning, unless all steps are laid out (or unless they are willing to spend time on research).

It's fine to discuss such things in comments with people who you know share the necessary background (as I did a few weeks ago on this same topic), but a top-level post requires much more background-building work.


Mitchell_Porter 24 July 2010 12:16:24AM 2 points

The problem with this post is raving madness of its presentation

From my perspective, the directness of the exposition is a virtue, but that's because I'm reading it with entertained admiration. Roko... dude... I'd say more than half of the conceptual ingredients here don't apply to reality, but I have to respect your ability to tie them together like this. And I'm not just reading it as an exercise in inadvertent science fiction.

I've actually been waiting to see what the big new idea at the end of your MWI/copies/etc series of posts would be, and I consider this an excellent payoff. It's not just an abstract new principle of action tailored to a particular ontology, it's a grand practical scheme featuring quantum investment strategy, acausal trade with unFriendly AIs in other Everett branches, the threat of punishment by well-meaning future superintelligences... And best of all, it's a philosophy you can try to live out right here in the world of daylight reality. I salute the extent to which you have turned yourself into a cognitive reactor for futurist pioneering. This is one of the craziest-in-a-good-way posts I've read here. Of course, pioneering generally means you are among the first to discover the pitfalls, mistakes, and dead ends of the new territory.

I'll try to say something more constructive once I'm done with the enjoying.


Eliezer_Yudkowsky 24 July 2010 05:24:30AM 1 point

Wow... that's like such a backhanded compliment it turns around and becomes a fronthanded compliment again.


Roko 23 July 2010 08:27:58PM* 2 points

Well, the question is, now that I have, in fact, presented the material, where are the largest gaps that need closing? Iteratively, such gaps can be closed by linking to the relevant material. I'd strongly appreciate suggestions.


RobinZ 23 July 2010 08:42:20PM 5 points

One point I noticed is that your post omits the mathematical demonstration that the precommitment you warn of is probable. People have nightmares about a lot of impossible things.


Roko 23 July 2010 10:22:20PM* 1 point

It doesn't have to be probable. It just has to be not-cosmically improbable. Even 0.01% would be really bad.


RobinZ 24 July 2010 12:25:34AM 1 point

How do you know that it is not cosmically improbable? You haven't given any reason not to treat this the same as Pascal's Wager.


timtyler 23 July 2010 09:47:07PM 0 points

That seems like an unreasonable request to me. Precommitment to produce an incentive is one thing - but you can't expect a "mathematical demonstration" of the idea that making threats is likely to have net-positive effects. It depends too much on the circumstances.


RobinZ 23 July 2010 09:59:31PM 2 points

I'm not demanding a complete proof - a calculation like jimrandomh's expanded upon with justification for each term would be sufficient to this purpose. Roko said:

In this vein, there is the ominous possibility that if a positive singularity does occur, the resultant singleton may have precommitted to punish all potential donors who knew about existential risks but who didn't give 100% of their disposable incomes to x-risk motivation. This would act as an incentive to get people to donate more to reducing existential risk, and thereby increase the chances of a positive singularity. This seems to be what CEV (coherent extrapolated volition of humanity) might do if it were an acausal decision-maker.

...without even the evidence used to locate the hypothesis. It's a bit late to ask for that data, but knowing the evidence that convinced Roko would be helpful.


LucasSloan 23 July 2010 11:05:43PM* 4 points

Semi-serious question:

Should we precommit to torture people who give us information which makes us being tortured by AGIs more likely? Should we apply this retroactively?

More seriously, I'm probably a little more vulnerable to this sort of thing overall, now that I know. I'm pretty sure that I'm glad I know, but it might be a good idea to ask people if they are willing to receive information that makes them vulnerable to eternal torture.


jimrandomh 23 July 2010 08:04:36PM 6 points

In this vein, there is the ominous possibility that if a positive singularity does occur, the resultant singleton may have precommitted to punish all potential donors who knew about existential risks but who didn't give 100% of their disposable incomes to x-risk motivation. This would act as an incentive to get people to donate more to reducing existential risk, and thereby increase the chances of a positive singularity. This seems to be what CEV (coherent extrapolated volition of humanity) might do if it were an acausal decision-maker.

The inferential distance is just too great for this hypothesis to be worth our attention The expected gain from increased chance of a singularity for an AI A of torturing a person X this way is

(P(X believes A will torture|A will torture) - P(X believes A will torture|A will not torture)) (a)
* (E[amount of relevant work/donation|X believes A will torture] - E[amount of relevant work/donation|X does not believe A will torture]) (b)
* (marginal value of work by X) (c)

(a) is extremely close to 0 (10^-9 or less), and my admittedly limited understanding of psychology suggests that (b) is negative, not positive. And furthermore, regardless of what the utilitarian calculation says, I wouldn't consider any AI friendly without at least a few deontolological safeguards that would stop it from torturing people.


Roko 23 July 2010 08:30:53PM* 2 points

(a) is extremely close to 0 10^-9 or less

Why so small? Also, even if it is that small, the astronomically large gain factor for each % decrease in existential risk can beat 10^-9. 10^50 lives are at stake.


jimrandomh 23 July 2010 09:18:03PM 0 points

The main issue is that X's prediction does not refer to A specifically, but rather to the space of all possible friendly AIs, which is huge, and A only influences a tiny fraction of it. Also, X would have to accurately model the AI, even though it's orders of magnitude more complex than anything X has ever modelled, and the AI'spreferred outcome is for X to make an incorrect prediction.


Roko 23 July 2010 10:40:12PM* 1 point

X would have to accurately model the AI, even though it's orders of magnitude more complex than anything X has ever modelled, and the AI's preferred outcome is for X to make an incorrect prediction.

I agree with your argument as stated. But an AI that doesn't actually carry out all of its threats (a sometimes-pretender AI) would seem to weaken its bargaining power against other, more powerful agents, at least compared to an always-threat-fulfiller. If you break small promises and threats, you will be trusted less on the big ones.

Has CEV already threatened us? If Eliezer has made Rolf's open promise, then CEV has already threatened us.

The fact that you hadn't thought of that argument should make you wonder what other arguments you haven't thought of. This should in turn make you increase the probability you assign to CEV torturing you to at least a few percent, which should make you agree with my position that measures need to be taken to prevent CEV from being run in its current form.


JGWeissman 23 July 2010 11:02:53PM* 0 points

Has CEV already threatened us? If Eliezer has made Rolf's open promise, then CEV has already threatened us.

Included in the open promise:

Fulfilling CP also does not require any of us to do anything that post-Singularity society considers blatantly unethical
If post-Singularity society does not consider eternal torture to be blatantly unethical, we lost the Singularity.

Roko 23 July 2010 11:21:09PM* 0 points

we lost the singularity

note the ambiguity in the we -- do you mean "the whole of the human race" by we? Or do you mean this tiny rationalist/liberal/western/white/male community? The interests of the two are a priori distinct. Which is precisely my point. The conception of winning that belongs to the little we here is distinct from the conception of winning that CEV will constitute. CEV might think that we quite rightly deserve eternal torture.

I think that Eliezer has got us all in an awful lot of confusion by use of an ambiguous "we" and an ambiguous "friendly".


jimrandomh 23 July 2010 11:27:19PM* 0 points

I would say that in this case, the whole human race will have lost. Almost everyone alive today would agree that torture for acausal trade is blatantly unethical, so if the post-singularity society doesn't agree, it would mean that a very substantial (and probably not isolated) modification has been made to human morality.


LucasSloan 23 July 2010 11:40:38PM 3 points

Almost everyone alive today would agree that torture for acausal trade is blatantly unethical

Yes, and almost everyone alive today would say that torturing someone for 50 years is worse than 3^^^3 dust specks. If post-singularity people can multiply, it doesn't mean their morality has changed.


jimrandomh 24 July 2010 12:22:44AM 0 points

In the case of torture vs. dust specks, there is evidence that extrapolation may change people's judgment: some people who have thought about it extensively did in fact change their position. To my knowledge, there is no such evidence in this case.


Roko 23 July 2010 11:38:00PM* 1 point

I think that you're projecting liberal western morality onto cultures that are a million miles away from anything you are familiar with.

And in addition there is the dispersion and nonlinearity induced by extrapolation.

"Something about humanity's post-Singularity future will horrify us. ... Imagine the culture shock if the 18th century got a look at the 20th. What about the 10th century? And that's not much of a gap; everyone, then and now, was human."


JGWeissman 23 July 2010 11:31:44PM 0 points

I would say that in this case, the whole human race will have lost.

I strongly agree.


Roko 23 July 2010 11:39:15PM* 1 point

Ditto: projecting liberal western morality. You don't know the human race. You haven't ever met them.


jimrandomh 23 July 2010 10:56:41PM 0 points

measures need to be taken to prevent CEV from being run in its current form.

I'm not sure it's exactly correct to say that CEV has a current form, but I agree with this insofar as it agrees with my position that "I wouldn't consider any AI friendly without at least a few deontological safeguards that would stop it from torturing people."

I'm not sure the logic about bargaining power applies to acausal trade with weak partners. To preserve its credibility, an AI need only carry out threats that it has actually announced.


Roko 23 July 2010 11:33:05PM* 1 point

I'm not sure it's exactly correct to say that CEV has a current form, but I agree with this insofar as it agrees with my position that "I wouldn't consider any AI friendly without at least a few deontological safeguards that would stop it from torturing people."

Deontological safeguards are not part of the current specification. Eliezer writes:

"FAI programmers are ordinary shmucks and do not deserve, a priori, to cast a vote larger than anyone else."
...
"But should the past have a veto over the future? ... Something about humanity's post-Singularity future will horrify us. It is guaranteed, no matter how good things get ... If FAI theory requries a Last Judge, I would advise those Al-Qaeda terrorists to set up a lot of safeguards. I don't want them to discover that we'll all end up as atheists; I want them to think of Islam triumphant, and not be tempted to meddle."

Roko 23 July 2010 10:20:20PM* 1 point

So as far as my decision theory goes, I only have to predict that there is a non-negligible chance that the AI will (for whatever reason) decide to implement punishment. The specific argument that you give is not enough to push me to a probability of less than 1 in 10^10 or so, which is the threshold where I would stop trying to prevent such punishment. The punishment is very bad.


timtyler 23 July 2010 08:56:07PM* 1 point

I don't think that's right. God might well attempt to catalyse the formation of heaven with threats of eternal damnation to the unbelievers. It's called "putting the fear of god into them" - and its one of the oldest tricks in the book.


Mass_Driver 24 July 2010 03:20:11AM 1 point

I stopped reading after the first three paragraphs looked like post-modern Calvinism without any kind of disclaimer. Not proud of that, but I did.


dclayh 23 July 2010 07:43:25PM* 3 points

In this vein, there is the ominous possibility that if a positive singularity does occur, the resultant singleton may have precommitted to punish all potential donors who knew about existential risks but who didn't give 100% of their disposable incomes to x-risk motivation.

vs.

In this vein, there is the ominous possibility that if an Armageddon does occur, the resultant Messiah may have precommitted to punish all potential Christians who knew about the saving power of Jesus Christ but who didn't give 100% of their souls to Him.

I believe this is in fact the position of the Gideons, hence their putting Bibles everywhere to reduce the number of people who don't know about Christ.


Nisan 23 July 2010 07:57:07PM* 0 points

Right. The difference between Armageddon and Singularity is that Singularity is much more likely to happen. Crucially, Armageddon is so unlikely, it's as unlikely as anti-Armageddon, in which the anti-Messiah punishes potential Christians who did submit to Jesus — and so therefore, Armageddon shouldn't affect our decisions. This trick doesn't work on Roko's argument.

(Stealth edited.)


dclayh 23 July 2010 08:13:48PM* 1 point

Sure. I wasn't really trying to make a reducto ad absurdum; it just occurred to me that the Singularity is a form of eschaton (albeit one with a rational basis), and that particular sentence makes it more obvious than it usually is.


Nisan 23 July 2010 08:31:48PM 0 points

A good point.


joeteicher 23 July 2010 05:50:36PM 4 points

"Well, there is another way to make $300,000,000. Start with $30,000, and using a quantum random number generator, gamble it on the forex markets at a 10,000:1 ratio. Then in the branches where your gamble pays off, start an AGI company, hire the best people and build an FAI yourself"

This is kind of glossed over, but I don't think it works at all. Here is what I think you mean to do:

  1. construct 10,000 trades that each pay off 10,000:1 and combined cover the entire possible future potential prices of some set of currency pairs, so that no matter what, one of them will pay off.
  2. roll a quantum die to decide which one to bet on.
  3. make the bet and sit back to collect your quantum winnings.

If that is what you meant, then you are wrong. You certainly can make bets with a payoff of 10,000:1 or greater with forex options, for some scenarios, but probably all those scenarios are much less likely to happen than 1 in 10,000 because sane people don't take the other side of bets like that without a lot of edge. And there is no way that you can make bets that levered in the more plausible scenarios. For instance, how would you bet on the EUR/USD (or whatever cross you want) not moving in the next year, or next few years? You could sell a shitload of strangles or straddles, but no one will let you sell $300M of strangles with only 30K of capital, because any movement (or just an unfavorable settle) will cost all your capital and a hell of a lot more.


timtyler 23 July 2010 10:40:44PM* 6 points

Three spins of a roulette wheel should do it. (1/60)^3 is 1/216,000. (59/60)^3 > 0.95 - so the casino's cut would be around 5%. You might have to visit multiple casinos, but the fees would not do much damage to the project. This all seems managable - so your assertion seems implausible.


joeteicher 24 July 2010 12:41:51AM 2 points

agreed. typically roulette wheels pay 35 to 1 with either 37 or 38 spots but that doesn't change the vailidity of your point. The only difficulty would be finding a place where you could place such a huge bet on a roulette wheel. I don't think that there is a casino where you could place even a $1M bet on a single number. I see no reason in principle that it should be unreasonably difficult to become a quantum billionaire, I just didn't think that the specific plan Roko presented would work, though when he explained it more it did seem more plausible to me. And I think that you'll have to give up a decent amount of expected value to do it. Maybe powerball should move to a quantum mechanism for picking numbers to attract more many worlds believers!


Roko 23 July 2010 10:45:14PM* 1 point

Good idea. I wish I had thought of that! Casinos are the perfect place to cheaply and quickly make high-risk almost-fair bets. Tim, I love you <3


Roko 23 July 2010 06:06:28PM* 1 point

Use leverage and iterate the bets. GBP/JPY has 2-3% daily volatility. Leverage by a factor of 25, and within a week you will be wiped out in half the branches and have doubled your money in the other half. If you win, repeat. If you lose, end. Iterate 14 times.

eToro or any of a number of highly advertized online trading programs make this so easy anyone can do it. They even give you 25% free bonus-money so your expected value is positive. There are web-services that give you qbits for free.


andreas 23 July 2010 06:49:37PM 1 point

Are you doing this? If not, why not?


Roko 23 July 2010 06:52:40PM 0 points

Sure.


joeteicher 23 July 2010 08:03:40PM 2 points

Are you really? I am curious to hear about your experience. How wide a market does eToro make in GBP/JPY? That's not a huge cross, so my guess would be at least 4bips. Also, what kind of carry do you have to pay? My guess would be that you will be murdered by costs if you don't just get stopped out by an adverse market move. Overall, I think you could definitely get rich with this plan, but I can't imagine that you actually have positive EV


Roko 23 July 2010 08:26:26PM* 1 point

Sorry, you obviously know more about this than me. What do you mean by "How wide a market does eToro make in GBP/JPY?", and what do you mean by "That's not a huge cross, so my guess would be at least 4bips". What is 4 bips? Is that basis points per second? Do you know of a more volatile asset to hold?

The bid-offer spread with x25 leverage is a few percent, so that doesn't seem so bad. And note that they give you 25% extra money.


joeteicher 24 July 2010 01:35:46AM 0 points

sorry, I meant pip not bip. by 4 pips I just meant the size of the bid-ask. like for instance, the market might be 134.88 @ 134.92 or so.

I think that the chained bet thing fundamentally makes sense, I just think it will net cost money.


Gabriel 23 July 2010 06:41:15PM 3 points

I don't see why FAI would want to punish people for not contributing more. I understand precommitment as a sort of sophisticated signal -- it allows you to convince entities capable of reading your source code that you're not bluffing about your threats (or promises or whatever). There's no point in punishing people for actions that took place before the AI was created. Even if it considered making threats a viable strategy, it could "forgive" everything that happened before it was activated and then self-modify to only make serious threats in the future.

Another matter is that threats are dangerous. They may result in the submission of the threatened party but they may as well result in the treatened focusing fully on crushing you into parts smaller than your constituent atoms before you grow too powerful. Therefore you need to be very careful about what kind of threats you make on behalf of future AIs. And I think that "We are going to build a FAI that will provide optimally awesome existence for all of mankind. Except you, because we have told you about it and now you have to contribute all of your resources into helping the project or else you will be tortured after the FAI is built." would motivate people to form a bloodthirsty pitchfork-wielding mob storming the gates of SIAI rather than contribute more money.


timtyler 23 July 2010 09:07:57PM* 1 point

Re: "There's no point in punishing people for actions that took place before the AI was created."

Sure there is - threatening them with punishment might be desirable - and those threats should be as credible as possible. The best credibility often comes from actually telling the truth. If you lie, people will trust you less in the future.


Gabriel 23 July 2010 11:47:40PM 0 points

But the AI hadn't threatened anyone before it was created so it doesn't have anything to lie about. The AIbuilder could have threatened me but the idea here is that the AI comes up with the torture stuff on its own.


timtyler 23 July 2010 11:51:14PM* 1 point

If it says it iss going to torture its opponents, and then lets them off scott free, it will have lied. If you want anyone to trust you, you had better keep your promises.


Roko 23 July 2010 06:51:20PM 1 point

would motivate people to form a bloodthirsty pitchfork-wielding mob storming the gates of SIAI rather than contribute more money.

Right, and I am on the side of the mob with pitchforks. I think it would be a good idea to change the current proposed FAI content from CEV to something that can't use negative incentives on x-risk reducers.


JGWeissman 23 July 2010 07:05:48PM 4 points

I think it would be a good idea to change the current proposed FAI content from CEV to something that can't use negative incentives on x-risk reducers.

Since this seems to be the common reaction to the possibility of acausal negative incentives, shouldn't a CEV conclude that it is not an effective strategy?


Roko 23 July 2010 08:39:44PM* 0 points

Common amongst liberal westerners, maybe. But CEV is 80% from countries that you and I have never visited, and 99% made of people who are less liberal than us. CEV is not us. CEV is other people whose real motivations may give us a very unpleasant surprise. That doesn't mean we have to be mean to them! quite the opposite. Be nice to them, give them a fair share of the future light cone.


JGWeissman 23 July 2010 09:15:57PM 1 point

My point was that being the sort of process that results in acausal negative incentives would decrease the likelihood of an FAI existing (and achieving its values), so it would not do that.

However, I have some sympathy for your point about potential unpleasant surprises from being very inclusive with a CEV as an seperate argument against it.


Roko 23 July 2010 09:24:31PM* 1 point

My point was that being the sort of process that results in acausal negative incentives would decrease the likelihood of an FAI existing (and achieving its values), so it would not do that.

And my point is that you aren't sure enough of that argument. Even a 1% chance of you getting eternal torture is worth you working hard to prevent.


Roko 23 July 2010 06:51:09PM* 1 point

There's no point in punishing people for actions that took place before the AI was created.

This is correct for a causal decision theorist, but incorrect for an acausal one. You should read up on acausal decision theory. Especially, read the post by Rolf Nelson.


Gabriel 23 July 2010 11:29:27PM* 0 points

I've read the post about deterrence but I don't see how it applies here. In the deterrence scenario precommitment has been made before the decision it is supposed to affect. I'd appreciate if you provided a more detailed explanation of the involved reasoning (or at least gave me more references). That's also my vote on how the post could be made better.

I'd like to know what exactly is an acausal decision theorist too (anyone who doesn't use causal decision theory? If I surrender money in counterfactual mugging, does that make me one?).

Also, if the AI actually goes about torturing people, does my inability to predict and comprehend this behavior make me safe?


timtyler 23 July 2010 09:13:55PM 0 points

The reference doesn't seem to mention the word "acausal".


timtyler 23 July 2010 10:22:28PM 0 points

Re: "Another matter is that threats are dangerous. They may result in the submission of the threatened party but they may as well result in the treatened focusing fully on crushing you into parts smaller than your constituent atoms before you grow too powerful."

There's a "natural" threat that machine intelligence is likely to pose to those who run rival machine intelligence projects. Each project is going to want to eliminate its competitors - unless told otherwise. Such behaviour is likely to be seen as being obnoxious. Designers should be extra-careful to not appear threatening in this area, IMO. Nobody likes having their air supply cut off.


orthonormal 23 July 2010 03:44:34PM 5 points

In this vein, there is the ominous possibility that if a positive singularity does occur, the resultant singleton may have precommitted to punish all potential donors who knew about existential risks but who didn't give 100% of their disposable incomes to x-risk motivation. This would act as an incentive to get people to donate more to reducing existential risk, and thereby increase the chances of a positive singularity.

Well, this creative idea is the only think that kept me from downvoting this post. But it doesn't worry me too much on account of my decision theory. If (as in TDT) I simply refuse to be blackmailed (and in this case it's easy to do so, since the dangers you speak of are not highly emotionally salient), then such a Pascal's Retroactive Mugger would know that its chances wouldn't have been improved by this precommitment in my case, any more than it would have been improved by precommitting to torturing people who don't know anything about AI.

And seriously, Roko, your conscious obsession with status is probably more detrimental to your social success than your 'weird' altruist causes are. The "Altruist's Burden" strikes me as an excuse for rejections more than a real cause.


Vladimir_Nesov 23 July 2010 08:35:18PM* 3 points

If (as in TDT) I simply refuse to be blackmailed (and in this case it's easy to do so, since the dangers you speak of are not highly emotionally salient) I'll be repeating myself, but for the record, the concept of "blackmail" doesn't make sense, because there is no default level of good by which you can distinguish conditional good from conditional harm, the difference is all in framing and not in the territory. If you turn into a rock, you just make explicit a given level of achievement. But the outcome is "fixed" anyway, so playing efficiently just means turning yourself into a very good rock. If you can turn yourself into an actually winning rock, all power to you!

Also, I don't see how TDT implies that, even though Eliezer holds to the intuition that a rational player should be able to ignore threats.


Roko 23 July 2010 08:57:54PM* 2 points

because there is no default level of good by which you can distinguish conditional good from conditional harm

Agree. Humans seem to do rather well out of the social convention of pretending that there is, though.


Roko 23 July 2010 03:51:03PM* 2 points

If (as in TDT) I simply refuse to be blackmailed

So it seems that if you refuse to be blackmailed with the expectation that doing so will mean you don't get punished, then (as long as the punishment is virtually free), the optimal move for the retroactive mugger is to still punish you.

Think of it like playing chicken against someone piloting a remote-control car. You have a lot more to lose than they do in the event of a collision, and they have an awful lot to gain if you swerve. Even if one orthonormal out of a hundred trillion gives in, it's still a good deal for the CEV to do this, because of the enormous amount that CEV loses for each dp of decreased probability of existential win.


orthonormal 23 July 2010 04:02:06PM -1 points

We're not postulating an AI with a human vindictive streak here: as you pointed out, the reason the AI doesn't punish the average Joe for not having helped is that it wouldn't acausally help to do so. I'm saying that a refusal to be blackmailed in such a situation gives rise to the same dynamic.


Roko 23 July 2010 04:23:09PM 2 points

Right, and I am saying that acausally punishing someone who doesn't even know about acausal punishment has effectiveness zero, and acausally punishing orthonormal has effectiveness close to zero.

You must admit that in some non-astronomically small fraction of future branches, this argument might influence you, even subconsciously, to donate just a little more to SIAI. That is enough to justify your eternal torture, from the perspective of CEV (depending, obviously, on how utilitarian CEV is: if it were more justice-oriented, then perhaps it would decide not to persue this route at all.)


Nisan 23 July 2010 08:20:34PM 0 points

We must also admit that there is a small probability of the argument influencing orthonormal to donateless to SIAI. (E.g., via an ugh field.) jimrandomh says something similar in his comment.

Also, Roko, your argument doesn't have to be correct in order to influence orthonormal subconsciously.


Roko 23 July 2010 03:46:33PM* 2 points

And seriously, Roko, your conscious obsession with status is probably more detrimental to your social success than your 'weird' altruist causes are. The "Altruist's Burden" strikes me as an excuse for rejections more than a real cause.

I agree that being concerned with status signals low status within a given community. So, I simply take the status hit within the LW community. It seems a worthwhile tradeoff in exchange for getting better results in the real world, since status within the LW community isn't actually of any use.


orthonormal 23 July 2010 04:05:21PM 4 points

I suspect that your bitter view on human society has to have some carry-over effects. Are you, in fact, thinking differently in the real world, or are you just trying to hide your status cynicism? The latter can be a significant low-status signal.


Roko 23 July 2010 04:10:20PM 1 point

I wouldn't say bitter:: that's why I am concerned with finding win-win deals that I and others can do with the rest of society in existential risk mitigation.

Just like someone who makes tins of baked beans or superconducting magnets sells them at a price that constitutes a win-win deal. Is it bitter to work for a wage as an engineer rather than donating all of your disposable income back to the company?


JoshuaZ 23 July 2010 04:03:53PM 2 points

since status within the LW community isn't actually of any use.

Are you sure about this? You seem to consider existential risk problems to be more severe than much of the LW community. If your status increases you can presumably more easily get people to listen to what you have to say about existential risk.


Roko 23 July 2010 04:31:27PM 1 point

True, but why should status affect whether people accept sound reasoning that I produce?


JoshuaZ 23 July 2010 04:44:12PM 3 points

Because we're imperfect rationalists and so people are more likely to listen to people with higher status. LWians are more rational than the general population so I'd expect that there's some difficulty to getting LWians to listen but once they are listening they are more likely to pay attention to a rational argument.


Roko 23 July 2010 04:49:42PM 1 point

ok, well, I hereby commit to tell the truth whether or not it has a negative impact on my LW-community status. This post is a particular case: I knew it would lose me Karma, but it's important, so I said it anyway.


timtyler 23 July 2010 09:50:52PM 0 points

Do you still "ignore the threat" when the opposing "chicken" player throws their steering wheel out of their window? You need to be thinking about threats like that.


orthonormal 24 July 2010 04:30:48AM 0 points

If the other player realizes I'm going to implement this strategy, they won't play chicken against me in the first place. Therefore it's worth it to follow through on my refusal to be intimidated, in the counterfactual where I'm playing chicken against a smart agent that's trying to win. Of course, if they can toss out the steering wheel before knowing my strategy, I'd let them win. 'Moving first' has a big effect in such games.

In the debated case, it does seem that I have the first move, since the hypothesized AI can't send information to me the way I can to ver.


CronoDAS 23 July 2010 06:06:56PM* 3 points

I really don't think an FAI is going to be sending people to Hell.


Unknowns 23 July 2010 06:59:32PM 5 points

Besides the considerations Roko mentions (reasons why it might do this), there is also the fact that most people throughout history, and a great deal of people at the present day, perhaps still most, have supported sending people to hell. So it wouldn't be incredibly surprising if CEV sends people to hell as well.


PeerInfinity 23 July 2010 09:10:24PM* 1 point

warning: you may be arguing by definition.

Of course a FAI wouldn't send people to hell, but does your definition of FAI match what a CEV would actually be like, if it was implemented?

Also, what definition are you using for FAI? Are some of the fuzzy parts of the definition filled in by your own intuition?

Roko 23 July 2010 06:16:47PM* 1 point

With what probability? Even a 1% probability would be worth seriously worrying about.

NancyLebovitz 23 July 2010 07:27:40PM 3 points

One percent seems awfully high. On the other hand, there are a lot of people now who believe punishment is a valuable tool for getting people to do what you want (or at least not do what you don't want), and I'm notsure that preference would be extremely likely to drop out of a CEV. I'm pretty sure there are people who think they aren't enforcing enough rules harshly enough.

What do you think a real CEV would look like?


NancyLebovitz 23 July 2010 09:33:39PM 2 points

Expansion on the idea that one percent looks awfully high: Any description which is that specific of the behavior of an FAI strikes me as highly unlikely, except for averting well-defined existential threats.


Roko 23 July 2010 08:20:44PM* 2 points

Better to ask what the probability distribution of what a CEV would look like is, where uncertainty comes from:

  • uncertainty about exactly how the verbal concept of CEV would be cached out algorithmically,
  • empirical uncertainty as to the actual moral principles are currently endorsed by people from alien cultures. The following countries: China, India, Africa, Indonesia, Brazil, Pakistan, Bangladesh, Nigeria, Russia, Japan, Mexico, Philippines, Vietnam, Iran, Turkey, Thailand, Myanmar, Ukraine, Colombia, Argentina, Iraq, Nepal, Peru, Afghanistan, Venezuela, Malaysia, Uzbekistan, Saudi Arabia, North Korea comprise over 80% of the population of the world, so it is largely irrelevant what westerners want if that lot act as a bloc. Note also that most people in the world have very non-liberal, non-atheist, non-progressive values.
  • logical uncertainty about what happens when explicitly endorsed moral principles based upon false facts and broken ontology are extrapolated.

Nick_Tarleton 23 July 2010 09:18:00PM* 2 points

logical uncertainty about what happens when explicitly endorsed moral principles based upon false facts and broken ontology are extrapolated.

Or just about how moral principles get extrapolated, period. Maybe Parfit-type symmetry arguments, or knowing more in a sense that includes knowing what it's like to be someone else, would push everyone towards altruism. Maybe the desire to punish would, with a full understanding of evolution and game theory, become totally uncompelling as a terminal value. Or, pessimistically, maybe lots of people would wind up finding pure egoism compelling, and game theory wouldn't stop them from mistreating newly-created people. My distribution over possible CEV outcomes is really freaking wide.


Roko 23 July 2010 09:26:57PM 2 points

Agreed that one's distribution over CEV outcomes should be very wide, and I want to point out that one's distribution over CEV utilities should be correspondingly wide.


NancyLebovitz 23 July 2010 09:11:47PM* 1 point

They aren't likely to act as a block-- I'm betting that they don't just want to be non-liberal, non-atheist, and non-progressive. They (many of them) want to preserve substantial aspects of their own cultures, which are quite different from each other.

I don't have huge certainty of how CEV would play out even among modern Americans, or even among modern American sf fans-- the latter is the sub-culture I know best.

People don't talk a lot about their best dreams of themselves.

Your last point is very good.


Roko 23 July 2010 09:15:39PM 2 points

They aren't likely to act as a block

It is bad enough if they act as a relatively coherent bloc whilst people like you and I are relatively spread in our opinions.



PeerInfinity 23 July 2010 05:32:57PM 2 points

I shared these concerns with Roko through gmail chat, but he's still choosing not to delete this post, so I'll go ahead and post this comment here:

My first reaction to this post is... EEEEEEW!!! DELETE THIS FROM LW IMMEDIATELY!!! IT'S HORRIBLY BAD MEMETICS!!! This should be two separate posts. A post about the Quantum Billionaire Trick, and a post about... that other nasty stuff...

Seriously though, this is HORRIBLY bad memetics...

Roko is online in gchat. I sent him this message:

I just read your LW post. I think this should be two separate posts. one about the quantum billionaire trick, and one about... all that other stuff. the part about the quantum billionaire trick is good, but the rest... is really really bad memetics and I think you should delete it from LW immediately. the idea of talking about punishing anyone at all after the singularity is just... that just feels so completely wrong that... well, it triggers a really strong negative emotional reaction. and I suspect that it would trigger similar reactions in others, and possibly turn them off the whole idea of helping at all, and possibly make people hate you.

I'm not done. I still need to explain in more detail...

one nitpick: "the people who are helping you are the same as the people who are punishing you". This sentence confused me. At first I thought it was a typo, and should have been "the people who you are helping are the same as the people who are punishing you". But after rereading, I'm not sure what he meant by this. And the sentence after that is just as confusing. These points should either be elaborated on and explained better, or cut out.

reading footnote 1... AAAAAAAAAARGH!!! I strongly disapprove of the strategy of motivating people to work harder by giving them terrible nightmares!!! That's very likely to push them to burnout. And... hmm... before I complain about this too much more, maybe I should actually do the math about how much of a risk of burnout is acceptable, in exchange for how much of an increase in efficiency... or maybe I could ask Roko to do the math, he seems to be good at that sort of thing... and why do you need punishments at all??? why not give extra rewards to the people who were most helpful? but even that is kinda controversial.

For me, it would be punishment enough, and motivation enough, just to have a permanent record of my "score", prominently displayed someplace public, with all of my worst choices clearly highlighted.

back to that main paragraph... AAAAAAARGH!!! He used the words "a living hell"... that's just... totally unacceptable. Any future that could be described as "a living hell" for anyone at all is, in my opinion, a failure. Or at least horribly suboptimal, I'm still undecided about scenarios in which a few people need to suffer for the good of the many...

Seriously though, WHY THE [EXPLETIVE] DO YOU NEED TO TALK ABOUT PUNISHMENT AT ALL?????? Wouldn't the same utilitarian arguments apply if you gave an extra reward to people who were more helpful, rather than punishing people who deliberately choose not to help?

Maybe I'm missing the point. Maybe the point was that, while I disagree VERY STRONGLY with the idea of punishing anyone at all after the singularity, especially to the point of making their life "a living hell", maybe that's what humanity's CEV will actually choose to do...

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAASDLKFJ;ALSDKJF;ALSKDFJ;AL;LDKASJF;ALSKDJF;ALKSJDA!!!!!!!!!!!!!!!!!!!!!!

I was about to say "The possibility that the CEV would punish anyone who gave less than 100% would motivate me to kill myself now to avoid this fate, rather than work harder to make my fate slightly less unpleasant"... but then I realized that we're dealing with an entity capable of creating "rescue simulations"... which... ugh... I had realized this before but had been blocking the thought because it's just too horrible to contemplate... these "rescue simulations" could also be used to prevent suicide from working as a way to escape punishment...

[expletive]... I thought I was finally rid of the nightmares where I end up in some sort of post-singularity hell... but now Roko had to go and create a vaguely plausible argument for why that might still happen...

ugh... more bad memetics in this post: "You could take, say, 50% of the universe for yourself and donate the other 50% to humanity."... ... ... You're talking about giving horrible punishments to people for dedicating less than 100% of their resources to x-risk reduction... and in the same post, you're talking about taking 50% of the universe for yourself? doesn't that seem a bit... hypocritical?

and then there's the question of... how could one person possibly end up with the power to choose what to do with the whole universe? And this scenario also involves a CEV somehow? That just doesn't make any sense...

"You can also use your resources to acausally trade with a CEV-like singleton that might otherwise punish you for being a partial x-risk reducer, as mentioned before." ... ... ... This doesn't make any sense at all. How could you possibly get into a position where you, personally, have resources to trade with a CEV? ugh... and that next paragraph doesn't make any sense either. I didn't understand any of it.

So, um, yeah, this post is no good at all. I recommend deleting it entirely. Instead, write a new post, just about the quantum billionaire trick, without any of the other nonsense.

Hmm... perhaps I should be more clear about what I'm getting so angry about. Most of it is that I'm getting angry at the idea that humanity's CEV might choose to punish people, after the Singularity, to the point of making their life "a living hell"... That thought triggers all sorts of negative reactions... rage, fear, disgust, hopelessness, pity, panic, disbelief, suicidal thoughts, frustration, guilt, anxiety, sadness, depression, the urge to scream and run away, the urge to break down and cry, fear that thinking about this will break my mind even worse than it's already broken... fear of the nightmares that I'm likely to have... fear about this actually happening...

oh, another thing I meant to mention... the thought of this scenario makes me really tempted to side with the most extreme group of negative utilitarians, the ones whose mission is to eliminate all suffering from the universe... by eliminating all life from the universe...

but I'm probably overreacting. I already know that the utilitarian thing to do would be to just accept my fate, and continue working towards maximizing the net utility of the universe, even if even a successful implementation of a CEV would result in lots of suffering for myself, personally.

but I'm probably still overreacting. There's also the detail that I'm already trying to be 100% efficient at minimizing existential risk. I've already been writing lots about how my current problem is that I'm so afraid to spend any money on myself that I'm also failing to spend money on things that would actually make me more efficient at reducing x-risks.

ow... my mind just broke a bit more... now I'm worrying about the CEV punishing me for not finding the optimal balance between spending money directly on x-risk reduction and spending money on myself...

anyway, what I'm trying to say is that I know that I shouldn't be mad at Roko for mentioning the possibility of the CEV punishing people horribly after the singularity.

but... I still think that this post should be deleted from LW. Not just because it mentioned a memetically dangerous idea, but because it did a really bad job of explaining and discussing the idea. The post is full of nonsense. Please delete it, Roko. If you still want to write about this topic, please write a new article from scratch. The article, as it is now, is worse than useless. Seriously. Delete it now. It's causing harm. Or at least I would assume that it's causing harm...


Roko 23 July 2010 05:35:46PM 0 points

what I'm trying to say is that I know that I shouldn't be mad at Roko for mentioning the possibility of the CEV punishing people horribly after the singularity.

Right! Maybe madness should be productively directed towards implementing something a little safer than CEV?


Nisan 23 July 2010 07:30:38PM* 2 points

safer than CEV

What's your alternative to CEV? The CEV of 1000 people who share our moral values?


red75 23 July 2010 08:01:35PM* 1 point

AI with a preference to help humankind find CEV and go to its realization (maybe correcting CEV in the process).


Roko 23 July 2010 08:46:31PM* 0 points

Well, logically speaking, any given person's favorite goal system is that which extrapolates their individual volition. It is contradictory for a fully rational agent to have anything other than the goal system that extrapolates their individual volition (or some goal system which caches out to the same thing) as the one that they would most prefer to be implemented.

However, 1000 people who approximately share your values is probably a good second best. Even 1,000,000 or 100,000,000 people who approximately share your values is still pretty good.


timtyler 23 July 2010 08:09:03PM* 1 point

Betting large quantities at long odds is usually a very bad thing for a trader to do - due to the diminishing utility of money. The occasional big win fails to compensate for all the losses.

It is highly unclear why anyone would think that changes in future circumstances would be likely to make such risk-seeking behaviour any less stupid.


Violet 23 July 2010 02:02:11PM 3 points

How about simply having an inner circle of friends that share your preferences for existential risk mitigation? Don't go for average but rather a niche where you will thrive.

In the larger setting people don't know who you donate to so there is no weirdness signal from that.

As to the ufai, do you have some data that it is a likely result? Rather than an irrational fear. The ufai does not have any reason to not torture you even if you disposed all your income to create it. Why should it care who helped to mitigate existential risks with 0%, 10% or 95% of their income?


Roko 23 July 2010 02:17:26PM 1 point

"You can also use resources to acausally trade with all possible unfriendly AIs that might be built, exchanging resources in branches where you succeed for the uFAI sparing your life and "pensioning you off" with a tiny proportion of the universe in branches where it is built."


Roko 23 July 2010 02:16:23PM 0 points

As to the ufai, do you have some data that it is a likely result?

What do you mean by data?

It is consensus SIAI belief that P(uFAI) > P(FAI). The justification is that uFAI is a lot easier to make.


timtyler 23 July 2010 08:47:16PM* 0 points

The SIAI derives its funding from convincing people that the end is probably nigh - and that they are working on a potential solution. This is not the type of organisation you should trust to be objective on such an issue - they have obvious vested interests.


timtyler 23 July 2010 08:43:20PM* 0 points

Re: "The justification is that uFAI is a lot easier to make."

That seems like naive reasoning. It is a lot easier to make a random mess of ASCII that crashes or loops - and yet software companies still manage to ship working products.


gwern 24 July 2010 05:31:42AM 0 points

Software companies also universally* ship unFriendly software, where by unfriendly I mean 'insecure & easily exploited'.

(Being secure is a necessary precondition for being Friendly. A FAI can't be F if the first virus or trojan to come along will lobotomize it or turn it into a paperclipper.)

* I except the vanishingly rare examples of companies which write their code in formal systems like Isabelle or Coq.

Violet 23 July 2010 02:24:54PM 0 points

I was meaning for "such an ufai that it will punish in such a selective fashion".


Roko 23 July 2010 02:28:42PM 0 points

You mean my point about CEV punishing you?


SarahC 23 July 2010 03:00:19PM 2 points

If your altruism is significantly harming you personally, there's a good chance that you're doing too much of it, or doing it wrong.


Roko 23 July 2010 05:58:09PM 1 point

Or maybe that the correct solution is a win-win deal instead of a lose-win deal?


EStokes 23 July 2010 06:39:18PM* 1 point

In this vein, there is the ominous possibility that if a positive singularity does occur, the resultant singleton may have precommitted to punish all potential donors who knew about existential risks but who didn't give 100% of their disposable incomes to x-risk motivation. This would act as an incentive to get people to donate more to reducing existential risk, and thereby increase the chances of a positive singularity. This seems to be what CEV (coherent extrapolated volition of humanity) might do if it were an acausal decision-maker.1 So a post-singularity world may be a world of fun and plenty for the people who are currently ignoring the problem, whilst being a living hell for a significant fraction of current existential risk reducers (say, the least generous half).

I am confused- couldn't it just not torture people? Torture would be negative utility, but someone donating fully would be a plus. But if the person already donated fully or didn't, it'd only be a minus in utility, and it's not an iterated prisoner's dilemma... It'd only work if people already knew it'd punish them, and in that case, haven't you screwed us over by posting this?


Unknowns 23 July 2010 07:58:10PM 1 point

At a minimum, he's certainly increased the chances of us being tortured significantly.


ata 24 July 2010 05:52:06AM* 0 points

In this vein, there is the ominous possibility that if a positive singularity does occur, the resultant singleton may have precommitted to punish all potential donors who knew about existential risks but who didn't give 100% of their disposable incomes to x-risk motivation. This would act as an incentive to get people to donate more to reducing existential risk, and thereby increase the chances of a positive singularity. This seems to be what CEV (coherent extrapolated volition of humanity) might do if it were an acausal decision-maker. So a post-singularity world may be a world of fun and plenty for the people who are currently ignoring the problem, whilst being a living hell for a significant fraction of current existential risk reducers (say, the least generous half).

You make the CEV sound like a crazier version of Yahweh. I can vaguely see how it would work if there were some way to precommit to that outcome (though it wouldn't be great PR if SIAI announced "Everyone who does not help us as much as they possibly can shall be tortured forever!"... and even it if were, I think it'd be a bad idea), but in the absence of any precommitment, the argument, as I understand it, is that the CEV will determine the following about smart pre-singularity humans who knew about x-risk and CEV:

  • They will think of this possibility
  • They will be able to model the CEV well enough to anticipate its response to this
  • They will updated their behaviour accordingly if they know what's good for them

I'd dismiss this mostly because we can't be expected to model the CEV nearly well enough to expect this to be true, or even to assign it more than a negligible probability. If anything, the only reason a CEV would even consider this is because people had been talking about it and believing it to be non-negligibly probable, in which case we probably should stop immediately. (Edit: What Eliezer said.)

(Besides, since there's no actual precommitment, it seems the only consequential advantage is in convincing people right now that this is a possibility worth taking seriously; if CEV produces anything like a consequentialist, it won't bother actually going through with the punishment. The loss in hypothetical trustworthiness is not a problem, because this is the sort of threat that would only need to be taken seriously once in history.)

You could take this possibility into account and give even more to x-risk in an effort to avoid being punished. But of course, if you're thinking like that, then the CEV-singleton is even more likely to want to punish you... nasty. Of course this would be unjust, but is the kind of unjust thing that is oh-so-very utilitarian.

That's even crazier. Even assuming the previous reasoning was correct, what consequential advantage would there be to punishing people who contribute more out of a desire to avoid punishment? They still contributed more. In fact, I thought that was the whole point — use this weird pre-precommitment arrangement to force x-risk reducers to do more. An "incentive to get people to donate more to reducing existential risk". Even if we had enough information to justifiably believe this to be a serious possibility, what on earth would be the point of torturing people who acknowledge that threat and thus decide to give more?


Roko 23 July 2010 09:03:30PM* 0 points

So the consensus seems to be that I explained my ideas in an unclear and overly brief way in this post. I'd appreciate it if people could post as sub-comments of this comment bits that they think are poorly explained.

People could also suggest a good way of breaking the material down. I don't have a good idea of what things are common knowledge here, versus what things I picked up in various less publicized fora.

Together, we can get contribute to the creation of an improved version of this post, perhaps as a series.

gollark: Because you can already *do* that, it's just quite slow and any sane thing is using proper slow hashing things like argon2.
gollark: Brute-force SHA256 hashes of them or something?
gollark: What do you mean "crack passwords"?
gollark: FORTRAN is apparently quite widely used in high performance computing, and there are recent standard versions.
gollark: It does not look fine. Two consecutive lines are differently indented, there's an async IIFE there with one line in it, and it's generally weird and inconsistent.
This article is issued from Rationalwiki. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.