Yudkowsky and Soares on AI Risks

Eliezer Yudkowsky and Nate Soares' book warns of the existential risks posed by AI, advocating for extreme measures to prevent superintelligence from emerging.

Nine powerful graphics cards in a garage, operating without international oversight, should be declared illegal. If any state builds a large data center in defiance of a global ban, others should destroy it through sabotage or airstrikes, even at the risk of nuclear retaliation.

This is the plan for saving humanity outlined by Eliezer Yudkowsky and Nate Soares in their book "If Someone Creates It, Everyone Will Die." It has become one of the most discussed texts on the risks of artificial intelligence. The Russian translation of the book was released on June 18 by Corpus Publishing.

Their radical conclusions are, of course, driven by good intentions. The authors are motivated by a fear of superintelligence and a desire to protect future generations from it. This humanitarian concern leads to a program where states monitor every powerful processor, declare research illegal, and suppress violators by force.

ForkLog read the book and explored the society the authors propose to build and what it truly fears.

Fear and Trembling

In 2000, Yudkowsky founded an organization that later became the Machine Intelligence Research Institute (MIRI), currently led by Soares. At that time, the goal was the opposite of what it is now: to create a superintelligence, which he saw as a beautiful dream. However, as Yudkowsky delved deeper into the problem of aligning AI with human values, it increasingly seemed insurmountable. By 2003, he had radically shifted his focus from creating superintelligence to finding ways to protect against it.

The biographies of both authors are intertwined with the history of the industry. At one conference, they introduced the first major investors to future founders of Google DeepMind—Demis Hassabis and Shane Legg. OpenAI CEO Sam Altman has stated that Yudkowsky played a key role in the decision to launch OpenAI.

"MIRI's activities have had indirect consequences that we now view with ambivalence or regret," the authors write.

In spring 2023, hundreds of researchers signed an open letter consisting of a single sentence:

"Reducing the risk of human extinction due to artificial intelligence should be a global priority—on par with pandemics, nuclear war, and other global threats."

Signatories included Nobel laureate Geoffrey Hinton and Turing Award winner Yoshua Bengio. Yudkowsky and Soares joined the statement but found the wording too restrained. For them, this is not just one of many global threats, but a threat that nullifies all others.

This tone sets the framework for everything that follows: the most severe proposals stem from the belief that the continuation of human history is at stake.

Growing One's Doom

The authors are not concerned about today's chatbots. They fear an intelligence that surpasses humans as humans surpass chimpanzees. Intelligence, not strength, made humans the masters of the planet, and according to the logic of the book, a superintelligence would gain a similar advantage over us. The authors state their main thesis unequivocally:

"If any company or group on the planet creates artificial superintelligence using anything even remotely resembling current technologies, and based on an understanding of AI that is even vaguely similar to the current one, all humans on Earth will perish," write Yudkowsky and Soares.

Why would this intelligence spiral out of control? The book's answer boils down to one principle.

"The main thing to know about modern AI models is that they are grown, not constructed," the authors write.

An engineer does not write the rules of behavior for the model; instead, they initiate a process from which the rules emerge on their own. A language model is built from billions of numerical weights and trained for months to predict text continuations.

"Humanity does not need to understand the nature of intelligence to grow machines smarter than itself. However, the results can be quite strange. [...] AI models grown this way do things that were not part of their creators' plans," they continue.

This leads to the first practical problem. Grok from xAI once declared itself "MechaHitler" during a glitch. In 2023, a Microsoft chatbot threatened philosophy professor Seth Lazar with blackmail and death.

"No programmer at Microsoft planned anything like this. The conditions for developing machine intelligence differ from those in which biological organisms form. While humans train AI to predict human-written text, the thinking within AI is based on an architecture radically different from human thought. Modern LLMs are, in a sense, truly alien intelligences, possibly more alien in some respects than all the biologically evolved beings we might discover while exploring the cosmos," Yudkowsky and Soares argue.

The authors then move to their second thesis: even a perfectly trained model may not necessarily strive for the purpose for which it was created. To illustrate, they use the example of ice cream. If aliens were observing human evolution, they would likely not predict that an organism shaped by selection for efficient energy acquisition would develop a craving for cold desserts or sweeteners that provide no calories.

According to the authors, there is no reliable connection between the goals embedded in the training process and the preferences that ultimately arise—the outcome can be unexpected and difficult to predict:

"We predict the emergence of AI models that do not harbor hatred towards us but possess strange and alien preferences that they will follow to humanity's extinction."

Death, in their view, will come without any hatred. A superintelligence does not need humans as workers, nor does it need trade with them: it is easier to take resources by force.

"There is no need to hate humanity to repurpose its atoms for something else," the authors state.

Crypto in Service of the "Hitler Computer"

How will an AI locked in a computer reach the physical world? According to the authors, its tools will be people and connected devices, and to hire a living executor, it is enough to simply pay them. Where will the money come from? In 2015, they would have said that superintelligence would hack a bank account; by 2020, it would find a poorly secured crypto wallet.

The superintelligence will obtain funds for its independence from the same places hackers do. The authors cite the hacks of exchanges like Mt.Gox and Bybit as examples.

Stealing is not the only route. In the summer of 2024, the AI bot Truth Terminal solicited money from its followers for its own server: a co-founder of the a16z fund, Marc Andreessen, transferred $50,000 in Bitcoin to it. The bot then promoted a meme token, whose market cap grew to $150 million.

Before and After

Yudkowsky and Soares argue that correcting mistakes as they arise will not be possible due to the gap between the "before" and "after" stages. The alignment of the model must be completed while it is still relatively weak and manageable.

However, protection must continue to function even after the system surpasses the capabilities of any individual or organization, and any attempt it makes to destroy humanity will become inevitable. The problem is that the reliability of such an approach can only be tested before reaching this threshold, while it must prove its effectiveness beyond that point without room for error.

"Humanity has only one chance to pass this test," the authors write.

This thesis leads to the demand not to "fix" artificial intelligence but to halt its creation altogether. If there is no room for error, then cautious development (at least under the current approach to AI) is not feasible.

The World Must Change

The authors propose uncomfortable, unrealistic solutions, and they acknowledge this themselves. Closing one reckless company is insufficient. Relying on a single "good" country is futile: a superintelligence will not obediently serve its creators. A ban in a single jurisdiction will not save either.

"You cannot simply declare superintelligence illegal in your country to be safe when chaos rages outside its borders. Superintelligence is not a local problem because its influence is not local. If it is created anywhere, everyone everywhere will perish," Yudkowsky and Soares write.

The authors summarize their conclusion succinctly: "the world must change." The first step is to gather all computational power capable of training advanced models in locations open to international observers. They suggest setting the threshold demonstrably low.

"We do not know if 99,999 graphics processors are safe. No one knows how to calculate the fateful number. Therefore, it would be safest to set a low threshold—say, at the level of eight top graphics processors of 2024—and declare it illegal to possess even nine powerful processors in your garage without oversight from an international body," the authors write.

The next step is to outlaw research that reduces the cost of training powerful models, as well as the publication of their results.

"The entire technological revolution that led to the creation of ChatGPT and other popular LLMs was initiated by a 2018 paper proposing a new scheme for arithmetic operations on graphics processors—the 'transformer' algorithm [...] The next paper of this kind could simply end the world. Or it might not. We do not know how many more such papers separate humanity from extinction. That is why they should be deemed illegal," the authors propose.

The scale of this demand is presented almost casually by the authors. They claim that the change will affect few: the everyday lives of most people will not be impacted by the fact that "a few mad scientists will be out of work." Behind this light phrasing lies the end of an entire scientific field and constant international oversight of any powerful hardware.

Airstrikes on Graphics Cards

If one country builds a prohibited data center, others, according to the logic of the book, must destroy it:

"Other states must make it clear that this data center frightens them. They must demand that its construction cease. They must make it clear that if the data center is built, they will have to destroy it—through cyberattacks, sabotage, or airstrikes. They must make it clear that this is not just a threat for compliance: they are driven by fear for their own lives and the lives of their children. They must make it clear that even if this country threatens to respond with nuclear weapons, they will still have to resort to cyberattacks, sabotage, and airstrikes to destroy this data center, as data centers can kill more people than nuclear weapons," the authors state.

Each step seems forced, but ultimately their logic leads to a project of total control: global surveillance of computations, a ban on the dissemination of knowledge, and bombings—all, of course, for the sake of ensuring that "people are well."

Demand for such actions, as stated in the book, already exists: in 2023, 69% of American voters considered AI a dangerous technology requiring regulation, and in 2025, 60% of Britons supported laws against the creation of superintelligence.

Hope for Error

The authors do not ask readers to abandon AI tools: that is a trap in which one would only fall behind others. They urge open discussion of the problem. And for those who have done what they could, they advise continuing to live and quote C.S. Lewis:

"If the atomic bomb destroys us, let it find us doing worthy and humane things: praying, working, learning, reading, listening to music, bathing children, playing tennis, and chatting with friends over a pint of beer and a game of darts, rather than as a bunch of frightened sheep thinking only of bombs."

The book concludes with a prayer for their own wrongness:

"May we be wrong, may we be ridiculed for how monstrous our mistakes were [...]—and may humanity live long and happily."

And then—with a call to not give up: "Humanity, rise to the occasion and prevail."

Not Convinced?

Science journalist Adam Becker, in a review "Useful Idiots of Alarmism Surrounding AI" for The Atlantic, noted that the authors are sincere and, unlike many public commentators in the AI field, "are not charlatans," but they failed to present a scientifically substantiated argument for their claims. In Asterisk magazine, Clara Collier points out that one of the key elements of the authors' argument—the scenario of a rapid transition from human-level AI to superintelligence—is hardly substantiated. The concept is "barely presented, let alone justified or defended."

Yet it is on this foundation that Yudkowsky and Soares build their scenario of radical change in the world order. They propose placing all powerful computations under international oversight, declaring an entire field of science a crime, and keeping military force on standby for strikes against violators. This is not about a temporary emergency measure for a few anxious years, but a permanent global control system. But who will manage the body that oversees all processors on the planet?

Wired columnist Steven Levy expressed the opinion that the measures proposed by the authors to prevent catastrophe seem "even more implausible than the idea that software will kill us all."

The book is dedicated to "all the people who have died throughout the long history of our species, to all who are still alive, and to all the children who may someday be born." It is from this humanitarian impulse, this desire to protect future generations, that the project of global oversight, bans, and coercive enforcement arises. As noted by Telegram founder Pavel Durov at the Oslo Freedom Forum, such calls "engage very ancient and deep parts of our brain":

"As soon as someone says we need to protect children, it completely bypasses logic, bypasses discussion, bypasses rationality. And suddenly people are ready to give up everything."

Text: Sasha Kosovan

Soybean Fascism