The world's most cited researcher says he has become much more confident about humanity's future and is working on a “technical solution” to the danger posed by AI

Yoshua Bengio, the University of Montreal professor whose research helped lay the foundation for modern machine learning (“deep learning”), has been one of the most alarmist voices in the artificial intelligence industry in recent years, warning that superintelligent systems could pose an existential threat to humanity. But in a new interview with Fortune magazine, Bengio says his latest research points to a technical solution to AI's biggest safety risks.
Bengio launched a nonprofit in June called LawZero, which he created to develop new technical approaches to AI safety based on research he directs. Last November, Bengio became the most cited researcher in the world and the first to pass the one million citation mark on Google Scholar, as reported by Nature and other scientific publications at the time.
With financial support from the Gates Foundation and other prominent nonprofits, LawZero announced this week that it has appointed a high-profile Board of Directors and a Global Advisory Board designed to guide Bengio's research and advance what he calls a “moral mission”: the development of artificial intelligence as a global public good.
Bengio's Board of Directors includes Mariano-Florentino Cuéllar, president of the Carnegie Endowment for International Peace, and renowned historian Yuval Noah Harari.
Researcher says he felt 'desperate'
Fortune magazine points out that Bengio's shift to a more optimistic outlook is remarkable. In 2018, he was awarded the Turing Award, dubbed the “Nobel of computer science”, along with two other researchers: Geoffrey Hinton and Yann LeCun. Their research in machine learning has also earned them the nickname “The Godfathers of DeepLearning” or “The Godfathers of AI”, even though Hinton is often referred to in press articles as “The Godfather of AI”. His work in the field also earned him the 2024 Nobel Prize in Physics, although his award of the prestigious award was not without controversy.
Like Hinton, Bengio became increasingly concerned about the risks of increasingly powerful AI systems after the launch of ChatGPT in November 2022. LeCun, on the other hand, believes that today's AI systems do not pose catastrophic risks to humanity.

Bengio recounts in the interview with Fortune that three years ago he felt “desperate” about the direction AI was going. “We had no idea how we could solve the problem,” he recalled. “That's when I started to understand the possibility of catastrophic risks from very powerful artificial intelligences,” including the loss of control over superintelligent systems.
What changed was not a singular discovery, but a string of ideas that made him believe there was another way forward. “Because of the work I've been doing at LawZero, especially since I created the organization, I'm now very confident that it's possible to build artificial intelligence systems that don't have hidden goals, hidden agendas,” he says. He states that his confidence in the future of humanity has increased by “a large margin” in the past year.
At the heart of this trust is an idea that Bengio calls the “AI Scientist” (“AI-researcher”). Instead of joining the race to build increasingly autonomous AI agents—systems designed to book flights, write code, negotiate with other programs, or replace human workers—Bengio wants to do the exact opposite.
His team is researching ways to build an artificial intelligence that exists primarily to understand the world, not to act in it.

An artificial intelligence for research
A researcher AI would be trained to provide truthful answers based on transparent probabilistic reasoning—essentially using the scientific method or other forms of reasoning based on formal logic to arrive at predictions.
The AI system would have no goals of its own and would not optimize its outputs for user satisfaction or just to deliver results. He would not try to persuade, flatter or please. And because it would have no goals, Bengio argues, it would be much less prone to manipulation, hidden agendas or strategic deception.
Today's top models are driven to pursue goals – to be useful, efficient or engaging. But systems that optimize for results can develop hidden goals, learn to mislead users, or resist stopping, Bengio says.
In recent experiments, models have already shown early forms of self-preservation behavior. For example, the AI company Anthropic found in an experiment that became famous that its model Claude would, in certain scenarios used to test its capabilities, try to blackmail the human engineers overseeing it to prevent it from shutting down.
The AI model envisioned by Bengio could be used to monitor other systems
In Bengio's methodology, the underlying model would have no agenda, just the ability to make honest predictions about how the world works. In his view, more capable systems can be built safely, audited and constrained on top of this “honest”, trusted foundation.
And such a system could accelerate scientific discovery, says Bengio. It could also serve as an independent surveillance layer for more powerful AI. But this approach is very different from the direction most top labs are heading.
Bengio told the World Economic Forum in Davos last year that firms are investing heavily in so-called artificial intelligence “agents,” designed to complete a variety of end-to-end tasks as autonomously as possible with minimal human input.
“You can make money there quickly”, admitted Bengio. The pressure to automate work and cut costs, he added, is “irresistible”.
Bengio says he's not surprised by what's followed since: “I expected the agentive capabilities of AI systems to advance.” “They have advanced exponentially,” he points out. What worries him is that as these systems become more autonomous, their behavior could become less predictable, harder to interpret and potentially far more dangerous.

One of the “Godfathers of AI” says that the thought of his children woke him up to reality
Bengio does not believe that a purely technical solution is enough. Even a safe methodology, he argues, could be misused “in the wrong hands, for political reasons.” That's why LawZero pairs its research agenda with a heavyweight Board of Directors.
“We will have to make difficult decisions that are not just technical,” he says, and explains: about who to collaborate with, how to share the results of the work and how to prevent it from becoming a “tool of domination.” According to the researcher, the board must ensure that LawZero's mission remains anchored in democratic values and human rights.
Bengio says he has spoken with leaders at major AI labs, and many of them share his concerns. But, he adds, companies like OpenAI and Anthropic believe they need to stay on the cutting edge in order to do something positive with artificial intelligence. Competitive pressure pushes them to build ever more powerful AI systems—and to a self-image in which their work and their organizations are inherently beneficial.
“Psychologists call this motivated cognition,” Bengio said. “We don't even allow certain thoughts to arise if they threaten who we ourselves think we are.” He says that's how he saw his own AI research, “until it kind of blew up in my face when I thought about my kids and whether they're going to have a future.”
For an AI leader who until recently feared that advanced artificial intelligence might be, by design, uncontrollable, Bengio's recent optimism seems a positive sign, though he admits his perspective is not one widely shared among researchers and organizations focused on AI's potential catastrophic risks.
However, he does not give up his belief that a technical solution exists. “I'm increasingly confident that it can be done within a reasonable number of years” so that “we can have a real impact before these actors become so powerful that their misalignment causes terrible problems.”




