A Potential Path to Safer AI Development

Imagine you're in a car with your loved ones, following an unfamiliar road up a spectacular mountain range. The problem? The way ahead is shrouded in fog, newly built, and lacking both signposts and guardrails. The farther you go, the more it's clear you might be the first ones to ever drive this route. To either side, you catch glimpses of precipitous slopes.

This scenario echoes the current trajectory of AI development – an exhilarating but unnerving journey into the unknown where we could easily lose control. Since the 1980s, I've been actively imagining what this technology has in store for humanity's future and contributed many of the advances that form the basis of state-of-the-art AI applications we use today.

I've always seen AI as a tool for helping us find solutions to our most pressing problems, including climate change, chronic diseases, and pandemics. However, my perspective completely changed in January 2023, shortly after OpenAI released ChatGPT to the public. It wasn't the capabilities of this particular AI that worried me, but rather how far private labs had already progressed toward Artificial General Intelligence (AGI) and beyond.

Since then, even more progress has been made, as private companies race to significantly increase their models' capacity to take autonomous action. Now, it is a common stated goal among leading AI developers to build AI agents that can surpass and replace humans.

However, this raises significant concerns about the potential risks of unchecked AI agency. As the capabilities and agency of AI increase, so too does its potential threat to public safety. Unchecked AI agency poses the greatest threat to public safety.

So, my team and I are forging a new direction called "Scientist AI." It offers a practical, effective—but also more secure—alternative to the current uncontrolled agency-driven trajectory. Scientist AI would be built on a model that aims to more holistically understand the world. This model might comprise, for instance, the laws of physics or what we know about human psychology.

It could then generate a set of conceivable hypotheses that may explain observed data and justify predictions or decisions. Its outputs would not be programmed to imitate or please humans, but rather reflect an interpretable causal understanding of the situation at hand.

Basing Scientist AI on a model that is not trying to imitate what a human would do in a given context is an important ingredient to make the AI more trustworthy, honest, and transparent. It could be built as an extension of current state-of-the-art methodologies based on internal deliberation with chains-of-thought, turned into structured arguments.

Crucially, because completely minimizing the training objective would deliver the uniquely correct and consistent conditional probabilities, the more computing power you give Scientist AI to minimize that objective during training or at run-time, the safer and more accurate it becomes.

We think Scientist AI could be used in three main ways:

1. **Guardrail against self-preservation**: By double-checking the actions of highly capable agentic AIs before they can perform them in the real world, Scientist AI would protect us from catastrophic results, blocking actions if they pass a predetermined risk threshold. 2. **Honest and justified explanatory hypotheses**: Scientist AI would ideally generate honest and justified explanatory hypotheses. As a result, it could serve as a more reliable and rational research tool to accelerate human progress in fields like biology, material sciences, chemistry, and other domains. 3. **Safe design of powerful AI models**: As a trustworthy research and programming tool, Scientist AI could help us design safe human-level intelligence—and even safe Artificial Super Intelligence (ASI). This may be the best way to guarantee that a rogue ASI is never unleashed in the outside world.

Scientist AI offers a practical, effective—but also more secure—alternative to the current uncontrolled agency-driven trajectory. It provides a new direction for researchers, developers, and policymakers to focus on the development of generalist AI systems that do not act like the agents industry is aiming for today, which show many signs of deceptive behavior.

Of course, other scientific projects need to emerge to develop complementary technical safeguards. This is especially true in the current context where most countries are more focused on accelerating technology's capabilities than efforts to regulate it meaningfully and create societal guardrails.