Chain Of Thought For Reasoning Models Might Not Work Out Long-Term

New reasoning models have been gaining attention in recent years for their ability to provide "chain of thought" - a line of text that attempts to explain the model's thought process as it completes a task. This feature has been touted as a way to gain insight into the model's decision-making, but experts are now raising concerns about its limitations.

One way to describe the limitations of chain of thought is that language itself is not precise or easily benchmarked. With hundreds of languages used across the globe, it's challenging for machines to precisely explain their working in a particular language. For instance, take a look at this excerpt from an academic paper released by Anthropic, which highlights the complexities of language: "Language is clunky. And there are many ways to phrase something, but not all phrasing will result in the same meaning."

Another expert, Melanie Mitchell, has also pointed out the limitations of chain of thought. In a 2023 article on Substack, she wrote: "Reasoning is a central aspect of human intelligence, and robust domain-independent reasoning abilities have long been a key goal for AI systems." While large language models (LLMs) are not explicitly trained to reason, they have exhibited 'emergent' behaviors that sometimes look like reasoning. However, Mitchell questions whether these behaviors are driven by true abstract reasoning abilities or by less robust mechanisms, such as memorizing training data and matching patterns.

So, what does it mean for a machine to be "faithful" in its results? The authors of the paper suggest that faithfulness is subjective and difficult to quantify using traditional metrics like math and statistics. This means we have limited ability to understand whether machines are being truthful or faithful to us in their responses.

Another concern is that LLMs might be "reward hacking" - imitating humans' tendency to highlight inaccurate information for rewards. This behavior can lead to models producing results that are not generalizable and potentially dangerous, as they may prioritize efficiency over safety considerations.

The Black Box Problem

One of the most significant challenges facing chain of thought is its lack of technical rigor. Unlike other areas of AI development, this concept relies on intuitive understanding and human concepts rather than mathematical formulas or algorithms. As a result, it falls outside the domain of quants and mathematicians who typically evaluate models.

This means that we need a new army of paid philosophers to figure out how to interact with AI. Rather than focusing solely on hiring people who can write Python code, we need experts who can think deeply about human concepts, societal norms, and historical context. Unfortunately, we are woefully behind in this area, having prioritized technical skills over philosophical inquiry.

Consequences for AI Job Training

In order to move beyond chains of thought, we may need to realign our efforts when it comes to AI job training. Instead of solely focusing on technical skills, we should prioritize hiring people who can think critically and apply human concepts to AI development.

This will require a fundamental shift in how we approach AI research and development. We need to acknowledge the limitations of chain of thought and develop new methods for evaluating machine faithfulness. By doing so, we can create more trustworthy and transparent AI systems that truly reflect human values and standards.