On Closed-Door AI Safety Research
Epistemic status: Based on multiple accounts, I'm confident that frontier labs keep some safety research internal-only, but I'm much less confident on the reasons underlying this. Many benign explanations exist and may well suffice, but I wanted to explore other possible incentives and dynamics which may come into play at various levels.
I've tried to gather information from reliable sources to fill my knowledge/experience gaps, but the post remains speculative in places. There might be very little time in which we can steer AGI development towards a better outcome for the world, and an increasing number of organisations (including frontier labs themselves) are investing in safety research to try and accomplish this.
However, without the right incentive structures and collaborative infrastructure in place, some of these organisations (especially frontier labs) may not publish their research consistently, leading to slower overall progress and increased risk. In addition to more benign reasons such as time costs and incremental improvements, I argue there may also exist incentives that could result in safety hoarding, where AI labs choose not to publish important frontier safety research straight away for reasons related to commercial gain (e.g. PR, regulatory concerns, marketing).
Independent of the underlying reasons, keeping safety research internal likely results in duplicated effort across safety teams and with external researchers, and introduces the risk of other labs pushing the capability frontier forward without use of these proprietary safety techniques.
This points to a need for organisations like the Frontier Model Forum to exist, with the goal of facilitating research-sharing across both competitors and external research organisations to ensure that teams pursuing vital safety work have access to as much information as possible, hopefully boosting odds of novel research outputs and overall safer models.
Why might decision-makers at AI labs choose to hoard safety research, and not publish it straight away?
NB: I'm leaving obvious dual-use safety research (e.g. successful jailbreaks or safeguard weaknesses) out of scope from this discussion, as I think there are reasonable infohazard concerns regarding publishing this type of research in the public domain.
Some other factors might lead to labs not publishing safety research. When can we make a case for not publishing certain safety research? What problems may arise as a result of AI labs hoarding safety research? What possible upsides are there to AI labs hoarding safety research?
The Problem with Incentives and Collaboration
Disclaimer: This section contains some information I learned through a combination of searching online for details of labs' publication policies, and asking current and former employees directly. As a result of time and contact constraints, this isn't perfectly thorough or balanced, and some information is more adjacent to the topic than directly related.
Many organizations (including frontier labs) are investing in safety research to try and accomplish this, but without the right incentive structures and collaborative infrastructure in place, some may not publish their research consistently, leading to slower overall progress and increased risk.
Examples from Other Industries
A few of the proposed solutions in this study are also interesting with respect to the AI safety hoarding problem: Autonomous vehicle (AV) companies are reluctant to share crash data. Sandhaus et al. (April 2025) investigated the reasons why AV crash data was often kept within companies, and concluded that a previously-unknown barrier to data sharing was hoarding AV safety knowledge as a competitive edge.
This constitutes an example of safety hoarding in the AV industry, and I suspect the effects may have been felt earlier in AV development because (a) unsafe self-driving cars feels like less of a conceptual leap compared to AGI safety, so demand for safety is already high among the buying public; (b) the narrow domain invites tighter regulation faster than general-use AI; and (c) AVs are currently prevented from deploying "capabilities work" until they can demonstrate safety, which is not the case for LLMs.
The Need for Open Collaboration
While there are certain cases in which publishing restrictions might be the right move (dual-use/infohazards), without regulatory insight into the publishing policies of frontier labs and the state of their internal research, those outside of the labs cannot know the full extent of internal safety research.
This makes coordination of this vital work difficult, and without mechanisms in place to help with this, teams across organisations may waste time, money, and compute on work that's being duplicated in many places.
Facilitating Open Sharing
Finding ways to facilitate open sharing of safety work across labs, governmental institutions, academia, and external organisations seems like an important problem to address in order to globally lift the floor of AI safety research and reduce risks from this research being kept proprietary.
Thanks to Marcus Williams, Rauno Arike, Marius Hobbhahn, Simon Storf, James Peters-Gill, and Kei Nishimura-Gasparian for feedback on previous drafts of this post. I'm not including safety researchers themselves here.