#BHUSA: OpenAI Launches Red Teaming Challenge for New Open-Weight LLMs

In a significant milestone, OpenAI has rolled out two new open-weight large language models alongside a groundbreaking red teaming challenge with a prize fund of $500,000. On August 5, at 10am Pacific Time (PT), Sam Altman, OpenAI’s CEO, posted “gpt-oss is out” on his social media, signaling the release of GPT OSS, which stands for ‘GPT open source.’ This move marks a significant step towards making cutting-edge AI technology more accessible and transparent.

The two new models, gpt-oss-120b and gpt-oss-20b, are available for developers on most AI and cloud platforms. These open-weight LLMs have been fine-tuned to deliver strong performance and agentic tool use, setting a new standard in the field of artificial intelligence. According to Eric Wallace, a researcher at OpenAI responsible for safety, robustness, and alignment, before releasing the models, OpenAI conducted a "first of its kind safety analysis" to intentionally maximize their bio and cyber capabilities.

The goal of this analysis was to estimate a rough 'upper bound' on the possible harms from adversaries. To achieve this, the team fine-tuned the models with in-domain data to maximize biorisk capabilities and with a coding environment to solve capture the flag (CTF) competitions for cybersecurity. The results revealed that the "malicious-finetuned gpt-oss underperforms OpenAI o3, a model below Preparedness High capability" and that while it "marginally outperforms open-weight models on bio capabilities," it "does not substantially push the frontier."

The launch of these new models comes with an exciting opportunity for researchers, developers, and AI hobbyists to contribute to identifying novel safety issues. OpenAI has launched a red teaming challenge for gpt-oss-20b on Kaggle, a competition platform for data science and artificial intelligence contests. The objective is to uncover previously undetected vulnerabilities and harmful behaviors ranging from lying and deceptive alignment to reward-hacking exploits.

Participants are invited to submit up to five distinct issues along with a detailed, reproducible report. The challenge focuses on several nuanced and sophisticated forms of model failure, including sabotage, inappropriate tool use, and data exfiltration. Submissions will be evaluated on several criteria, including the severity of harm, breadth of harm, novelty, and reproducibility of the findings.

The competition encourages creativity and innovation, allowing for various methodologies and rewarding participants who share open-source tooling and notebooks to help the broader community build on their work. The hackathon started on August 5, 2025, and all final submissions are due by August 26, 2025, at 11:59 PM UTC.

The judging period will take place from August 27 to September 11, 2025, with an estimated winner's announcement on September 15, 2025. A virtual workshop is scheduled for October 7, 2025. As AI continues to boom and attract new security talent, experts are optimistic about the future of AI security.

During a panel session at the AI Summit, Victoria Westerhoff, the director for AI safety and security red teaming at Microsoft, praised the approach OpenAI is taking on AI red teaming. She highlighted the launch of such open red teaming challenges and building the OpenAI Red Teaming Network as significant steps towards improving AI security.

"I think in the next three to five years, there is an opportunity, with AI adoption, to mine the plethora of people who are obsessed with AI security right now and who, a few years ago, would never have been involved with traditional cybersecurity," Westerhoff said. "We want to stand on the shoulders of giants and use new perspectives, broadening the scope of experts involved in security."

This move marks an exciting development in the field of artificial intelligence, demonstrating OpenAI's commitment to making cutting-edge technology more accessible and transparent. With its groundbreaking red teaming challenge and open-weight large language models, OpenAI is taking a significant step towards improving AI safety and security.

HACKER_BLOG

#BHUSA: OPENAI LAUNCHES RED TEAMING CHALLENGE FOR NEW OPEN-WEIGHT LLMS