Secret Sprawl in Public Repos is Worse Than Ever, Finds New Report

A recent report by security firm GitGuardian has revealed that the unintended exposure of credentials such as API keys and passwords, known as "secrets sprawl," has worsened significantly during 2024. The report, titled "The State of Secrets Sprawl 2025," found an alarming increase of 25% in secrets found in public GitHub code repositories compared to the previous year.

The researchers at GitGuardian scanned public GitHub activity and anonymized customer data to detect nearly 23.8 million new hardcoded secrets in public GitHub commits alone during 2024. This staggering number suggests an increasing trend in sensitive credentials being leaked from development environments, posing a significant threat to organizations and individuals alike.

The Rise of Generic Secrets

A big contributor to the increase in secret sprawl is the rise in "generic" secrets which don't have a recognizable format like API keys or OAuth tokens. According to the report, these are often hardcoded passwords, database connection strings, custom authentication tokens, and encryption keys. These accounted for 58% of detected secrets last year, up from 49% in 2023.

The report highlights that automated scanning tools, such as GitHub's own protection mechanisms, often miss this secret format due to their limitations. However, tools like GitHub's Push Protection have shown promise in detecting and preventing secret leaks, particularly for known credential patterns.

Leaked Secrets Beyond GitHub

The report found secrets leaked not only in public GitHub repositories but also in collaboration and project management tools such as Slack, Jira, and Confluence. More incidents were considered critical in these tools than in GitHub, often attributed to less security-aware employees and fewer safeguards built into these other tools.

Additionally, researchers scanned public Docker Hub images at a large scale, revealing over 100,000 secrets, including AWS and GCP keys, and some belonging to Fortune 500 companies. The lack of a partner notification system for secret exposure on Docker Hub was found to contribute significantly to this issue.

The Longevity of Leaked Secrets

Another concerning finding by the report is that many exposed secrets stay active long after they've been published, with 70% of secrets detected in 2022 still active in 2024. This suggests ineffective credential lifecycle management, particularly for non-human service accounts, as a major culprit.

The Role of Machine Learning

GitGuardian has leveraged machine learning to improve secret hunting in their new report. Previously, the team was cautious not to err on the side of caution with their secrets detection engine, rejecting any element of doubt about whether a string was a secret. The new ML models made the researchers more confident in validating less-structured secrets.

GitHub and GitLab's Efforts

Both GitHub and GitLab have been actively working to address the issue of secret sprawl. GitHub has introduced tools like Secret Protection to help detect and prevent secret leaks, while also using AI for Copilot secret scanning to deal with nuanced and varied structures of generic passwords.

GitLab has also implemented its own solution, GitLab Secret Push Protection, to fulfill a similar objective. These efforts demonstrate the industry's growing recognition of the importance of addressing secrets sprawl and the need for robust security measures in software development.

Conclusion

The report "The State of Secrets Sprawl 2025" serves as a stark reminder of the gravity of secret sprawl in public repositories. The increasing trend of sensitive credentials being leaked poses significant risks to organizations and individuals alike. As the industry continues to evolve, it is essential that we prioritize security measures to prevent such incidents.

The report is available for download now, offering valuable insights into the current state of secrets sprawl and potential solutions to mitigate its impact.