Merge branch 'maint-2.46' into maint-2.47: Understanding Anubis, the Protection against AI Scraping

As you're reading this article, it's likely because the administrator of this website has set up a special protection system called Anubis to safeguard its server from the threats posed by aggressive AI companies scouring websites for data. This measure may cause temporary downtime, making the site inaccessible to everyone who depends on it.

Anubis is a compromise that seeks to balance the need for security with the convenience of unrestricted access to the website. It employs a Proof-of-Work scheme similar to Hashcash, which was initially proposed as a solution to reduce email spam. At an individual scale, the additional load caused by Anubis may be negligible, but when faced with large-scale scraping operations, it significantly increases the cost and difficulty of such activities.

The true purpose of Anubis lies not in its capacity to prevent legitimate users from accessing the site, but rather as a stepping stone towards more advanced solutions. Its real intention is to serve as a temporary placeholder while researchers focus on developing more sophisticated methods for fingerprinting and identifying headless browsers – a crucial step towards preventing AI companies from exploiting these vulnerabilities.

One critical aspect of Anubis is its requirement for modern JavaScript features, which can be disabled by plugins like JShelter. Unfortunately, this means that users must enable JavaScript to bypass the challenge proof-of-work page and gain access to the site. This measure is essential in light of the new social contract around website hosting, which AI companies have reshaped to prioritize their interests over those of legitimate users.

The use of a no-JS solution for Anubis remains a work-in-progress, as the threat landscape continues to evolve with the rise of advanced AI-powered scraping tools. However, by understanding the purpose and mechanics of Anubis, we can begin to appreciate its role in safeguarding our online security and promoting a more balanced approach to website protection.

It is worth noting that Anubis may cause downtime for affected websites, making their resources inaccessible to everyone who relies on them. In this light, it is crucial to consider the implications of such measures on legitimate users and to advocate for more robust solutions that can keep pace with the rapidly changing threat landscape.