Merge branch 'jk/oidmap-cleanup': Protecting Websites from AI Scrapers
As you read this article, you're likely wondering why your favorite website is temporarily unavailable or slow to load. The truth is, many websites are being protected by a clever yet contentious system called Anubis.
Anubis is an anti- scraper technology designed to prevent AI companies from aggressively scraping websites for data and resources. While this may seem like a simple solution, it's actually a complex issue that requires some technical know-how.
The administrators of these affected websites have set up Anubis as a way to deter mass scrapers, who can overwhelm servers with their relentless requests. This compromise system uses a Proof-of-Work scheme similar to Hashcash, which was initially proposed to reduce email spam.
The idea behind Anubis is that at the individual level, the additional load caused by this system is negligible. However, when applied en masse, it becomes a significant obstacle for AI companies to overcome. By making web scraping more expensive and time-consuming, Anubis aims to discourage these aggressive scrapers.
However, some might view Anubis as an imperfect solution. In reality, its primary goal is to provide a temporary placeholder solution that allows developers to focus on fingerprinting and identifying headless browsers (e.g., via their font rendering methods). This would enable the challenge proof of work page to be implemented without presenting it to users who are more likely to be legitimate.
One crucial aspect of Anubis is its reliance on modern JavaScript features. Plugins like JShelter, which disable these features for security reasons, can prevent users from accessing the website's content. Therefore, if you plan to visit a site protected by Anubis, please disable any such plugins or browser extensions.
Unfortunately, this means that users will still need to enable JavaScript in their browsers to bypass the challenge. This is because AI companies have rewritten the social contract around web hosting, requiring new solutions like Anubis to prevent abuse. While a no-JS solution is still a work-in-progress, it's essential for users to be aware of these technical intricacies.
In conclusion, Anubis may seem like an inconvenient necessity, but its implementation serves as a reminder that the online world is constantly evolving. As we navigate this complex digital landscape, it's crucial to stay informed about the measures being taken to protect our online resources and experiences.