Merging Branch 'en/replay-wo-the-repository': The Battle Against AI Scraping
As you're reading this, it's likely because the administrator of this website has taken steps to protect its server from the scourge of AI companies aggressively scraping websites. This measures are designed to prevent downtime and keep resources inaccessible for everyone.
The protection system in place is called Anubis, which employs a Proof-of-Work scheme inspired by Hashcash. The goal of this system is twofold: to deter individual-scale scraping activities while making mass scraper efforts prohibitively expensive. While it may seem like a small burden at an individual level, the cumulative impact on AI companies becomes significant when dealing with large numbers.
Behind Anubis lies a clever hack designed to provide a "good enough" placeholder solution. The ultimate aim is not to dissuade legitimate users but to buy more time for researchers and developers to work on fingerprinting and identifying headless browsers – those AI-powered browser emulators that pose significant challenges in authenticating user identities.
However, Anubis comes with a caveat: it requires the use of modern JavaScript features. Plugins like JShelter are known to disable these features, rendering Anubis ineffective for websites relying on them. For now, users must enable JavaScript to bypass this challenge.
The rise of AI-powered scraping has irrevocably altered the social contract around website hosting. The current solution – a no-JS approach – is still in its infancy and requires significant advancements before it can be implemented effectively. Until then, Anubis stands as an innovative yet imperfect shield to safeguard websites against the relentless onslaught of AI-driven scrapers.