Merge branch 'ch/t7450-recursive-clone-test-fix': The Battle Against AI Scraping
As you gaze upon this page, you may wonder what brought you here. You are seeing this because the administrator of this website has taken proactive measures to protect their server from a growing threat: aggressive AI companies scraping websites for valuable data. This endeavor is called Anubis, a compromise solution designed to discourage mass scrapers without completely crippling access to legitimate users.
Anubis is built upon a Proof-of-Work scheme reminiscent of Hashcash, a proposed method for reducing email spam. On an individual scale, the added load is barely noticeable; however, when mass scrapers flood a website, this scheme becomes increasingly expensive and less viable.
The primary aim of Anubis is not to eliminate web scraping entirely but rather to provide a temporary placeholder solution that allows time to be spent on more critical tasks: fingerprinting headless browsers. By doing so, it shifts the focus away from presenting users with a challenge proof-of-work page, whereheadless browsers are more likely to pass, thereby increasing security.
However, Anubis comes with its own set of requirements. Modern JavaScript features, often enabled by plugins such as JShelter, must be utilized for navigation past the Anubis hurdle. Unfortunately, this means disabling these plugins or other similar tools on this domain.
This website is running Anubis version 1.20.0 and demands that users enable their JavaScript to bypass the challenge. The reason lies in a shift in the social contract around how web hosting works, now influenced by AI companies' interests. While an all-JS solution is being worked on, for now, enabling JavaScript remains mandatory.
As we navigate these evolving digital landscapes, it's crucial to acknowledge and address threats such as AI scraping, protecting both legitimate users and the websites they rely on.