**AI's Unpaid Debt: How LLM Scrapers Destroy the Social Contract of Open Source**

The open source movement has revolutionized the way we develop and share software, music, and art. By releasing their work under permissive licenses, creators have empowered others to build upon and modify their creations, fostering a culture of collaboration and innovation. However, a recent phenomenon threatens to undermine this social contract: LLM (Large Language Model) scrapers that hoover up vast amounts of open source data without permission or compensation.

These AI-powered content thieves are not just stealing from individuals; they're also exploiting the very communities that have made open source possible. By stripping provenance from contributions and transforming them into copyright-free data, LLMs are essentially creating a new class of intellectual property – one that benefits big tech at the expense of creators.

So, what exactly is copyleft? In simple terms, it's a way to license a work under open source principles while ensuring that any modifications or derivatives remain freely available. This approach preserves the openness and collaboration that define the open source ethos. By contrast, permissive licenses allow anyone to use, modify, and distribute a work without restrictions.

The benefits of copyleft are numerous. It has enabled the development of powerful software like Linux, which powers everything from Android devices to supercomputers. Copyleft has also fueled innovation in fields like browser engines (Blink and WebKit), game engines (KDE's KHTML), and even desktop environments (GNOME and KDE). The open source stack has become so robust that Valve now promotes the Linux desktop as a viable gaming platform, allowing gamers to run Windows games without Microsoft's operating system.

However, LLM scrapers have disrupted this delicate balance. By incorporating copyleft data into their models, they're essentially breaking the covenant between creators and users. The AI companies are not sharing alike; instead, they're removing copyright protections and rendering contributions as unattributed, public domain works. This process robs creators of their intellectual property rights and undermines the very principles that made open source so successful.

But why should we care? For one, LLM piracy hurts communities that rely on voluntary contributions. When contributors realize that their work is being stolen without compensation or attribution, they'll naturally withdraw their efforts. This shift can already be seen on Stack Overflow, where a 50% decrease in posts has occurred since ChatGPT's release.

Sean O'Brien, founder of the Yale Privacy Lab, highlights another alarming consequence: "Now those same corporations are using that wealth and compute to train opaque models on the very codebases that made their existence possible, and threatening the legal structures, such as reciprocal or copyleft licenses like GNU GPL, by labeling all the outputs of genAI chatbots public domain."

It's time for contributors to ask themselves: Does it make sense to continue sharing our work with big tech LLMs without compensation? Is this a fair exchange of value? If you're among those who believe in the open source spirit, consider supporting creators like myself who are fighting to preserve the social contract. Share this article on Mastodon or follow me on the platform for more updates.

As Fobazi Ettarh's concept of "vocational awe" reminds us, many contributors share their work out of a sense of duty and love for the community. However, when AI companies exploit these efforts without permission or compensation, they're essentially taking advantage of this generosity. It's time to reclaim our intellectual property rights and redefine the terms of collaboration in the digital age.