Extracting Content from an LCP "Protected" ePub
Cory Doctorow once said, "Any time that someone puts a lock on something that belongs to you but won't give you the key, that lock's not there for you." But what about when the lock is indeed there, and it does provide the key? This is the case with LCP, a DRM scheme used by some eBook publishers. While it may seem counterintuitive, we can use a technique to extract the unencrypted content from an LCP-protected ePub.
One popular app that uses this technique is Thorium, an Electron Web App that runs on your own computer and doesn't touch anyone else's machine. The code is debuggable, making it easy to extract the content without compromising any technical controls or revealing sensitive information.
Setting Up Remote Debugging in Thorium
To start, we need to enable remote debugging in Thorium. We do this by opening up the book we want to read and going to http://localhost:9223/ in Chrome. This will list all the Thorium windows, including the one for our book. We then click on the link related to our book and navigate through it as usual.
Next, we open up the debug window by clicking on the three dots in the top right corner of the Thorium app. In the debug window, we should see the text and images pop up. The "Content" tab will show us the decrypted files, including images that are full-resolution copies of their original ePub counterparts.
We can save these images by right-clicking on them in the developer tools and selecting "Save as" or "Save image as." We can also copy the filenames for the CSS, fonts, and text files and give them to Thorium, which will read them for us.
Extracting Metadata and HTML
The metadata, such as the NCX and OPF files, are easily decrypted without issue. The CSS is also readable directly and can be printed to the console. However, it's larger than the original due to Thorium's injected directives.
The HTML of the book is visible on the Content tab, but it includes additional CSS and JS that weren't present in the original ePub. Once we get to the body of the HTML, we can see plain old ePub content. We can use the fetch command to extract the xHTML files, just like with the images.
If you've unzipped the original ePub, you'll see the internal directory structure. Simply add the extracted files into this exact structure, zip them, and rename the .zip file to .epub. And that's it – we now have a DRM-free copy of our purchased book!
LCP 2.0 PDFs are Also Extractable
Similarly, LCP 2.0 PDFs can be extracted using Thorium with debug mode active. We simply open the purchased PDF in Thorium, navigate to the debugger, and find the URL for the decrypted PDF.
Copying this output and base64 decoding it will give us an unencumbered PDF. This means that if we ever buy an ePub with LCP Profile 2.0 encryption, we can manually extract what we need without reverse engineering the encryption scheme.
Conclusion
In conclusion, while some may argue that accessing DRM-protected content is illegal or against terms of service, we believe that this post is a legitimate attempt to educate people about the deficiencies in Readium's DRM scheme. By using Thorium and its debug capabilities, we can extract the unencrypted content from an LCP-protected ePub.
We're not advocating for circumvention or reverse engineering; simply highlighting the fact that there are vulnerabilities in the system that can be exploited. We will continue to monitor any further correspondence related to this issue and look forward to publishing any additional updates or insights.