I guess there are only two options left:
- Accept the fact that some dickheads will do whatever they want because that’s just the world we live in
- Make everything private and only allow actual human beings access to our content
Seen a couple takes about the Hachette case along the lines of “the Internet Archive should’ve stuck to just archiving the Internet and not testing new theories of copyright” and uhhh... I’m not sure what it is you think the Internet Archive does, outside of testing new theories of copyright.
People have gotten so used to the existence of the Internet Archive’s web archive that they forget how revolutionary and subversive it is. The idea that that is somehow safe while the book lending was not is completely flawed. They were just up against a more powerful group.
What was the alternative? That they only archive and distribute works that are copyrighted by people with sufficiently little power/wealth?
The Internet Archive lost its appeal in the Hachette case. What a devastating loss for all of us.
One advantage to working on freely-licensed projects for over a decade is that I was forced to grapple with this decision far before mass scraping for AI training.
In my personal view, option 1 is almost strictly better. Option 2 is never as simple as "only allow actual human beings access" because determining who's a human is hard. In practice, it means putting a barrier in front of the website that makes it harder for everyone to access it: gathering personal data, CAPTCHAs, paywalls, etc.
This is not to say a website owner shouldn't implement, say, DDoS protection (I do). It's simply to remind you that "only allow humans to access" is just not an achievable goal. Any attempt at limiting bot access will inevitably allow some bots through and prevent some humans from accessing the site, and it's about deciding where you want to set the cutoff. I fear that media outlets and other websites, in attempting to "protect" their material from AI scrapers, will go too far in the anti-human direction.