Activity tagged "scraping"

Posted:

Fighting bots is fighting humans

One advantage to working on freely-licensed projects for over a decade is that I was forced to grapple with this decision far before mass scraping for AI training.

In my personal view, option 1 is almost strictly better. Option 2 is never as simple as "only allow actual human beings access" because determining who's a human is hard. In practice, it means putting a barrier in front of the website that makes it harder for everyone to access it: gathering personal data, CAPTCHAs, paywalls, etc.

This is not to say a website owner shouldn't implement, say, DDoS protection (I do). It's simply to remind you that "only allow humans to access" is just not an achievable goal. Any attempt at limiting bot access will inevitably allow some bots through and prevent some humans from accessing the site, and it's about deciding where you want to set the cutoff. I fear that media outlets and other websites, in attempting to "protect" their material from AI scrapers, will go too far in the anti-human direction.

I guess there are only two options left:
  1. Accept the fact that some dickheads will do whatever they want because that’s just the world we live in
  2. Make everything private and only allow actual human beings access to our content