Rants of a deranged squirrel.

OSVDB -How bad is the scraping problem?

[This was originally published on the OSVDB blog.]

Via Twitter, blogs, or talking with our people, you may have heard us mention the ‘scraping’ problem we have. In short, individuals and companies are using automated methods to harvest (or ‘scrape’) our data. They do it via a wide variety of methods but most boil down to a couple methods involving a stupid amount of requests made to our web server.

This is bad for everyone, including you. First, it grinds our poor server to a stand-still at times, even after several upgrades to larger hosting plans with more resources. Second, it violates our license as many of these people scraping our data are using it in a commercial capacity without returning anything to the project. Third, it forces us to remove functionality that you liked and may have been using in an acceptable manner. Over the years we’ve had to limit the API, restrict the information / tools you see unauthenticated (e.g. RSS feed, ‘browse’, ‘advanced search’), and implement additional protections to stop the scraping.

So just how bad is it? We enabled some CloudFlare protection mechanisms a few weeks back and then looked at the logs.

To put that in perspective, DatalossDB was hit only 218 times in the same time period by requests with no user agent. We want to be open and want to help everyone with security information. But we also need for them to play by the rules.

Exit mobile version