Before 2000, web scraping was a gray area in the legal system of US. There was no significant precedent around web scraping. The first time a company was sued for web scraping related activities happened on December 10, 1999, Ebay v. Bidder’s Edge. Bidder’s Edge was an aggregator of auction listings. Users could search through tons of auction websites, including Ebay, without visiting each individual site. Bidder’s Edge accessed eBay approximately 100,000 times a day, it was 1.53% of the whole traffic.
When Ebay got the case to the court they claimed that Bidder’s Edge without authorization interfered with eBay’s possessory interest in the computer system and resulted in damage to Ebay. In March 2001, they settled their legal disputes: Bidder’s Edge paid Ebay an undisclosed amount and agreed not to acces Ebay’s data.
Frequent legal issues in web scraping
You are probably familiar with the text above. How does it relate to web scraping? Actually it doesn’t. The point here is what you do with the data after you scraped it. If it is copyright-protected – in most cases it is – then you can’t do whatever you want with the data. For example, you cannot use it for commercial purposes or you cannot upload it to your own site. So next time you scrape some website make sure you are allowed to use the scraped information the way you want to.
Violation of the Computer Fraud and Abuse Act (CFAA)
This law wasn’t invented to prevent web scraping as well. Actually it is mainly against hackers. In short, this one is about fetching data by getting unauthorized access to a page. Considering that using web scraping techniques you can only reach data that are publicly available we would think we have nothing to do with this law. Though, it is also true that some scrapers, taking advantage of people or making fun of them, can violate this law. This was exactly how Jerk.com worked back then in 2009. It stole personal photos from facebook then asked for money to remove it. It’s really unethical and unlawful.
Trespass to Chattel
This one is really easy to forget about. At least from a scraper perspective. You violate this law if you hurt directly the website server in any way. Actually, as a web scraper, it’s an easy thing to do. A typical mistake that almost every novice scraper do: they make requests relatively too often. At first, we don’t care about the number and frequency of http requests. We care about only getting the data we need as soon as possible. It’s not a good practise! Having 1000 request of various pages per second can really decrease the performance of the site. You violate trespass to chattel if you make the server slow or even stop because of your scraper. Or if your scraper does something that distracts the natural life of the website you do something wrong! And even worse that can happen is the website owner could think that you’re doing harm intentionally to the website by requesting pages with high frequency. It may seem like you try to attack the website. Be careful!
At the end, I’m supposed to tell you some important statements. I’m not a lawyer. I don’t even know any lawyers. This whole post is based on my web scraping experience and a comprehensive research on the topic. I hope you get some value out of it.