This is something new. I’ve just started out the ScrapingAuthority Youtube channel. On this channel you will find videos about web scraping, data processing, data mining, big data and some other stuff. Also, I’m gonna share my progress with PriceMind. As always I appreciate your comments and try to create content that’s valuable for you!
If your job includes writing spiders regularly you should find a way so you do it quickly and effectively. In todays video I’m gonna show you how I write my scrapy spiders using some tricks and tips.
Figuring out selectors are necessary for your scraper. That’s why it’s important to develop a process so you can come up with selectors as quickly as possible. My process includes simply hitting up the inspector in my browser and find the element I need. I highly suggest that you should learn about css if you don’t know it yet because it will help you figure out selectors. But if you’re looking for a simpler solution you can just right click on the element in the inspector and copy the selector from your browser. That’s it. I’m showing you how to do it in the video above.
Scrapy shell is the #1 productivity tool for you while building a scraper. It helps you debug your spider right on spot when something unexpected happens. Also with it you can test the selectors without running the whole spider. It’s pretty cool. I tell you more about scrapy shell in the video and in this post: How To Debug Your Spider
Scrapy caching is another excellent way to save time while developing your spider. A real life spider could take several minutes to finish running. I really don’t like to wait for this while I try to test if it really works well. Scrapy cache saves all the html files your spider scrapes so the next time it will fetch data from the saved files. This way it’s quicker to test the spider because scrapy doesn’t have to request the actual page.
Item Loaders, Processors
I always use item loaders and i/o processors whenever I can. It makes the code more readable and modular. At least that is what I discovered. In the video I show you what processors I use mostly. Read more about item loaders and processors here.
At the end, I mention my github repo where you can find the templates I use nowadays to create my spiders. Read more about it here: Scrapy Spiders In Minutes