programming stack

Building a Web Scraping Based SaaS Business, part 3

Hey I’m back again with a new business documentation post. The last time we talked about how I validated my idea without starting to code. Now in this one, I’m gonna go (sort of) deep on the technical side. What programming languages I use for what. Which web framework I utilize and what kind of database I have.

Web scraping

So first of all, I mentioned in the first article of this series, that this project is heavily based on web scraping. Web scraping provides data that makes the whole thing work so it’s really crucial to choose a scalable, powerful web scraping framework. I have massive experience using scrapy so it is really a no brainer for me that I choose it. The thing I love about scrapy is that it allows me to build awesome abstractions upon the framework. Right now, I have about 5-6 scrapy website crawler templates and these crawlers are reusable every time on different websites. I just need to write the selectors. I have created a system that allows me to write effective scrapers for any websites in minutes(!) I will definitely talk about it in a later post.


At the moment, I host my web spiders in scrapy cloud. It works amazingly well. No problems. No hassle. Great support. And the best thing is that I can use it for free which is pretty cool for such an excellent service. Chances are that later I will need to upgrade my plan in the future. I will definitely keep using it instead of trying to deploy my spiders on my own servers.


First, when I was considering which database system I should choose, I wanted to store my data in simple json format. Then I quickly realized that it’s not gonna work because it’s not really scalable and when you have a massive amount of JSON data it’s a nightmare to work with. An SQL solution is definitely the best idea for this project. I use MySQL as my database server. I choose it because I have the most experience with this one.


So speaking about backend languages, I was hesitating between two languages. Java and python. Java was also considered because I have some experience in building web applications with java and spring. Unlike in python. If I had chosen Java as my backend language the initial phase of development would have been quicker because I wouldn’t have had to learn how to use the framework. In spite of that, I chose python because I knew that I would need to do a great amount of data mining and data transformation activities. In python the pandas library makes it easy to work with huge datasets.

Okay so I chose python because I also want to work with data not just use it. Then, I recognized that I don’t need a full-fledged web framework I just need something that works easily, extensible and scalable. I did a research and realized that Flask web framework is what I need. It’s pretty easy to learn and use. Setup and initial development is quick. Also, it’s surprisingly extensible and scalable. For example there’s flask-login, an extension which handles user session management. Great choice for me.


Frontend is an interesting field for me. I have never done any kind of development on frontend. I’m not HTML and CSS expert. I barely code in javascript. Also, I’m not good at design. So I knew this would be challenging and fun. But good thing for me that I don’t need fancy stuff right now. Also, I learnt a lot from Matt Kremer’s ebook and I can really quickly build the frontend by applying the rule “Don’t reinvent the wheel”. I just need a simple flat user interface for my web app so the clients can use it. I don’t even care if it’s gonna be ugly. That doesn’t matter at the moment. But you know I try my best.

So obviously I use HTML+CSS. Besides, data visualization is very important in a pricing intelligence software so I chose a charting library which is HighCharts.js (may use google charts later because it’s free). Also, I have to display some tables in my app. I found an amazing github repo which converts csv to a fancy html table. Props to the creator.

Project stack overall

Web scraping: Python, Scrapy (hosted on scrapy cloud)

Database: MySQL

Backend: Python, Flask(+flask-login)

Frontend: HTML+CSS+JS, HighCharts.js (may use google charts later), csv-to-html-table

At the end I include some pictures what the webapp looks like right now. I know it is sort of minimalist now but I’m happy it works 😉