Crawling with Scrapy – Crawling Settings

Scrapy provides a convenient way to customize the crawling settings of your scraper. Including the core mechanism, pipelines and spiders.  When you create a new scrapy project with scrapy startproject command you will find a file. Here you can customize your scraper’s settings.

Scrapy Settings

Let’s examine the key settings which you may have to modify for each project.

USER_AGENT = 'myscraper ('

The user agent should identify who you are. Most websites you cannot visit without a user agent.


By default this is set to True so your scraper will follow the guidelines defined in the site’s robots.txt. Every time you scrape a website your scraper should operate ethically.

    'myscraper.pipelines.ExportPipeline': 100,
   #'myscraper.pipelines.SamplePipeline': 200

Pipelines are meant to process items right after scraping. It’s important that you have to make it clear which pipelines you want to apply while scraping. In this example the second pipeline is commented out so not activated it won’t be invoked.

SPIDER_MODULES = ['myscraper.spiders']

Here you have to declare where the spiders are inside your project.


The requests scrapy can make at the same time. This is 16 by default. Be careful when you set it to avoid damaging the website!


The delay between requests given in seconds. Default value is 0! You might want modify this to be nicer to the website.

   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
   'Accept-Language': 'en',

Some website recognizes scrapy’s default request headers so it might be a good idea to customize it.


Scrapy’s AutoThrottle extension is designed to adjust the speed of crawling according to the scrapy server and the crawled website server. In high-volume projects it’s useful to enable.

This is a very brief guide to scrapy settings. These are the most frequent settings I adjust in almost every project. I suggest you checking out the official doc here if you want to know more.

Modify Settings in Command Line

You can override any settings in the command line with -s (or –set):

scrapy crawl spidername -s DOWNLOAD_DELAY=3

Settings for Specific Spiders

You can define settings specifically for certain spiders:

class SampleSpider(scrapy.Spider):
    name = 'samplespider'

    custom_settings = {
        'DOWNLOAD_DELAY': '3',

It will override the DOWNLOAD_DELAY attribute in

Accessing Settings Objects

In your spiders you have access to the settings through self.settings:

class SampleSpider(scrapy.Spider):
    name = 'samplespider'

    def parse(self, response):
        print("Settings: %s" % self.settings.attributes.keys())

If you want to access settings in your pipeline you have to override from_crawler method. Crawler has settings attribute:

    def from_crawler(cls, crawler):
        settings = crawler.settings
        return cls(settings.getint('DOWNLOAD_DELAY'))

You should use these settings obejcts according to its API.

Free Ebook

scrapy fundamentalsScrapy Fundamentals

Your information will be used to send you these ebooks and subscribe you to our weekly newsletter. We will only send you relevant information. We may use your email address for marketing purposes but we will never sell or share your information to any third parties. You can unsubscribe at any time.