How to Write the Best XPATH and CSS Selectors for Your Web Scraper

css selector

css selector

Selectors are one of the most important pieces of your scraper. Well-written selectors make your web scraper work efficiently and fast. When the website’s layout changes your scraper’s selectors need to be changed as well. Then, in a well-established scraping environment the only things that have to be changed are the selectors. In this post I will dig into CSS selectors and XPATH and share some good tips with you to write effective and fast selectors for your web scraper.

CSS Selectors and XPATH

 

CSS Selectors

Css selectors are widely used by frontend developers to associate css properties with their html elements. For web scrapers, we can use it to navigate in the structure of an html file. If you are a beginner scraper and you’re familiar with css then I suggest that you should use css selectors over xpath, though in some cases you have to use xpath.

XPATH

Xpath is a specification which is created to help you navigate in any XML document so you can use it while you’re parsing an html file. Almost each html parsing or web scraping related library has Xpath support. It’s a more robust and powerful way to locate elements than css selectors.

 

CSS Selectors Basic

#x

Element that has x id.

.x

Elements that has x class. Selects all elements that have x class

x y

Element is a direct or non-direct descendant of x.

x, y

Elements that are x or y.

x + y

First element that is immediately preceded by x

x > y

Element is a direct child of x.

x ~ y

All elements that is preceded by x

x[y]

Element that has y attribute.

x[y=’z’]

Element’s y attribute is “z”.

x:last-child

Elements that is the last child of its parent.

x:empty

Elements that have no children.

XPATH Basic

/

Start searching from root node.

//

Start searching from the start of the document.

//x[@id=’y’]

Element that has y id.

//x[@class=’y’]

Elements that has y class.

//x | //y

Selects elements that are x or y. Searching in the whole document.

//x[@y=’z’]

Elements that has y attribute which are z.

//x/y/z

Element is direct descendant of y and y is direct descendant of x.

//x/text()

Selects the text in x.

//x/y[N]

The Nth y element that is a child of x.

4 Basic Tips to Write Effective Selectors

  • Be specific if necessary and at the same time use as short selectors as you can.
  • Know the HTML structure of the website thoroughly. Take time to go over it.
  • Maintain the selectors. If the layout changed you probably need to change your code.
  • Write selectors for yourself. Try to avoid tools.

 

XPATH and CSS Selector Generator Tools

It can take a lot of time to figure out and test your selectors especially if it is a large project. If you are not afraid of messy CSS Selectors or XPATH or simply you don’t want to waste time writing your own selectors you can use one of the amazing tools below to make your job easier. These tools will generate your desired selectors and xpath. Be aware that these tools don’t necessarily create the most readable and most efficient piece of code. Also, they sometimes generate wrong strings that doesn’t select what you need.

CSS Selector Tools

http://selectorgadget.com/

https://chrome.google.com/webstore/detail/css-selector-helper-for-c/gddgceinofapfodcekopkjjelkbjodin

http://getfirebug.com/

XPATH Tools

https://extendsclass.com/xpath-tester.html

http://www.altova.com/xmlspy/xpath-analyzer.html#xpath_analyzer20

http://xmltoolbox.appspot.com/xpath_generator.html

http://getfirebug.com/

 

If you liked the post you should join the Scraping Authority Community on Facebook.