Setting up htmlUnit
You can download the htmlUnit library from HERE.
In htmlUnit you will use the WebClient class to simulate a real browser.
You can instantiate a WebClient like this:
WebClient webClient = new WebClient();
WebClient webClient = new WebClient(BrowserVersion.CHROME); WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);
Obviously, use the one that works the best for you after trying some of them.
Additionally, by getOptions(), if you need you can enable/disable or configure things like CSS, max timeout, SSL etc..
Let’s say you want to scrape some soccer data from THIS soccer stats page.
Here’s the full code:
For example, this snippet scrapes each premiere league team from the page and stores it in a List:
List<HtmlAnchor> teams = (List) page.getByXPath("//td[@class='team']");
Please tell me if it helped you!