Web scrapers come in many different forms.
From simple browser plugins to more robust software applications. Depending on the web scraper you’re using, you might or might not be able to scrape multiple pages of data in one single run.
Today, we will review how to use a free web scraper to scrape multiple pages of data. These include pages with 2 different kinds of navigation.
For this, we will use ParseHub, a free and powerful web scraper that can extract data from any website.
Web Scraping with ParseHub
General Web Scraping Our scraper API is the perfect tool to crawl any website. This includes websites that impose CAPTCHAs, IP Blacklisting and all other Anti-Bot measures. Stop spending your time for server setup and maintenance tasks. Web scraping (also termed web data extraction, screen scraping, or web harvesting) is a technique of extracting data from the websites. It turns unstructured data into structured data that can be stored into your local computer or a database. It can be difficult to build a web scraper for people who don’t know anything about coding.
If you have never used ParseHub before, do not fret. It is actually quite easy to use while still being incredibly powerful.
In basic terms, ParseHub works by loading the website you’d like to scrape and letting you click on the specific data you want to extract.
Taking it a step further, you can also instruct ParseHub to interact or click on specific elements of the pages in order to browse to other pages with more data in them. That means you can make ParseHub click through to navigate through multiple pages.
Read more: How to use ParseHub to scrape data from any website into an Excel spreadsheet
Scraping Multiple Pages on a Website
A Website’s pagination (or the lack thereof) can come in many different ways. Let’s break down how to deal with any of these scenarios while scraping data.
Clicking on the “Next Page” Button
This is probably the most common scenario you will find when scraping multiple pages of data. Here’s how to deal with it:
- In ParseHub, click on the PLUS(+) sign next to your page selection and choose the Select command.
- Using the select command, click on the “Next Page” link (usually at the bottom of the page you’re scraping). Rename your new selection to NextPage.
- Expand your NextPage selection by using the icon next to it and delete both Extract commands under it.
- Using the PLUS(+) sign next to your NextPage selection, choose the Click command.
- A pop-up will appear asking you if this a next page link. Click on “Yes” and enter the number of times you’d like to repeat the process of clicking on this button. (If you want to scrape 5 pages of data total, you’d enter 4 repeats).
No “Next Button”
Sometimes, there might be no next page link for pagination. In these cases, there might just be links to the specific page numbers such as the image below.
Here’s how to navigate through these with ParseHub:
- In ParseHub, click on the PLUS (+) sign next to your page selection and click on the current page number (In this case, page 1). Rename your selection to CurrentPage.
- Click on the PLUS (+) sign next to the CurrentPage selection and add a Relative Select command.
- Using the Relative Select command, click on the current page number and then on the next page number. An arrow will appear to show the connection you’re creating. Rename this selection to NextPage.
- Now, use the PLUS (+) sign next to the NextPage selection to add a Click Command.
- A pop-up will appear asking you if this a “Next Page” link. Click on “Yes” and enter the number of times you’d like to repeat this process (If you want to scrape 5 pages of data total, you’d enter 4 repeats).
- ParseHub will now load the next page of results. Scroll all the way down and check that the NextPage Relative Selection you created is now selecting Page 3 instead of Page 2 again. If it is, then click on Page 2 and then on Page 3 to train ParseHub accordingly.
Other Methods of Scraping Multiple Pages
You might also be interested in scraping multiple pages by searching through a list of keywords or by loading a predetermined list of URLs.
These are tasks that ParseHub can easily tackle as well. Check out Help Center for these guides.
Closing Thoughts
You now know how to scrape multiple pages worth of data from any website.
However, we know that websites come in many different shapes and forms. The methods highlighted in this article might not work for your specific project.
If that’s the case, reach out to us at hello(at)parsehub.com and we’ll be happy to assist you with your project.
Happy Scraping!
Macro Recorder and Diagram Designer
Download FMinerFor Windows:Free Trial 15 Days, Easy to Install and Uninstall Completely
Pro and Basic edition are for Windows, Mac edition just for Mac OS 10. Recommended Pro/Mac edition with full features.
or
FMiner is a software for web scraping, web data extraction, screen scraping, web harvesting, web crawling and web macro support for windows and Mac OS X.
It is an easy to use web data extraction tool that combines best-in-class features with an intuitive visual project design tool, to make your next data mining project a breeze.
Whether faced with routine web scrapping tasks, or highly complex data extraction projects requiring form inputs, proxy server lists, ajax handling and multi-layered multi-table crawls, FMiner is the web scrapping tool for you.
With FMiner, you can quickly master data mining techniques to harvest data from a variety of websites ranging from online product catalogs and real estate classifieds sites to popular search engines and yellow page directories.
Simply select your output file format and record your steps on FMiner as you walk through your data extraction steps on your target web site.
FMiner's powerful visual design tool captures every step and models a process map that interacts with the target site pages to capture the information you've identified.
Using preset selections for data type and your output file, the data elements you've selected are saved in your choice of Excel, CSV or SQL format and parsed to your specifications.
And equally important, if your project requires regular updates, FMiner's integrated scheduling module allows you to define periodic extractions schedules at which point the project will auto-run new or incremental data extracts.
Easy to use, powerful web scraping tool
- Visual design tool
Design a data extraction project with the easy to use visual editor in less than ten minutes.
- No coding required
Use the simple point and click interface to record a scrape project much as you would click through the target site.
- Advanced features
Extract data from hard to crawl Web 2.0 dynamic websites that employ Ajax and Javascript.
- Multiple Crawl Path Navigation Options
Drill through site pages using a combination of link structures, automated form input value entries, drop-down selections or url pattern matching.
- Keyword Input Lists
Upload input values to be used with the target website's web form to automatically query thousands of keywords and submit a form for each keyword.
- Nested Data Elements
Breeze through multilevel nested extractions. Crawl link structures to capture nested product catalogue, search results or directory content.
- Multi-Threaded Crawl
Expedite data extraction with FMiner's multi-browser crawling capability.
- Export Formats
Export harvested records in any number of formats including Excel, CSV, XML/HTML, JSON and popular databases (Oracle, MS SQL, MySQL).
- CAPCHA Tests
Get around target website CAPCHA protection using manual entry or third-party automated decaptcha services.
More Features>>
Webscraperapp Support
Web Scraper Plus
- If you want us build an FMiner project to scrape a website:
Request a Customized Project (Starting at $99), we can make any complex project for you.
Webscraper.net Github
Web Scraper Chrome Extension
This is working very very well. Nice work. Other companies were quoting us $5,000 - $10,000 for such a project. Thanks for your time and help, we truly appreciate it.
--Nick