Gyan Vardhan — December 15, 2020
Data Mining Intermediate Libraries

This article was published as a part of the Data Science Blogathon.

“In this rushing world, Don’t Do hard work, it’s better to do Work Smartly.
This line fits perfectly in the world of technology where you have to act differently to get the results better & faster.”

 

INTRODUCTION

Collection of data from the web is termed as Web Scraping, Web Data Extraction & Web Harvesting. These days everything & everyone needs fuel to run. Data is the most precious fuel to run any organization. Finding the data is good; extracting it even better; doing it using automation is perfect.

 

Get to know the Tool

What is Cypress?

Cypress is an advanced & next-generation front end testing tool built for the modern web.

Cypress is a free, open-source, locally installed Test Runner and a Dashboard Service for recording your tests. It aims to restrict the hurdles that the engineers and developers face while testing web applications based on React and Angular JS.

It is most often compared to Selenium; however, Cypress is different fundamentally and architecturally.

 

Experiential-Session

Performed on Versions

  • Microsoft Visual Studio – 1.52.0
  • Cypress – 6.0.1

Let’s perform web scraping using Cypress. Just check the website for the data you want to scrape and get the list of parent and child HTML tags.

 

Steps to follow to Web Scrape using Cypress

  • Select the Website and the Data
  • Create a Java Script file
  • Set the URL
  • Inspect and get the proper HTML Tags
  • Include the HTML Tags in the code
  • Cross-check the Scraped Data

Step 1- Select the Website and the Data

I select this website “https://www.bullion-rates.com/gold/INR/2007-1-history.htm” and want to scrape data of gold rates along with dates.

cypress web scraping data

Data to be Scraped

 

Step 2- Create a Java Script file

Create a Java Script file & open it into Microsoft Visual Studio, where we start to code for Web scraping.

 

Step 3- Set the URL

Java Script code looks like this to pass the URL.

 

web scraping data java code

Passing URL

 

Step 4- Inspect and get the proper HTML Tags

When you know the HTML tags, it’s quite easy to find them in which your data is present. To see the HTML tags; right-click and select the inspect option.

web scraping cypress HTML tags

Inspecting the HTML Tags

                                              

Proper HTML Tags:-

If you noticed table id is “dtDGrid” and table body is “tbody” under that table row tag “tr” in which our data resides in the “DataRow” tag.

Now, if you want to frame the selector it would be like this

Proper HTML tags

Framing the Selector

                                                                            

In selector framing there is “#” which represents Id & “.” represents the class.

web scraping cypress child html

Child HTML Tag

If you look closely all the data are present under table data tag “td” under DataRow”. So now, I have to iterate through “td” HTML tag to get all the data within it.

Step 6- Include the HTML Tags in the code

Our code will be like this after including the HTML Tags:-

HTML tags

HTML Tag Included in code

 

Step 7- Cross-check the Scraped Data

Code be like this to print the data:-

scrapped data print code

Printing Data Code

Our Scraped Data

In this way, you can cover more child HTML tags to scrape data.

 

Conclusion:

I tried to explain Web Scraping using Cypress in a very simple way. Hope this will help you.

Find full code on this link.

 

If you have any questions about the code or web scraping in general, reach out to me on

Connect with Gyan on Linkedin

We will meet again with something new.

Till then,

Happy Coding..!

About the Author

Our Top Authors

  • Analytics Vidhya
  • Guest Blog
  • Tavish Srivastava
  • Aishwarya Singh
  • Ram Dewani
  • Faizan Shaikh
  • Aniruddha Bhandari

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *