Web Scraping Using RPA Tool UiPath!

Gyan Vardhan 04 Feb, 2021 • 5 min read

This article was published as a part of the Data Science Blogathon.

The World is rapidly moving towards AI, So it’s better to go with the flow. This line represents the adaptation of technology in the real world to get the results better & faster.

INTRODUCTION

Web Scraping, Web Data Extraction & Web Harvesting is the collection of data from the web. These days everything & everyone needs data to run. Data is the most precious gem to run any organization & the most challenging part is to collect or gather quality data. Finding the data is good; extracting it even better; doing it using automation is perfect.

What is UiPath?

UiPath is an RPA Tool. But wait – what is RPA?

What is RPA?

Quoting from UIPath’s site:

Robotic Process Automation is the technology that allows anyone today to configure computer software. Or in other words, it is a “robot” to emulate and integrate the actions of a human interacting within digital systems to execute a business process. RPA robots utilize the user interface to capture data and manipulate applications just like humans do. They interpret, trigger responses, and communicate with other systems to perform a vast variety of repetitive tasks.

Only substantially better: an RPA software robot never sleeps and makes zero mistakes.

Experiential-Session

Performed on Versions

UiPath – 20.4.3

Let’s perform web scraping using UiPath. Just check the website for the data you want to scrape and check the list of parent and child HTML tags for better understanding.

 

Steps to follow to do Web Scrape

  • Select the Website and the Data
  • Create a Project in your desired directory
  • Create a Flowchart file for Web scraping flow design
  • Design the Flow
  • Run the Automation flow
  • Open the Excel file & Cross-check the Scraped Data

 

Step 1- Select the Website and the Data

I selected this website “https://www.bullion-rates.com/gold/INR/2007-1-history.htm” and want to scrape data of gold rates along with dates.

 

Step 2- Create a Project in your desired directory

Provide the Name, path & short description of your project.

 

Step 3- Create a Flowchart file

Now create a flowchart file to design your Web scraping flow.

 

 

Step 3- Design the Flow

a) Choose the open browser from the activity pane

b) Set the Property of Open Browse

i) Choose browser type as Chrome

ii) Set the URL- Insert your URL within quotes here Data To Scrape

iii) Choose New Session as True

iv) Add a delay activity with a duration of 6 sec in the format of 00:00:06, so that the page will load properly, there is another option as well but for now, I’m using the delay option.

 

v) Choose the Data Scraping option

a) Select Element option is coming

b) Select the Next option

                                                                   

c) Now the Element selector highlighter will come, so select the element. Once the elements are selected, you can see the preview data. If the data are coming as expected select the finish button else re-select the data.

d) Now a pop-up box appears which asks for multiple pages scraping, so if you want to do multiple pages scraping then select Yes & choose the element which will redirect you to the next page. In today’s case, I want to scrape one page only so I’m using the No option.

e) Data Scraping activity will appear in the flow design. Select the Extract Structured Data ‘TABLE dtDGrid’ activity & you can notice two things in properties

i) Max number of results default is 100, you can change it as per the records on the page.
ii) In the output section, you can see the Data Table variable is Extract Data Table.

f) Now, we have to write the scraped data in excel format. So we use the Write Range activity.

i) 1st field is for the path of the excel sheet, provide it as per your excel sheet location.
ii) 2nd field is for sheet name & Cell name, provide the sheet name in quotes & remove the cell name. So that it will create the sheet & write the whole data.
iii) The Last field is for a variable name, in my case variable name is ExtractDataTable.

 

Step 4-Source Run the Automation flow

Click on the run option or press ctrl +f6 to run the automation flow.

Step 5- Open the Excel file & Cross-check the Scraped Data

 

Conclusion

I tried to explain Web Scraping using RPA Tool UiPath in a very simple way, Hope this will help you.

Find full code on- GitHub

If you have any questions about the code or web scraping in general, reach out to me on

Connect to Gyan on Linkedin

We will meet again with something new.

Till then,

Happy Coding..!

 

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Gyan Vardhan 04 Feb 2021

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Related Courses