Practice
Resources
Contests
Online IDE
New
Free Mock
Events New Scaler
Practice
Improve your coding skills with our resources
Contests
Compete in popular contests with top coders
logo
Events
Attend free live masterclass hosted by top tech professionals
New
Scaler
Explore Offerings by SCALER

Data Analysis

Last Updated: Jan 10, 2022
Go to Problems
Contents

Web Scraping

Web scraping is employed to gather large information from websites. As the objectives of web scraping its applications are like email gathering, price comparisons, job listings, research and development, collecting datasets, etc.

Web scraping is an automatic method to extract large amounts of knowledge from websites. The data on the websites is unstructured. Web scraping is useful to collect such unstructured data and give a structured form to it.

To know whether an internet site allows web scraping or not, you'll check out the website’s “robots.txt” file.

 

Uses of web-scraping:-

  1. Gather datasets for ML
  2. Collect data from responsive web applications
  3. Remote execution of commands on the web
  4. Information retrieval and storage to databases
  5. Ethical hacking and security engineering



Let us see how to extract data from the Flipkart website using Python. We are gonna use Selenium, BeautifulSoup, Pandas

 

!apt update
!apt install chromium-chromedriver
!pip install selenium
from selenium import webdriver
from bs4 import BeautifulSoup as bs
import pandas as pd
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
driver = webdriver.Chrome(options=options)
driver.get("https://www.flipkart.com/search?q=best%20laptops%20under%2080000&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off") #using this line of code we open the mentioned URL.
content = driver.page_source
soup = bs(content)
products=[] #list to store products' name
discounted_prices=[] #list to store the new discounted price
discounts=[] #list to store the discount available
for a in soup.findAll('div', attrs={'class':'_2kHMtA'}):
   name=a.find('div', attrs={'class':'_4rR01T'}) 
#In  Above code the div tag of class:_2kHMtA we are extracting the div tag of class:_4rR01T
   discounted_price=a.find('div', attrs={'class':'_30jeq3 _1_WHN1'})
   discount=a.find('div', attrs={'class':'_3Ay6Sb'})
   products.append(name.text)
   discounted_prices.append(discounted_price.text)
   discounts.append(discount.text)
df = pd.DataFrame({'Product Name':products,'Discounted_price':discounted_prices,'Discounts':discounts})
df.to_csv('products.csv', index=False, encoding='utf-8')
df.head()




Here the code was run in Google colab that’s why we had to configure the webdriver first otherwise its a simple procedure to use the webdriver on the local editor.

Using the above code we have extracted data from the website. The data we are extracting is nested in tags. So, we will find the div tags with those respective class names, extract the data and store the data in a variable.

We can store the extracted data and store them in a csv file using the following code:

df = pd.DataFrame({'Product Name':products,'Price':prices,'Rating':ratings})
df.to_csv('products.csv', index=False, encoding='utf-8')

In the saved CSV file, we can see the product’s name, discounted_price, and the discount on product.

Video Courses
By

View All Courses
Excel at your interview with Masterclasses Know More
Certificate included
What will you Learn?
Free Mock Assessment
Fill up the details for personalised experience.
Phone Number *
OTP will be sent to this number for verification
+1 *
+1
Change Number
Graduation Year *
Graduation Year *
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
*Enter the expected year of graduation if you're student
Current Employer
Company Name
College you graduated from
College/University Name
Job Title
Job Title
Engineering Leadership
Software Development Engineer (Backend)
Software Development Engineer (Frontend)
Software Development Engineer (Full Stack)
Data Scientist
Android Engineer
iOS Engineer
Devops Engineer
Support Engineer
Research Engineer
Engineering Intern
QA Engineer
Co-founder
SDET
Product Manager
Product Designer
Backend Architect
Program Manager
Release Engineer
Security Leadership
Database Administrator
Data Analyst
Data Engineer
Non Coder
Other
Please verify your phone number
Edit
Resend OTP
By clicking on Start Test, I agree to be contacted by Scaler in the future.
Already have an account? Log in
Free Mock Assessment
Instructions from Interviewbit
Start Test