Skip to main content

How Web Scraping Of Zomato Can Be Done By BeautifulSoup Library In Python?

 How-Web-Scraping-of-Zomato-can-be-done-by-BeautifulSoup-Library-in-Python

Introduction

Web scraping, also known as data scraping, is a kind of data extraction used to gather information from different websites. The software of web scraping uses a web browser or HTTP to access these websites. The software user performs web scraping manually but web scraping is generally known for automated procedures done by bots or by a web crawler. This is a type of process where specific data from the websites and the internet are copied and stored into a local dataset or spreadsheet to retrieve the data later.

Here, we will use Zomato data scraper to gather information on the best restaurants in Bengaluru, India. HTML website pages will be used in accessing and reading the information.

Scraping the Website Content

Scraping-the-Website-Content

The web address is typed in the browser and the HTTP request is made to visit the webpage. If a request is successfully completed, the web page will be displayed by the browser otherwise or it will show an error. The same kind of request is made for accessing a Zomato web page.

Some of the tools that are available with us help us use Python to access a web page.

import requests
from bs4 import BeautifulSoup

Let us understand the uses of libraries before using them as well as functions in accessing a web page.

Making a Request

Making-a-Request

It is created for humans who are dependent on the language. It eliminates the need of adding query strings manually to the URLs or encrypting the post data. The Requests allow you to use Python in sending requests of HTTP/1.1. You can use simple Python libraries to add material like headers, multipart files, form data, and arguments. Similarly, Python's response data can be retrieved.

BeautifulSoup (BS4)

BeautifulSoup

BeautifulSoup4 is a package of Python for data extraction from XML and HTML files. It integrates with your preferred parser to offer navigation, search, and modification of a parse tree. This is normal for programmers to save hours or even days of effort.

After knowing the tools, we shall now try to access the web page of Zomato.

The data of the best hotels on Zomato has now been put in the variable. However, it is not in the readable format for everyone except computer scientists. Let's see the uses of scraped data.

Here, we are looking for the name of restaurant, address of a restaurant, and the category of cuisine. To start looking for all these characteristics, we need to locate the HTML elements that contain this data.

By looking at the BeautifulSoup material mentioned above, or by using a review on your Web Browser called Chrome to check which tag holds the gathering of the best restaurants, as well as additional tags with more information.

top_rest = soup.find_all("div",attrs={"class": "bb0 collections-grid col-l-16"})
list_tr = top_rest[0].find_all("div",attrs={"class": "col-s-8 col-l-1by3"})

The preceding code will look for any div HTML tags with the class="col-s-8 col-l-1by3" and return data for collecting lists of hotels. We need to use a loop for accessing the list items, i.e., a restaurant information at a time, for extracting additional information using loop.

list_rest =[]
for tr in list_tr:
    dataframe ={}
    dataframe["rest_name"] = (tr.find("div",attrs={"class": "res_title zblack bold nowrap"})).text.replace('\n', ' ')
    dataframe["rest_address"] = (tr.find("div",attrs={"class": "nowrap grey-text fontsize5 ttupper"})).text.replace('\n', ' ')
    dataframe["cuisine_type"] = (tr.find("div",attrs={"class":"nowrap grey-text"})).text.replace('\n', ' ')
    list_rest.append(dataframe)
list_rest 

The tr variable in the preceding code holds various details about the hotel, such as its name, cuisine, address, prices, reviews, and menu. Each piece of information is saved in its particular tag, which can be identified by looking at the tr called each item’s data.

Before looking for tags in the HTML, we should take a look at how the restaurant's menu appears on the website.

Restaurant's Menu Appears

You can see in the above images that the data required to get scraped is shown in several formats. Returning to HTML content, we have discovered that data is kept within the div tag in the modules defining the kind of formats or fonts used.

The dataframe is developed for collecting necessary information. We go through each detail of data one after another and save it in diverse DataFrame columns. Because HTML data utilizes ‘n’ to split data that cannot be saved in a DataFrame, we will have to employ a few String functions. As a result, we can substitute ‘n’ with “to prevent any issues with space.

Results obtained from the above-mentioned code would be like-

Result
Saving Data in a Readable Format
Saving-Data-in-a-Readable-Format

Presume the situation where you need to deliver data to a person who is not familiar with Python. They will not understand any information. The dataframe data will be saved in a readable format like CSV.

import pandas
df = pandas.DataFrame(list_rest)
df.to_csv("zomato_res.csv",index=False)

The code above will generate the Zomato res CSV file.

Conclusion

In this blog, we have learned to make Requests for accessing a web page from Python and BeautifulSoup4 for extracting HTML data from the available content. Then, the data was formatted in a dataframe and saved in a CSV format.

Looking for Web Scraping Service to scrape Zomato data? Contact Web screen Scraping now! Request for a quote!

Comments

Popular posts from this blog

Why Entrepreneurs Should Use E-Commerce Scrapers?

  For retail shops, the competition has become limited as it comprises other shops near your location. However, online e-commerce stores have similar online stores across the world. So, it’s almost impossible to keep an eye on competitors online amongst thousands worldwide. For retail shops, the competition gets limited as it comprises other shops near your place. However, online stores have very much similar online shops in the world in terms of competition. Relevant news, updates, and information associated to customer preferences help an organization of working accordingly. These information scraps could drive e-commerce ventures to wonderful heights. In that regard, data scraping is important for your business. Using data from an online field is a skill, which can assist e-commerce entrepreneurs in striking gold! Why Web Scraping is Important for E-Commerce Websites? Web data scraping has arose as a vital approach for e-commerce businesses, particularly in providing rich data i...

What Are The Top 10 Advantages Of Amazon Data Scraping?

  Amazon is identified as the world’s biggest Internet retailer as far as total sales, as well as market capitalization, is concerned. This e-commerce platform consists of a huge amount of data, which is important to online businesses. Here in this blog, we will discuss the top 10 reasons why people scrape data from Amazon. Online shoppers are progressively becoming more self-confident in buying their smartphones or laptops online. Today, many shoppers do their online searching on Amazon and avoid search engines like Yahoo or Google altogether. The trustworthy base of Prime members is invaluable for Amazon because they are key to the huge success of this retailer. Although to convert typical online consumers to customers, e-commerce merchants need to use data analytics for optimizing their offerings. Why Do You Require Amazon Scraping? Being a retailer, it’s easy to think about how important data and information Amazon carries: reviews, ratings, products, special deals, news, etc. ...

How to Scrape Glassdoor Job Data using Python & LXML?

  This Blog is related to scraping data of job listing based on location & specific job names. You can extract the job ratings, estimated salary, or go a bit more and extract the jobs established on the number of miles from a specific city. With extraction Glassdoor job, you can discover job lists over an assured time, and identify job placements that are removed &listed to inquire about the job that is in trend. In this blog, we will extract Glassdoor.com, one of the quickest expanding job hiring sites. The extractor will scrape the information of fields for a specific job title in a given location. Below is the listing of Data Fields that we scrape from Glassdoor: Name of Jobs Company Name State (Province) City Salary URL of Jobs Expected Salary Client’s Ratings Company Revenue Company Website Founded Years Industry Company Locations Date of Posted Scraping Logics First, you need to develop the URL to find outcomes from Glassdoor. Meanwhile, we will be scraping lists by j...