Skip to main content

How To Extract Alibaba Product Data Using Python And Beautiful Soup?



Now we will see how to Extract Alibaba Product data using Python and BeautifulSoup in a simple and elegant manner.

The purpose of this blog is to start solving many problems by keeping them simple so you will get familiar and get practical results as fast as possible.

Initially, you need to install Python 3. If you haven’t done, then please install Python 3 before you continue.

You can mount Beautiful Soup with:

pip3 install beautifulsoup4

We also require the library's needs soup sieve, lxml, and to catch data, break down to XML, and utilize CSS selectors.

pip3 install requests soupsieve lxml

Once it is installed you need to open the editor and type in:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests

Now go to the Alibaba list page and look over the details we need to get.

now-go-to-the-alibaba-list-page-and-look-over-the-details-we-need-to-get.jpg

Get back to code. Let’s acquire and try that information by imagining we are also a browser like this:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requestsheaders = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.alibaba.com/catalog/power-tools_cid1417?spm=a2700.7699653.scGlobalHomeHeader.548.7bc23e5fdb6651'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')

Save this as scrapeAlibaba.py

If you run it.

python3 scrapeAlibaba.py

You will be able to see the entire HTML side.

you-will-be-able-to-see-the-entire-html-side.jpg

Now, let’s utilize CSS selectors to get the data you require. To ensure that you need to go to Chrome and open the review tool.

We observe all the specific product data contains a class ‘organic-gallery-offer-outter’. We scrape this with the CSS selector ‘. organic-gallery-offer-outter’ effortlessly. Here is the code, let’s see how it will look like:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requestsheaders = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.alibaba.com/catalog/power-tools_cid1417?spm=a2700.7699653.scGlobalHomeHeader.548.7bc23e5fdb6651'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')#print(soup.select('[data-lid]'))
for item in soup.select('.organic-gallery-offer-outter'):
    try:
        print('----------------------------------------')
        print(item) except Exception as e:
        #raise e
        print('')

This will print all the remaining content in every container that clutches the product information.

We can choose the classes inside the given row that holds the information we require.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requestsheaders = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.alibaba.com/catalog/power-tools_cid1417?spm=a2700.7699653.scGlobalHomeHeader.548.7bc23e5fdb6651'response=requests.get(url,headers=headers)soup=BeautifulSoup(response.content,'lxml')#print(soup.select('[data-lid]'))
for item in soup.select('.organic-gallery-offer-outter'):
    try:
        print('----------------------------------------')
        print(item)     print(item.select('.organic-gallery-title__content')[0].get_text().strip())
        print(item.select('.gallery-offer-price')[0].get_text().strip())
        print(item.select('.gallery-offer-minorder')[0].get_text().strip())
        print(item.select('.seb-supplier-review__score')[0].get_text().strip())
        print(item.select('[flasher-type=supplierName]')[0].get_text().strip())
        print(item.select('.seb-img-switcher__imgs img')[0]['src'])
    except Exception as e:
        #raise e
        print('')

Once it is run, it will print all the information.

If you need to use this product and want to scale millions of links, then you will see that your IP is getting blocked by Copy Blogger. In this situation use a revolving proxy service to rotate IPs is necessary. You can use a service like Proxies API to track your calls via millions of inhabited proxies.

If you need to measure the crawling pace or you don’t need to set up your structure, you can easily utilize our Cloud base crawler. So that you can easily crawl millions of URLs at a high pace from crawlers.

If you are looking for Alibaba Product Data Scraping Services, then you can contact Web Screen Scraping for all your queries.

Comments

Popular posts from this blog

How Web Scraping Restaurant Menu Can Be Beneficial To Your Business?

  Customers expect delicious, authentic meals while dining out or purchasing food online. When you provide consumers with foods that are both economical and delicious, you will be able to maintain a steady flow of customers. Everything seems easy in saying rather than doing it. The restaurant industry is the most difficult to break into. With eateries on every corner, you will need a differentiating element to increase sales. You may do this by SWOT analysis of the competitors. You might begin by obtaining such information from a single web source. You can collect your data from several different sources. Some are simple to find, while others are more difficult to find. Doing this manually doing all of this is waste of time and effort. Instead, you can use  Restaurant Data Scraping services  to complete this task. Data scraping is the process of gathering all related information about your competitors from the internet to make the right business decisions. Importance of S...

Is Sports Data Scraping A New Way Of Beating Your Competition ?

  Technical advancements play an enormous role in how businesses are shaping and developing today. The huge amount of available data across the web is unbelievably massive. This data hugely impact different industries. The sports industry, as well as athletics, also come under the industries, which are affected greatly by Big Data. All the accessible data is a wonderful resource, which can benefit this industry in different ways. Scraping sports data could be used for getting a competitive benefit as well as beat competition in different ways. The available Big Data today may help this sports industry, however, it’s meaningless if there’s nobody, who can study the data as well as provide important feedback. Sports data analysis is increasing sales, fan engagement, revenue, as well as probabilities of victory. Thus, the current years had seen some increase in the demands of data analysis in the sports industry. All top sports teams today are having their individual data experts and ...

How to Scrape Glassdoor Job Data using Python & LXML?

  This Blog is related to scraping data of job listing based on location & specific job names. You can extract the job ratings, estimated salary, or go a bit more and extract the jobs established on the number of miles from a specific city. With extraction Glassdoor job, you can discover job lists over an assured time, and identify job placements that are removed &listed to inquire about the job that is in trend. In this blog, we will extract Glassdoor.com, one of the quickest expanding job hiring sites. The extractor will scrape the information of fields for a specific job title in a given location. Below is the listing of Data Fields that we scrape from Glassdoor: Name of Jobs Company Name State (Province) City Salary URL of Jobs Expected Salary Client’s Ratings Company Revenue Company Website Founded Years Industry Company Locations Date of Posted Scraping Logics First, you need to develop the URL to find outcomes from Glassdoor. Meanwhile, we will be scraping lists by j...