How To Crawl Coupon Sites With Python

In this post, I will show you how to use Python and LXML to crawl coupons and deals from coupon sites. The purpose of this post is to help users write crawlers with Python.

To demo this, I will crawl coupons from couponannie.com and couponmonk.us.

Example 1

Let us start with couponannie.com first.
Let us first import the following two libraries..
import requests
import lxml.html
Most of the coupon sites have thousands of coupon pages. Most of the times, these pages are per company or site. These pages are structured templates. Therefore when we write a crawler for one coupon page, then it should work for all the coupon pages. In the case of couponannie also, this is the case.
Let us pick the following url couponannie.com/stores/linkfool and extract the coupons and its related information.
url = 'https://www.couponannie.com/stores/linkfool'
We will use requests to get the content of above page as shown below.
obj = requests.get(url)
Let us convert the data in to a form which lxml can understand.
root = lxml.html.fromstring(obj.text)
If you look at the page url, the coupons are presented in the list form as shown in the snapshot below.
In Chrome browser, Right click on this list page and select option "inspect" from the pop up menu. You will see a dialog box open at the bottom or right end of the screen as shown in the snapshot below.
Now in the developers console, you will see the ul elment with id="rectStoreCard". You would notice that all the coupons are present as list elements under the above ul tag.
We can get hold of these li elements as shown below.
len(root.xpath('//ul[@id="rectStoreCard"]/li'))
19
As we see above, there are 19 list or coupons under the ul tag. For this example, let us grab one.
elem = root.xpath('//ul[@id="rectStoreCard"]/li')[0]
Ok now we have the first element. We can extract all the sub-elements of the above element.
Let us first get the coupon description. The description is inside the div element with class="desc" and inside p tag.
elem.xpath('.//div[@class="desc"]/p')[0].text_content().strip()
'Enjoy Up to 25% Off on this Flash Sale'
Ok, let us see how we can extract the coupon code. Extracting coupon code is tricky. If you notice, to get the coupon code, we need to first click the button "Get Code". Then the site shows the coupon code. This functionality has been implemented using Javascript or Jquery. Therefore to click, we would need Selenium. However for this site, there is a easy way too.
If you notice carefully, each coupon item has "see details" section which has class="see-detail-con". We can get the all detials of a coupon as shown below.
elem.xpath('.//div[@class="see-detail-con"]')[0].text_content().strip()
'Find Enjoy Up to 25% Off on this Flash Sale via coupon code “YHNWFL25”. 
Apply this promo code at checkout. Discount automatically applied in cart. Exclusions Apply.'
In the above details, coupon code is also given. Of course, we would need to parse the text to extract the coupon out.

Example 2

Ok, Let us do one more example. In this example I will crawl, couponmonk.us coupon site's coupon page.
Ok, let us crawl through the page https://www.couponmonk.us/coupons-for/quizlet.com/
url1 = 'https://www.couponmonk.us/coupons-for/quizlet.com/'
obj1 = requests.get(url1)
root1 = lxml.html.fromstring(obj1.text)
Let us find out the html element of a coupon listing on the above page. Looks like, this site doesnt have any ul or list element and each coupon item is a div element in the div class="card flex-row flex-wrap"
len(root1.xpath('.//div[@class="card flex-row flex-wrap"]'))
6
As we see above there are 6 elements on this page at the time of writing this code. Let us grab the Ist element.
elem1 = root1.xpath('.//div[@class="card flex-row flex-wrap"]')[0]
The text of coupon is inside the p element of the above div tag. Let us grab text out of this p tag using code below.
elem1.xpath('.//p')
[<Element p at 0x7f516e2ebfb0>, <Element p at 0x7f516e2eb170>]
Ok so there are two p tags. Let us check the content of each p tag.
elem1.xpath('.//p')[0].text_content().strip()
'Practice Questions And More, Get 20% Off With Code! - Try code MOMETRIX30'

elem1.xpath('.//p')[1].text_content().strip()
'Added on 2020-June-30'
Ok the first p tag contains the coupon description and 2nd p tag contains the date when the coupon was added.
Extracting the coupon, however involves clicking the "show coupon" button which requires Selenium but the rest of information can be extracted using the ways that I have shown above.

Wrap Up!

Now, time to wrap up this post. I hope this post have given you enough starting material on writing scrapers using Python and Lxml.

Comments