Hacking Competitive Pricing Analysis with Scraping

Ever wanted to find out what price your competitors are charging for all of their products? Perhaps you want to monitor changes to their pricing so that you can adapt quickly or just monitor trends.

Well, I'm going to show you a hack that I've used in the past and continue to use (especially within e-commerce projects) to monitor all of my competitors' pricing, and it's automated.

Not only will you be able to avoid extremely expensive and largely inaccurate 'competitive intelligence' software, but you'll have complete control over what data you want to pull through, when you want it, and there's absolutely no limit to how often you do it. Sounds good, right?

What You'll Need

There are a few things that you're going to need to be able to do this - nothing big or expensive, and all the knowledge you need will be within this post:

  1. Subscription to URL Profiler ($15.95 per month).
  2. Microsoft Excel.
  3. Very basic knowledge of HTML/CSS (I'll walk you through what you need to know).

That's it.

Hacking Competitive Pricing Analysis

Before I get into the technical side of running the competitive pricing analysis, I'm going to quickly outline each of the steps. Don't worry if you don't quite understand some of these steps at this stage, because I'm going to run through them in detail.

You should be able to follow all of this, even if you've never done any coding before - if you have any issues, drop a comment and I'll do my best to answer them. So here's the process on a basic level:

Step 1: Gather Your  Competitors' Product Pages

The first step in this process involves you gathering all of the product page URLs that you want to pull pricing information (and more) from. There are a number of ways to do this, and the reasoning for this will become more apparent as we progress.

There are three main ways to get these URLs:

  1. Pull them in from the sitemap of your competitor's website.
  2. Crawl the site with a tool like Screaming Frog SEO Spider or Deep Crawl.
  3. Scrape them from listing pages on their website (I'm not going to go into this because it's worth a whole post in itself).

The first method is by far the easiest. The only reason why you'd go with option two or three is if your competitor doesn't have a sitemap.

To find your competitor's sitemap, go to Google and type the following query, replacing COMPETITOR-DOMAIN with the domain name of your competitor's website (for example, amazon.com):

site:COMPETITOR-DOMAIN inurl:sitemap OR filetype:xml

Sometimes you'll get a few results here of different sitemaps here. This is because a lot of websites, especially large e-commerce sites, have multiple sitemaps. You'll just need to do a bit of manual digging to find the right one.

ASOS sitemap

You can also just add 'sitemap.xml' to the end of their domain name and that will sometimes do the trick.

Once you find their sitemap, copy the URL and then visit this awesome URL extractor tool by Rob Hammond, so that it just gives you a list of all the URLs within the sitemap without any of the other data. You can then copy and paste this into a new Excel spreadsheet.

XML sitemap extractor

If you can't find their sitemap then you'll want to crawl their website to pull through all of their URLs.

I'm not going to go into this process in detail because it's very straightforward once you download the software to do it. I'd recommend using Screaming Frog SEO Spider for this - there's a free version, too.

All you'll need to do is add your competitor's domain and it will pull in all the URLs from their website. I even put together a full tutorial on using Screaming Frog SEO Spider that you can check out.

Once you have the URLs...

Once you've pulled in all of the URLs from your competitor's website, you'll want to filter down on the product pages. Sometimes this is easy because they have something like /product/ in all of their product page URLs (a lot of Shopify stores have this).

If this is the case, just use a filter in Excel to filter down on any URL containing /products/.

In the case where there's no way that you can distinguish a product page from its URL, don't worry, as you can just process all the URLs that you have and any that aren't product pages will just return blank results.

Like this Hack?

Download my growth hacking ebook with 25 hacks to implement straight away.

Download for Free

Step 2: Identify the Elements to Scrape

The next step in the process is to identify the elements on the product pages that you want to extract (i.e. the price).

This is where a basic knowledge of HTML and CSS comes in really useful, but as I mentioned previously, I'll explain this in a way that even someone with no code knowledge will understand.

First of all, navigate to one of your competitor's product pages. For the purposes of explaining this process, I'm going to use Ebuyer.com as my competitor product page:

Ebuyer.com product page

Here's the URL of the product page above so that you can follow along: http://www.ebuyer.com/712178-apple-macbook-pro-mjlt2b-a

For the competitive pricing analysis, I might want to pull in the following data points for each product:

  1. The product name.
  2. The price.
  3. The product category.
  4. Whether it's in stock.
  5. The product description.

It's worth noting that all of this information has to be visibly present on your competitor's product page for you to be able to extract it.

Let's start with the product name to show how you'd identify the HTML element that contains this information:

  1. Open the product page up in Google Chrome.
  2. Identify where on the page the product name is shown.
  3. Right-click on the product name.
  4. Select 'Inspect' from the menu.
  5. Look at the line of code highlighted in the Developer Tools box that's just popped up at the bottom of your browser.
  6. The HTML element that contains the product name is what we're looking for.

Here's a visual guide to this process:

finding a html element

In the example above, the HTML element was a H1 with the class, 'product-title'.

Now that you have the HTML element, it's time to build the CSS selector or XPath to extract this data.

Step 3: Write the CSS Selector or XPath Query

This step is all about being able to communicate with software in order to tell it where to find the information you want within a webpage.

To do this, we either use a CSS selector or some XPath. I'm not going to get into all the details on what these both are because you don't really need to know that. However, if you want to do some more research on them, check out W3Schools.

For this step we're going to just build slightly on step 2. For this, let's go back to finding the product name element using the 6 steps that I outlined above.

All you need to do is right-click on the line of code shown in the Developer Tools and then select Copy > Copy selector. Here's a visual walkthrough:

copying the CSS selector

This will copy the CSS selector code to your clipboard. Just open up a blank text editor and paste this into it to keep track of it for now. Just make sure you write next to it what data it's related to (i.e. the product name).

In the example above, the CSS selector I'm given is:

#main-content > div > div:nth-child(1) > div.clearfix > div.product-main > div.product-header > div.product-info > h1

This is what you'll need for the next step to identify the elements that we want to extract.

One word of warning is that pulling in CSS selector info like this can sometimes be a little inaccurate (for reasons that I don't have time to explain). If you want to be completely certain that it will work, you can write your own XPath query.

To do this, we just need to know a couple of things about the HTML element that's holding the data we want. The first thing we need to know is what kind of HTML element it is (for example, is it a h1, div, p, a, span, etc.). You'll be able to find this out because it's the first word after the opening <.

In the case of the product name in my example, the full line of code is:

<h1 class="product-title" itemprop="name">Apple MacBook Pro</h1>

In this case, the HTML element will be h1.

The second piece of information that we need is some kind of unique attribute. Within this line of code there are two attributes, the 'class' and the 'itemprop'.

The unique identifier for the class attribute is 'product-title', whilst the unique identifier for the 'itemprop' attribute is 'name'.

This is all we need to identify this specific piece of data. Now we just need to turn this into an XPath query.

Here's the structure of an XPath query with dummy placeholder text (in bold) in the areas we need to add our HTML element, attribute and unique identifier:

//element[@attribute="unique identifier"]

So using this syntax, the XPath for me to pull in the product name from the Ebuyer.com product page would be:

//h1[@class="product-title"]

Or if we use the itemprop as the attribute for identifying it (instead of the class):

//h1[@itemprop="name"]

It's completely up to you which attribute you use.

You'll need to go through this process for each of the elements on the page that you want to extract. Just to give another example, here's the code for the product price on the Ebuyer.com product page:

<span itemprop="price">1888.97</span>

In this instance, the HTML element is 'span', the attribute is 'itemprop' and the unique identifier is 'price'. The XPath for this would be as follows:

//span[@itemprop="price"]

Hopefully you're starting to follow this now.

Step 4: Scrape the Data With URL Profiler

This is where it starts to get fun.

Open up URL Profiler and untick any boxes that may be preselected. To just run a test, add only one of the product page URLs from your competitor into the box on the right. You can literally just copy the URL and then paste it into the box.

add a URL to URL Profiler

Now you'll want to select the 'Custom Scraper' option.

Once you click this, a new box will open. This is where you're going to add in your CSS selectors or XPath queries. You can do up to 10 data points at a time; all you need to ensure is that you select the right data type for each one.

If you've gone down the route of writing your own XPath then you will select the XPath data type, as shown below:

Custom Scraper in URL Profiler

If you used CSS selectors instead, just be sure to select the data type as CSS.

Once you've added in the relevant CSS selector or XPath for each piece of data you want to extract, click 'Apply'. Now all you need to do is click the 'Run Profiler' button and URL Profiler will start doing its thing.

After a short while, you'll get a spreadsheet with a few bits on extra data on the URL and then you'll see all of the values within the columns labelled, 'Data 1', 'Data 2', 'Data 3'...

The extracted data

In the spreadsheet above it shows the two pieces of data that I pulled for the Ebuyer.com product URL (the product name and product price).

All that's left for you to do is add all of the product page URLs into URL Profiler and run it exactly the same way. Instead of just having the data for one URL, you'll have it for all of them - and it only takes a few minutes to process!

Here's what my spreadsheet looked like after processing a larger batch of URLs:

Competitive pricing analysis data

As you can see, I also pulled in data on whether the product was currently in stock and what category the product falls under within the website.

Now tell me this isn't pretty cool!

Step 4: Organise the Data

The fourth and final step to this process is to organise all of the data that you've extracted.

You won't need all of the extra data that URL Profiler pulls in by default (e.g. TLD, HTTP Status, Encoding, etc.) so I'd just delete all of these, leaving only the product URL and the data that you've extracted.

Next, change the column titles (Data 1, Data 2, Data 3 ...) to something more descriptive; for example, 'Price'.

Finally, you'll want to label the spreadsheet with the name of your competitor and the date you extracted the data. You can then create a separate sheet that has all of your competitors' data housed in one place to do full comparative analysis.

To be honest, you can choose what works best for you to display all of this data because it'll vary by project.

As always, if you have any questions, feel free to leave them below and I'll do my best to answer them.

Like this Hack?

Download my growth hacking ebook with 25 hacks to implement straight away.

Download for Free