• Got Sheet
  • Posts
  • Web Scraping with Google Sheets (post 3 of 3)

Web Scraping with Google Sheets (post 3 of 3)

Using IMPORTXML to Import Data to Your Google Sheet

This is post 3/3 in the Import Series.

Google Sheets has five built-in functions to import data from various sources into your sheet.

  1. IMPORTRANGE()

  2. IMPORTHTML()

  3. IMPORTDATA()

  4. IMPORTFEED()

  5. IMPORTXML(url, xpath_query)

Each of these pulls data from other sources. This could be another spreadsheet, an RSS feed, or a website. We’ll look at the final function in our list today.

IMPORTXML() is a powerful function that is capable of scraping darn near anything off of a web site and bringing it into your Google Sheet.

Video Walkthrough

IMPORTXML

This takes the URL for a website and something called an “xpath_query” to return various types of data from a website.

screenshot of importxml function

This is the most complicated of the two examples we’ve looked at, as well as the most powerful.

The xpath_query is like writing code. If we want to pull data from a website, we need to know some HTML in order to ask the query to pull over that specific data.

See the cheat sheet below for some examples.

screenshot of xpath cheat sheet

By inspecting a web site with developer tools, we can find information about it in HTML. For instance, on my freeCodeCamp author page here, all the post titles are H2 (Heading Level 2) text blocks.

screenshot of inspecting a webpage

By using the following function, I can quickly pull every title into my spreadsheet. I've copied the url into A11 so that the function is a bit more readable:

=IMPORTXML(A11,"//*[@class='post-card-title']")

This is a very basic example, but as you can see from the partial list above, your queries can get very specific and very complicated.

screenshot of imported data from importxml

Detective Work

Using IMPORTXML requires a bit of sleuth work, at least for me. Figuring out where the data lives on a webpage and then how it's referenced requires some tedious investigation.

And once you've found it on the page, you'll need to write the correct xpath_query to properly pull it into your spreadsheet.

I've got to say, though, it is by far the most powerful of the import functions, and is a whole lot of fun to use once you get the hang of it!

Properly nesting your query statements is important so that you can pull data that falls within certain classes and then within other HTML elements. Like in this example where I pull the href slug from the anchor element inside the 'post-card-title' class:

=IMPORTXML(A11,"//*[@class='post-card-title']//a/@href")

screenshot of another importxml function

I hope this has been a helpful look at how to import data into your Google Sheets. Be sure to check out Part 1 and Part 2 of the series where I cover the other four functions.

Thank you so much!

It means a lot that you’ve read this, and I hope it’s informed and/or entertained you for a few moments today!

Would love to say hi. Here are the best places to find me:

✉️ LinkedIn
📺️ YouTube

Ways I can help:

  • View my spreadsheet and creative products on my Gumroad store.

  • Need a powerful automation tool for Google Sheets? Try Coefficient to automatically import data and sync with your business systems.

  • Hire an expert to help complete your next Google Sheets, Apps Script or Google Workplace project.

Reply

or to participate.