• Got Sheet
  • Posts
  • Analyzing the Barkley Fall Classic 50K

Analyzing the Barkley Fall Classic 50K

A beginner friendly data analysis

Every September, just in time for my birthday, a race takes place in the woods of the Frozen Head State Park.

The Barkley Fall Classic is a 50K ultra that takes place on and off the trails within the park…and a little bit outside the park too.

It gives mortals like me a small taste of what the real Barkley Marathons (a 100+ mile, 80 hour, world-class nightmare) are all about.

A lot of people sign up for the Fall Classic, and a lot of us end up waiting many months before we get an invite to actually register.

I’ve had a 50K finish in 2021 as well as a marathon finish in 2022.

I just got in for this coming weekend’s race after being on the waitlist since the end of last September…

In the meantime, let’s crunch some data from the results on UltraSignup since the Fall Classic started back in 2014.

Resources

Here’s a video walkthrough of what we’ll touch on below:

Here’s the Google Sheet where you can follow along and/or make a copy to mess with yourself: https://docs.google.com/spreadsheets/d/1NmyFqEldiYcqaHr14RQqKbMMginHDLirvEdXtLDqhUE/edit#gid=1257498594

gif of barkley analysis charts

Warm Up Lap: Let’s Get Started

We need the data. I thought I’d be clever and use IMPORTHTML or IMPORTXML to grab all the results. However, this didn’t work because of the structure of UltraSignup’s site.

Side Note: Excel takes some points on this one because its Import Data from Web feature was able to import everything, and would have been the faster route to snag all the results.

At any rate, I’ve got them all now after copying and pasting into one big ol’ sheet.

Each race’s results are broken into two pages: the 50K results and the Marathon results. So I’ve included links to the page URLs I used to get all this.

In my sheet, I also made some additional columns for organization’s sake:

  1. Full Name: concatenated the First and Last name columns.

  2. Year: I added this since it wasn’t included

  3. Race: I added this to specify between 50K, Marathon, DNF 50K, DNF Marathon and DNS…(yes, there are some fun, detailed rules around these differences and even a 30 Miler and DNF 30 Miler in the first year of the race!)

screenshot of race types

And that gives us a total of 3012 rows of race data from 2014 to 2022’s race.

Screenshot of race data

The Aid Stations: What Questions to Answer?

This project was mainly about grabbing, cleaning and displaying some data about the race, but we do need to define what exactly we’d like to find out.

Here’s what I came up with:

  1. List of folks with the most finishes

  2. List of fastest & slowest finishes

  3. States represented with the most finishes

  4. Number of unique starters

  5. Ratio of finishes to starts

  6. List of all the winners

screenshot of all the Barkley Fall Classic 50k winners 2015 - 2022

The Climbs: Let’s Use SQL in Sheets

To sift through our data, now that it’s in a neat spreadsheet, we could use a variety of tools. We could throw down some pivot tables, but instead, I’d like to use the QUERY() function to practice a little SQL.

Query lets us use several of the prominent SQL commands right in our spreadsheet. Mostly, I’ve used the following to wrangle our data and get the answers we’re seeking:

  • SELECT

  • WHERE

  • ORDER BY

  • GROUP BY

  • COUNT

  • LIMIT

As an example, here’s the QUERY statement to retrieve all the winners (Male and Female) from each year:

=QUERY(race_data,"SELECT Col1, Col2, Col4, Col7, Col11, Col12, Col13, Col14 WHERE Col12 = 1 AND Col2='"&N2&"'")

The QUERY takes the race_data named range from our 🔥data sheet which simply includes all 3012 rows of data.

Then, we write the actual query command enclosed in quotation marks. SELECT tells it which columns we want to display.

You can see from our 🔥data sheet that I numbered the columns for easier reference.

screenshot of columns of runner data

Then we have two conditions defined by the WHERE keyword. Col12 is the DP (Division Place) column. This returns both the top male and top female. AND Col2 is the Race column which we match to the range of N2… which is where I’ve put a dropdown selector to toggle between the 50K or the marathon:

screenshot of dropdown selector

For a couple of the questions, I also used COUNT, GROUP BY, and ORDER BY.

For example, this is how I retrieved the top 10 States and the number of finishes represented by that state. Column 9 was the state column, and COUNT(Col9) counts up all those finishes.

=QUERY(race_data,"SELECT Col9,COUNT(Col9) WHERE Col2='"&N2&"' GROUP BY Col9 Order By COUNT(Col9) desc LIMIT 10", TRUE)

Then I GROUP BY state and ORDER BY the count of the state finishes. Typing desc makes it in descending order (largest to smallest), and LIMIT 10 simply returns only the first 10 results.

screenshot of data and chart

Decision Point: Let’s Make it Pretty

For each analysis, I inserted a chart by selecting Insert → Chart from the menu. Google Sheets gives you a fair amount of options for customizing, and you’ll typically have to adjust things a bit when playing with the chart editor options.

You can double click on each chart to see the options I’ve set in this spreadsheet.

screenshot of chart editor options

Thank you so much!

It means a lot that you’ve read this, and I hope it’s informed and/or entertained you for a few moments today!

Would love to say hi. Here are the best places to find me:

✉️ LinkedIn
📺️ YouTube

Ways I can help:

  • View my spreadsheet and creative products on my Gumroad store.

  • Need a powerful automation tool for Google Sheets? Try Coefficient to automatically import data and sync with your business systems.

  • Hire an expert to help complete your next Google Sheets, Apps Script or Google Workplace project

Reply

or to participate.