- Got Sheet
- Posts
- Analyzing the Barkley Fall Classic 50K
Analyzing the Barkley Fall Classic 50K
A beginner friendly data analysis
Every September, just in time for my birthday, a race takes place in the woods of the Frozen Head State Park.
The Barkley Fall Classic is a 50K ultra that takes place on and off the trails within the park…and a little bit outside the park too.
It gives mortals like me a small taste of what the real Barkley Marathons (a 100+ mile, 80 hour, world-class nightmare) are all about.
A lot of people sign up for the Fall Classic, and a lot of us end up waiting many months before we get an invite to actually register.
I’ve had a 50K finish in 2021 as well as a marathon finish in 2022.
I just got in for this coming weekend’s race after being on the waitlist since the end of last September…
In the meantime, let’s crunch some data from the results on UltraSignup since the Fall Classic started back in 2014.
Resources
Here’s a video walkthrough of what we’ll touch on below:
Here’s the Google Sheet where you can follow along and/or make a copy to mess with yourself: https://docs.google.com/spreadsheets/d/1NmyFqEldiYcqaHr14RQqKbMMginHDLirvEdXtLDqhUE/edit#gid=1257498594
gif of barkley analysis charts
Warm Up Lap: Let’s Get Started
We need the data. I thought I’d be clever and use IMPORTHTML
or IMPORTXML
to grab all the results. However, this didn’t work because of the structure of UltraSignup’s site.
Side Note: Excel takes some points on this one because its Import Data from Web feature was able to import everything, and would have been the faster route to snag all the results.
At any rate, I’ve got them all now after copying and pasting into one big ol’ sheet.
Each race’s results are broken into two pages: the 50K results and the Marathon results. So I’ve included links to the page URLs I used to get all this.
In my sheet, I also made some additional columns for organization’s sake:
Full Name: concatenated the First and Last name columns.
Year: I added this since it wasn’t included
Race: I added this to specify between 50K, Marathon, DNF 50K, DNF Marathon and DNS…(yes, there are some fun, detailed rules around these differences and even a 30 Miler and DNF 30 Miler in the first year of the race!)
screenshot of race types
And that gives us a total of 3012 rows of race data from 2014 to 2022’s race.
Screenshot of race data
The Aid Stations: What Questions to Answer?
This project was mainly about grabbing, cleaning and displaying some data about the race, but we do need to define what exactly we’d like to find out.
Here’s what I came up with:
List of folks with the most finishes
List of fastest & slowest finishes
States represented with the most finishes
Number of unique starters
Ratio of finishes to starts
List of all the winners
screenshot of all the Barkley Fall Classic 50k winners 2015 - 2022
The Climbs: Let’s Use SQL in Sheets
To sift through our data, now that it’s in a neat spreadsheet, we could use a variety of tools. We could throw down some pivot tables, but instead, I’d like to use the QUERY() function to practice a little SQL.
Query lets us use several of the prominent SQL commands right in our spreadsheet. Mostly, I’ve used the following to wrangle our data and get the answers we’re seeking:
SELECT
WHERE
ORDER BY
GROUP BY
COUNT
LIMIT
As an example, here’s the QUERY statement to retrieve all the winners (Male and Female) from each year:
=QUERY(race_data,"SELECT Col1, Col2, Col4, Col7, Col11, Col12, Col13, Col14 WHERE Col12 = 1 AND Col2='"&N2&"'")
The QUERY takes the race_data
named range from our 🔥data sheet which simply includes all 3012 rows of data.
Then, we write the actual query command enclosed in quotation marks. SELECT tells it which columns we want to display.
You can see from our 🔥data sheet that I numbered the columns for easier reference.
screenshot of columns of runner data
Then we have two conditions defined by the WHERE keyword. Col12 is the DP (Division Place) column. This returns both the top male and top female. AND Col2 is the Race column which we match to the range of N2… which is where I’ve put a dropdown selector to toggle between the 50K or the marathon:
screenshot of dropdown selector
For a couple of the questions, I also used COUNT, GROUP BY, and ORDER BY.
For example, this is how I retrieved the top 10 States and the number of finishes represented by that state. Column 9 was the state column, and COUNT(Col9) counts up all those finishes.
=QUERY(race_data,"SELECT Col9,COUNT(Col9) WHERE Col2='"&N2&"' GROUP BY Col9 Order By COUNT(Col9) desc LIMIT 10", TRUE)
Then I GROUP BY state and ORDER BY the count of the state finishes. Typing desc makes it in descending order (largest to smallest), and LIMIT 10 simply returns only the first 10 results.
screenshot of data and chart
Decision Point: Let’s Make it Pretty
For each analysis, I inserted a chart by selecting Insert → Chart from the menu. Google Sheets gives you a fair amount of options for customizing, and you’ll typically have to adjust things a bit when playing with the chart editor options.
You can double click on each chart to see the options I’ve set in this spreadsheet.
screenshot of chart editor options
Thank you so much!
It means a lot that you’ve read this, and I hope it’s informed and/or entertained you for a few moments today!
Would love to say hi. Here are the best places to find me:
Ways I can help:
View my spreadsheet and creative products on my Gumroad store.
Need a powerful automation tool for Google Sheets? Try Coefficient to automatically import data and sync with your business systems.
Hire an expert to help complete your next Google Sheets, Apps Script or Google Workplace project
Reply