A "parabolic" pacing strategy seems to be optimal for middle distance events (from Tucker et al. (2006) c/o Sweat Science) |
So
I got to thinking - is there an optimum way to run a 100 mile event?
Now, I think that this is a very difficult question to answer. Unlike
track events, the terrain and conditions for a 100 miler event are so
variable that judging pacing becomes a complex interplay between many
different factors throughout the race; weather conditions, ground
conditions, elevation change, fueling, hydration, navigation, gear
choice - there is just much more to consider, and much more that can go
wrong. Even ignoring the variable effects like weather, the elevation profile and
terrain is so different between any two races, coming
up with a simple strategy that would apply to all races is likely
impossible. The optimal pacing
strategy for UTMB (31,496 ft of ascent) is unlikely to be the same as
that for the Thames Path 100 (2,100 ft of ascent) for instance.
However, if we focus on a single route which is (largely) unchanging, we can get a good feeling for the optimal strategy for this particular race (if such a thing exists). Given the fact that the Western States Endurance Run, one of the most well known 100 mile events in the world, is less than a week away, I thought that this would be the perfect opportunity to have a play and see if I can work out the scientifically optimal strategy for winning this amazing event. This is real science people, and when has science ever been wrong?
Now I don't claim that if you follow this plan that you will win this year's Western States. You might trip over or something, and I'm not responsible for that. However, it would be interesting to know if these results are in any way applicable to racing. So if you do apply these results to your racing decisions, I would be really interested to hear how it went (I eagerly await Kilian Jornet's email, thanking me personally for helping him to beat Geoff Roes' 2010 record).
Anyway, the Western States Endurance Run was established in 1977, so there is 34 years worth of data available should anybody wish to look at it. Being the incredibly cool guy that I am, I have taken it upon myself to mine these data in my spare time and have a play with them to see if I can see any trends which might be applicable to racing. Sometimes I feel like I should have a long hard look at my life...
First a bit of boring stuff (okay, more boring than usual). This is (briefly) what I have done. If you're not interested, just skip forward to the results below, or even go straight to the conclusions which are a bit more concise.
Methods:
First of all, I downloaded the data from 1986 to 2011 from the Western States website. Why only from 1986? Because before that only the finishing times are available, whereas I am interested in the aid station splits to get a feeling for the runners' changing pace throughout the course. I then went through the tedious process of manually curating these data so that they were all in the same format. Next, I estimated any missing data, simply by taking the average of the preceding and proceeding checkpoint times. This is obviously not ideal, but is better than having no data at these points. I then worked out the speed between each checkpoint, giving an estimate of the runners' speeds at several set markers along the 100 mile route. Some years, the race route was changed slightly, so I had to find a way to combine these data to be analysed as a whole. I used a process known as loess local regression to fit a line of best fit to these values, and used this model to get an estimate for the runners' speed every mile. Obviously these values are purely estimates between the checkpoints and do not take into account the elevation profile changes in the intermediate sections, but they give a rough idea of how fast you should be covering between aid stations. We can now combine all of the data together into a single data set, even for years where the route was changed.
For the analysis, I took the top 50 runners from each year (chosen to represent a subgroup of runners likely to actually be racing for a position, rather than running to finish) and used a method known as Kmeans clustering (based on the Pearson correlation between profiles) to split the race profiles into 25 groups. Essentially, what this method does is to partition the runners such that the similarity between the profiles within the group is greater than that of the similarity between the groups themselves. So each group has a general trend that represents a unique pacing strategy. I combined any similar looking groups together, to reduce the total number of profiles to a more manageable number, by looking at the correlation of the average profiles within the groups and combining any groups with correlation greater than 0.85 (85 % similar to one another). This method of over-splitting then recombining tended to result in the most intuitive separation of classes when looking by eye (rather than say just using 9 groups with the Kmeans clustering to begin with). Then we can look to see if the finishing times of runners with a particular strategy are significantly faster than other runners. Simple!
Unfortunately, as these data were originally entered by hand, there are some mistakes in there. I have caught a few whilst curating the data, but there are likely many more errors that I have not been able to pick up. Also, I decided to remove the data from some of the races due to extreme conditions (based on the history of the race on the Western States website). These were 1991 (a very cold year), 1993 (a very hot year but with lots of snow in the mountains), 1995 (a snow year), 2006 (very, very hot) and 2008 (cancelled due to wild fires). Other than those, I have included all data from 1986 to the present day.
There are various things that I could do to improve these analyses, but this should at least give us a taste of things. Firstly it might be interesting to account for confounding variables such as temperature which are obviously highly correlated with the finishing times in a particular year. Also, I have been playing around with corrections for elevation changes between aid stations to take the changing terrain into account, and normalisation methods to make data comparable between individuals. But I decided to keep things simple as it remains easier to interpret the data and apply it to the real life situation of running this specific course. Also, using correlation to compare the profiles rather than something like the Euclidean distance means that the actual values themselves are not so important anyway. But hopefully you'll forgive me these small inadequacies!
Okay, that's that out of the way. Are you back with us?
Good.
Results:
There is a very good paper by Martin Hoffman et al. (2011) called "Factors related to successful completion of a 161-km ultramarathon" which takes a statistical look at the factors that can relate to the successful completion (or not) of a 100 mile run. This looks at runners from the 2009 Western States and Vermont 100 mile endurance runs. Here are just a few figures to show how participation in the Western States has changed over the years.
This first figure shows the number of participants taking part in the race each year from 1986, and of those how many actually finished the race (blue) vs. how many DNFd (red). Female participants are shown in the top panel and male participants are shown in the bottom panel. The number of participants has not really changed significantly over the years, with the major change seemingly that the proportion of females to males has increased over the years. This is good to see, but it does seem as if women are very under-represented in this race (probably due to a smaller number of female ultra runners in the world in general). The fact that the overall number of participants has not changed over the years is an unfortunate consequence of restrictions put in place by the US Forest Service on the number of runners allowed to enter, with an average of only 369 runners allowed each year over each 5 year period. This is why entry is so difficult, since there are typically many more runners entered into the race for every place available (for 2011, the chances of being selected were 1 in 10). I am a little surprised by the fact that there is no clear trend in DNFs over the years. I naively would have assumed that there would be fewer DNFs in later years compared to earlier years as technology made it easier for people to complete the race, but I hadn't taken into account how damn hardy ultra runners are (not everybody needs a Garmin to run after all). There is perhaps a slight decrease over time, but we can see a clear spike in 2006 which was an incredibly hot year (and the year of Brian Morrison's infamous disqualification mere yards from the finish).
Next up is a figure showing the age of runners who have taken part in the Western States Endurance Run over the years. I would say that there used to be a prevalence of runners of the ages 30-50 in the beginning, but these days there is more of an even spread with the young blood like Killian taking part.
So now we move onto the real question that I was interested in. Can we find a pacing strategy for optimum performance at Western States? Let's have a look.
Male Finishers
Hopefully this figure isn't too off-putting. What we can see here is the individual pacing profiles of the top 50 male finishers from all races arranged into 9 distinct groups. Hopefully you can see that these groups represent individuals who have adopted similar strategies. They have been arranged from top-left to bottom-right based on the average finishing time of all individuals within the group. The colour of the lines is meaningless - it just allows us to distinguish all of the individuals.
So what can we take from this? Well first of all let's take a look at the slowest pacing strategy (Strategy 9). To me, this represents a group of runners who went out too fast at the start of the race, and suffered for it later. So there's a good tip for anybody racing next week - don't go out too fast! But that's hardly new information...
With Strategy 7, we see that runners have had a relatively good good start, but have gradually dropped their pace over the course of the run. I would say that this is what the profile of an average runner would like over 100 miles, resulting in an obvious negative split, particularly if this was the first time running this distance and you didn't know what to expect. Strategy 5 is very similar, but with slightly less of a drop in the final sections of the race, and strategy 6 is again similar but with runners throwing a kick in in the last 5 miles. This could represent runners keeping something back for the end. So this seems to suggest that saving some energy for the final sections will give you a better finishing time, whether this be by not slowing down or by kicking it up towards the end.
Strategies 3 and 4 seem a bit odd, and seem to represent runners who absolutely caned miles 20 to 40, and then ran a relatively steady race towards the finish. It looks as if many of these individuals are from the 2002 and 2004 races, where there was a diversion between Red Star Ridge and Duncan Canyon due to the Star Fire of 2001 (which hit the High Sierra for over three weeks), so this is likely just an error in my analyses. This explains why the average speed is lower for these individuals, as lots of people posted PBs in these years due to the course purportedly being "easier". So another tip is to run in the 2002 or 2004 editions of the race. Aren't you glad you're reading this?
Which brings us to the top two race profiles. I believe that Strategy 2 represents people that run to the course profile, running faster from miles 25 to 40 (downhill from Robinson Flat to Last Chance), then running slower uphill over Michigan Bluff before coming in for a relatively even finish after the river crossing (this is based on my interpretation of the course profile, not from actual experience, but I would be interested to hear if anybody has an experience of these sections and whether this sounds plausible). Strategy 1 is similar, but with less severe changes in pacing through the hills. This less severe change, indicating a more even pacing throughout the course, seems to improve finishing times. This may be due to the fact that they are running to the hills without trashing themselves. To me, this suggests that running the course with an even pace but adjusting for the hills (that is running with an even perceived intensity) is the most economical way to go.
Female Finishers:
Now let's look at the female finishers to see if we see the same general trends.
I think that we can probably ignore Strategy 1 and Strategy 7 and just concentrate on the others. The general trends seem to be very similar to those seen with the male finishers, so I won't go into quite so much detail here to avoid repeating myself too much. Again, we see the runners who go out too fast then drop off in pace in Strategy 5 showing a slower average finish. We see runners who slow down over the course of the run but kick in towards the end in Strategy 6. We see runners running to the course profile in Strategies 2 and 3, and again see that if the changes in speed from the terrain are kept to a minimum then the effect on the finishing time is reduced.
Conclusions:
However, if we focus on a single route which is (largely) unchanging, we can get a good feeling for the optimal strategy for this particular race (if such a thing exists). Given the fact that the Western States Endurance Run, one of the most well known 100 mile events in the world, is less than a week away, I thought that this would be the perfect opportunity to have a play and see if I can work out the scientifically optimal strategy for winning this amazing event. This is real science people, and when has science ever been wrong?
Now I don't claim that if you follow this plan that you will win this year's Western States. You might trip over or something, and I'm not responsible for that. However, it would be interesting to know if these results are in any way applicable to racing. So if you do apply these results to your racing decisions, I would be really interested to hear how it went (I eagerly await Kilian Jornet's email, thanking me personally for helping him to beat Geoff Roes' 2010 record).
Anyway, the Western States Endurance Run was established in 1977, so there is 34 years worth of data available should anybody wish to look at it. Being the incredibly cool guy that I am, I have taken it upon myself to mine these data in my spare time and have a play with them to see if I can see any trends which might be applicable to racing. Sometimes I feel like I should have a long hard look at my life...
First a bit of boring stuff (okay, more boring than usual). This is (briefly) what I have done. If you're not interested, just skip forward to the results below, or even go straight to the conclusions which are a bit more concise.
Methods:
First of all, I downloaded the data from 1986 to 2011 from the Western States website. Why only from 1986? Because before that only the finishing times are available, whereas I am interested in the aid station splits to get a feeling for the runners' changing pace throughout the course. I then went through the tedious process of manually curating these data so that they were all in the same format. Next, I estimated any missing data, simply by taking the average of the preceding and proceeding checkpoint times. This is obviously not ideal, but is better than having no data at these points. I then worked out the speed between each checkpoint, giving an estimate of the runners' speeds at several set markers along the 100 mile route. Some years, the race route was changed slightly, so I had to find a way to combine these data to be analysed as a whole. I used a process known as loess local regression to fit a line of best fit to these values, and used this model to get an estimate for the runners' speed every mile. Obviously these values are purely estimates between the checkpoints and do not take into account the elevation profile changes in the intermediate sections, but they give a rough idea of how fast you should be covering between aid stations. We can now combine all of the data together into a single data set, even for years where the route was changed.
For the analysis, I took the top 50 runners from each year (chosen to represent a subgroup of runners likely to actually be racing for a position, rather than running to finish) and used a method known as Kmeans clustering (based on the Pearson correlation between profiles) to split the race profiles into 25 groups. Essentially, what this method does is to partition the runners such that the similarity between the profiles within the group is greater than that of the similarity between the groups themselves. So each group has a general trend that represents a unique pacing strategy. I combined any similar looking groups together, to reduce the total number of profiles to a more manageable number, by looking at the correlation of the average profiles within the groups and combining any groups with correlation greater than 0.85 (85 % similar to one another). This method of over-splitting then recombining tended to result in the most intuitive separation of classes when looking by eye (rather than say just using 9 groups with the Kmeans clustering to begin with). Then we can look to see if the finishing times of runners with a particular strategy are significantly faster than other runners. Simple!
Unfortunately, as these data were originally entered by hand, there are some mistakes in there. I have caught a few whilst curating the data, but there are likely many more errors that I have not been able to pick up. Also, I decided to remove the data from some of the races due to extreme conditions (based on the history of the race on the Western States website). These were 1991 (a very cold year), 1993 (a very hot year but with lots of snow in the mountains), 1995 (a snow year), 2006 (very, very hot) and 2008 (cancelled due to wild fires). Other than those, I have included all data from 1986 to the present day.
There are various things that I could do to improve these analyses, but this should at least give us a taste of things. Firstly it might be interesting to account for confounding variables such as temperature which are obviously highly correlated with the finishing times in a particular year. Also, I have been playing around with corrections for elevation changes between aid stations to take the changing terrain into account, and normalisation methods to make data comparable between individuals. But I decided to keep things simple as it remains easier to interpret the data and apply it to the real life situation of running this specific course. Also, using correlation to compare the profiles rather than something like the Euclidean distance means that the actual values themselves are not so important anyway. But hopefully you'll forgive me these small inadequacies!
Okay, that's that out of the way. Are you back with us?
Good.
Results:
There is a very good paper by Martin Hoffman et al. (2011) called "Factors related to successful completion of a 161-km ultramarathon" which takes a statistical look at the factors that can relate to the successful completion (or not) of a 100 mile run. This looks at runners from the 2009 Western States and Vermont 100 mile endurance runs. Here are just a few figures to show how participation in the Western States has changed over the years.
This first figure shows the number of participants taking part in the race each year from 1986, and of those how many actually finished the race (blue) vs. how many DNFd (red). Female participants are shown in the top panel and male participants are shown in the bottom panel. The number of participants has not really changed significantly over the years, with the major change seemingly that the proportion of females to males has increased over the years. This is good to see, but it does seem as if women are very under-represented in this race (probably due to a smaller number of female ultra runners in the world in general). The fact that the overall number of participants has not changed over the years is an unfortunate consequence of restrictions put in place by the US Forest Service on the number of runners allowed to enter, with an average of only 369 runners allowed each year over each 5 year period. This is why entry is so difficult, since there are typically many more runners entered into the race for every place available (for 2011, the chances of being selected were 1 in 10). I am a little surprised by the fact that there is no clear trend in DNFs over the years. I naively would have assumed that there would be fewer DNFs in later years compared to earlier years as technology made it easier for people to complete the race, but I hadn't taken into account how damn hardy ultra runners are (not everybody needs a Garmin to run after all). There is perhaps a slight decrease over time, but we can see a clear spike in 2006 which was an incredibly hot year (and the year of Brian Morrison's infamous disqualification mere yards from the finish).
Next
up is a figure showing the distribution of finishing times of runners
over the years (female finishers shown in red, males shown in blue).
Again, we see no real trend to indicate that finishing times have changed
significantly in the last 25 years. There is perhaps a slight increase over the past 10 years in male finishing times, although this could be my eyes playing tricks on me. But even looing at the winning times we do not see much of a trend - Geoff Roes' 2010 15:07:04 record is
not that much faster than Tom Johnson's 1991 15:54:05 record. 1994 was a
hell of a year for Ann Trason, who came in second overall in 17:37:54 in a course record which still stands to this day.
Next up is a figure showing the age of runners who have taken part in the Western States Endurance Run over the years. I would say that there used to be a prevalence of runners of the ages 30-50 in the beginning, but these days there is more of an even spread with the young blood like Killian taking part.
So now we move onto the real question that I was interested in. Can we find a pacing strategy for optimum performance at Western States? Let's have a look.
Male Finishers
Hopefully this figure isn't too off-putting. What we can see here is the individual pacing profiles of the top 50 male finishers from all races arranged into 9 distinct groups. Hopefully you can see that these groups represent individuals who have adopted similar strategies. They have been arranged from top-left to bottom-right based on the average finishing time of all individuals within the group. The colour of the lines is meaningless - it just allows us to distinguish all of the individuals.
So what can we take from this? Well first of all let's take a look at the slowest pacing strategy (Strategy 9). To me, this represents a group of runners who went out too fast at the start of the race, and suffered for it later. So there's a good tip for anybody racing next week - don't go out too fast! But that's hardly new information...
With Strategy 7, we see that runners have had a relatively good good start, but have gradually dropped their pace over the course of the run. I would say that this is what the profile of an average runner would like over 100 miles, resulting in an obvious negative split, particularly if this was the first time running this distance and you didn't know what to expect. Strategy 5 is very similar, but with slightly less of a drop in the final sections of the race, and strategy 6 is again similar but with runners throwing a kick in in the last 5 miles. This could represent runners keeping something back for the end. So this seems to suggest that saving some energy for the final sections will give you a better finishing time, whether this be by not slowing down or by kicking it up towards the end.
Strategies 3 and 4 seem a bit odd, and seem to represent runners who absolutely caned miles 20 to 40, and then ran a relatively steady race towards the finish. It looks as if many of these individuals are from the 2002 and 2004 races, where there was a diversion between Red Star Ridge and Duncan Canyon due to the Star Fire of 2001 (which hit the High Sierra for over three weeks), so this is likely just an error in my analyses. This explains why the average speed is lower for these individuals, as lots of people posted PBs in these years due to the course purportedly being "easier". So another tip is to run in the 2002 or 2004 editions of the race. Aren't you glad you're reading this?
Which brings us to the top two race profiles. I believe that Strategy 2 represents people that run to the course profile, running faster from miles 25 to 40 (downhill from Robinson Flat to Last Chance), then running slower uphill over Michigan Bluff before coming in for a relatively even finish after the river crossing (this is based on my interpretation of the course profile, not from actual experience, but I would be interested to hear if anybody has an experience of these sections and whether this sounds plausible). Strategy 1 is similar, but with less severe changes in pacing through the hills. This less severe change, indicating a more even pacing throughout the course, seems to improve finishing times. This may be due to the fact that they are running to the hills without trashing themselves. To me, this suggests that running the course with an even pace but adjusting for the hills (that is running with an even perceived intensity) is the most economical way to go.
Female Finishers:
Now let's look at the female finishers to see if we see the same general trends.
I think that we can probably ignore Strategy 1 and Strategy 7 and just concentrate on the others. The general trends seem to be very similar to those seen with the male finishers, so I won't go into quite so much detail here to avoid repeating myself too much. Again, we see the runners who go out too fast then drop off in pace in Strategy 5 showing a slower average finish. We see runners who slow down over the course of the run but kick in towards the end in Strategy 6. We see runners running to the course profile in Strategies 2 and 3, and again see that if the changes in speed from the terrain are kept to a minimum then the effect on the finishing time is reduced.
Conclusions:
So what (if anything) has this little exercise shown us? Have we managed to stumble upon the optimum way to run the Western States? Well no, not really. But what we have shown is evidence for the importance of adhering to a few key tips often given to people looking to run in 100 mile races. Perhaps when you race you use these tips already. Perhaps you don't. I'm not saying that you should, but perhaps the experience of others before you can help you to plan your optimum strategy. So here are the things that I think that we can learn from these data:
- Don't go out too fast. This is probably rule numero uno for ultra running. There's a long way to go, and however fresh you're feeling at the start of the race, you sure as hell won't be feeling that fresh later in the race. Keep some reserves for later in the race. There's a great cartoon from UltraRunningGuy (also seen in Bryon Powell's excellent book Relentless Forward Progress) about this. Take a look (unless you're Dave Mackey of course).
- Keeping your pace up later in the race is preferable to holding back for a final kick towards the end of the race. This is something that can only come with experience, as you need to run slow enough that you can keep the pace up in the latter stages of the course, but you should be running hard enough that by the end you have nothing left to give. Unfortunately, that's not a terribly useful piece of advice really, but the point is to not run hard and gradually drop off over time.
- You should try and run as evenly across the entire course as possible. This does not necessarily mean an even pace, but instead an even level of intensity. Through experience, you should learn what level of intensity you can keep up over the entire course, then run to that. If the terrain gets harder slow down, and if the terrain gets easier speed up, but make sure it feels the same throughout. I remember a good quote from someone (it may have been Ian Sharman); "it's not who runs the fastest, it's who slows down the least".
- Don't go too mad on the hills. To me it makes sense to take full advantage of the downhills when you can, but these data seem to suggest otherwise. You want to be keeping the changes in your speed to a minimum. I guess the problem with running the downhills hard is that there is the danger of trashing your legs which can impact the uphills later. On the other hand, if you actively fight against the downhills, you can trash your quads. Again, somewhere in there is a happy medium that allows your engine to just keep ticking over at maximum efficiency without the wheels coming off.
Whether or not there even is an optimum way to run 100 miles is a very difficult question. These results suggest that on average an even pace will be faster than going flat out and fading, but there are always exceptions. Stuart Mills, who favours the "run as fast you can for as long as you can" approach, has a great blog post looking at the difference in pacing strategies for 4 world record 100 mile times from Cavin Woodward (trail; 1975), Don Ritchie (trail; 1977), Oleg Kharitinov (trail; 2002) and Yiannis Kouros (road; 1984). Kharitinov ran an almost perfect even split, Ritchie and Kouros ran fairly even splits but faded slightly in the latter half, whilst Woodward favoured the balls out approach. All 3 strategies resulted in a world record. At the end of the day, we are all very different, and what works for one runner may not work for another. There may be evidence here that on average an even pace is the way to run a faster race, but as Stuart shows, different approaches can often work just as well.
Intuitively I would say that there is definitely another benefit to going out strong, and this is an approach that I often find myself taking. Starting strong allows you to break off from the pack and avoid any kind of congestion on the trail. Finishing strong is a also good way to ensure that you use up everything that you have by the end (and a good sprint finish feels great!). But I think that what these results show is that if you are going to do this it should be kept to a minimum - there is starting strong initially to break away and then there is sprinting off ahead of everybody else for 20 miles and blowing up. So perhaps the parabolic approach does fit into optimising a 100 mile race, but the resolution of these data is too low to capture that.
It will be interesting to see if I can apply any of these approaches and tame my usual tendency to start off too fast in light of these results at the South Downs Way 100 in a couple of weeks. And if anybody happens to read this who runs (or has run) Western States, it would be great to get some feedback on what I have seen and whether or not it holds water.
Well good luck to everybody running Western States - I'll be following along to see how it all goes. But as far as that elusive course record goes? I'm afraid that's down to you guys!
Happy trails!
Edit: Added the post from Stuart Mills as a counter argument.
one of the best blog articles I've read. Cheers Sam. A lot of effort went into that
ReplyDeleteI also really enjoyed following along with your logic. I have to admit I didn't take the time to study your graphs and took your word on most counts as I continued on patiently waiting for you to sum it all up I was pleasantly surprised to see that you mentioned one of my comics to help illustrate a point. I have trouble with the going out too fast thing also(hence the comic). However it is nice to see that going out a little fast is not so bad especially for a 5km which is what I'll be running in this Saturday around the same time they should be starting The Western States 100.
ReplyDeleteThanks for the insightful read,
EJ
Thanks! Yeah, sorry EJ, it is a little on the long side... Hopefully there was something interesting in there for people to take away from it! Good luck with the 5km! I ran a 5km race last night and pretty sure I ran a "parabolic" race. It worked out quite nicely!
ReplyDelete