Friday, March 29, 2013

You can't touch this, Part II

In my previous post, I reviewed the concept of Average and Normalized Power, more as an introduction to some further thoughts about the topic of NP Busters. I also said that this would be a two part discussion, with Part II on the topic of NP Busters. Well I am getting to that but it will actually require three parts, so here continues the discussion on Normalized Power, as another prequel to an NP Buster chat. I will at least introduce what is meant by an NP Buster.

Previously I demonstrated by way of an example of a proposed interval session how average power can be a misleading indicator of metabolic strain, especially when power output is highly variable, and that Normalized Power represents a better means of measuring metabolic strain. Well we don't need to make up theoretical examples, we can turn to real data.

Criteriums versus Time Trials

Let's consider the Normalized and Average power from hard rides of different types but of similar durations. An obvious example would be to compare a time trial with a criterium race.

A TT is typically ridden solo and involves sustaining a high power in a relatively steady state manner, with perhaps some variability if the terrain is not flat or has some technical elements, while a criterium involves substantially variable power outputs as one deals with or dishes out the attacks and surges, the braking and/or coasting into and accelerations out of turns, the inevitable driving of the pace in or to establish a break, and sitting in the slipstream of others when recovering. As rides, they are poles apart.

The following chart (click on it to see a larger version) shows a comparison of the power output over time for a time trial and a criterium race by the same rider, performed within about five weeks of each other and both on relatively flat courses. There are two plots for each race. The lines that jumps up and down are the second by second power data trace, and the two straight horizontal lines are the average power from each race. The time trial (blue) is a little shorter in duration than the criterium (red).

The instantaneous power output is a little hard to follow since it jumps up and down so much, but even so, it's clear that the criterium power line (red) is far more variable than the time trial power line (blue). This is pretty typical. So while both of these races were hard efforts by the same rider and over reasonably similar durations, there was a substantial 40 watt difference in the average power.

On closer inspection we can see a period in the crit race from around the 33-minute mark where power dropped substantially. It happened that the rider had a puncture and "took a lap out" to replace a wheel and rejoin the race (annoyingly as they had established a breakaway prior to that). So we would expect this lower power period would account for some of the lower average power overall, even so, the average power up to that point was 272W, still 25W less than the average power in the time trial.

But let's not forget that time spent not pedalling affects what you can do when you are pedalling, and so that mini break no doubt meant a little freshening up before rejoining the race, and an ability to go a little harder than might have been the case with no recovery.

A good way to gain some insight is to view the power trace after applying a filter to the data, and one simple filter is a rolling 30-second average (i.e. each point on the chart represents the average power for the preceding 30-seconds). Here's the same plot showing the rolling 30-second average power:

The vertical scale is now halved which means variances are amplified. The 30-second rolling average makes it easy to spot differences in the power sustained during sections of a ride. In this example we can readily identify periods during the criterium of sustained harder and easier effort. Likewise, the time trial also shows two brief drops in power output, which correspond to a steep decline on the course with speeds too fast for continued pedalling.

A 30-second rolling average power filter is of particular interest as metabolic responses to changes in effort really start to kick in at around that time frame - many have what we call a "half-life" of around 30-60 seconds. Very brief forays (a handful of seconds) at higher powers are not all that metabolically stressful but sustain the higher power for longer (>20-30 seconds) and it gets ugly, fast. How fast depends on how hard you go.

Hence it's no coincidence the algorithm used to calculated Normalized Power is based (partly) on a rolling 30-second average power filter. There's a couple more important elements to the NP formula than that (although it's not a very complicated formula) but it starts with this 30-second rolling average.

So what was the Normalized Power for these two races? Well here they are plotted on the chart as the two horizontal lines:

In effect, the Normalized Power from each race was the same (OK, one watt different). So even though the races were very different in style, they were both hard and produced a Normalized Power that was more representative of the metabolic strain experienced.

OK, so that's pretty nifty, and is why Normalized Power is a good way to glean from races how your fitness is tracking despite the lack of a formal testing protocol.

It should also be of no surprise there is very little difference between the Average and Normalized Power for the time trial (297W and 299W respectively), since the effort was already relatively steady state, and NP is about providing a steady state power equivalent (hence the name "Normalized").

By definition, Normalized Power will be equal to or greater than Average Power, and the gap between them will depend on the amount of variability there is in the rolling 30-second power, and especially the duration and number of forays at very high power levels.

Using Normalized Power to estimate Functional Threshold Power

Since Normalized Power is providing a steady state power equivalent for longer (dominantly aerobic) durations, then it follows that one can consider NP from hard rides/races of about an hour as one means to estimate FTP.

The well established rule of thumb is for durations of about an hour, Normalized Power will be no more than 5% higher than the maximal quasi-steady state power a rider is truly capable of. Since maximal quasi-steady state power for about an hour is the definition of Functional Threshold Power, then we can simply state:

~1-hour NP <= 105% of FTP

or at least that it will be for the large majority of riders, a large majority of the time.

So if you notice from a hard ride/race of about an hour that NP is > 105% of FTP, then it's quite possible your FTP is higher than you think it is.

Caveats and fruit salad

There are of course caveats to this rule of thumb. I'll go over these as they impact the definition of an NP Buster and can help explain what some perceive to be anomalies when interpreting their own NP numbers.

The duration caveat
Since we are primarily concerned with obtaining a measure of equivalent aerobic metabolic demand/strain, then the duration of any comparison of highly variable versus steady state efforts needs to be sufficiently long to reduce the confounding impacts from individual differences in anaerobic work capacity and neuromuscular power capabilities relative to a rider's aerobic capabilities.

For this reason, NP numbers from rides or parts of a ride of less than 20-minutes duration are not suitable for such comparisons, nor as an indicator of a metabolic steady state power equivalent. I generally take more notice of NP for durations of at least 30-minutes, but it depends on the rider's individual circumstances and capabilities. As the duration of a ride reduces (e.g. down towards 20-minutes), then the difference between NP and a rider's actual maximal steady state power can become somewhat wider.

The circumstantial caveat
There are circumstances where no matter how one rode (steady state or variable), their power output would be somewhat different when compared to another circumstance. Examples of this might be comparing riding on an indoor trainer to an outdoor ride as some people experience a sizeable difference in the power they can sustain indoors versus out.

Another might be comparing long steep hillclimb to flat terrain, or on a road race bike versus an aggressive time trial bike position that might compromise power output for some aerodynamic gains, or really hot day, or at altitude and so on. Another is the use of frequent out of the saddle efforts engaging upper body musculature versus staying in the saddle.

So while Normalized Power enables a comparison of some apples with some oranges, we need to be thoughtful when using it to compare all types of fruit.

The power meter data accuracy caveat
Well it should go without saying that power data needs to be accurate for the interpretation to make sense. While basic accuracy is a factor, there are ways in which data integrity can be compromised even though the individual data points might still be accurate. This mostly concerns the way some power meter head units collect and store data, especially the sampling rate. If the fruit is bad, well no point in trying to use it.

An example of this is/was Garmin's use of "smart recording", which should in current firmware versions be automatically disabled when using a power meter, but it makes sense to ensure it really is disabled. This was also a factor for older model power meters with memory space restrictions, and options to "down-sample" data (e.g. older Powertap head units). You could get away with 2-second sampling (just), but any more than that would compromise data integrity to the extent that the data might not be all that useful.

The software algorithm caveat
While the Normalized Power algorithm is pretty straightforward and in the public domain, not all software (be it commercial desktop software such as WKO+, home designed spreadsheets or websites) produce the same results. There may be a number of reasons for that, e.g. use of an incorrect algorithm (I've seen it many times with people claiming an NP that was incorrectly calculated) or more subtle matters such as how gaps in power data or variable duration time stamps are handled.

So when doing such analyses and/or comparisons, then consider the software you are using as well and validate it is correctly applying the algorithm. Some food processors take the goodness out of the fruit.

So what is an NP Buster?

An NP Buster is a ride that breaks the rule of thumb, or put this way:

~1-hour NP > 105% of FTP

  1. the above caveats are taken into consideration (especially power data accuracy, correct calculation of NP, but also the circumstantial caveats), and
  2. FTP at around the time of the claimed buster ride has been well established using one or all of Andy Coggan's Sins 5, 6 and 7 referenced in this post on establishing Functional Threshold Power, i.e.:
    • using critical power testing and analysis
    • from the power that you can routinely generate during long intervals done in training
    • from the average power during a ~1-hour TT

Such NP Buster rides have occurred, and there are riders who can produce them. They are however rare, and I'll talk more about them in Part III.


calibration said...

just linked this article on my facebook account. it’s a very interesting article for all...

Torque Calibration

Arthur Russell said...

I recently had a NP for a very hard 2 hour race that was the same as my FTP. I found this odd because I thought my FTP was reasonably right. Is there any relationship between 2 hour NP and FTP? Nice blog BTW, really enjoying it.

Alex Simmons said...

Hi Arthur

Keep in mind that NP from a hard hour is typically no more than *105%* of FTP, then it follows that it's feasible for 2-hour NP to still be very close to FTP.

It would certainly be a hard ride but not impossible for some riders. Indeed I have 2-hour NP values at ~97-100% of FTP.