Monday 30 April 2012

Running loops: Did you say recursion?

A favorite blog of mine, Mike the Mad biologist (he is indeed a trained biologist), discusses the difference between how a modeler would view catching a baseball versus an actual outfielder. Here are two ways to approach this (complex) problem:

1. Bottom-up models
2. Top-down heuristics

The bottom-up model approach tries to account for every relevant variable (i.e. wind speed, gravity, ball shape, initial ball speed, flow dynamics affecting mid-air wobbling et cetera), place all of these characteristics into a set of differential calculus equation, then decide where an outfielder should stand in order to catch it (time limit: two seconds). This approach, though complex, is tantalizing because of that feeling of total control for the expected outcome (if it works, you'll get the right answer every time). In theory the 'model method' is purely deterministic but in practice it's weak: The list of significant variables is long, the input equations are massive and small input mistakes can (and do) lead to large errors. Problems arise easily:"Oh no a gust of wind, my calculations are off!"

Compare this scheme to the heuristic approach:
1) Fix your gaze on the ball.
2) Start running and adjust your running speed so your angle of gaze remains constant.
A talented MLB fielder has a greater than 99% success rate in catching an incoming ball. Even team averages for fielding can approach this value. For a model to be competitive, it must predict where the ball will land using initial conditions only (otherwise it's cheating) and to be acceptable the results must show accuracy to within a few centimeters (>99% of the time).

Chances are even the best outfielders haven't taken a single course in differential calculus yet they outperform the most ingenious computer programs on earth. For complex tasks the heuristic approach  wins the day when not all variables are known. Simpler approaches can be more robust when data is missing. "There is no optimization in an uncertain world" (Gigerenzer)

What is the point to this story? It's a prelude to the same problems I see in other sports, like running. Consider the bottom-up modeling approach to training and nutrition. Here is a sample week of suggested training for an advanced 10k runner (as provided by Running Planet), typical in the strict structure found in many racing buildup plans:


 Now here's a sample diet schedule I took from online, also typical of other diet books I've seen (though more cabbage soup than I could personally ever stomach):


Both show similar model-type approaches attempting to control for a complex problem with unknown variables (running conditions and food sources). In theory with enough variables we can 'solve' the trajectory problem of fitness using the bottom-up model technique. A proper exercise plan requires knowledge of both nutrition and exercise. For a truly quantified modeled solution, is there a plan that includes both?

Enter TrainingPeaks (developed in part by Matt Fitzgerald and advertized in his book Racing Weight), which attempts to quantifying this complexity by including workout schedules and calories burned per workout. Here is a sample of their website's online training software package that tracks both your workouts and calorie consumption:


The cost of this service is $110 USD per year, so they want you to have your money's worth. Food consumed is described not only by caloric total, but broken down into carbs, fats, and protein (see pie chart). The workouts are then plotted by intensity versus time (upper right graph), heart rate, workout type, climate of the day, elevation climbed, and personal comments.

These approaches are the products of a continued refinement on the ultimate goal of a bottom-up approach.  But this system is incomplete; there's still more to fitness than just nutrition and exercise.

For a workout, whether quantified by time run or distance (mileage), it's important to include the terrain type run (grass, concrete, dirt, sand, mud, or gravel, or a combination). We also need to know how elevation changes affect your speed. Hence your program has to know when mileage or pace is more important, and whether the tempo/fartlek runs were done on changing surfaces and if perhaps heart rate is a needed instead of GPS pacing. There's more.

Consider a single workout, say a 'simple' 10x400m repeat. What variables must be tracked? Off the top of my head we have

1. Running pace per 400m interval (1500m to marathon pace, or a combination thereof i.e. ladders or or will the last 400 simulate a final lap sprint?)
2. Rest period and type (i.e. 1-5 minute stationary rest versus jogging a 200-400m loop or more)
3. Training week phase (is this before/after another hard training session, easy run, rest day, or long run? Will running fast hurt a mileage buidup?)
4. Season phase (is this workout near final race, beginning of season, or between races?)
5. Damage control/improvisation (if the goal was to run the 400s in 65 seconds but they can't, what new pace will be chosen? or if they feel better than expected, run faster?)
6. Did the athlete do a long warmup or do a long/short cool down, and will they have more strength work immediately after this?
7. What is the athlete's mood? Was this a mentally straining day for the athlete?
8. Is the interval done as a group or solo?
9. Is/was the athlete sick with a cold/flu?, are there any past injuries to consider before running too hard?

10. Will this workout actually make the athlete race faster? Which race in particular, and when?

We have to know each athlete in individually to account for these variables, and how each variable affects the others. Just like in baseball, the only relevant result is actually catching the ball (here running your best race). To compete with heuristics, the model approach must answer each of these questions to a very satisfying degree. And again for the other weekly workouts, plus easy running days. Did you account for the miles you ran to catch a bus, walk to work (and at work too), play a pickup game of soccer? Hundreds of interlocked variables.

Nutrition is also more complex than this cab/fat/protein pie chart. I've heard semi-intelligent people say "in the end a calorie is a calorie". Not true. Ever eat a spoonful of congealed butter then followed separately by a slice of toast? You'll feel sick swallowing a straight spoonful of butter. Go ahead and try it. (I did once; it's really gross). Obviously it's not the same as buttering the bread, and eating them together. Timing also matters, eating all your meals in one sitting versus five smaller ones.

Combination of different food is important, so that will have to be tracked and quantified in our model. Did you have a big meal before a race or a small one? Did you drink liquids with your meal or not? Did you eat the number of calories you supposedly burned but still hungry? Who's correct as you hesitate to eat more: your online calculator or your stomach? The software must also account for important micronutrients, like B vitamins, and minerals (salt, iron, selenium). You will have to be completely honest about every food you eat, for example weighing cheese and crackers at parties, knowing how many calories are in beer (a lot), and your basal metabolic rate: how many calories did you burn on top of your workout calories? (slightly different for every person). 

Daunted yet? I hope so. Quantization of every variable using a bottom-up approach is clearly an insurmountable task. Not only are each of these issues inter-related, they also change with time. Like a gust of wind moving a baseball, you'll have to improvise if something comes up (injury, new race date, or missing a workout due to unexpected life issues). A rigorous plan cannot is not rigorous if you never follow it to a tee, and often damaging if you do. Every slight change is, in effect, cheating: you are using heuristics. Like with the baseball trajectory model, updating the 'model' with new information every moment is effectively predicting nothing. The training log is a useful tool for motivation perhaps, but if nothing is explained, or if nothing can be predicted then a fully quantified approach will never work.

That is my conclusion, but this is not the end, rather the beginning. What I implore is that people abandon attempts of model refinement. The big picture is, and will be, forever incomplete. Heuristics works on results-oriented schemes, and attempts to 'prune' your conscious self to only those variables that have any direct impact on other controllable variables: Simple feelings of hunger probably mean you should eat more regardless of what any algorithm tells you. If your legs hurt, running a 'planned' workout is not necessarily your best option. Feeling good? You might want to run farther, but that depends. Feedback, hence recursion, loops are the bread and butter of heuristics. The good news is almost everyone is already doing these modifications, but unconsciously. Every time you skip a workout because you're ill (or whatever) you are playing the heuristic game. Every snack you take because you are hungry, ditto. All I am attempting to do here is flesh out more of these ideas in the open. Go ahead, shout from your rooftop "My body is smarter than any calculator and I will learn how to use it!"