Technical information, news, research, and opinion on avalanches, snow safety, and winter backcountry travel.

Wednesday, November 2, 2011

Knowing, Part II

I was just guessing at numbers and figures, pulling the puzzles apart—Coldplay

( AUTHOR'S NOTE: This is the nine-thousandth post that addresses the general question of Why Is It So Complicated? While teaching often involves simplification, it's important to remember that you can also use complexity to teach. Despite conventional wisdom, complexity is not always the enemy of simple, and simplicity does not always improve understanding. This post is not meant to be a primer on statistics; it is an attempt to use simple statistics to illustrate the complexity of avalanche problems. Finally, this post does not apply to professional avalanche forecasters because they possess a.) a giant mental database of distributional information, b.) detailed information about the current situation, c.) access to high-end computer models, d.) extensive knowledge about the interaction of terrain and weather in the forecast areas. )

The question: why should we avoid speculating about whether or not data from one location can be used to estimate values at another location and simply acknowledge the uncertainty instead?

Do you really want to know? Here's an answer that goes a bit deeper than "because". I'd like to think that this answer goes all the way to the bottom of the rabbit hole, but unfortunately this particular rabbit hole is very deep.

Introduction
When we discuss avalanches, we often talk about observations. When we discuss observations, we are talking about data. Maybe it's wind speed and direction or precipitation intensity. Or maybe it's shear quality and cracking. Either way, we most often relate the data to a specific place, which is a process referred to as spatialisation. Places are described with a frame of reference such as aspect, elevation, or perhaps something as simple as a name.

Now, let's think of the observation as a single sample at a single point in space. Most of us are immediately curious to know whether or not we can use the data to estimate values for another location. With backcountry avalanche forecasting, the answer is usually no, and a fairly simple principle outlines the complexity:

variations = uncertainty

There are variety of words that describe variations, including homogeneity, heterogeneity, variance, invariance, isotropy, and anisotropy, but we'll just stick with variations for the time being.

Examples
Don't worry these figures, and the accompanying text, make my head hurt too.

Figure 1.1. Consider the terrain shown below. It's not really perfectly flat, and its surface is composed of different materials. However, there are some important ways in which variations in the terrain are low: the maximum difference in elevation is small. But even with such a simple shape, the interaction between terrain and weather still produces a chaotic arrangement of snow depths, drifts, crusts, and weak layers.

With that in mind, can we use a data sample from one location applicable to estimate the value of data at another location? Provided you can account for a significant degree of uncertainty, including the randomness inherent to the chaotic/complex system that produced the snowpack, then yes, you can use data from one location to estimate the values elsewhere.


Figure 1.2. The next example shows some mountainous terrain. The variations in the data are fairly obvious: the maximum difference between elevation values is much greater than in Figure 1.1. These variations create other variations such as orientation to the sky and orientation to wind. ( Which is similar to "propogation of uncertainty" in formal statistics. )

Start to imagine how these variations affect our ability to use a value taken at point A to estimate the values at points B and C. We have our frames of reference such as aspect, elevation, and temperature, but these are simply ways of sorting the data into buckets. A frame of reference can simplify how we perceive the data, but a frame of reference does not reduce the frequency of variations, nor their magnitude.


Figure 1.3. Here's a histogram of the elevation values in the second image ( blue ). You'll notice the magnitude and frequency of variation are statistically significant. Remember, the key equation is variations = uncertainty.  If it's of interest to you, the standard deviation is ~210, which is quite a large value for this data set. The histogram for the first image ( red ) has been fit into the same graph. There is much less variation.

Despite the variances, it's important to note that the data set itself is relatively stable. This simply means that, for our purposes, the values in the set aren't going to change very much in our lifetime. This is why experienced avalanche forecasters often say that terrain is the solution to a dirty snowpack. The stability of the data set reduces certain types of uncertainty.


NOTE: I'm going to make an important point about "making things simpler" and about the dangers of using speculation to estimate the value at two locations from a single piece of data.

Figure 1.4. So we've got a complex problem... we need to simply things... right? The thing is, simplification often has incredibly serious side effects, some of which are outlined in this example. This image shows an accurate simplification of the terrain produced with combination of mathematics and computer science called computational geometry.

Accuracy of the simplification aside, if we look at the image and consider the data at each point, it's immediately clear that a lot of data are missing. Missing data doesn't tend to make things easier, and in some cases it can be downright dangerous. Remember, simplification removes data, so while everything in the model is simpler, our picture is far less complete. Can we fill in the gaps?

There are hundreds of ways to accomplish this using everything from basic math to empirical statistical approaches. Do you favour linear interpolation? What about inverse distance weighting? Kriging? Unfortunately, even if the accuracy of the approach is reasonable, so much uncertainty remains that the simplification doesn't really help very much. Here's why:

Very often, simplification helpfully reduces data while unhelpfully introducing novel variations that are difficult to measure ( which increases uncertainty ). Simplification can reduce complexity, but only when implemented with fanatical attention to detail. This requires to understand all the details and side-effects of your simplification in a way that you can quantify to a high degree of accuracy.

Otherwise, you will certainly end up with less data, but new uncertainties will propogate through your "simplified" model. Think of it this way: before, you were uncertain ( but you knew why ) and now you're uncertain ( and you don't know why ). Imagine trying to use your brain to take the value of A and accurately determine the values of B and C. Does it still seem like a good idea?


Figure 1.5. This is a map of the drainage network for the terrain near Crystal Mountain Ski Resort. For the purpose of illustration, pretend for a moment that this is a map of wind directions that accurately depicts wind flow over rough terrain for a single second during a five hour storm. ( A sampling rate of 1:18000, which is laughably low. )

If you want to imagine what this would actually look like during our hypothetical storm, think about each arrow rotating and increasing/decreasing in size. However, in a rather beautiful paradox, even if you could somehow make the wind flow simulation accurate ( which you can't ), you'd still be wildly uncertain about actual snowfall amounts.

Of course, the next step involves adding clusters of snow crystals to the simulation, and suddenly it would be nice to have a supercomputer. This obscene complexity is simply business as usual for complex systems such as weather and its interaction with terrain. This image makes it pretty clear why local snowfall accumulations often have a variance of 1:10. Ten times more accumulation at point A than point B.


Figure 1.6. Don't worry, it gets worse! This is an overhead map of "Cement Basin" near Crystal Mountain Ski Resort, Washington State. This map shows ground cover such as trees in black, and open areas in white. Surface hoar forms best in areas with a clear view of the sky ( white ). How do ground cover variations effect your perception of where surface hoar forms?

What about variations in crystal size? Think about the gray areas where surface hoar crystals are small, but still connected to areas where the crystals are large. Do you think it's possible to figure out a safe route, or are the variations simply too complex? Are you sure of your ability to collect empirical estimates, or would you rather accept the uncertainty ( and deal with it )?


Variations Are a Fact of Life
Can we overcome the uncertainty inherent in data with large variations? Unfortunately, in the context of backcountry avalanche forecasting, the short answer is that we can't, and managing this inherent uncertainty is what professionals refer to as managing the risk. While this sounds vague, managing the risk includes reducing exposure to variation in order to reduce the amount of uncertainty.

And fortunately for us, it's utterly trivial to reduce exposure to variation.

Figure 1.8. This is a map of "Cement Basin" near Crystal Mountain Ski Resort, Washington State. This is a tiny drainage, but it still contains significant variation, and it's certainly a very easy place to get injured or killed in poor conditions. So, whether you're new to the backcountry, or an old dog in search of an easy day, always remember that you can reduce variations by choosing very small slopes. Just don't let the size of a slope lull you into a fall sense of security.


Geostatistics
Empirical solutions to the problems discussed above belong to a domain called geostatistics ( which is one of my professional interests ). I can rattle on about this domain all day long if you wanted, but I still won't be able to give you any clear answers, and I'm pretty sure you don't want me to rattle on all day.

The clear answer is that interpolation of spatial data derived from chaotic/complex systems is extremely tricky business. That's why this problem has been boxed up nicely inside the concept of spatial variability. Yes, there are things you can know: research has established that there is less variability to shear quality than the number of taps applied during a snowpack test. You can also know the general character of new snow amounts or precipitation intensity.

But it's important not to confuse hard data with speculation. Very often, it's easy to take hard data and try to apply it elsewhere without accounting for the uncertainty that comes with natural variations. This is when science becomes speculation, and while speculation isn't inherently wrong or dangerous, there are definitely situations when it can lead you down the garden path to somewhere you don't belong.

And you can be certain of that.

No comments:

Post a Comment