how to summarize a long tail?

Where to Live to Avoid a Natural Disaster
Weather disasters and quakes: who’s most at risk? The analysis below, by Sperling’s Best Places, a publisher of city rankings, is an attempt to assess a combination of those risks in 379 American metro areas … and take(s) into account the relative infrequency of quakes, compared with weather events and floods.

http://graphics8.nytimes.com/images/2011/05/01/weekinreview/01safe/01safe-custom1.gif

I don’t know what exact metric they used here but it seems to be more or less expected value of “disaster points” accumulated during a unit period, for the lack of a better description. Here is the problem with these expected value based metrics, trying to summarize very different distributions: the variance matters! One catastrophic disaster isn’t quite the same as several smaller disasters costing the same number of disaster points. Yet “maximizing” survival using this map, humanity moves to the West Coast then becomes extinct in the next big earthquake. Even imposing a convex utility function on the distribution isn’t entirely satisfying. When it comes to decision making, tail risk is in a different bucket than central risk. Somehow an important aspect of the decision making process isn’t captured by “soft” metrics.

But let’s go back a bit. The reason why distributions are summarized by a single number is often based on the notion of repeated experiments such that the law of large numbers kicks in. For a single sample path, repetition is not relevant for truly rare events, so expected value and other metrics that depend on repetition should be discarded immediately. It isn’t clear what a reasonable replacement is though. If something like a simple survival threshold were more appropriate (let’s say you could survive a two sigma lifetime event and no larger), people certainly don’t behave as such, since there would be no solution: there is always a catastrophic risk, say, of the universe ending. It may be necessary to treat three different severity regimes separately: soft metrics for the “mundane” and multiply occurring central risks, thresholding for the “rare” but “statistically identifiable” tail risks, and disregard for the “never occurred and unknown” acts of god.

No comments yet. Be the first.

Leave a reply