### Statistical Analysis of Fuel Economy Claims

I’m subtitling this post “don’t believe all the data you
read.”

There’s a lot of stock put in fuel economy tests to verify
changes in drag. Someone makes an aerodynamic change to their car and runs an “A-B”
test, one run (sometimes more) in each of opposing directions measuring
fuel economy, to “prove” that drag has been reduced. These results are then
posted online uncritically; I’ve done it myself in the past.

There are a couple of things wrong with these tests.

Statistics help us evaluate the truth of real-world claims.
One of the foundations of this branch of mathematics is the “normal
distribution”: the discovery that natural systems follow the same pattern of
variability, with a certain percentage of measured results falling within a
certain deviation of the simple average. For example, over the years I’ve owned
my Prius, I have kept track of the fuel economy displayed on the factory gauge
and the calculated MPG from the pump and miles driven. Last year, I put the
difference between displayed and calculated economy by tank in a spreadsheet and plotted
it:

You can see that the data form a nice curve. Most of the
results are in the middle, close to the mean. As you get further from the mean,
above and below, there are fewer data. This is the normal distribution, and it
is characteristic of just about every kind of measurement of, well,

**The Normal Distribution**

The mean was 3.0 MPG, right at the top of the curve. |

*anything*.Yes, anything. |

*one run*may or may not tell you

*anything*about the actual average result because it may be sitting out at one “tail” of the normal distribution, far away from the actual mean. To continue the example above, if I pick a tank at random, the difference between displayed and actual MPG might be 3.0 (exactly at the mean), but it might also be 5.2 or 0.7, and if that is my only datum I will grossly over- or underestimate the actual mean. Note that this is

*unlikely*, but we have no way of knowing with only one datum!

*you need far more than one datum for each configuration*. Statisticians use n = 15 as a minimum number; fewer tests than that requires much more robust, consistent results to assign the same

*confidence*to a claim.

**Confidence**

*confidence level*, which is the acceptable risk that the confidence test will result in an error e.g. the test suggests that adding a wing to your car improved its MPG when in reality it did not (in statistics-speak, this is a case of rejecting the null hypothesis when it was, in fact, true). Confidence level is always set at 0.15 or less, and the standard is 0.05. At a confidence level this small there is very little chance that the test will return an erroneous result, and consequently you can have confidence that your results actually reflect reality.

*you must test your results to ensure that you can make claims about them*. Without confidence testing, you are shooting in the dark. That initial A test may have been wrong because of a passing car or a gust of wind or because the temperature was rising or whatever. Accepting that result uncritically means you will never know if your tests

*actually show what was happening*.

**When Claims Go Wrong**

*null*or default hypothesis (the hypothesis I would assume true for purposes of the test) was that the average MPG without ducts was the same as with ducts, and my alternate hypothesis—what I wanted to know was true or not—was that the average MPG with ducts was greater than without. In mathematical terms,

_{0}: μ

_{without}= μ

_{with}H

_{1}: μ

_{without}< μ

_{with}

*2-sample non-pooled T-test*, a choice based on the number of data I had and the fact that I wanted to compare two sets of data with unknown standard deviation, at a confidence level of α = 0.05. The test compares the mean and standard deviation of the data with the normal distribution to calculate the probability that the averages are close to or far away from the actual, real-life mean; it spits out a probability, in this case p = 0.292. Since that is significantly higher than our α = 0.05, it means that the alternate hypothesis must be rejected since there is

*not significant evidence to conclude that the addition of the ducts improved MPG*. Extraordinary claims require extraordinary evidence, and this evidence did not meet a high enough standard to make the claims I had. Additionally, test results from others online that I had taken as gospel but which displayed similar spread in the data were, in all likelihood, BS. People (myself included) are incredibly adept at seeing what they want to see in data; some objective test

*of test results*is necessary before making claims about aerodynamic changes based on something as variable as fuel economy.

## Comments

## Post a Comment