Thursday 31 May 2012

A Note on the Standard Deviation


This issue comes up every year, so we may as well deal with it. In our lectures on calculating the Standard Deviation, we report that the denominator is: n – 1. That is, after adding up the deviations of observations from the mean of those observations, you must divide by the denominator of the number of observations minus 1. Many students enrolled in first-year math and/or stats classes like to point out this denominator as an error because it conflicts with information they receive in those courses. It seems that the equation for Standard Deviation that they often encounter in those courses reports the denominator as N, instead of n – 1. In actuality, both equations are correct and the contradiction can be easily resolved by pointing out that the Standard Deviation has different equations, depending on whether it is the Standard Deviation of a POPULATION of observations that is being computed versus a SAMPLE of observations drawn from a larger POPULATION. Imagine that we had all of the high school GPAs of every student beginning their studies at the U of M. We could calculate the Standard Deviation from that Population of scores by using N (the # of GPA scores we have) as the denominator in the equation. However, if we didn’t have the entire population of scores, we could obtain a smaller number of scores drawn from that population (say, 100 scores out of the much larger number of first-year students). In that case, we must compute the Standard Deviation using n – 1 as the denominator. The reason is that our goal is to estimate how much scores vary in the population, based on the information we have from a much smaller sample. Because we are using a sample, rather than the entire population of first-year student GPAs, we can only get an accurate estimate by computing the Standard Deviation with n – 1 as the denominator. You’ll just need to trust us in this one: using n – 1 in the denominator means that the Standard Deviation we compute will come as close as possible to the Standard Deviation of the Population of scores, even though the denominator for computing Standard Deviation for a population of scores is N, rather than n – 1. We report the equation for computing the Standard Deviation of a Sample of scores, instead of a Population of scores, because psychological studies most typically rely on observations obtained from a SAMPLE, and it is really quite rare for such studies to involve scores obtained from an entire POPULATION. It is simply too difficult to get observations from every single member of a POPULATION and it is usually much more trouble than it’s worth. We hope that clarifies things. 

1 comment: