Friday 30 November 2012

Statistics - The Normal Probability Distribution

This post will bring in an application of standard deviation. It can help give us units to measure distances between points in a data set as well as to measure the distance from the mean.

Chebyshev had a theorem. He said that for any set of observations, the minimum proportion of values that lie withing k standard deviations of the mean is 1- (1/k^2), as long as k is greater than 1. If k is 3, 89% of the observations lie withing the region and if k is 4, 94% of observations lie within the region.

For a normal probability distribution we need to use a normal curve, or a bell curve. It has a single peak in the centre of the distribution. This centre point is where the mean equals the median equals the mode. We can now introduce a new concept of z-values. A z-value is the distance between a selected value (Xi) and the population mean, divided by the population standard deviation. Another note on the normal curve is that is has a Kurtosis of 0. A higher Kurtosis means it's peak is higher and more pointy, a lower Kurtosis means it's a flatter shape.

Back to the z scores. They link together the theoretical normal distribution to the observed observations. It tells us how many standard deviations away from the mean an observation lies. So, we need to calculate the z-score. When calculating the z score it is essentially converting your data into a distribution with a mean of 0 and a normal curve shape. The formula is as follows:


Once the score has been calculated, you refer to a z score table. On this table, the first decimal goes down the side and the second decimal goes along the top. So, if using the formula above you were given a z score of 1.24, then you'd look for 1.2 down the side and 0.04 along the top. The score at which these match is your z score. That score is 0.3925. What to do with that score becomes more understandable with an example and some context. 

At a party, lemonade is distributed among the party-goers with a mean of 20cl and a standard deviation of 5cl. What is the likelihood that a person selected at random will get between 17cl and 23cl of lemonade? Right, so we plug in the values to start. z will equal (17 - 20) / 5 = -0.6. It will also equal (23-20) / 5 = 0.6. Now, from here we look for 0.6 down the side of the z score table and 0.00 along the top. We will be given a value of 0.2257. This score caters for one of the results, but as they are both the same size we can double it and get 0.4514. And that's the answer. 45.14% of people get between 17 and 23cl of lemonade, so the likelihood of any given person getting between that amount is 45.14%.

Another use for z scores isn't just finding the amount included, it can be used to find excluded regions too. That may sound quite complex, but I'll show you a picture to visualise it.


We need some context once again to work this out. Let's say that a teacher has said to achieve an A* on a test, students must get in the top 10% of the scores. The mean score for the test was 75 and there was a standard deviation of 5. We can now work out what score is needed to get a A*. The whole region to the right of the mean makes up 50%, we know we want to exclude 10%, so we need to find a z-score that marks 40% - 0.4. On the z-score table 0.4 doesn't appear (remember this time we know the size of the region, so we are looking at the values in the table and looking for a corresponding z score), 0.3997 is the closest so we'll go with that. That gives a z-score if 1.28. Now we need to refer back to the z score formula. We know z, we know s and we know the mean... we are trying to work out Xi. So, we plug in the numbers we have an rearrange to find Xi. 1.28 = (Xi - 75) / 5. Xi - 75 = 6.4. Xi = 81.4. There we have it, the answer. To make it more realistic this score could be rounded to 81 or 82, but one of these scores is needed to achieve the top 10% of the class and therefore get an A*. Simples!

That's it from me, z-scores are a fairly complex topic so feel free to ask any questions if i haven't been entirely clear in the explanation. Good luck!

Sam.


No comments:

Post a Comment