Search This Blog

Sunday 12 March 2017

What is Variance?

The variance (σ2) is defined as a measure of how far each value in the data set is from the mean. Here is how it is defined:
  1. Subtract the mean from each value in the data. This gives you a measure of the distance of each value from the mean.
  2. Square each of these distances (so that they are all positive values), and add all of the squares together.
  3. Divide the sum of the squares by the number of values in the data set.

I guess all of you are familiar enough with what are average (mean), variance and standard deviation. Averages, variance and standard deviation are the three most basic in statistics. This post is more about how to teach variance. Let’s say there are 8 test scores, the average is 46 and, the variance is 16. 

What if each test scores are doubled? Average? Sure, still easy. It will be 92. How about variance? or the standard deviation? I am not sure how many can answer this question right away.

In order to write the equation that defines the variance, it is simplest to use the summation operator, "Σ". The summation operator is just a shorthand way to write, "Take the sum of a set of numbers."

Data
X1
X2
X3
X4
X5
X6
X7
Value
3
4
9
13
17
22
23

Think of the variable (X) as the measured quantity from your experiment and think of the subscript as indicating the trial number (1-7). To calculate the average, first we have to add up the values from each of the seven trials. Using the summation operator, we will write it like this:


X1 + X2 + X3 + X4 + X5 + X6 + X7

or:

3+ 4 + 9 + 13+ 17 + 22 + 23 


Defining Variance:

Now you know how the summation operator works, you can understand the equation that defines the variance:
The variance (σ2), is defined as the sum of the squared distances of each term in the distribution from the mean (μ), divided by the number of terms in the distribution (N). You take the sum of the squares of the terms in the distribution, and divide by the number of terms in the distribution (N).

How to do the calculation:

    1)      First, add your data points together:
    3 + 4 + 9 + 13 + 17 + 22 + 23 = 91
    next, divides your answer by the number of data: 91 ÷ 7 = 13.
    Sample mean, x̅ = 13.

*You can think of the mean as the "centre-point" of the data. If the data clusters around the mean, variance is low. If it is spread out far from the mean, variance is high.

   2)     Subtract the mean from each of data. Each answer will tells that number's deviation from the mean, or in plain language, how far away it is from the mean.

X{\displaystyle x_{1}} - X̅ = 3 - 13 = -10
X{\displaystyle x_{1}} - X̅ = 4 - 13 = -9
X{\displaystyle x_{1}} - X̅ = 9 - 13 = -4
X{\displaystyle x_{1}} - X̅ = 13 - 13 = 0
X{\displaystyle x_{1}} - X̅ = 17 - 13 = 4
X{\displaystyle x_{1}} - X̅ = 22 - 13 = 9
X{\displaystyle x_{1}} - X̅ = 23 - 13 = 10

   3)   To solve this problem, find the square of each deviation. This will make all the number became positive numbers, so the negative and positive values no longer cancel out.

(-10)2 = 100
(-9)2 = 81
(-4)2 = 16
02 = 0
42 = 16
92 = 81
102 = 100

{\displaystyle ^{2}=1^{2}=1}
   4)    Find the sum of the squared values. Now calculate the entire numerator of the formula ∑(X - x̅)2. The upper-case sigma, “∑”, tells you to sum the value of the following term for each value of. You've already calculated for each value of in your sample, so all you need to do is add the results together.

   100 + 81 + 16 + 0 + 16 + 81 + 100 = 394.

    5)      Divide by n - 1, where n is the number of data points. As it turns out, dividing by “n – 1” instead of “n” gives you a better estimate of variance of the larger population. 
          
There are seven data points in the sample, so n = 7. Variance of the sample σ2=  394 ÷ 6 = 65.67



Data set 1: 3, 4, 4, 5, 6, 8, 10.

Data set 2: 1, 2, 4, 5, 7, 9, 11.


As an example, let's go back to the distributions where we started our discussion with:


What is the variance of each data set above?

First, try to follow the step above to find the variance for results from your experiments or you can construct using a table to calculate the values.
(Answer: Data 1 : 6.24 and Data 2 : 13.29)

*Although both data sets have the same mean (μ = 5), the variance (σ2) of the second data set, 13.29, is a little more than two times the variance of the first data set, 6.24.

It might be so easy to memorize for you, but not for them. Any questions? Post on our comments. We will be happy to answer any statistics problem and will try to help you to solve the problem. 


By: Nur Fariza

References:
  1. http://www.wikihow.com/Calculate-Variance
  2. http://www.sciencebuddies.org/science-fair-projects/projects_data_analysis_variance_std_deviation.shtml
  3. http://www.mathsisfun.com/data/standard-deviation.html




No comments:

Post a Comment