Variance – Measure of Data Dispersion

Calculate variance to understand how data points deviate from the mean. Includes formulas for population and sample vari...
📏

What is Variance?

Variance measures how much the values in a data set deviate from the mean. It is a key concept in probability and statistics that quantifies data dispersion. Variance calculates the average of the squared differences from the mean, providing a measure of the data's 'spread'. A small variance indicates that the data points tend to be very close to the mean (expected value), while a large variance indicates that the data points are spread out over a wider range of values.

SymbolTermDescription
σ²Population VarianceThe true variance of the entire population.
Sample VarianceAn estimate of the population variance, calculated from a sample of data.
μPopulation MeanThe average of all values in the entire population.
Sample MeanThe average of all values in a sample.
xᵢIndividual Data PointA single observation or measurement in the dataset.
NPopulation SizeThe total number of items in the population.
nSample SizeThe number of items in the sample.
n-1Degrees of FreedomUsed in sample variance calculation (Bessel's correction) to provide an unbiased estimate of the population variance.
🔑

Key Formulas

\[ \sigma^2 = \frac{1}{N}\sum_{i=1}^{N} (x_i - \mu)^2 \]
Population Variance
\[ s^2 = \frac{1}{n-1}\sum_{i=1}^{n} (x_i - \bar{x})^2 \]
Sample Variance (Unbiased Estimator)
\[ \sigma^2 = E[X^2] - (E[X])^2 \]
Computational Formula
📊

Visualizing Variance

30 50 20 20 40 15 15 σ² = Σ(xᵢ − x̄)² / n Each red line = deviation from mean (xᵢ − x̄) Variance = average of squared deviations
Variance σ²: each red line shows a point's deviation from the mean — variance averages the squared deviations

Variance does not represent a physical shape. Instead, it can be visualized on a number line or a histogram. A dataset with low variance will have its data points tightly clustered around the mean (μ or x̄). A dataset with high variance will have its points spread far out from the mean, indicating greater variability.

⚖️

Properties of Variance

Variance has several fundamental mathematical properties:

\[ \text{Var}(X) \geq 0 \]
Non-Negativity: Variance is always greater than or equal to zero. It is zero only if all data points are identical.
\[ \text{Var}(aX + b) = a^2 \text{Var}(X) \]
Linear Transformation: Adding a constant (b) does not change the variance. Multiplying by a constant (a) scales the variance by a².
\[ \text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) \]
Sum of Independent Variables: If X and Y are independent random variables, the variance of their sum is the sum of their variances.
💡

Proof of the Computational Formula

The computational formula, Var(X) = E[X²] - (E[X])², is often easier to calculate than the definitional formula. Here is its derivation, starting from the definition of variance where μ = E[X].

\[ \text{Var}(X) = E[(X - \mu)^2] \]
1. Start with the definition of variance.

2. Expand the squared term inside the expectation.

\[ \text{Var}(X) = E[X^2 - 2X\mu + \mu^2] \]

3. Apply the linearity of expectation, which states that E[A + B] = E[A] + E[B].

\[ \text{Var}(X) = E[X^2] - E[2X\mu] + E[\mu^2] \]

4. Since μ and 2 are constants, they can be factored out of the expectation. The expected value of a constant (μ²) is the constant itself.

\[ \text{Var}(X) = E[X^2] - 2\mu E[X] + \mu^2 \]

5. Substitute E[X] with μ.

\[ \text{Var}(X) = E[X^2] - 2\mu(\mu) + \mu^2 = E[X^2] - 2\mu^2 + \mu^2 \]

6. Combine the terms to arrive at the final computational formula.

\[ \text{Var}(X) = E[X^2] - \mu^2 = E[X^2] - (E[X])^2 \]
Q.E.D.
🧮

Worked Example

Given the sample data set {2, 4, 4, 4, 5, 5, 7, 9}, calculate the sample variance (s²).
  1. First, calculate the sample mean (x̄): <br> x̄ = (2 + 4 + 4 + 4 + 5 + 5 + 7 + 9) / 8 = 40 / 8 = 5.
  2. Next, calculate the squared difference from the mean for each data point: <br> (2-5)² = (-3)² = 9 <br> (4-5)² = (-1)² = 1 <br> (4-5)² = (-1)² = 1 <br> (4-5)² = (-1)² = 1 <br> (5-5)² = (0)² = 0 <br> (5-5)² = (0)² = 0 <br> (7-5)² = (2)² = 4 <br> (9-5)² = (4)² = 16
  3. Sum the squared differences: <br> Σ(xᵢ - x̄)² = 9 + 1 + 1 + 1 + 0 + 0 + 4 + 16 = 32.
  4. Finally, divide by the degrees of freedom (n-1), where n=8: <br> s² = 32 / (8 - 1) = 32 / 7 ≈ 4.571.
The sample variance (s²) is approximately 4.571.
🧮

Try It

🚀

Applications

Finance & Investment: Variance is a primary measure of risk. It quantifies the volatility of an asset's price. A high variance means the price can swing dramatically, representing higher risk and higher potential return. It is used in portfolio theory to diversify investments and manage risk.

Manufacturing & Quality Control: In industrial processes, variance is used to measure the consistency of a product. For example, a machine that fills bottles with 500ml of liquid must have very low variance. High variance would mean some bottles are overfilled and others underfilled, indicating a problem in the production line.

Scientific Research: In experiments, variance helps scientists determine if their results are statistically significant. Analysis of Variance (ANOVA) is a statistical method used to compare the means of two or more groups to see if there are any significant differences between them, by analyzing their respective variances.

🌍

Real-World Examples

A coffee shop wants to ensure its espresso shots are consistent. A barista pulls 5 shots and measures their volume in milliliters: {28, 31, 30, 29, 32}. Calculate the sample variance to measure the consistency of the espresso machine.
  1. Calculate the mean volume: x̄ = (28+31+30+29+32) / 5 = 150 / 5 = 30 ml.
  2. Calculate the sum of squared deviations: (28-30)² + (31-30)² + (30-30)² + (29-30)² + (32-30)² = (-2)² + 1² + 0² + (-1)² + 2² = 4 + 1 + 0 + 1 + 4 = 10.
  3. Divide by n-1: s² = 10 / (5 - 1) = 10 / 4 = 2.5 ml².
The sample variance of the espresso shots is 2.5 ml².
An urban planner is studying commute times for two different bus routes. Route A's times for 5 trips are {25, 27, 26, 25, 27} minutes. Route B's times are {15, 35, 20, 30, 25} minutes. Find the variance for each route to determine which is more predictable.
  1. <strong>Route A:</strong> Mean = 26 min. Sum of squared deviations = (25-26)² + (27-26)² + (26-26)² + (25-26)² + (27-26)² = 1+1+0+1+1 = 4. Variance s² = 4 / (5-1) = 1 min².
  2. <strong>Route B:</strong> Mean = 25 min. Sum of squared deviations = (15-25)² + (35-25)² + (20-25)² + (30-25)² + (25-25)² = 100+100+25+25+0 = 250. Variance s² = 250 / (5-1) = 62.5 min².
Route A has a much lower variance (1 min²) than Route B (62.5 min²), making it the more predictable and reliable route in terms of commute time.
🏙️

Variance in the Real World

μ Part Diameter (mm) Low variance = tight tolerance ✓
Manufacturing QC
Precision machining requires low variance in part dimensions. High variance means inconsistent products and costly rework or rejection.
Low var (bonds) High var (stocks) σ² measures investment risk
Investment Risk
Portfolio managers compare variance of returns — bonds have low variance (stable) while growth stocks have high variance (volatile but higher potential).
J F M A M J J A S O N D Monthly Temperatures High σ² = large seasonal swing
Climate Science
A city with high temperature variance has extreme seasonal swings — important for urban planning, agriculture, and energy demand forecasting.

Weather Forecasting
Meteorologists use variance to communicate the uncertainty in their forecasts. A forecast might predict an average temperature of 20°C, but a high variance indicates that the actual temperature could vary significantly, leading to a wider possible range (e.g., 15°C to 25°C).

Sports Analytics
The performance consistency of an athlete is often measured by variance. A basketball player might average 20 points per game, but a low variance in their scoring indicates they are a reliable and consistent performer, whereas a high variance suggests their performance is erratic and unpredictable.

Agriculture
Farmers analyze the variance in crop yield across different fields or with different fertilizers. A low variance in yield suggests that the growing conditions or treatments are uniformly effective, while a high variance might indicate soil quality issues or other problems in specific areas.

🗂️

Types and Classifications

The primary distinction in variance calculation is between a population and a sample.

FeaturePopulation Variance (σ²)Sample Variance (s²)
DataUses every member of a defined group.Uses a subset of the population.
Mean UsedPopulation Mean (μ)Sample Mean (x̄)
DenominatorN (total population size)n-1 (degrees of freedom)
PurposeDescribes the true spread of the entire population.Estimates the spread of the population from which the sample was drawn.

Pooled Variance (s²p): This is a weighted average of variances from two or more independent groups, used when it can be assumed the groups have equal variances. It provides a more precise estimate of the common variance and is used in statistical tests like the two-sample t-test.

Variance Decomposition (ANOVA): In Analysis of Variance, the total variance in a dataset is partitioned into different sources. For example, it might be split into 'between-group' variance (differences between the means of groups) and 'within-group' variance (variability inside each group).

⚠️

Common Mistakes

⚠️ Using the wrong denominator. For a sample, you must divide by n-1 (Bessel's correction) to get an unbiased estimate of the population variance. Dividing by n will consistently underestimate the true variance.
⚠️ Forgetting to square the deviations. The sum of the simple deviations from the mean, Σ(xᵢ - x̄), is always zero. Squaring each deviation ensures that all values are positive and that larger deviations have a greater impact on the final result.
💡 Misinterpreting the units. Variance is measured in the square of the original units (e.g., cm², kg², dollars²). To return to the original units for easier interpretation, you must take the square root of the variance to find the standard deviation.
🚀

Study Strategy

1 🧠 Grasp the Core Concept
  • Review the definition of variance as the average of the squared differences from the mean.
  • Articulate why variance is a measure of spread and how it relates to data consistency.
  • Distinguish between population variance (σ²) and sample variance (s²), focusing on the N vs. n-1 denominator.
  • Explain why deviations are squared: to prevent positive and negative deviations from canceling out and to amplify larger deviations.
2 ✍️ Commit Formulas to Memory
  • Write out the definitional formula for population variance: σ² = Σ(xᵢ - μ)² / N.
  • Write out the corresponding formula for sample variance: s² = Σ(xᵢ - x̄)² / (n-1).
  • Memorize the computational (shortcut) formula for variance, which is often faster for calculations.
  • Practice rearranging the formula to solve for other variables, if applicable in your course.
3 🧮 Solve Step-by-Step Problems
  • Follow the provided 'Worked Example' meticulously, recalculating each step yourself.
  • Solve problems with small datasets by hand, first calculating the mean, then squared deviations, then the average.
  • Use both the definitional and computational formulas on the same dataset to confirm you get the same answer.
  • Identify and correct any of the 'Common Mistakes' in your own practice calculations.
4 🌍 Interpret Real-World Scenarios
  • For each 'Real-World Example' on the page, explain what a high vs. low variance signifies in that context.
  • Find a simple dataset (e.g., grades for two different classes) and calculate the variance for each.
  • Compare the two variances you calculated and write a sentence interpreting which dataset has more variability.
  • Explain the relationship between variance and standard deviation, noting how they both measure spread but in different units.
By systematically moving from concept to calculation and then to context, you will build a robust and lasting understanding of variance.

Frequently Asked Questions

×

×