Median – Middle Value in a Dataset

Understand how to find the median, or middle value, of a sorted dataset. Includes examples and formulas.
🔑

Definition

The Median is the middle value of an ordered dataset. It divides the dataset into two equal halves, with 50% of values below and 50% above. Unlike the mean, it is not affected by extreme values (outliers), making it a robust measure of central tendency, especially for skewed distributions.

SymbolDescription
\[ \tilde{x} \]Sample Median - The middle value of a sample dataset.
\[ \eta \]Population Median - The true middle value of an entire population.
\[ n \]The total number of values in the dataset.
\[ x_{(i)} \]Order Statistic - The i-th smallest value in an ordered dataset.
\[ Q_2 \]Second Quartile - Another name for the median, representing the 50th percentile.
\[ P_{50} \]50th Percentile - The value below which 50% of the data falls.
🔢

Key Formulas

\[ \text{For an odd number of values (n): } \text{Median} = x_{\frac{n+1}{2}} \]
Median for Odd-Sized Dataset
\[ \text{For an even number of values (n): } \text{Median} = \frac{x_{\frac{n}{2}} + x_{\frac{n}{2}+1}}{2} \]
Median for Even-Sized Dataset
\[ \text{Where } x_1 \leq x_2 \leq x_3 \leq \ldots \leq x_n \text{ (ordered data)} \]
Data Prerequisite
\[ \text{Median} = L + \frac{\frac{N}{2} - CF}{f} \times h \]
Median for Grouped Frequency Data
📊

Conceptual Diagram

12 18 24 31 38 45 52 60 67 ↑ Median = 38 (middle value) Sorted data — median is the 5th of 9 values
Median: sort all values, pick the middle one — unaffected by extreme outliers unlike the mean

A dataset is conceptually visualized as a sequence of values arranged in ascending order. The median is the value that physically sits in the middle of this sequence. If the sequence has an odd number of values, the median is the single central value. If it has an even number, the median is the average of the two central values, effectively splitting the dataset into two equal halves.

⚖️

Properties

Resistant to Outliers: The median is not affected by extremely large or small values in the dataset, making it a robust measure of central tendency.

Unique Value: For any given dataset, there is one and only one median.

50th Percentile: The median is equivalent to the 50th percentile (P₅₀) and the second quartile (Q₂).

Minimizes Absolute Deviations: The sum of the absolute differences between each data point and the median is minimized. That is, \[ \sum_{i=1}^{n} |x_i - M| \] is at its minimum when M is the median.

Distribution ShapeRelationship
Symmetric (e.g., Normal)Mean = Median = Mode
Positively Skewed (Right Skewed)Mode < Median < Mean
Negatively Skewed (Left Skewed)Mean < Median < Mode
📝

Derivation

The median is found by a procedure rather than derived from a single algebraic formula. The goal is to find the value that splits the ordered data into two halves.

Step 1: Order the Data
Arrange all data points in ascending order, from smallest to largest.

\[ x_{(1)}, x_{(2)}, x_{(3)}, \ldots, x_{(n)} \]

Step 2: Find the Middle Position
Calculate the position of the median using the formula for the rank.

\[ \text{Position} = \frac{n+1}{2} \]

Step 3: Determine the Median Value
Case A: n is odd. The number of data points is odd (e.g., n=7). The position will be a whole number (e.g., (7+1)/2 = 4). The median is the value at this position.

\[ \text{Median} = x_{(\frac{n+1}{2})} \]

Case B: n is even. The number of data points is even (e.g., n=8). The position will be a decimal (e.g., (8+1)/2 = 4.5). This indicates the median lies between two values. The median is the average of the values at the positions immediately below and above the calculated rank (i.e., the values at positions n/2 and n/2 + 1).

\[ \text{Median} = \frac{x_{(\frac{n}{2})} + x_{(\frac{n}{2}+1)}}{2} \]
🧮

Worked Examples

Find the median of the dataset: {9, 3, 15, 7, 12}.
  1. First, arrange the dataset in ascending order: {3, 7, 9, 12, 15}.
  2. Count the number of values, n. Here, n = 5 (an odd number).
  3. Find the position of the median: Position = (n + 1) / 2 = (5 + 1) / 2 = 3.
  4. The median is the value at the 3rd position in the ordered list.
The median is 9.
Find the median of the dataset: {10, 4, 18, 6, 22, 14}.
  1. First, arrange the dataset in ascending order: {4, 6, 10, 14, 18, 22}.
  2. Count the number of values, n. Here, n = 6 (an even number).
  3. The median is the average of the two middle values. The positions are n/2 = 6/2 = 3rd and (n/2)+1 = 4th.
  4. The values at the 3rd and 4th positions are 10 and 14.
  5. Calculate the average of these two values: (10 + 14) / 2 = 24 / 2 = 12.
The median is 12.
🧮

Try It

🚀

Applications

Economics and Real Estate: Median household income and median home prices are standard metrics. They provide a more accurate representation of the typical person's financial situation or the typical house value than the mean, which can be heavily skewed by a few extremely high values (e.g., billionaires or mansions).

Survey Research and Polling: When analyzing survey data on rating scales (e.g., 1 to 5), the median is often preferred. It represents the central response without being distorted by a few extreme opinions at either end of the scale.

Performance Metrics in Technology: In system monitoring, median response time (e.g., for a website to load) is a key performance indicator (KPI). It reflects the typical user experience, ignoring rare but extreme loading times caused by temporary glitches that would skew the average.

Medical and Biological Studies: When studying survival times for patients or reaction times in experiments, distributions are often skewed. The median provides a better measure of the central tendency for these types of data.

🌍

Real-World Examples

A coffee shop tracks the number of customers each hour for one morning. The counts are: 15, 22, 18, 45, 21, 19, 25. What is the median number of customers per hour?
  1. Order the customer counts: {15, 18, 19, 21, 22, 25, 45}.
  2. There are n = 7 data points (an odd number).
  3. The median is the middle value, which is at position (7 + 1) / 2 = 4.
  4. The 4th value in the ordered list is 21.
The median number of customers per hour is 21.
The salaries for six new employees at a startup are: $65,000, $70,000, $72,000, $80,000, $85,000, and $150,000. Find the median salary to represent the typical starting pay.
  1. The salaries are already in order.
  2. There are n = 6 data points (an even number). The $150,000 salary is an outlier.
  3. The median is the average of the two middle values, at positions 6/2 = 3rd and (6/2)+1 = 4th.
  4. The 3rd and 4th salaries are $72,000 and $80,000.
  5. Average them: ($72,000 + $80,000) / 2 = $152,000 / 2 = $76,000.
The median starting salary is $76,000.
🏙️

Real-World Scenarios

180k 220k 250k 280k 920k House Prices median Median = $250k Mean = $370k
House Prices
One luxury mansion skews the mean upward to $370k, but the median stays at $250k — the price a typical buyer actually pays.
Income Distribution median mean Median better for skewed data
Income Reports
Government income reports use the median because a small number of billionaires pulls the mean far above what most households actually earn.
Player Salaries ($M) med One star skews mean to $6.9M
Sports Analytics
In a squad where one star earns $32M, the median salary ($3M) is a fairer measure of what a "typical" player earns than the mean.

City Planning and Housing Policy
Urban planners analyze median home prices to assess housing affordability. This measure gives a realistic view of the market for the average family, helping to shape policies on zoning, development, and subsidies without being distorted by the sale of a few luxury penthouses.

E-commerce Website Analytics
An online retailer monitors the median time visitors spend on a product page. This helps them understand typical customer engagement. Unlike the average time, the median isn't skewed by a few users who leave the tab open for hours, providing a more reliable metric for A/B testing page designs.

Environmental Science
When measuring pollutant levels in a river, scientists often report the median concentration from multiple samples. This prevents a single, anomalous reading—perhaps caused by an instrument error or a sudden local discharge—from misrepresenting the river's overall water quality.

📉

Median in Different Distributions

The relationship between the median, mean, and mode can reveal the skewness of a dataset's distribution. This is crucial for understanding the underlying pattern of the data.

Distribution TypeDescriptionRelationship
SymmetricData is evenly distributed around the center. The histogram is bell-shaped or similarly balanced.\[ \text{Mean} \approx \text{Median} \approx \text{Mode} \]
Positively Skewed (Right Tail)A long tail of high values pulls the mean to the right. Most data is clustered on the left.\[ \text{Mode} < \text{Median} < \text{Mean} \]
Negatively Skewed (Left Tail)A long tail of low values pulls the mean to the left. Most data is clustered on the right.\[ \text{Mean} < \text{Median} < \text{Mode} \]
⚠️

Common Mistakes

⚠️ Forgetting to Order the Data: The most common mistake is calculating the median from an unordered list. The data MUST be sorted in ascending or descending order before finding the middle value.
⚠️ Incorrectly Handling Even-Sized Datasets: When there is an even number of data points, students sometimes pick one of the two middle numbers instead of calculating their average. Remember to find the mean of the two central values.
💡 Confusing the Position with the Value: The formula (n+1)/2 gives you the *position* of the median in the ordered list, not the median itself. Once you find the position, you must go to the dataset to find the value at that location.
🚀

Study Strategy

1 📚 Grasp the Core Concept
  • Review the definition of the median as the middle value that separates a dataset into two equal halves.
  • Use the 'Conceptual Diagram' to visualize how the median physically sits at the 50% point of the data.
  • Compare the median with the mean, focusing on the 'Properties' section to understand why the median is resistant to outliers.
  • Read 'Median in Different Distributions' to see how its position changes in symmetrical versus skewed data.
2 🧠 Memorize the Calculation Rules
  • Commit to memory the rule for an odd number of observations: Median is the ((n+1)/2)th value.
  • Learn the rule for an even number of observations: Median is the average of the (n/2)th and ((n/2)+1)th values.
  • Write out and understand each variable in the formula for grouped data: L + [((n/2) - cf) / f] * h.
  • Create flashcards for each formula variation (odd, even, grouped) and practice reciting them daily.
3 ✏️ Practice with Guided Problems
  • Follow each 'Worked Example' step-by-step, ensuring you correctly sort the data before finding the middle value.
  • Cover the solutions and re-solve the problems yourself to test your recall and application of the formulas.
  • Analyze the 'Common Mistakes' section, focusing on errors like miscounting 'n' or forgetting to average for even datasets.
  • Complete practice problems for both ungrouped and grouped data to solidify your calculation skills.
4 🌍 Apply to Real-World Scenarios
  • Analyze the 'Real-World Examples', such as median income or house prices, and explain why the median is the preferred statistic.
  • Solve the problems in the 'Real-World Scenarios' section, justifying your choice of formula for each case.
  • Find a simple, real dataset (e.g., local temperatures for a month) and calculate the median to interpret its meaning.
  • Consider the 'Related Formulas' section and explain how the median (the 2nd quartile) relates to the overall data spread.
By systematically building from concept to application, you can confidently master the median and its statistical power.

Frequently Asked Questions

×

×