The Median is the middle value of an ordered dataset. It divides the dataset into two equal halves, with 50% of values below and 50% above. Unlike the mean, it is not affected by extreme values (outliers), making it a robust measure of central tendency, especially for skewed distributions.
| Symbol | Description |
|---|---|
| \[ \tilde{x} \] | Sample Median - The middle value of a sample dataset. |
| \[ \eta \] | Population Median - The true middle value of an entire population. |
| \[ n \] | The total number of values in the dataset. |
| \[ x_{(i)} \] | Order Statistic - The i-th smallest value in an ordered dataset. |
| \[ Q_2 \] | Second Quartile - Another name for the median, representing the 50th percentile. |
| \[ P_{50} \] | 50th Percentile - The value below which 50% of the data falls. |
A dataset is conceptually visualized as a sequence of values arranged in ascending order. The median is the value that physically sits in the middle of this sequence. If the sequence has an odd number of values, the median is the single central value. If it has an even number, the median is the average of the two central values, effectively splitting the dataset into two equal halves.
Resistant to Outliers: The median is not affected by extremely large or small values in the dataset, making it a robust measure of central tendency.
Unique Value: For any given dataset, there is one and only one median.
50th Percentile: The median is equivalent to the 50th percentile (P₅₀) and the second quartile (Q₂).
Minimizes Absolute Deviations: The sum of the absolute differences between each data point and the median is minimized. That is, \[ \sum_{i=1}^{n} |x_i - M| \] is at its minimum when M is the median.
| Distribution Shape | Relationship |
|---|---|
| Symmetric (e.g., Normal) | Mean = Median = Mode |
| Positively Skewed (Right Skewed) | Mode < Median < Mean |
| Negatively Skewed (Left Skewed) | Mean < Median < Mode |
The median is found by a procedure rather than derived from a single algebraic formula. The goal is to find the value that splits the ordered data into two halves.
Step 1: Order the Data
Arrange all data points in ascending order, from smallest to largest.
Step 2: Find the Middle Position
Calculate the position of the median using the formula for the rank.
Step 3: Determine the Median Value
Case A: n is odd. The number of data points is odd (e.g., n=7). The position will be a whole number (e.g., (7+1)/2 = 4). The median is the value at this position.
Case B: n is even. The number of data points is even (e.g., n=8). The position will be a decimal (e.g., (8+1)/2 = 4.5). This indicates the median lies between two values. The median is the average of the values at the positions immediately below and above the calculated rank (i.e., the values at positions n/2 and n/2 + 1).
Economics and Real Estate: Median household income and median home prices are standard metrics. They provide a more accurate representation of the typical person's financial situation or the typical house value than the mean, which can be heavily skewed by a few extremely high values (e.g., billionaires or mansions).
Survey Research and Polling: When analyzing survey data on rating scales (e.g., 1 to 5), the median is often preferred. It represents the central response without being distorted by a few extreme opinions at either end of the scale.
Performance Metrics in Technology: In system monitoring, median response time (e.g., for a website to load) is a key performance indicator (KPI). It reflects the typical user experience, ignoring rare but extreme loading times caused by temporary glitches that would skew the average.
Medical and Biological Studies: When studying survival times for patients or reaction times in experiments, distributions are often skewed. The median provides a better measure of the central tendency for these types of data.
City Planning and Housing Policy
Urban planners analyze median home prices to assess housing affordability. This measure gives a realistic view of the market for the average family, helping to shape policies on zoning, development, and subsidies without being distorted by the sale of a few luxury penthouses.
E-commerce Website Analytics
An online retailer monitors the median time visitors spend on a product page. This helps them understand typical customer engagement. Unlike the average time, the median isn't skewed by a few users who leave the tab open for hours, providing a more reliable metric for A/B testing page designs.
Environmental Science
When measuring pollutant levels in a river, scientists often report the median concentration from multiple samples. This prevents a single, anomalous reading—perhaps caused by an instrument error or a sudden local discharge—from misrepresenting the river's overall water quality.
The relationship between the median, mean, and mode can reveal the skewness of a dataset's distribution. This is crucial for understanding the underlying pattern of the data.
| Distribution Type | Description | Relationship |
|---|---|---|
| Symmetric | Data is evenly distributed around the center. The histogram is bell-shaped or similarly balanced. | \[ \text{Mean} \approx \text{Median} \approx \text{Mode} \] |
| Positively Skewed (Right Tail) | A long tail of high values pulls the mean to the right. Most data is clustered on the left. | \[ \text{Mode} < \text{Median} < \text{Mean} \] |
| Negatively Skewed (Left Tail) | A long tail of low values pulls the mean to the left. Most data is clustered on the right. | \[ \text{Mean} < \text{Median} < \text{Mode} \] |
Forgetting to Order the Data: The most common mistake is calculating the median from an unordered list. The data MUST be sorted in ascending or descending order before finding the middle value.
Incorrectly Handling Even-Sized Datasets: When there is an even number of data points, students sometimes pick one of the two middle numbers instead of calculating their average. Remember to find the mean of the two central values.
Confusing the Position with the Value: The formula (n+1)/2 gives you the *position* of the median in the ordered list, not the median itself. Once you find the position, you must go to the dataset to find the value at that location.