CUME_DIST Window Function
The fraction of rows at or below the current row, in (0, 1]. Different formula from PERCENT_RANK — pick the one that matches your question. Standard tool for top-X-percent reports.
CUME_DIST() — short for "cumulative distribution" — returns the fraction of rows in the partition that are at or below the current row in the ordering. The result is a number greater than 0 and less than or equal to 1.
Concretely: if 3 out of 10 rows in your partition have a value at or below the current row, CUME_DIST returns 0.3. The bottom row gets 1/N; the top row always gets 1.
[PARTITION BY column_list]
ORDER BY column_list [ASC | DESC]
)
CREATE TABLE employees (
year INT NOT NULL,
name VARCHAR(20) NOT NULL,
amount NUMERIC NOT NULL
);
INSERT INTO employees VALUES
(2018, 'Jack', 15000),
(2018, 'Jane', 9000),
(2018, 'John', 8000),
(2018, 'Stephen', 7000),
(2018, 'Harold', 6000),
(2019, 'Yin', 20000),
(2019, 'Jack', 18000),
(2019, 'Jane', 16000),
(2019, 'John', 14000),
(2019, 'Harold', 12000);
SELECT name, year, amount,
CUME_DIST() OVER (ORDER BY amount) AS cd
FROM employees
WHERE year = 2018;
| name | amount | cd |
|---|---|---|
| Harold | 6000 | 0.20 |
| Stephen | 7000 | 0.40 |
| John | 8000 | 0.60 |
| Jane | 9000 | 0.80 |
| Jack | 15000 | 1.00 |
Five rows in 2018, so each row covers 1/5 = 0.20 of the cumulative distribution. The bottom row covers the bottom 20%; the top row covers everything (1.00).
SELECT year, name, amount,
ROUND(CUME_DIST() OVER (
PARTITION BY year
ORDER BY amount
)::numeric, 2) AS cd
FROM employees
ORDER BY year, amount;
Each year has its own 0..1 distribution.
WITH ranked AS (
SELECT year, name, amount,
CUME_DIST() OVER (PARTITION BY year ORDER BY amount DESC) AS cd
FROM employees
)
SELECT * FROM ranked WHERE cd <= 0.20;
The two distribution functions answer subtly different questions. Pick the one that matches your interpretation:
| For 5 rows ascending | PERCENT_RANK | CUME_DIST |
|---|---|---|
| Row 1 (smallest) | 0.00 | 0.20 |
| Row 2 | 0.25 | 0.40 |
| Row 3 (median) | 0.50 | 0.60 |
| Row 4 | 0.75 | 0.80 |
| Row 5 (largest) | 1.00 | 1.00 |
- PERCENT_RANK: "where am I in the ordering, on a 0..1 scale?" (always 0 for first row).
- CUME_DIST: "what fraction of rows are at or below me?" (always 1/N for first row).
Tied rows always get the same CUME_DIST value — the cumulative count includes all of them. So three rows tied at the top of the ordering would all return 1.0.
CUME_DIST()returns the fraction of rows at or below the current row.- Range: greater than 0, up to and including 1.
- Bottom row gets
1/N; top row gets 1. - Use it for "top X% / bottom X%" reports.
- Differs from
PERCENT_RANKat the boundaries — pick the one that matches the question.