Window Functions Distribution CUME_DIST · cumulative · percentile

CUME_DIST Window Function

The fraction of rows at or below the current row, in (0, 1]. Different formula from PERCENT_RANK — pick the one that matches your question. Standard tool for top-X-percent reports.

What CUME_DIST does

CUME_DIST() — short for "cumulative distribution" — returns the fraction of rows in the partition that are at or below the current row in the ordering. The result is a number greater than 0 and less than or equal to 1.

Concretely: if 3 out of 10 rows in your partition have a value at or below the current row, CUME_DIST returns 0.3. The bottom row gets 1/N; the top row always gets 1.

Syntax

CUME_DIST() OVER (
[PARTITION BY column_list]
ORDER BY column_list [ASC | DESC]
)

Demo dataset

SQL — Create & insert

CREATE TABLE employees (
    year    INT NOT NULL,
    name    VARCHAR(20) NOT NULL,
    amount  NUMERIC NOT NULL
);

INSERT INTO employees VALUES
(2018, 'Jack',    15000),
(2018, 'Jane',     9000),
(2018, 'John',     8000),
(2018, 'Stephen',  7000),
(2018, 'Harold',   6000),
(2019, 'Yin',     20000),
(2019, 'Jack',    18000),
(2019, 'Jane',    16000),
(2019, 'John',    14000),
(2019, 'Harold',  12000);

Example 1 — cumulative distribution within 2018

SQL

SELECT name, year, amount,
       CUME_DIST() OVER (ORDER BY amount) AS cd
FROM   employees
WHERE  year = 2018;

Result

name	amount	cd
Harold	6000	0.20
Stephen	7000	0.40
John	8000	0.60
Jane	9000	0.80
Jack	15000	1.00

Five rows in 2018, so each row covers 1/5 = 0.20 of the cumulative distribution. The bottom row covers the bottom 20%; the top row covers everything (1.00).

Example 2 — partitioned by year

SQL

SELECT year, name, amount,
       ROUND(CUME_DIST() OVER (
           PARTITION BY year
           ORDER BY amount
       )::numeric, 2) AS cd
FROM   employees
ORDER  BY year, amount;

Each year has its own 0..1 distribution.

Example 3 — find the top 20% of earners per year

SQL

WITH ranked AS (
    SELECT year, name, amount,
           CUME_DIST() OVER (PARTITION BY year ORDER BY amount DESC) AS cd
    FROM   employees
)
SELECT * FROM ranked WHERE cd <= 0.20;

CUME_DIST vs PERCENT_RANK

The two distribution functions answer subtly different questions. Pick the one that matches your interpretation:

For 5 rows ascending	PERCENT_RANK	CUME_DIST
Row 1 (smallest)	0.00	0.20
Row 2	0.25	0.40
Row 3 (median)	0.50	0.60
Row 4	0.75	0.80
Row 5 (largest)	1.00	1.00

PERCENT_RANK: "where am I in the ordering, on a 0..1 scale?" (always 0 for first row).
CUME_DIST: "what fraction of rows are at or below me?" (always 1/N for first row).

Tie behaviour

Tied rows always get the same CUME_DIST value — the cumulative count includes all of them. So three rows tied at the top of the ordering would all return 1.0.

Recap

CUME_DIST() returns the fraction of rows at or below the current row.
Range: greater than 0, up to and including 1.
Bottom row gets 1/N; top row gets 1.
Use it for "top X% / bottom X%" reports.
Differs from PERCENT_RANK at the boundaries — pick the one that matches the question.