Window Functions Distribution CUME_DIST · cumulative · percentile

CUME_DIST Window Function

The fraction of rows at or below the current row, in (0, 1]. Different formula from PERCENT_RANK — pick the one that matches your question. Standard tool for top-X-percent reports.

CUME_DIST() — short for "cumulative distribution" — returns the fraction of rows in the partition that are at or below the current row in the ordering. The result is a number greater than 0 and less than or equal to 1.

Concretely: if 3 out of 10 rows in your partition have a value at or below the current row, CUME_DIST returns 0.3. The bottom row gets 1/N; the top row always gets 1.

CUME_DIST() OVER (
    [PARTITION BY column_list]
    ORDER BY column_list [ASC | DESC]
)
SQL — Create & insert
CREATE TABLE employees (
    year    INT NOT NULL,
    name    VARCHAR(20) NOT NULL,
    amount  NUMERIC NOT NULL
);

INSERT INTO employees VALUES
(2018, 'Jack',    15000),
(2018, 'Jane',     9000),
(2018, 'John',     8000),
(2018, 'Stephen',  7000),
(2018, 'Harold',   6000),
(2019, 'Yin',     20000),
(2019, 'Jack',    18000),
(2019, 'Jane',    16000),
(2019, 'John',    14000),
(2019, 'Harold',  12000);
SQL
SELECT name, year, amount,
       CUME_DIST() OVER (ORDER BY amount) AS cd
FROM   employees
WHERE  year = 2018;
Result
nameamountcd
Harold60000.20
Stephen70000.40
John80000.60
Jane90000.80
Jack150001.00

Five rows in 2018, so each row covers 1/5 = 0.20 of the cumulative distribution. The bottom row covers the bottom 20%; the top row covers everything (1.00).

SQL
SELECT year, name, amount,
       ROUND(CUME_DIST() OVER (
           PARTITION BY year
           ORDER BY amount
       )::numeric, 2) AS cd
FROM   employees
ORDER  BY year, amount;

Each year has its own 0..1 distribution.

SQL
WITH ranked AS (
    SELECT year, name, amount,
           CUME_DIST() OVER (PARTITION BY year ORDER BY amount DESC) AS cd
    FROM   employees
)
SELECT * FROM ranked WHERE cd <= 0.20;

The two distribution functions answer subtly different questions. Pick the one that matches your interpretation:

For 5 rows ascendingPERCENT_RANKCUME_DIST
Row 1 (smallest)0.000.20
Row 20.250.40
Row 3 (median)0.500.60
Row 40.750.80
Row 5 (largest)1.001.00
  • PERCENT_RANK: "where am I in the ordering, on a 0..1 scale?" (always 0 for first row).
  • CUME_DIST: "what fraction of rows are at or below me?" (always 1/N for first row).

Tied rows always get the same CUME_DIST value — the cumulative count includes all of them. So three rows tied at the top of the ordering would all return 1.0.

  • CUME_DIST() returns the fraction of rows at or below the current row.
  • Range: greater than 0, up to and including 1.
  • Bottom row gets 1/N; top row gets 1.
  • Use it for "top X% / bottom X%" reports.
  • Differs from PERCENT_RANK at the boundaries — pick the one that matches the question.