Window Functions Distribution PERCENT_RANK · 0..1 · percentile

PERCENT_RANK Window Function

A row's relative position within its partition, scaled to [0, 1]. First row gets 0, last row gets 1, the rest are spaced by (rank-1)/(N-1). The cleanest way to filter top-x-percent of a group.

PERCENT_RANK() calculates a row's relative position within its partition as a number between 0 and 1. The first row always gets 0, the last row always gets 1, and everything in between is spread out proportionally.

The formula it uses is straightforward: (rank − 1) / (total_rows − 1). So if there are 5 rows, the percent ranks are 0.00, 0.25, 0.50, 0.75, 1.00.

PERCENT_RANK() OVER (
    [PARTITION BY column_list]
    ORDER BY column_list [ASC | DESC]
)

Like the ranking functions, PERCENT_RANK() takes no arguments. ORDER BY is required.

SQL — Create & insert
CREATE TABLE employees (
    year    INT NOT NULL,
    name    VARCHAR(20) NOT NULL,
    amount  NUMERIC NOT NULL
);

INSERT INTO employees VALUES
(2018, 'Jack',    15000),
(2018, 'Jane',     9000),
(2018, 'John',     8000),
(2018, 'Stephen',  7000),
(2018, 'Harold',   6000),
(2019, 'Yin',     20000),
(2019, 'Jack',    18000),
(2019, 'Jane',    16000),
(2019, 'John',    14000),
(2019, 'Harold',  12000);
SQL
SELECT name, amount,
       PERCENT_RANK() OVER (ORDER BY amount) AS pct_rank
FROM   employees
WHERE  year = 2018;
Result
nameamountpct_rank
Harold60000.00
Stephen70000.25
John80000.50
Jane90000.75
Jack150001.00

Five rows means four "gaps" between them, so each step is 0.25. Harold (the smallest) gets 0; Jack (the largest) gets 1.

SQL
SELECT year, name, amount,
       ROUND(PERCENT_RANK() OVER (
           PARTITION BY year
           ORDER BY amount
       )::numeric, 2) AS pct_rank
FROM   employees
ORDER  BY year, amount;

The percent rank resets in each year — every year has its own 0..1 distribution.

SQL
WITH ranked AS (
    SELECT year, name, amount,
           PERCENT_RANK() OVER (
               PARTITION BY year
               ORDER BY amount DESC    -- DESC so high amounts get low pct_rank
           ) AS pr
    FROM   employees
)
SELECT * FROM ranked WHERE pr <= 0.25;

Both produce numbers between 0 and 1, but they answer different questions:

FunctionFormulaRangeFirst rowLast row
PERCENT_RANK(rank−1)/(N−1)0 ≤ x ≤ 101
CUME_DIST(rows ≤ this) / N0 < x ≤ 11/N1

PERCENT_RANK is "where in the rank order am I, scaled to 0..1." CUME_DIST is "what fraction of rows are at or below me." They're close but not the same.

Ties get the same percent rank. Concretely, the percent rank is computed using the same rank-with-gaps that RANK() would assign — so two tied rows share the percent rank of whichever comes first.

  • PERCENT_RANK() returns each row's relative position as a value in [0, 1].
  • First row is always 0; last row is always 1.
  • Formula: (rank − 1) / (total_rows − 1).
  • Use it for percentile cutoffs (top 10% / bottom quartile / etc.).
  • Subtly different from CUME_DIST — pick the one that matches your definition.