Window Functions Distribution PERCENT_RANK · 0..1 · percentile

PERCENT_RANK Window Function

A row's relative position within its partition, scaled to [0, 1]. First row gets 0, last row gets 1, the rest are spaced by (rank-1)/(N-1). The cleanest way to filter top-x-percent of a group.

What PERCENT_RANK does

PERCENT_RANK() calculates a row's relative position within its partition as a number between 0 and 1. The first row always gets 0, the last row always gets 1, and everything in between is spread out proportionally.

The formula it uses is straightforward: (rank − 1) / (total_rows − 1). So if there are 5 rows, the percent ranks are 0.00, 0.25, 0.50, 0.75, 1.00.

Syntax

PERCENT_RANK() OVER (
[PARTITION BY column_list]
ORDER BY column_list [ASC | DESC]
)

Like the ranking functions, PERCENT_RANK() takes no arguments. ORDER BY is required.

Demo dataset

SQL — Create & insert

CREATE TABLE employees (
    year    INT NOT NULL,
    name    VARCHAR(20) NOT NULL,
    amount  NUMERIC NOT NULL
);

INSERT INTO employees VALUES
(2018, 'Jack',    15000),
(2018, 'Jane',     9000),
(2018, 'John',     8000),
(2018, 'Stephen',  7000),
(2018, 'Harold',   6000),
(2019, 'Yin',     20000),
(2019, 'Jack',    18000),
(2019, 'Jane',    16000),
(2019, 'John',    14000),
(2019, 'Harold',  12000);

Example 1 — percent rank for 2018 employees

SQL

SELECT name, amount,
       PERCENT_RANK() OVER (ORDER BY amount) AS pct_rank
FROM   employees
WHERE  year = 2018;

Result

name	amount	pct_rank
Harold	6000	0.00
Stephen	7000	0.25
John	8000	0.50
Jane	9000	0.75
Jack	15000	1.00

Five rows means four "gaps" between them, so each step is 0.25. Harold (the smallest) gets 0; Jack (the largest) gets 1.

Example 2 — partitioned by year

SQL

SELECT year, name, amount,
       ROUND(PERCENT_RANK() OVER (
           PARTITION BY year
           ORDER BY amount
       )::numeric, 2) AS pct_rank
FROM   employees
ORDER  BY year, amount;

The percent rank resets in each year — every year has its own 0..1 distribution.

Example 3 — find the top 25% of earners per year

SQL

WITH ranked AS (
    SELECT year, name, amount,
           PERCENT_RANK() OVER (
               PARTITION BY year
               ORDER BY amount DESC    -- DESC so high amounts get low pct_rank
           ) AS pr
    FROM   employees
)
SELECT * FROM ranked WHERE pr <= 0.25;

PERCENT_RANK vs CUME_DIST

Both produce numbers between 0 and 1, but they answer different questions:

Function	Formula	Range	First row	Last row
PERCENT_RANK	(rank−1)/(N−1)	0 ≤ x ≤ 1	0	1
CUME_DIST	(rows ≤ this) / N	0 < x ≤ 1	1/N	1

PERCENT_RANK is "where in the rank order am I, scaled to 0..1." CUME_DIST is "what fraction of rows are at or below me." They're close but not the same.

Tie behaviour

Ties get the same percent rank. Concretely, the percent rank is computed using the same rank-with-gaps that RANK() would assign — so two tied rows share the percent rank of whichever comes first.

Recap

PERCENT_RANK() returns each row's relative position as a value in [0, 1].
First row is always 0; last row is always 1.
Formula: (rank − 1) / (total_rows − 1).
Use it for percentile cutoffs (top 10% / bottom quartile / etc.).
Subtly different from CUME_DIST — pick the one that matches your definition.