PL/pgSQL Bulk Processing
Master PL/pgSQL bulk processing the PostgreSQL way — set-based SQL beats Oracle-style BULK COLLECT/FORALL translations. Covers single-statement UPDATEs, UPDATE...FROM unnest() for per-row distinct values (the FORALL equivalent), array_agg for collection, multi-row VALUES INSERTs, and cursor-based batching for very large result sets.
If you're coming from Oracle PL/SQL, you know BULK COLLECT INTO and FORALL — language constructs that let you fetch many rows into an array in one go, then perform a batch DML against that array. The point: avoid the per-row context-switching overhead of "loop, fetch one row, do something, loop again."
PostgreSQL doesn't have those exact statements. What it has is even better for most cases: an SQL engine that's already optimized to operate on whole result sets at once. The PL/pgSQL equivalent of bulk operations falls into three patterns:
| Pattern | Best for |
|---|---|
| Set-based SQL — single UPDATE/INSERT covering many rows | The vast majority of use cases. Almost always the fastest. |
Arrays + SELECT array_agg(...) | When you need the data in PL/pgSQL memory for further processing |
| Cursor + batched FETCH | When working with very large result sets that won't fit in memory |
BULK COLLECT code mechanically into PostgreSQL. The Oracle pattern of "fetch into array, loop the array, run individual updates" is usually the wrong PostgreSQL idiom — a single set-based SQL statement is faster and clearer. Reach for arrays only when the per-row processing genuinely can't be expressed in plain SQL.DROP TABLE IF EXISTS employees;
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
salary DECIMAL(10, 2)
);
INSERT INTO employees (salary) VALUES
(50000.00),
(60000.00),
(70000.00),
(80000.00),
(90000.00);
SELECT * FROM employees;
Goal: increase every employee's salary by 10%. The PostgreSQL way — a single SQL statement that operates on the whole table:
DROP FUNCTION IF EXISTS update_all_salaries();
CREATE OR REPLACE FUNCTION update_all_salaries()
RETURNS VOID AS $$
BEGIN
UPDATE employees
SET salary = salary * 1.1;
END;
$$ LANGUAGE plpgsql;
SELECT update_all_salaries();
SELECT * FROM employees;
One UPDATE statement updated all five rows. PostgreSQL's planner saw this as a sequential scan plus an in-place update — about as efficient as it gets. This is the "bulk processing" pattern in PostgreSQL — let SQL do the work.
When you genuinely need the data in PL/pgSQL memory — say, to compute something across rows that requires iteration — use array_agg to collect a column into an array in one SELECT:
DO $$
DECLARE
emp_ids INT[];
emp_salaries DECIMAL[];
total DECIMAL := 0;
BEGIN
-- Collect both columns into arrays in one round trip
SELECT array_agg(employee_id ORDER BY employee_id),
array_agg(salary ORDER BY employee_id)
INTO emp_ids, emp_salaries
FROM employees;
-- Walk the arrays in PL/pgSQL
FOR i IN 1..array_length(emp_ids, 1) LOOP
RAISE NOTICE 'Employee %: %', emp_ids[i], emp_salaries[i];
total := total + emp_salaries[i];
END LOOP;
RAISE NOTICE 'Total payroll: %', total;
END
$$;
SELECT SUM(salary) FROM employees would compute the total faster than fetching everything into arrays and looping. The array pattern earns its keep when per-row processing genuinely can't be expressed in SQL — string formatting, calling external functions, complex conditional logic that depends on previous-row state, etc.Sometimes the value to update each row to differs per row. If you have those values in PL/pgSQL — perhaps computed from external logic — you can update them all in one SQL statement using UPDATE ... FROM unnest(...):
DO $$
DECLARE
-- Imagine these came from some external computation
target_ids INT[] := ARRAY[1, 3, 5];
target_bonuses DECIMAL[] := ARRAY[1000.00, 2000.00, 3000.00];
BEGIN
-- Apply all the per-id updates in a single SQL statement
UPDATE employees AS e
SET salary = e.salary + b.bonus
FROM unnest(target_ids, target_bonuses) AS b(emp_id, bonus)
WHERE e.employee_id = b.emp_id;
RAISE NOTICE 'Bonuses applied';
END
$$;
SELECT * FROM employees;
Three different bonus amounts applied to three different employees in a single UPDATE statement. unnest turns the parallel arrays into a virtual table of (emp_id, bonus) rows; the UPDATE joins against it. This is the FORALL equivalent in PostgreSQL — and it's just plain SQL.
Inserting many rows? One INSERT with multiple VALUES tuples beats a loop of single inserts by a wide margin — fewer round trips, fewer parse/plan cycles, single transaction commit:
DO $$
BEGIN
-- One INSERT, six rows
INSERT INTO employees (salary) VALUES
(45000), (52000), (61000), (47000), (58000), (63000);
END
$$;
SELECT count(*) FROM employees;
For really large bulk loads — say, thousands of rows — even a multi-row INSERT can be beaten by COPY, PostgreSQL's bulk load command. But for hundreds of rows, multi-row VALUES is plenty.
If a result set is too large to fit in memory, walk it with a cursor, processing N rows at a time. FETCH FORWARD n returns up to n rows from a cursor:
DO $$
DECLARE
big_cursor CURSOR FOR SELECT employee_id, salary FROM employees ORDER BY employee_id;
rec record;
batch_cnt INTEGER := 0;
BEGIN
OPEN big_cursor;
LOOP
FETCH big_cursor INTO rec;
EXIT WHEN NOT FOUND;
batch_cnt := batch_cnt + 1;
-- Process rec.employee_id, rec.salary
-- Every 3 rows, do something batched (commit-style work, network call, etc)
IF batch_cnt % 3 = 0 THEN
RAISE NOTICE 'Processed batch ending at id %', rec.employee_id;
END IF;
END LOOP;
CLOSE big_cursor;
RAISE NOTICE 'Total rows: %', batch_cnt;
END
$$;
Cursor-based batching is the right pattern when memory is tight or when each batch needs its own bookkeeping (logging, progress reporting, periodic commits in a long-running migration).
| Want to… | Best pattern |
|---|---|
| Update many rows the same way | Single UPDATE statement |
| Update many rows, each with a different value | UPDATE ... FROM unnest(arr1, arr2) |
| Insert many rows of constant data | One INSERT with multiple VALUES |
| Insert from a query | INSERT INTO target SELECT ... |
| Load from a file (very large) | COPY |
| Aggregate over rows | SELECT SUM/COUNT/AVG ... INTO var FROM ... |
| Per-row processing that needs application logic | FOR rec IN SELECT ... LOOP |
| Process a result set too large to fit in memory | Cursor + batched FETCH |
- Set-based SQL beats array-based loops in nearly every case. A single UPDATE/INSERT/DELETE that the planner sees is faster than a Python-style "fetch into list, modify, write back."
- Reach for arrays only when application logic genuinely demands it — string formatting, calling stored procedures, complex per-row decisions involving previous-row state.
- Use
UPDATE ... FROM unnest(...)for the case where each row needs a different update value — this is PostgreSQL'sFORALL. - Bulk INSERT with multi-row VALUES is dramatically faster than looped single inserts. For really large loads, use
COPY. - Don't translate Oracle
BULK COLLECTpatterns mechanically. Re-think the problem in set-based terms first. - Use cursors for memory-bounded processing when result sets are larger than RAM. Otherwise, plain SQL or
FOR rec IN SELECTis simpler. - Trust the planner. PostgreSQL is very good at executing single statements over large datasets efficiently; manual batching at the PL/pgSQL level rarely outperforms it.
- PostgreSQL doesn't have Oracle's
BULK COLLECT/FORALLstatements. It doesn't need them — set-based SQL is the bulk-processing primitive. - For most "bulk operations," a single
UPDATE,INSERT, orDELETEstatement is the right answer. The planner handles the rest. - For per-row distinct values, use
UPDATE ... FROM unnest(arr1, arr2) AS t(...)— equivalent toFORALL. - For collecting data into PL/pgSQL memory, use
SELECT array_agg(col) INTO arr FROM ...— equivalent toBULK COLLECT. - Reach for arrays and loops only when application logic genuinely can't be expressed in SQL.
- For result sets too large to fit in memory, walk them with a cursor in batches.