Over Keyword - Window Functions

Window functions perform calculations across sets of table rows that are related to the current row. Unlike aggregate functions that return a single result for a group of rows, window functions return a value for every row while considering a window of rows defined by the OVER clause.

We'll cover high-level, introductory information about window functions, and then move on to composition.

We also have some common examples to get you started.

tip

Click Demo this query within our query examples to see them in our live demo.

Deep Dive: What is a Window Function?

A window function performs a calculation across a set of rows that are related to the current row. This set of related rows is called a "window", defined by an OVER clause that follows the window function.

In practical terms, window functions are used when you need to perform a calculation that depends on a group of rows, but you want to retain the individual rows in the result set. This is different from aggregate functions like a cumulative sum or avg, which perform calculations on a group of rows and return a single result.

The underlying mechanism of a window function involves three components:

Partitioning: The PARTITION BY clause divides the result set into partitions (groups of rows) upon which the window function is applied. If no partition is defined, the function treats all rows of the query result set as a single partition.
Ordering: The ORDER BY clause within the OVER clause determines the order of the rows in each partition.
Frame Specification: This defines the set of rows included in the window, relative to the current row. For example, ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW includes all rows from the start of the partition to the current row.

Use cases for window functions are vast.

They are often used in analytics for tasks such as:

Calculating running totals or averages
Finding the maximum or minimum value in a sequence or partition
Ranking items within a specific category or partition
Calculating moving averages or cumulative sums

Window functions are tough to grok.

An analogy before we get to building:

Imagine a group of cars in a race. Each car has a number, a name, and a finish time. If you wanted to know the average finish time, you could use an aggregate function like avg() to calculate it. But this would only give you a single result: the average time. You wouldn't know anything about individual cars' times.

For example, a window function allows you to calculate the average finish time for all the cars (the window), but display it on each car (row), so you can compare this average to each car's average speed to see if they were faster or slower than the global average.

So, in essence, window functions allow you to perform calculations that consider more than just the individual row or the entire table, but a 'window' of related rows. This 'window' could be all rows with the same value in a certain column, like all cars of the same engine size, or it could be a range of rows based on some order, like the three cars who finished before and after a certain car.

This makes window functions incredibly powerful for complex calculations and analyses.

Syntax

functionName OVER (
    [PARTITION BY columnName [, ...]]
    [ORDER BY columnName [ASC | DESC] [, ...]]
    [ROWS | RANGE BETWEEN frame_start AND frame_end]
    [EXCLUDE CURRENT ROW | EXCLUDE NO OTHERS]
)

Where:

functionName: The window function to apply (e.g., avg, sum, rank)
OVER: Specifies the window over which the function operates
- PARTITION BY: Divides the result set into partitions
- ORDER BY: Specifies the order of rows within each partition
- ROWS | RANGE BETWEEN: Defines the window frame relative to the current row
- EXCLUDE: Optionally excludes certain rows from the frame

Supported functions

avg() – Calculates the average within a window
count() – Counts rows or non-null values
dense_rank() – Assigns a rank to rows monotonically
first_not_null_value() – Retrieves the first not null value in a window
first_value() – Retrieves the first value in a window
lag() – Accesses data from previous rows
last_value() – Retrieves the last value in a window
lead() – Accesses data from subsequent rows
max() – Returns the maximum value within a window
min() – Returns the minimum value within a window
rank() – Assigns a rank to rows
row_number() – Assigns sequential numbers to rows
sum() – Calculates the sum within a window

Components of a window function

A window function calculates results across a set of rows related to the current row, called a window. This allows for complex calculations like moving averages, running totals, and rankings without collapsing rows.

Function Name: Specifies the calculation to perform (e.g., avg(price))
OVER Clause: Defines the window for the function
- PARTITION BY: Divides the result set into partitions
- ORDER BY: Orders rows within partitions
- Frame Specification: Defines the subset of rows using ROWS or RANGE
Exclusion Option: Excludes specific rows from the frame

Example

Moving average exampleDemo this query
SELECT
    symbol,
    price,
    timestamp,
    avg(price) OVER (
        PARTITION BY symbol
        ORDER BY timestamp
        ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
    ) AS moving_avg
FROM trades;

This calculates a moving average of price over the current and three preceding rows for each symbol. For other common window function examples, please check the Window functions reference.

Frame types and behavior

Window frames specify which rows are included in the calculation relative to the current row.

ROWS frame

Defines the frame based on a physical number of rows:

ROWS BETWEEN 2 PRECEDING AND CURRENT ROW

This includes the current row and two preceding rows.

RANGE frame

note

RANGE functions have a known issue. When using RANGE, all the rows with the same value will have the same output for the function. Read the open issue for more information.

Defines the frame based on the actual values in the ORDER BY column, rather than counting rows. Unlike ROWS, which counts a specific number of rows, RANGE considers the values in the ORDER BY column to determine the window.

Important requirements for RANGE:

Data must be ordered by the designated timestamp column
The window is calculated based on the values in that ORDER BY column

For example, with a current row at 09:04 and RANGE BETWEEN '1' MINUTE PRECEDING AND CURRENT ROW:

Only includes rows with timestamps between 09:03 and 09:04 (inclusive)
Earlier rows (e.g., 09:00, 09:02) are excluded as they fall outside the 1-minute range

The following time units can be used in RANGE window functions:

day
hour
minute
second
millisecond
microsecond

Plural forms of these time units are also accepted (e.g., 'minutes', 'hours').

Multiple time intervals exampleDemo this query
SELECT
    timestamp,
    bid_px_00,
    -- 5-minute average: includes rows from (current_timestamp - 5 minutes) to current_timestamp
    AVG(bid_px_00) OVER (
        ORDER BY timestamp
        RANGE BETWEEN '5' MINUTE PRECEDING AND CURRENT ROW
    ) AS avg_5min,
    -- 100ms count: includes rows from (current_timestamp - 100ms) to current_timestamp
    COUNT(*) OVER (
        ORDER BY timestamp
        RANGE BETWEEN '100' MILLISECOND PRECEDING AND CURRENT ROW
    ) AS updates_100ms,
    -- 2-second sum: includes rows from (current_timestamp - 2 seconds) to current_timestamp
    SUM(bid_sz_00) OVER (
        ORDER BY timestamp
        RANGE BETWEEN '2' SECOND PRECEDING AND CURRENT ROW
    ) AS volume_2sec
FROM AAPL_orderbook
WHERE bid_px_00 > 0
LIMIT 10;

This query demonstrates different time intervals in action, calculating:

5-minute moving average of best bid price
Update frequency in 100ms windows
2-second rolling volume

Note that each window calculation is based on the timestamp values, not the number of rows. This means the number of rows included can vary depending on how many records exist within each time interval.

Frame boundaries

Frame boundaries determine which rows are included in the window calculation:

UNBOUNDED PRECEDING: Starts at the first row of the partition
<value> PRECEDING: Starts or ends at a specified number of rows or interval before the current row
CURRENT ROW: Starts or ends at the current row

When the frame clause is not specified, the default frame is RANGE UNBOUNDED PRECEDING, which includes all rows from the start of the partition to the current row.

If ORDER BY is not present, the frame includes the entire partition, as all rows are considered equal.
If ORDER BY is present, the frame includes all rows from the start of the partition to the current row. Note that UNBOUNDED FOLLOWING is only allowed when the frame start is UNBOUNDED PRECEDING, which means the frame includes the entire partition.

Restrictions

Frame start can only be:
- UNBOUNDED PRECEDING
- <value> PRECEDING
- CURRENT ROW
Frame end can only be:
- CURRENT ROW
- <value> PRECEDING (unless start is UNBOUNDED PRECEDING)
RANGE frames must be ordered by a designated timestamp

Exclusion options

Modifies the window frame by excluding certain rows:

EXCLUDE NO OTHERS

Default behavior
Includes all rows in the frame

EXCLUDE CURRENT ROW

Excludes the current row from the frame
When frame ends at CURRENT ROW, end boundary automatically adjusts to 1 PRECEDING
This automatic adjustment ensures that the current row is effectively excluded from the calculation, as there cannot be a frame that ends after the current row when the current row is excluded.

Example query

To tie it together, consider the following example:

EXCLUSION exampleDemo this query
SELECT
    timestamp,
    price,
    SUM(price) OVER (
        ORDER BY timestamp
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        EXCLUDE CURRENT ROW
    ) AS cumulative_sum_excluding_current
FROM trades;

The query calculates a cumulative sum of the price column for each row in the trades table, excluding the current row from the calculation. By using EXCLUDE CURRENT ROW, the window frame adjusts to include all rows from the start up to one row before the current row. This demonstrates how the EXCLUDE CURRENT ROW option modifies the window frame to exclude the current row, affecting the result of the window function.

Notes and restrictions

ORDER BY behavior

ORDER BY in OVER clause determines the logical order for window functions
Independent of the query-level ORDER BY
Required for window-only functions
Required for RANGE frames

Frame specifications

ROWS frames:
- Based on physical row counts
- More efficient for large datasets
- Can be used with any ORDER BY column
RANGE frames:
- Defines the frame based on the actual values in the ORDER BY column, rather than counted row.
- Require ORDER BY on timestamp
- Support time-based intervals (e.g., '1h', '5m')

Exclusion behavior

Using EXCLUDE CURRENT ROW with frame end at CURRENT ROW:
- Automatically adjusts end boundary to 1 PRECEDING
- Ensures consistent results across queries

Performance considerations

ROWS frames typically perform better than RANGE frames for large datasets
Partitioning can improve performance by processing smaller chunks of data
Consider index usage when ordering by timestamp columns

Common pitfalls

Using window functions in WHERE clauses:

Not allowed!
-- Incorrect usage
SELECT
    symbol,
    price,
    timestamp
FROM trades
WHERE
    avg(price) OVER (ORDER BY timestamp) > 100;

Instead, build like so:

Correct usageDemo this query
with prices_and_avg AS (
SELECT
    symbol,
    price, avg(price) OVER (ORDER BY timestamp) as moving_avg_price,
    timestamp
FROM trades
WHERE timestamp in yesterday()
)
select * from prices_and_avg
WHERE
   moving_avg_price  > 100;

Missing ORDER BY in OVER clause

When no ORDER BY is specified, the average will be calculated for the whole partition. Given we don't have a PARTITION BY and we are using a global window, all the rows will show the same average. This is the average for the whole dataset.

Missing ORDER BY
-- Potential issue
SELECT
    symbol,
    price,
    sum(price) OVER () AS cumulative_sum
FROM trades;
WHERE timestamp in yesterday();

To compute the moving average, we need to specify an ORDER BY clause:

Safer usageDemo this query
SELECT
    symbol,
    price,
    sum(price) OVER (ORDER BY TIMESTAMP) AS cumulative_sum
FROM trades
WHERE timestamp in yesterday();

We may also have a case where all the rows for the same partition (symbol) will have the same average, if we include a PARTITION BY clause without an ORDER BY clause:

Partitioned usageDemo this query
-- Potential issue
SELECT
    symbol,
    price,
    sum(price) OVER (PARTITION BY symbol ) AS cumulative_sum
FROM trades
WHERE timestamp in yesterday();

For every row to show the moving average for each symbol, we need to specify both an ORDER BY and a PARTITION BY clause:

Partitioned and ordered usageDemo this query
SELECT
    symbol,
    price,
    sum(price) OVER (PARTITION BY symbol ORDER BY TIMESTAMP) AS cumulative_sum
FROM trades
WHERE timestamp in yesterday();

Deep Dive: What is a Window Function?​

Syntax​

Supported functions​

Components of a window function​

Example​

Frame types and behavior​

ROWS frame​

RANGE frame​

Frame boundaries​

Restrictions​

Exclusion options​

EXCLUDE NO OTHERS​

EXCLUDE CURRENT ROW​

Example query​

Notes and restrictions​

ORDER BY behavior​

Frame specifications​

Exclusion behavior​

Performance considerations​

Common pitfalls​

Using window functions in WHERE clauses:​

Missing ORDER BY in OVER clause​