Finance functions

This page describes functions specific to the financial services domain.

l2price

Trade price calculation.

l2price(target_quantity, quantity_1, price_1, quantity_2, price_2, ..., quantity_n, price_n)

Consider quantity_1, price_1, quantity_2, price_2, ..., quantity_n, price_n to be either side of an order book with n price levels. Then, the return value of the function is the average trade price of a market order executed with the size of target_quantity against the book.

Let's take the below order book as an example.

SizeBidAskSize
1014.1014.5014
1714.0014.6016
1913.9014.8023
2113.7015.1012
1813.40

A buy market order with the size of 50 would wipe out the first two price levels of the Ask side of the book, and would also trade on the third level.

The full price of the trade:

14$14.50+16$14.60+(501416)$14.80=$732.6014 \cdot \$14.50 + 16 \cdot \$14.60 + (50 - 14 - 16) \cdot \$14.80 = \$732.60

The average price of the instrument in this trade:

$732.60/50=$14.652\$732.60 / 50 = \$14.652

This average trade price is the output of the function when executed with the parameters taken from the above example:

select l2price(50, 14, 14.50, 16, 14.60, 23, 14.80, 12, 15.10);
l2price
14.652

Parameters

The function takes a target quantity, and a variable number of quantity/price pairs. Each represents a price level of the order book.

Each parameter is expected to be a double, or convertible to double (float, long, int, short, byte).

  • target_quantity: The size of a hypothetical market order to be filled.
  • quantity*: The number of instruments available at the corresponding price levels.
  • price*: Price levels of the order book.

Return value

The function returns with a double, representing the average trade price.

Returns null if the price is not calculable. For example, if the target quantity cannot be filled, or there is incomplete data in the set (nulls).

Examples

Test data:

CREATE TABLE order_book (
ts TIMESTAMP,
bidSize1 DOUBLE, bid1 DOUBLE, bidSize2 DOUBLE, bid2 DOUBLE, bidSize3 DOUBLE, bid3 DOUBLE,
askSize1 DOUBLE, ask1 DOUBLE, askSize2 DOUBLE, ask2 DOUBLE, askSize3 DOUBLE, ask3 DOUBLE
) TIMESTAMP(ts) PARTITION BY DAY;

INSERT INTO order_book VALUES
('2024-05-22T09:40:15.006000Z', 40, 14.10, 47, 14.00, 39, 13.90, 54, 14.50, 36, 14.60, 23, 14.80),
('2024-05-22T09:40:15.175000Z', 42, 14.00, 45, 13.90, 35, 13.80, 16, 14.30, 57, 14.50, 30, 14.60),
('2024-05-22T09:40:15.522000Z', 36, 14.10, 38, 14.00, 31, 13.90, 30, 14.40, 47, 14.50, 34, 14.60);

Trading price of instrument when buying 100:

SELECT ts, L2PRICE(100, askSize1, ask1, askSize2, ask2, askSize3, ask3) AS buy FROM order_book;
tsbuy
2024-05-22T09:40:15.006000Z14.565999999999
2024-05-22T09:40:15.175000Z14.495
2024-05-22T09:40:15.522000Z14.493

Trading price of instrument when selling 100:

SELECT ts, L2PRICE(100, bidSize1, bid1, bidSize2, bid2, bidSize3, bid3) AS sell FROM order_book;
tssell
2024-05-22T09:40:15.006000Z14.027
2024-05-22T09:40:15.175000Z13.929
2024-05-22T09:40:15.522000Z14.01

The spread for target quantity 100:

SELECT ts, L2PRICE(100, askSize1, ask1, askSize2, ask2, askSize3, ask3)
- L2PRICE(100, bidSize1, bid1, bidSize2, bid2, bidSize3, bid3) AS spread FROM order_book;
tsspread
2024-05-22T09:40:15.006000Z0.538999999999
2024-05-22T09:40:15.175000Z0.565999999999
2024-05-22T09:40:15.522000Z0.483

mid

mid(bid, ask) - calculates the midpoint of a bidding price and asking price.

Returns null if either argument is NaN or null.

Parameters

  • bid is any numeric bidding price value.
  • ask is any numeric asking price value.

Return value

Return value type is double.

Examples

SELECT mid(1.5760, 1.5763)
mid
1.57615

regr_intercept

regr_intercept(y, x) - Calculates the y-intercept of the linear regression line for the given numeric columns y (dependent variable) and x (independent variable).

  • The function requires at least two valid (y, x) pairs to compute the intercept.
    • If fewer than two pairs are available, the function returns null.
  • Supported data types for x and y include double, float, and integer types.
  • The regr_intercept function can be used with other statistical aggregation functions like regr_slope or corr.
  • The order of arguments in regr_intercept(y, x) matters.
    • Ensure that y is the dependent variable and x is the independent variable.

Calculation

The y-intercept b0b_0 of the regression line y=b0+b1xy = b_0 + b_1 x is calculated using the formula:

b0=yˉb1xˉb_0 = \bar{y} - b_1 \bar{x}

Where:

  • yˉ\bar{y} is the mean of y values
  • xˉ\bar{x} is the mean of x values
  • b1b_1 is the slope calculated by regr_slope(y, x)

Arguments

  • y: A numeric column representing the dependent variable.
  • x: A numeric column representing the independent variable.

Return value

Return value type is double.

The function returns the y-intercept of the regression line, indicating the predicted value of y when x is 0.

Examples

Calculate the regression intercept between two variables

Using the same measurements table:

xy
1.02.0
2.03.0
3.05.0
4.04.0
5.06.0

Calculate the y-intercept:

SELECT regr_intercept(y, x) AS y_intercept FROM measurements;

Result:

y_intercept
1.4

Or: When x is 0, y is predicted to be 1.4 units.

Calculate the regression intercept grouped by category

Using the same sales_data table:

categoryadvertising_spendsales
A100015000
A200022000
A300028000
B150018000
B250026000
B350031000
SELECT category, regr_intercept(sales, advertising_spend) AS base_sales
FROM sales_data
GROUP BY category;

Result:

categorybase_sales
A9500
B12000

Or:

  • In category A, with no advertising spend, the predicted base sales are 9,500 units
  • In category B, with no advertising spend, the predicted base sales are 12,000 units

Handling null values

The function ignores rows where either x or y is null:

SELECT regr_intercept(y, x) AS y_intercept
FROM (
SELECT 1 AS x, 2 AS y
UNION ALL SELECT 2, NULL
UNION ALL SELECT NULL, 4
UNION ALL SELECT 4, 5
);

Result:

y_intercept
1.4

Only the rows where both x and y are not null are considered in the calculation.

regr_slope

regr_slope(y, x) - Calculates the slope of the linear regression line for the given numeric columns y (dependent variable) and x (independent variable).

  • The function requires at least two valid (x, y) pairs to compute the slope.
    • If fewer than two pairs are available, the function returns null.
  • Supported data types for x and y include double, float, and integer types.
  • The regr_slope function can be used with other statistical aggregation functions like corr or covar_samp.
  • The order of arguments in regr_slope(y, x) matters.
    • Ensure that y is the dependent variable and x is the independent variable.

Calculation

The slope b1b_1 of the regression line y=b0+b1xy = b_0 + b_1 x is calculated using the formula:

b1=N(xy)xyN(x2)(x)2b_1 = \frac{N \sum (xy) - \sum x \sum y}{N \sum (x^2) - (\sum x)^2}

Where:

  • NN is the number of valid data points.
  • (xy)\sum (xy) is the sum of the products of xx and yy.
  • x\sum x and y\sum y is the sums of xx and yy values, respectively.
  • (x2)\sum (x^2) is the sum of the squares of xx values.

Arguments

  • y: A numeric column representing the dependent variable.
  • x: A numeric column representing the independent variable.

Return value

Return value type is double.

The function returns the slope of the regression line, indicating how much y changes for a unit change in x.

Examples

Calculate the regression slope between two variables

Suppose you have a table measurements with the following data:

xy
1.02.0
2.03.0
3.05.0
4.04.0
5.06.0

You can calculate the slope of the regression line between y and x using:

SELECT regr_slope(y, x) AS slope FROM measurements;

Result:

slope
0.8

Or: The slope of 0.8 indicates that for each unit increase in x, y increases by 0.8 units on average.

Calculate the regression slope grouped by a category

Consider a table sales_data:

categoryadvertising_spendsales
A100015000
A200022000
A300028000
B150018000
B250026000
B350031000

Calculate the regression slope of sales versus advertising_spend for each category:

SELECT category, regr_slope(sales, advertising_spend) AS slope FROM sales_data
GROUP BY category;

Result:

categoryslope
A8.5
B7.0

Or:

  • In category A, for every additional unit spent on advertising, sales increase by 8.5 units on average.
  • In category B, the increase is 7.0 units per advertising unit spent.

Handling null values

If your data contains null values, regr_slope() will ignore those rows:

SELECT regr_slope(y, x) AS slope FROM ( SELECT 1 AS x, 2 AS y UNION ALL SELECT
2, NULL UNION ALL SELECT NULL, 4 UNION ALL SELECT 4, 5 );

Result:

slope
0.8

Only the rows where both x and y are not null are considered in the calculation.

Calculating beta

Assuming you have a table stock_returns with daily returns for a specific stock and the market index:

datestock_returnmarket_return
2023-01-010.50.4
2023-01-02-0.2-0.1
2023-01-030.30.2
.........

Calculate the stock's beta coefficient:

SELECT regr_slope(stock_return, market_return) AS beta FROM stock_returns;
beta
1.2

Or: A beta of 1.2 suggests the stock is 20% more volatile than the market.

Remember: The order of arguments in regr_slope(y, x) matters.

Ensure that y is the dependent variable and x is the independent variable.

spread_bps

spread_bps(bid, ask) - calculates the quoted bid-ask spread, based on the highest bidding price, and the lowest asking price.

The result is provided in basis points, and the calculation is:

spread(bid,ask)mid(bid,ask)10000\frac {\text{spread}\left(\text{bid}, \text{ask}\right)} {\text{mid}\left(\text{bid}, \text{ask}\right)} \cdot 10\,000

Parameters

  • bid is any numeric bidding price value.
  • ask is any numeric asking price value.

Return value

Return value type is double.

Examples

SELECT spread_bps(1.5760, 1.5763)
spread_bps
1.903372140976

vwap

vwap(price, quantity) - Calculates the volume-weighted average price (VWAP) based on the given price and quantity columns. This is defined by the following formula:

vwap=sum(pricequantity)sum(quantity)\text{vwap} = \frac {\text{sum}\left(\text{price} \cdot \text{quantity}\right)} {\text{sum}\left(\text{quantity}\right)}

Parameters

  • price is any numeric price value.
  • quantity is any numeric quantity value.

Return value

Return value type is double.

Examples

SELECT vwap(x, x)
FROM (SELECT x FROM long_sequence(100));
vwap
67

wmid

wmid(bidSize, bidPrice, askPrice, askSize) - calculates the weighted mid-price for a sized bid/ask pair.

It is calculated with these formulae:

imbalance=bidSize(bidSize+askSize)\text{imbalance} = \frac { \text{bidSize} } { \left( \text{bidSize} + \text{askSize} \right) } wmid=askPriceimbalance+bidPrice(1imbalance)\text{wmid} = \text{askPrice} \cdot \text{imbalance} + \text{bidPrice} \cdot \left( 1 - \text{imbalance} \right)

Parameters

  • bidSize is any numeric value representing the size of the bid offer.
  • bidPrice is any numeric value representing the bidding price.
  • askPrice is any numeric value representing the asking price.
  • askSize is any numeric value representing the size of the ask offer.

Return value

Return value type is double.

Examples

SELECT wmid(100, 5, 6, 100)
wmid
5.5