Finance functions
This page describes functions specific to the financial services domain.
l2price
Trade price calculation.
l2price(target_quantity, quantity_1, price_1, quantity_2, price_2, ..., quantity_n, price_n)
Consider quantity_1
, price_1
, quantity_2
, price_2
, ..., quantity_n
,
price_n
to be either side of an order book with n
price levels. Then, the
return value of the function is the average trade price of a market order
executed with the size of target_quantity
against the book.
Let's take the below order book as an example.
Size | Bid | Ask | Size |
---|---|---|---|
10 | 14.10 | 14.50 | 14 |
17 | 14.00 | 14.60 | 16 |
19 | 13.90 | 14.80 | 23 |
21 | 13.70 | 15.10 | 12 |
18 | 13.40 |
A buy market order with the size of 50 would wipe out the first two price levels of the Ask side of the book, and would also trade on the third level.
The full price of the trade:
The average price of the instrument in this trade:
This average trade price is the output of the function when executed with the parameters taken from the above example:
select l2price(50, 14, 14.50, 16, 14.60, 23, 14.80, 12, 15.10);
l2price |
---|
14.652 |
Parameters
The function takes a target quantity
, and a variable number of
quantity
/price
pairs. Each represents a price level of the order book.
Each parameter is expected to be a double, or convertible to double (float, long, int, short, byte).
target_quantity
: The size of a hypothetical market order to be filled.quantity*
: The number of instruments available at the corresponding price levels.price*
: Price levels of the order book.
Return value
The function returns with a double
, representing the average trade price.
Returns null if the price is not calculable. For example, if the target quantity cannot be filled, or there is incomplete data in the set (nulls).
Examples
Test data:
CREATE TABLE order_book (
ts TIMESTAMP,
bidSize1 DOUBLE, bid1 DOUBLE, bidSize2 DOUBLE, bid2 DOUBLE, bidSize3 DOUBLE, bid3 DOUBLE,
askSize1 DOUBLE, ask1 DOUBLE, askSize2 DOUBLE, ask2 DOUBLE, askSize3 DOUBLE, ask3 DOUBLE
) TIMESTAMP(ts) PARTITION BY DAY;
INSERT INTO order_book VALUES
('2024-05-22T09:40:15.006000Z', 40, 14.10, 47, 14.00, 39, 13.90, 54, 14.50, 36, 14.60, 23, 14.80),
('2024-05-22T09:40:15.175000Z', 42, 14.00, 45, 13.90, 35, 13.80, 16, 14.30, 57, 14.50, 30, 14.60),
('2024-05-22T09:40:15.522000Z', 36, 14.10, 38, 14.00, 31, 13.90, 30, 14.40, 47, 14.50, 34, 14.60);
Trading price of instrument when buying 100:
SELECT ts, L2PRICE(100, askSize1, ask1, askSize2, ask2, askSize3, ask3) AS buy FROM order_book;
ts | buy |
---|---|
2024-05-22T09:40:15.006000Z | 14.565999999999 |
2024-05-22T09:40:15.175000Z | 14.495 |
2024-05-22T09:40:15.522000Z | 14.493 |
Trading price of instrument when selling 100:
SELECT ts, L2PRICE(100, bidSize1, bid1, bidSize2, bid2, bidSize3, bid3) AS sell FROM order_book;
ts | sell |
---|---|
2024-05-22T09:40:15.006000Z | 14.027 |
2024-05-22T09:40:15.175000Z | 13.929 |
2024-05-22T09:40:15.522000Z | 14.01 |
The spread for target quantity 100:
SELECT ts, L2PRICE(100, askSize1, ask1, askSize2, ask2, askSize3, ask3)
- L2PRICE(100, bidSize1, bid1, bidSize2, bid2, bidSize3, bid3) AS spread FROM order_book;
ts | spread |
---|---|
2024-05-22T09:40:15.006000Z | 0.538999999999 |
2024-05-22T09:40:15.175000Z | 0.565999999999 |
2024-05-22T09:40:15.522000Z | 0.483 |
mid
mid(bid, ask)
- calculates the midpoint of a bidding price and asking price.
Returns null if either argument is NaN or null.
Parameters
bid
is any numeric bidding price value.ask
is any numeric asking price value.
Return value
Return value type is double
.
Examples
SELECT mid(1.5760, 1.5763)
mid |
---|
1.57615 |
regr_intercept
regr_intercept(y, x)
- Calculates the y-intercept of the linear regression line for the given numeric columns y (dependent variable) and x (independent variable).
- The function requires at least two valid (y, x) pairs to compute the intercept.
- If fewer than two pairs are available, the function returns null.
- Supported data types for x and y include
double
,float
, andinteger
types. - The
regr_intercept
function can be used with other statistical aggregation functions likeregr_slope
orcorr
. - The order of arguments in
regr_intercept(y, x)
matters.- Ensure that y is the dependent variable and x is the independent variable.
Calculation
The y-intercept of the regression line is calculated using the formula:
Where:
- is the mean of y values
- is the mean of x values
- is the slope calculated by
regr_slope(y, x)
Arguments
- y: A numeric column representing the dependent variable.
- x: A numeric column representing the independent variable.
Return value
Return value type is double
.
The function returns the y-intercept of the regression line, indicating the predicted value of y when x is 0.
Examples
Calculate the regression intercept between two variables
Using the same measurements table:
x | y |
---|---|
1.0 | 2.0 |
2.0 | 3.0 |
3.0 | 5.0 |
4.0 | 4.0 |
5.0 | 6.0 |
Calculate the y-intercept:
SELECT regr_intercept(y, x) AS y_intercept FROM measurements;
Result:
y_intercept |
---|
1.4 |
Or: When x is 0, y is predicted to be 1.4 units.
Calculate the regression intercept grouped by category
Using the same sales_data table:
category | advertising_spend | sales |
---|---|---|
A | 1000 | 15000 |
A | 2000 | 22000 |
A | 3000 | 28000 |
B | 1500 | 18000 |
B | 2500 | 26000 |
B | 3500 | 31000 |
SELECT category, regr_intercept(sales, advertising_spend) AS base_sales
FROM sales_data
GROUP BY category;
Result:
category | base_sales |
---|---|
A | 9500 |
B | 12000 |
Or:
- In category A, with no advertising spend, the predicted base sales are 9,500 units
- In category B, with no advertising spend, the predicted base sales are 12,000 units
Handling null values
The function ignores rows where either x or y is null:
SELECT regr_intercept(y, x) AS y_intercept
FROM (
SELECT 1 AS x, 2 AS y
UNION ALL SELECT 2, NULL
UNION ALL SELECT NULL, 4
UNION ALL SELECT 4, 5
);
Result:
y_intercept |
---|
1.4 |
Only the rows where both x and y are not null are considered in the calculation.
regr_slope
regr_slope(y, x)
- Calculates the slope of the linear regression line for the
given numeric columns y (dependent variable) and x (independent variable).
- The function requires at least two valid (x, y) pairs to compute the slope.
- If fewer than two pairs are available, the function returns null.
- Supported data types for x and y include
double
,float
, andinteger
types. - The regr_slope function can be used with other statistical aggregation
functions like
corr
orcovar_samp
. - The order of arguments in
regr_slope(y, x)
matters.- Ensure that y is the dependent variable and x is the independent variable.
Calculation
The slope of the regression line is calculated using the formula:
Where:
- is the number of valid data points.
- is the sum of the products of and .
- and is the sums of and values, respectively.
- is the sum of the squares of values.
Arguments
- y: A numeric column representing the dependent variable.
- x: A numeric column representing the independent variable.
Return value
Return value type is double
.
The function returns the slope of the regression line, indicating how much y changes for a unit change in x.
Examples
Calculate the regression slope between two variables
Suppose you have a table measurements with the following data:
x | y |
---|---|
1.0 | 2.0 |
2.0 | 3.0 |
3.0 | 5.0 |
4.0 | 4.0 |
5.0 | 6.0 |
You can calculate the slope of the regression line between y and x using:
SELECT regr_slope(y, x) AS slope FROM measurements;
Result:
slope |
---|
0.8 |
Or: The slope of 0.8 indicates that for each unit increase in x, y increases by 0.8 units on average.
Calculate the regression slope grouped by a category
Consider a table sales_data:
category | advertising_spend | sales |
---|---|---|
A | 1000 | 15000 |
A | 2000 | 22000 |
A | 3000 | 28000 |
B | 1500 | 18000 |
B | 2500 | 26000 |
B | 3500 | 31000 |
Calculate the regression slope of sales
versus advertising_spend
for each
category:
SELECT category, regr_slope(sales, advertising_spend) AS slope FROM sales_data
GROUP BY category;
Result:
category | slope |
---|---|
A | 8.5 |
B | 7.0 |
Or:
- In category A, for every additional unit spent on advertising, sales increase by 8.5 units on average.
- In category B, the increase is 7.0 units per advertising unit spent.
Handling null values
If your data contains null values, regr_slope()
will ignore those rows:
SELECT regr_slope(y, x) AS slope FROM ( SELECT 1 AS x, 2 AS y UNION ALL SELECT
2, NULL UNION ALL SELECT NULL, 4 UNION ALL SELECT 4, 5 );
Result:
slope |
---|
0.8 |
Only the rows where both x and y are not null are considered in the calculation.
Calculating beta
Assuming you have a table stock_returns
with daily returns for a specific
stock and the market index:
date | stock_return | market_return |
---|---|---|
2023-01-01 | 0.5 | 0.4 |
2023-01-02 | -0.2 | -0.1 |
2023-01-03 | 0.3 | 0.2 |
... | ... | ... |
Calculate the stock's beta coefficient:
SELECT regr_slope(stock_return, market_return) AS beta FROM stock_returns;
beta |
---|
1.2 |
Or: A beta of 1.2 suggests the stock is 20% more volatile than the market.
Remember: The order of arguments in regr_slope(y, x)
matters.
Ensure that y is the dependent variable and x is the independent variable.
spread_bps
spread_bps(bid, ask)
- calculates the quoted bid-ask spread, based on the
highest bidding price, and the lowest asking price.
The result is provided in basis points, and the calculation is:
Parameters
bid
is any numeric bidding price value.ask
is any numeric asking price value.
Return value
Return value type is double
.
Examples
SELECT spread_bps(1.5760, 1.5763)
spread_bps |
---|
1.903372140976 |
vwap
vwap(price, quantity)
- Calculates the volume-weighted average price (VWAP)
based on the given price and quantity columns. This is defined by the following
formula:
Parameters
price
is any numeric price value.quantity
is any numeric quantity value.
Return value
Return value type is double
.
Examples
SELECT vwap(x, x)
FROM (SELECT x FROM long_sequence(100));
vwap |
---|
67 |
wmid
wmid(bidSize, bidPrice, askPrice, askSize)
- calculates the weighted mid-price
for a sized bid/ask pair.
It is calculated with these formulae:
Parameters
bidSize
is any numeric value representing the size of the bid offer.bidPrice
is any numeric value representing the bidding price.askPrice
is any numeric value representing the asking price.askSize
is any numeric value representing the size of the ask offer.
Return value
Return value type is double
.
Examples
SELECT wmid(100, 5, 6, 100)
wmid |
---|
5.5 |