In this post, I am going to demonstrate how to calculate a moving average in SQL.
You also need to know a bout SQL Window Functions. If you don't, you should read my article here
To illustrate how to calculate a moving average, I'll use the DVDrental database that can be downloaded here
We have a payment table storing all the payments made by our customers.
# \d+ payment
Table "public.payment"
Column | Type |
--------------+-----------------------------+
payment_id | integer |
customer_id | smallint |
staff_id | smallint |
rental_id | integer |
amount | numeric(5,2) |
payment_date | timestamp without time zone |
I want to calculate the moving average of the amount over a 5 days period.
First, I calculate the total amount per day in a Common Table Expression (CTE) using a WITH
statement.
WITH daily_amount AS (
SELECT
payment_date::date AS payment_date
, SUM(amount) AS amount
FROM
payment
GROUP BY
payment_date::date
)
Next, I calculate the moving average on the amount using a window function:
SELECT
payment_date
, amount
, AVG(amount) OVER(ROWS 5 PRECEDING) AS MA5
FROM
daily_amount
;
payment_date | amount | ma5
--------------+---------+-----------------------
2007-02-14 | 116.73 | 116.7300000000000000
2007-02-19 | 1290.90 | 703.8150000000000000
2007-02-20 | 1219.09 | 875.5733333333333333
2007-03-19 | 2617.69 | 1311.1025000000000000
2007-04-26 | 347.21 | 1118.3240000000000000
2007-04-08 | 2227.84 | 1303.2433333333333333
2007-02-15 | 1188.92 | 1481.9416666666666667
2007-04-28 | 2622.73 | 1703.9133333333333333
SQL returns an average for the first four rows. When we calculate a moving average, the first rows should be empty because we don't have enough values to calculate the average.
In my example, I need to return Null for the first four rows. To do so, I'll use a CASE WHEN ELSE
statement. This statement is the SQL equivalent of if then else
.
I also use the ROW_NUMBER()
window function to get the number of each row of the result set. Then if ROW_NUMBER()
is more than 4 then the query should return the average of the amount of the last 5 rows. If the ROW_NUMBER()
is less than 4, the query should return Null
.
Finaly, I round the average to two decimals using the ROUND()
function.
This is how I write the CASE
statement:
CASE
WHEN ROW_NUMBER() OVER() > 4 THEN ROUND(AVG(amount) OVER(ROWS 5 PRECEDING), 2)
ELSE Null
END AS MA5
The Final query looks like this:
WITH cte_daily_amount AS (
SELECT
payment_date::date AS payment_date
, SUM(amount) AS amount
FROM
payment
GROUP BY
payment_date::date
)
SELECT
payment_date
, amount
, CASE
WHEN ROW_NUMBER() OVER() > 4 THEN ROUND(AVG(amount) OVER(ROWS 5 PRECEDING), 2)
ELSE Null
END AS MA5
FROM
cte_daily_amount
;
payment_date | amount | ma5
---------------+---------+---------
2007-02-14 | 116.73 |
2007-02-19 | 1290.90 |
2007-02-20 | 1219.09 |
2007-03-19 | 2617.69 |
2007-04-26 | 347.21 | 1118.32
2007-04-08 | 2227.84 | 1303.24
2007-02-15 | 1188.92 | 1481.94
2007-04-28 | 2622.73 | 1703.91
[...]