In this post, I am going to demonstrate how to calculate a moving average in SQL.

You also need to know a bout SQL Window Functions. If you don't, you should read my article here

To illustrate how to calculate a moving average, I'll use the DVDrental database that can be downloaded here

We have a payment table storing all the payments made by our customers.

# \d+ payment
          Table "public.payment"
    Column    |            Type             |
--------------+-----------------------------+
 payment_id   | integer                     |
 customer_id  | smallint                    |
 staff_id     | smallint                    |
 rental_id    | integer                     |
 amount       | numeric(5,2)                |
 payment_date | timestamp without time zone |

I want to calculate the moving average of the amount over a 5 days period.

First, I calculate the total amount per day in a Common Table Expression (CTE) using a WITH statement.

WITH daily_amount AS (
    SELECT 
        payment_date::date AS payment_date
        , SUM(amount) AS amount
    FROM 
        payment
    GROUP BY
        payment_date::date
)

Next, I calculate the moving average on the amount using a window function:

SELECT 
    payment_date
    , amount
    , AVG(amount) OVER(ROWS 5 PRECEDING) AS MA5
FROM 
    daily_amount
;
 payment_date | amount  |          ma5          
--------------+---------+-----------------------
 2007-02-14   |  116.73 |  116.7300000000000000
 2007-02-19   | 1290.90 |  703.8150000000000000
 2007-02-20   | 1219.09 |  875.5733333333333333
 2007-03-19   | 2617.69 | 1311.1025000000000000
 2007-04-26   |  347.21 | 1118.3240000000000000
 2007-04-08   | 2227.84 | 1303.2433333333333333
 2007-02-15   | 1188.92 | 1481.9416666666666667
 2007-04-28   | 2622.73 | 1703.9133333333333333

SQL returns an average for the first four rows. When we calculate a moving average, the first rows should be empty because we don't have enough values to calculate the average. In my example, I need to return Null for the first four rows. To do so, I'll use a CASE WHEN ELSE statement. This statement is the SQL equivalent of if then else. I also use the ROW_NUMBER() window function to get the number of each row of the result set. Then if ROW_NUMBER() is more than 4 then the query should return the average of the amount of the last 5 rows. If the ROW_NUMBER() is less than 4, the query should return Null. Finaly, I round the average to two decimals using the ROUND() function. This is how I write the CASE statement:

CASE 
    WHEN ROW_NUMBER() OVER() > 4 THEN ROUND(AVG(amount) OVER(ROWS 5 PRECEDING), 2)
    ELSE Null
END AS MA5 

The Final query looks like this:

  WITH cte_daily_amount AS (
    SELECT 
        payment_date::date AS payment_date
        , SUM(amount) AS amount
    FROM 
        payment
    GROUP BY
        payment_date::date
)
 SELECT 
    payment_date
    , amount
    , CASE 
        WHEN ROW_NUMBER() OVER() > 4 THEN ROUND(AVG(amount) OVER(ROWS 5 PRECEDING), 2)
        ELSE Null
      END AS MA5 
 FROM 
    cte_daily_amount
 ;
  payment_date | amount  |   ma5   
---------------+---------+---------
 2007-02-14    |  116.73 |        
 2007-02-19    | 1290.90 |        
 2007-02-20    | 1219.09 |        
 2007-03-19    | 2617.69 |        
 2007-04-26    |  347.21 | 1118.32
 2007-04-08    | 2227.84 | 1303.24
 2007-02-15    | 1188.92 | 1481.94
 2007-04-28    | 2622.73 | 1703.91
[...]