Postgres Window Functions

So, you have this kind of data in the PostgresSQL database and you need to select last change that happened to email for the last 2 hours:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
users=# SELECT * from users;
 version | user_id |        email        |          last_update
---------+---------+---------------------+-------------------------------
       1 | mike    |                     | 2022-02-04 19:27:25.621633+00
       2 | mike    |                     | 2022-02-04 19:27:40.912754+00
       3 | tom     |                     | 2022-02-04 19:28:25.246051+00
       4 | david   |                     | 2022-02-04 19:28:30.097674+00
       5 | ann     |                     | 2022-02-04 19:28:35.561786+00
       6 | david   |                     | 2022-02-04 19:28:51.056749+00
       7 | tom     |                     | 2022-02-04 19:28:53.749841+00
       8 | ann     |                     | 2022-02-04 19:28:55.668205+00
       9 | mike    |                     | 2022-02-04 19:29:07.120897+00
      10 | mike    | mik@example.org     | 2022-02-04 19:30:51.29083+00
      11 | tom     | tom123@example.org  | 2022-02-04 19:31:30.433945+00
      12 | david   | david_s@example.org | 2022-02-04 19:32:45.260783+00
      13 | mike    | mike@example.org    | 2022-02-04 19:32:52.007771+00
      14 | tom     | tom2000@example.org | 2022-02-04 19:33:02.616984+00
      15 | ann     |                     | 2022-02-04 19:33:17.054711+00
      16 | tom     | tom2100@example.org | 2022-02-04 19:44:17.080066+00
      17 | tom     | tom2200@example.org | 2022-02-04 19:44:29.755892+00
      18 | tom     | tom2200@example.org | 2022-02-04 20:10:00.587856+00
      19 | tom     | tom2200@example.org | 2022-02-04 20:10:01.712654+00
(19 rows)

We will start with finding previous email. Postgresql window functions should work nicely here:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
users=# SELECT version, user_id, email, last_update,
users-# LEAD(email) OVER (ORDER BY last_update DESC) AS prev_email FROM users;
 version | user_id |        email        |          last_update          |     prev_email
---------+---------+---------------------+-------------------------------+---------------------
      19 | tom     | tom2200@example.org | 2022-02-04 20:10:01.712654+00 | tom2200@example.org
      18 | tom     | tom2200@example.org | 2022-02-04 20:10:00.587856+00 | tom2200@example.org
      17 | tom     | tom2200@example.org | 2022-02-04 19:44:29.755892+00 | tom2100@example.org
      16 | tom     | tom2100@example.org | 2022-02-04 19:44:17.080066+00 |
      15 | ann     |                     | 2022-02-04 19:33:17.054711+00 | tom2000@example.org
      14 | tom     | tom2000@example.org | 2022-02-04 19:33:02.616984+00 | mike@example.org
      13 | mike    | mike@example.org    | 2022-02-04 19:32:52.007771+00 | david_s@example.org
      12 | david   | david_s@example.org | 2022-02-04 19:32:45.260783+00 | tom123@example.org
      11 | tom     | tom123@example.org  | 2022-02-04 19:31:30.433945+00 | mik@example.org
      10 | mike    | mik@example.org     | 2022-02-04 19:30:51.29083+00  |
       9 | mike    |                     | 2022-02-04 19:29:07.120897+00 |
       8 | ann     |                     | 2022-02-04 19:28:55.668205+00 |
       7 | tom     |                     | 2022-02-04 19:28:53.749841+00 |
       6 | david   |                     | 2022-02-04 19:28:51.056749+00 |
       5 | ann     |                     | 2022-02-04 19:28:35.561786+00 |
       4 | david   |                     | 2022-02-04 19:28:30.097674+00 |
       3 | tom     |                     | 2022-02-04 19:28:25.246051+00 |
       2 | mike    |                     | 2022-02-04 19:27:40.912754+00 |
       1 | mike    |                     | 2022-02-04 19:27:25.621633+00 |
(19 rows)

Using LEAD() function we got prev_email column that contains email value of previous row. But it is far from acceptable result because multiple users are mixed up in the same table. You don’t want to compare apples to oranges, right?

Let’s add grouping by user_id using PARTITION BY option:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
users=# SELECT version, user_id, email, last_update,
users-# LEAD(email) OVER (PARTITION BY user_id ORDER BY last_update DESC) AS prev_email FROM users;
 version | user_id |        email        |          last_update          |     prev_email
---------+---------+---------------------+-------------------------------+---------------------
      15 | ann     |                     | 2022-02-04 19:33:17.054711+00 |
       8 | ann     |                     | 2022-02-04 19:28:55.668205+00 |
       5 | ann     |                     | 2022-02-04 19:28:35.561786+00 |
      12 | david   | david_s@example.org | 2022-02-04 19:32:45.260783+00 |
       6 | david   |                     | 2022-02-04 19:28:51.056749+00 |
       4 | david   |                     | 2022-02-04 19:28:30.097674+00 |
      13 | mike    | mike@example.org    | 2022-02-04 19:32:52.007771+00 | mik@example.org
      10 | mike    | mik@example.org     | 2022-02-04 19:30:51.29083+00  |
       9 | mike    |                     | 2022-02-04 19:29:07.120897+00 |
       2 | mike    |                     | 2022-02-04 19:27:40.912754+00 |
       1 | mike    |                     | 2022-02-04 19:27:25.621633+00 |
      19 | tom     | tom2200@example.org | 2022-02-04 20:10:01.712654+00 | tom2200@example.org
      18 | tom     | tom2200@example.org | 2022-02-04 20:10:00.587856+00 | tom2200@example.org
      17 | tom     | tom2200@example.org | 2022-02-04 19:44:29.755892+00 | tom2100@example.org
      16 | tom     | tom2100@example.org | 2022-02-04 19:44:17.080066+00 | tom2000@example.org
      14 | tom     | tom2000@example.org | 2022-02-04 19:33:02.616984+00 | tom123@example.org
      11 | tom     | tom123@example.org  | 2022-02-04 19:31:30.433945+00 |
       7 | tom     |                     | 2022-02-04 19:28:53.749841+00 |
       3 | tom     |                     | 2022-02-04 19:28:25.246051+00 |
(19 rows)

That’s better already, you can see user groups. But we need to return last changes per user somehow. We have another window function that can be of a help:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
users=# SELECT version, user_id, email, last_update,
users-# RANK() OVER (PARTITION BY user_id ORDER BY last_update DESC) AS rank_user_id,
users-# LEAD(email) OVER (PARTITION BY user_id ORDER BY last_update DESC) AS prev_email FROM users;
 version | user_id |        email        |          last_update          | rank_user_id |     prev_email      
---------+---------+---------------------+-------------------------------+--------------+---------------------
      15 | ann     |                     | 2022-02-04 19:33:17.054711+00 |            1 | 
       8 | ann     |                     | 2022-02-04 19:28:55.668205+00 |            2 | 
       5 | ann     |                     | 2022-02-04 19:28:35.561786+00 |            3 | 
      12 | david   | david_s@example.org | 2022-02-04 19:32:45.260783+00 |            1 | 
       6 | david   |                     | 2022-02-04 19:28:51.056749+00 |            2 | 
       4 | david   |                     | 2022-02-04 19:28:30.097674+00 |            3 | 
      13 | mike    | mike@example.org    | 2022-02-04 19:32:52.007771+00 |            1 | mik@example.org
      10 | mike    | mik@example.org     | 2022-02-04 19:30:51.29083+00  |            2 | 
       9 | mike    |                     | 2022-02-04 19:29:07.120897+00 |            3 | 
       2 | mike    |                     | 2022-02-04 19:27:40.912754+00 |            4 | 
       1 | mike    |                     | 2022-02-04 19:27:25.621633+00 |            5 | 
      19 | tom     | tom2200@example.org | 2022-02-04 20:10:01.712654+00 |            1 | tom2200@example.org
      18 | tom     | tom2200@example.org | 2022-02-04 20:10:00.587856+00 |            2 | tom2200@example.org
      17 | tom     | tom2200@example.org | 2022-02-04 19:44:29.755892+00 |            3 | tom2100@example.org
      16 | tom     | tom2100@example.org | 2022-02-04 19:44:17.080066+00 |            4 | tom2000@example.org
      14 | tom     | tom2000@example.org | 2022-02-04 19:33:02.616984+00 |            5 | tom123@example.org
      11 | tom     | tom123@example.org  | 2022-02-04 19:31:30.433945+00 |            6 | 
       7 | tom     |                     | 2022-02-04 19:28:53.749841+00 |            7 | 
       3 | tom     |                     | 2022-02-04 19:28:25.246051+00 |            8 | 
(19 rows)

RANK() function allows to filter out older rows that are not required and do this per user group. We will use rank_user_id column to select required changes. Subquery is used to filter the data we need by rank, changes in email and last_update timestamp:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
users=# SELECT version, user_id, email, prev_email, last_update
FROM (
SELECT version, user_id, email, last_update,
RANK() OVER (PARTITION BY user_id ORDER BY last_update DESC) AS rank_user_id,
LEAD(email) OVER (PARTITION BY user_id ORDER BY last_update DESC) AS prev_email FROM users
) dt
WHERE dt.rank_user_id < 2 AND email IS NOT NULL
AND email IS DISTINCT FROM prev_email
AND last_update::timestamp > (now() - interval '2 hour')
ORDER BY last_update DESC;
 version | user_id |        email        |   prev_email    |          last_update
---------+---------+---------------------+-----------------+-------------------------------
      13 | mike    | mike@example.org    | mik@example.org | 2022-02-04 19:32:52.007771+00
      12 | david   | david_s@example.org |                 | 2022-02-04 19:32:45.260783+00
(2 rows)

PostgreSQL has other window functions, more can be found here: https://www.postgresql.org/docs/current/tutorial-window.html

Licensed under CC BY-NC-SA 4.0
Built with Hugo
Theme Stack designed by Jimmy