SQL 定长窗口(Sliding Window)是一种使用窗口函数来处理一段固定范围内的数据。这种方式可以对一定范围内的数据进行聚合或分析,并且窗口会随着数据的行逐步滑动。
在 SQL 中,窗口函数常与 OVER()
子句一起使用,定义一个窗口的大小和范围。定长窗口可以根据行数或时间范围进行滑动,下面我将详细讲解定长窗口的语法及用法,并举例说明。
1. 基本语法
窗口函数的语法:
<window_function> OVER ([PARTITION BY partition_column][ORDER BY order_column][ROWS or RANGE BETWEEN <frame_start> AND <frame_end>]
)
<window_function>
:如SUM()
,AVG()
,COUNT()
,ROW_NUMBER()
,RANK()
等。PARTITION BY
:用于将数据按照某一列分组(类似GROUP BY
的作用)。ORDER BY
:用于定义窗口中的排序规则,窗口会基于这个顺序进行计算。ROWS or RANGE BETWEEN
:用于定义窗口的范围。
-
ROWS
:基于行数定义窗口长度。RANGE
:基于值(如时间、数值)定义窗口长度。BETWEEN <frame_start> AND <frame_end>
:指定窗口的起点和终点。
2. 定长窗口的类型
A. 基于行数的定长窗口
- 使用
ROWS BETWEEN
语法,窗口根据行数定义长度。例如,当前行和前 4 行构成一个 5 行的窗口。
示例:计算当前行与前 4 行的销售总和(包括当前行)。
SELECTorder_id,order_amount,SUM(order_amount) OVER (ORDER BY order_id ROWS BETWEEN 4 PRECEDING (AND CURRENT ROW)-- AND CURRENT ROW可以省略) AS rolling_sum
FROMOrders;
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
:表示当前行和前 4 行一起计算(共 5 行的数据)。
B. 基于时间范围的定长窗口
- 使用
RANGE BETWEEN
语法,窗口根据时间范围定义长度。例如,计算当前行及前 7 天的数据。
示例:计算过去 7 天的销售总和(包括当天)。
SELECTorder_date,order_amount,SUM(order_amount) OVER (ORDER BY order_date RANGE BETWEEN INTERVAL 7 DAY PRECEDING (AND CURRENT ROW)-- AND CURRENT ROW可以省略) AS rolling_sum
FROMOrders;
RANGE BETWEEN INTERVAL 7 DAY PRECEDING AND CURRENT ROW
:表示从当前行往前 7 天的范围,计算销售额总和。
3. 详细语法解析
A. PARTITION BY
PARTITION BY
用于将数据划分为多个窗口,每个分区独立计算窗口函数结果。类似于GROUP BY
,但它不会聚合数据,只是划分数据。
示例:根据 region
划分不同的分区,并计算每个分区内前 4 行的滚动平均值。
SELECTregion,order_id,order_amount,AVG(order_amount) OVER (PARTITION BY region ORDER BY order_id ROWS BETWEEN 4 PRECEDING AND CURRENT ROW -- AND CURRENT ROW可以省略) AS rolling_avg
FROMOrders;
PARTITION BY region
:将数据按照region
列进行分区,每个分区单独计算滚动平均值。
B. ORDER BY
ORDER BY
用于定义窗口的排序方式。窗口函数会按照指定列的顺序滑动窗口,逐行计算。
示例:按 order_date
排序,并计算当前行和前 4 行的销售总和。
SELECTorder_id,order_date,order_amount,SUM(order_amount) OVER (ORDER BY order_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW -- AND CURRENT ROW可以省略) AS rolling_sum
FROMOrders;
C. ROWS BETWEEN 和 RANGE BETWEEN
ROWS BETWEEN
:基于行的偏移量定义窗口的范围。例如,ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
表示当前行及之前 4 行的数据构成窗口。RANGE BETWEEN
:基于值或时间间隔定义窗口的范围。例如,RANGE BETWEEN INTERVAL 7 DAY PRECEDING AND CURRENT ROW
表示过去 7 天(包括当天)构成窗口。
4. 举例说明
示例 1:基于行数的滚动总和(滑动窗口)
计算每个订单及其前 2 行的滚动总和。
SELECTorder_id,order_amount,SUM(order_amount) OVER (ORDER BY order_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW -- AND CURRENT ROW可以省略) AS rolling_sum
FROMOrders;
结果:
order_id | order_amount | rolling_sum |
1 | 100 | 100 |
2 | 200 | 300 |
3 | 150 | 450 |
4 | 250 | 600 |
示例 2:基于时间的滚动平均值
计算每一天及其前 7 天的销售平均值。
SELECTorder_date,order_amount,AVG(order_amount) OVER (ORDER BY order_date RANGE BETWEEN INTERVAL 7 DAY PRECEDING AND CURRENT ROW -- AND CURRENT ROW可以省略) AS rolling_avg
FROMOrders;
结果:
order_date | order_amount | rolling_avg |
2023-01-01 | 100 | 100 |
2023-01-02 | 200 | 150 |
2023-01-03 | 150 | 150 |
2023-01-08 | 250 | 175 |
示例 3:分区内计算滚动总和
按 region
分区,并计算每个分区内订单的滚动总和。
SELECTregion,order_id,order_amount,SUM(order_amount) OVER (PARTITION BY region ORDER BY order_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW -- AND CURRENT ROW可以省略) AS rolling_sum
FROMOrders;
结果:
region | order_id | order_amount | rolling_sum |
North | 1 | 100 | 100 |
North | 2 | 200 | 300 |
North | 3 | 150 | 450 |
South | 4 | 250 | 250 |
South | 5 | 300 | 550 |
结束点的几种方式:
1. 指定窗口结束点为当前行之后的第 N 行 (N FOLLOWING
)
例如:计算当前行和接下来 2 行的总和。
SELECTorder_id,order_amount,SUM(order_amount) OVER (ORDER BY order_id ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS rolling_sum
FROMOrders;
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
:从当前行到接下来的第 2 行,共 3 行的数据。
2. 指定窗口结束点为无界后 (UNBOUNDED FOLLOWING
)
这个语法用于定义从当前行开始,一直扩展到表的最后一行。
SELECTorder_id,order_amount,SUM(order_amount) OVER (ORDER BY order_id ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS total_sum
FROMOrders;
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
:从当前行到表的最后一行。
3. 指定窗口结束点为无界之前 (UNBOUNDED PRECEDING
)
用于定义从第一行开始,一直到当前行。
SELECTorder_id,order_amount,SUM(order_amount) OVER (ORDER BY order_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sum
FROMOrders;
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
:从第一行到当前行。
4. 指定窗口结束点为 N 行之前 (N PRECEDING
)
例如:计算当前行之前的 5 行数据。
SELECTorder_id,order_amount,SUM(order_amount) OVER (ORDER BY order_id ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS prev_sum
FROMOrders;
ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING
:从当前行之前的第 5 行到之前的第 1 行(不包含当前行)。
总结:
- 定长窗口 可以基于行数或时间范围定义,适用于滚动总和、滚动平均等场景。
- 使用
ROWS BETWEEN
可以精确控制行数的范围,使用RANGE BETWEEN
可以基于数值或时间范围定义窗口。 PARTITION BY
和ORDER BY
是常见的窗口函数参数,用于分区和排序数据。
这些功能使得 SQL 能够轻松处理数据的复杂分析任务。