Using OLAP Analytic Features

Objectives

After completing this lesson, you will be able to:
  • Introduce OLAP analytic features.
  • Use SQL Group By features.
  • Use window framing in SQL.

OLAP Analytic Features

SQL is a set-oriented language. It is particularly effective when accessing a set of rows, as opposed to a single row at a time.

In most languages, complex data analysis requires that data be accessed in a loop, processing a row at a time. However, ANSI SQL-99 and 2003 include the following features that enable complex data analysis within a SELECT statement:

  • ANSI SQL-99: GROUP BY enhancements

  • ANSI SQL-2003: window framing

SAP HANA supports these SQL features.

GROUP BY Enhancements

Online Analytical Processing (OLAP) GROUP BY enhancements allow application developers to write complex SQL statements to generate valuable results:

  • To generate multiple groupings of data in a single SELECT statement, use GROUP BY GROUPING SETS.

  • To create a sparse multi-dimensional result set that contains multiple levels of aggregation, use GROUP BY ROLLUP.

  • To create a multi-dimensional cube as a result set, use GROUP BY CUBE.

These features are standard with ANSI SQL-99 and are supported by SAP HANA.

Group by Grouping Sets

The GROUPING SETS parameter of the GROUP BY clause generates multiple groupings of data in a single statement.

The following is an example of the syntax to use with the GROUPING SETS parameter:

Code Snippet
1234
SELECT BRAND, COLOR, AVG(HP) FROM TRAINING.CAR WHERE OWNER = 'H03' GROUP BY GROUPING SETS ((BRAND), (COLOR), (BRAND, COLOR));

This query returns the following result:

GROUP BY GROUPING SETS Result

 BRANDCOLORAVG(HP)
1Audi?116
2BMW?184
3Mercedes?170
4Renault?90
5Skoda?136
6VW?120
7?black142
8?blue150
9?red90
10?white170
11Audiblue116
12BMWblue184
13Mercedesblack170
14Mercedeswhite170
15Renaultred90
16Skodablack136
17VWblack120

Rows 1 to 6 of the output are grouped by BRAND.

Rows 7 to 10 of the output are grouped by COLOR.

Rows 11 to 17 of the output are grouped by BRAND and COLOR.

Group By Rollup

The ROLLUP parameter of the GROUP BY clause generates multiple levels of aggregation in a single statement.

The following is an example of the syntax to use with the ROLLUP parameter:

Code Snippet
1234
SELECT BRAND, COLOR, AVG(HP) FROM TRAINING.CAR WHERE OWNER = 'H03' GROUP BY ROLLUP (BRAND, COLOR);

This query returns the following result:

GROUP BY ROLLUP Result

 BRANDCOLORAVG(HP)
1Audiblue116
2BMWblue184
3Mercedesblack170
4Mercedeswhite170
5Renaultred90
6Skodablack136
7VWblack120
8Audi?116
9BMW?184
10Mercedes?170
11Renault?90
12Skoda?136
13VW?120
14??140.857142

Rows 1 to 7 of the output are grouped by BRAND and COLOR.

Rows 8 to 13 of the output are grouped by BRAND only, leaving out the additional level of detail that comes from breaking down by COLOR.

Row 14 of the output is the grand aggregate, leaving out all levels of detail.

Group By Cube

The CUBE parameter of the GROUP BY clause generates multiple levels of aggregation in every dimension.

The following is an example of the syntax to use with the CUBE parameter:

Code Snippet
1234
SELECT BRAND, COLOR, AVG(HP) FROM TRAINING.CAR WHERE OWNER = 'H03' GROUP BY CUBE (BRAND, COLOR);

This query returns the following result:

GROUP BY CUBE Result

 BRANDCOLORAVG(HP)
1Audiblue116
2BMWblue184
3Mercedesblack170
4Mercedeswhite170
5Renaultred90
6Skodablack136
7VWblack120
8?black142
9?blue150
10?red90
11?white170
12Audi?116
13BMW?184
14Mercedes?170
15Renault?90
16Skoda?136
17VW?120
18??140.857142

Rows 1 to 7 of the output are grouped by BRAND and COLOR.

Rows 8 to 11 of the output are grouped by COLOR only.

Rows 12 to 17 of the output are grouped by BRAND only.

Row 18 is the grand aggregate, without any grouping.

Queries that GROUP BY CUBE on three or more columns cause the volume of output to explode as every possible subset of the columns is separately calculated for grouped aggregates. For example, GROUP BY CUBE (A, B, C) would result in aggregates for:

  • Grouping on (A, B, C)
  • Grouping on (A, B)
  • Grouping on (A, C)
  • Grouping on (B, C)
  • Grouping on (A) only
  • Grouping on (B) only
  • Grouping on (C) only
  • Grand aggregate of all rows with no grouping

Window Partitioning

Window Framing

Window framing is an important SQL construct for OLAP operations. It enables users to perform the following tasks:

  • Divide query result sets into groups of rows called partitions.

  • Determine subsets of rows to aggregate with respect to the current row.

Window framing is standard with ANSI SQL:2003 and SAP HANA supports a variety of aggregate functions within window framing.

Window Frame Components

Window frames have the following three components:

  • Partitioning: the division of a user-specified result set (input rows) using the PARTITION BY clause.

  • Ordering: the arrangement of results (rows) within each partition using the WINDOW ORDER clause.

  • Framing: the definition of a moving frame of rows using the BETWEEN clause.

These components are defined within a SELECT statement using the OVER() clause.

Window Partitioning

Window partitioning enables the user to divide user-specified input rows using a PARTITION BY clause.

The partition is defined by one or more value expressions separated by commas. The expression list can consist of items such as columns, operations, and functions. The partition data is implicitly sorted and the default sort order is ascending.

Partitioning is optional. If a window partition clause is not specified, the input rows are treated as a single partition.

The following is an example of the syntax to use with the PARTITION BY clause:

Code Snippet
1
PARTITION BY <WINDOW PARTITION EXPRESSION LIST>

The following is an example of the syntax to use with the PARTITION BY clause within a SELECT statement:

Code Snippet
12
SELECT ... RANK() OVER (PARTITION BY customer_key ...

Window Ordering

Window ordering is the arrangement of results (rows) within each window partition using a WINDOW ORDER clause.

The expression list can consist of one or more expressions separated by commas. They consist of items such as columns, operations, and functions.

Ordering is optional. If a window ordering clause is not specified, the input rows can be processed in an arbitrary order.

The following is an example of the syntax to use with the WINDOW ORDER clause:

Code Snippet
1
<WINDOW ORDER CLAUSE> :: = <ORDER SPECIFICATION>

The following are examples of the syntax to use with the WINDOW ORDER clause within a SELECT statement:

Code Snippet
12345
SELECT ... RANK() OVER (ORDER BY sum(amount) [desc] ... SELECT ... OVER (ORDER BY customer_key ...

Window Ordering Example

The following is an example of the syntax to use with the WINDOW ORDER clause:

Code Snippet
12345
SELECT product_id, SUM(amount), RANK() OVER ( ORDER BY sum(amount) desc) AS RANK FROM TRAINING.SALES_DATA GROUP BY product_id ORDER BY RANK

This query returns the following result:

WINDOW ORDER Result

 PRODUCT_IDSUM(AMOUNT)RANK
12000212,813.81
2200087,8402
3200074,896.33
4200063,189.64
520004483.55
620003472.56

In this example, the data is sorted as follows:

  • Select rows from the SALES_DATA table.

  • Group rows by PRODUCT_ID.

  • Calculate the rank of each group based on the SUM(AMOUNT) aggregate.

  • Order the rows by SUM(AMOUNT) descending.

  • List the PRODUCT_ID, SUM(AMOUNT), and RANK().

Window Ordering and Partitioning Example

The following is an example of the syntax to use with window ordering and partitioning:

Code Snippet
12345
SELECT customer_id, product_id, SUM(amount), RANK() OVER (PARTITION BY customer_id ORDER BY SUM(amount) desc) AS RANK FROM TRAINING.SALES_DATA GROUP BY customer_id, product_id ORDER BY customer_id, rank

This query returns the following result:

Window Ordering and Partitioning Result

 CUSTOMER_IDPRODUCT_IDSUM(AMOUNT)RANK
11000200022,876.41
21000200069002
3100020007564.33
4100020003112.54
510002000467.55
61001200084,7921
71001200071,6722
81001200021,5983
91001200041504
10100120003505
113000200027,322.41
123000200062,289.62
133000200071,5963
143000200032794
153000200041715
163001200083,0481
173001200071,0642
183001200021,0173
19300120004954
20300120003315

In this example, the data is sorted as follows:

  • Select rows from the SALES_DATA table.

  • Partition data by CUSTOMER_ID.

  • When the CUSTOMER_ID value changes, reset the RANK calculation.

  • Within each partition, calculate the rank of each month based on the SUM(AMOUNT).

  • Within each partition, order the rows by SUM(AMOUNT), in descending order.

  • List the CUSTOMER_ID, revenue, and RANK.

  • Order the rows by CUSTOMER_ID and RANK.

Window Framing

Window framing defines the beginning and the ending of the window relative to the current row.

This OLAP function is computed with respect to the contents of a moving frame rather than the complete contents of the input data. Depending on its definition, a partition has a start row and an end row, and the window frame slides from the starting point to the end of the partition.

Window framing can be used for all windows functions except ranking aggregates.

Defining a Window Frame

There are two ways to define a window frame:

  • By rows: specify a number of rows before and/or after the current row.

  • By range: use data values before and/or after the current row.

A window frame uses the BETWEEN keyword to specify the beginning and the end of the frame.

The following is an example of the syntax to use when defining a window frame by rows:

Code Snippet
1
ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW

The following is an example of the syntax to use when defining a window frame by range:

Code Snippet
1
RANGE BETWEEN UNBOUNDED PRECEDING and CURRENT ROW

The row clause refers to the specific number of rows to be included in the window frame. The range clause refers the data values of the column in the ORDER BY clause of the window. In both cases, specific values or keywords may be used.

Range and Row Extension

Ranges and rows can extend between the following:

  • CURRENT ROW, meaning the current row in the query’s result set

  • UNBOUNDED PRECEDING, meaning all rows back to the beginning of the partition

  • UNBOUNDED FOLLOWING, meaning all rows forward to the end of the partition

  • N PRECEDING, where n is the number of rows or range of values before the current row

  • N FOLLOWING, where n is the number of rows or range of values after the current row

Default Window Frame

If the framing clause is omitted, the default window frame is as follows:

  • When the window specification includes an ORDER BY clause, the rows are between unbounded preceding and current row.

  • When the window specification does not include an ORDER BY clause, the rows are between unbounded preceding and unbounded following.

  • If the window frame extent specifies only one of these two values, the other value defaults to current row.

The following is an example of window framing:

Window Framing Example

Column ValueRowsRange
10Unbounded PrecedingUnbounded Preceding
221 Preceding8 Preceding
30CURRENT ROW 
351 Following5 Following
50Unbounded FollowingUnbounded Following

The table summarizes the result returned for each framing clause:

Window Framing Example Result

Framing ClauseResult
Rows betweenunbounded preceding and current row10, 22, and 30
1 preceding and current row22 and 30
1 preceding and 1 following22, 30, and 35
unbounded preceding and unbounded followingAll rows
1 preceding and 1 preceding22
Range betweenunbounded preceding and current row10, 22, and 30
1 preceding and current row30
10 preceding and current row22 and 30
10 preceding and 10 following22, 30, and 35
current row and current row30

Cumulative Total Example

The following is an example of the syntax to use to return a cumulative total:

Code Snippet
1234
SELECT product_id, amount, SUM(amount) OVER (ORDER BY product_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "CUMULATIVE TOTAL" FROM TRAINING.SALES_DATA ORDER BY product_id;

This query returns the following result:

Cumulative Total Example Result

 PRODUCT_IDAMOUNTCUMULATIVE TOTAL
1200021,0171,017
2200027,322.48,339.4
3200021,5989,937.4
4200022,876.412,813.8
5200033112,844.8
6200035012,894.8
720003112.513,007.3
82000327913,286.3
92000417113,769.8
102000415013,598.8
112000467.513,448.8
12200049513,381.3
13200062,289.616,059.4
142000690016,959.4
15200071,59618,555.4
16200071,67220,227.4
1720007564.320,791.7
18200071,06421,855.7
19200084,79226,647.7
20200083,04829,695.7

This example calculates the cumulative total of amounts for all products. For each row, the amount column is summed from the first row (unbounded preceding) to the current row. For example, the cumulative total for row 2 (8,339.4) is calculated by adding the amount for row 2 (1,017) to the amount for row 1 (7,322.4).

Cumulative Total and Partitioning Example

The following is an example of the syntax to use to return a cumulative total with partitioning:

Code Snippet
12345
SELECT customer_id, product_id, amount, SUM(amount) OVER (PARTITION BY customer_id ORDER BY product_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "CUMULATIVE TOTAL" FROM TRAINING.SALES_DATA ORDER BY customer_id, product_id;

This query returns the following result:

Cumulative Total With Partitioning Example Result

 CUSTOMER_IDPRODUCT_IDAMOUNTCUMULATIVE TOTAL
11000200022,876.42,876.4
2100020003112.52,988.9
310002000467.53,056.4
41000200069003,956.4
5100020007564.34,520.7
61001200021,5981,598
7100120003501,648
81001200041501,798
91001200071,6723,470
101001200084,7928,262
113000200027,322.47,322.4
123000200032797,601.4
133000200041717,772.4
143000200062,289.610,062
153000200071,59611,658
163001200021,0171,017
17300120003311,048
18300120004951,143
193001200071,0642,207
203001200083,0485,255

This example calculates the cumulative total for the first three months on a customer-by-customer basis. When the customer ID value increments, the cumulative total is reset.

Moving Average Example

The following is an example of the syntax to use to return a moving average:

Code Snippet
1234
SELECT order_id, amount, ROUND(AVG(amount) OVER (ORDER BY order_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)) AS "MOVING AVG" FROM TRAINING.BII_ORDERS ORDER BY order_id;

This query returns the following result:

Moving Average Example Result

 ORDER_IDAMOUNTMOVING AVG
129,4012,4522,452
229,4023,372.72,912
329,4037,2664,364
429,4041,1353,925
529,4053272,909
629,4063,5391,667
729,4072,0781,981
829,4081,2852,301
929,4092,6682,010
1029,4103,9542,636
1129,4114,8803,834
1229,4122,6123,815
1329,4136,7124,735
1429,4147,0335,452
1529,4151,3445,030

This example calculates the moving average of the amounts for the previous two orders and the current order. This is a three-order moving average.

For rows 1 and 2, there are not yet three orders to average together. In row 2, the average of row 1 and 2 is calculated. It is only from row 3 on that the average over three rows is calculated.

Note

The values in the MOVING AVG column have been rounded up to avoid unsightly decimal places.

Difference Example

The following is an example of the syntax to use to return a difference:

Code Snippet
12345
SELECT stock, period, stock_price, stock_price – sum(stock_price) OVER (ORDER BY period ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)) AS "DIFFERENCE" FROM TRAINING.STOCK_PRICES WHERE stock = ‘SAP’ ORDER BY stock, period;

This query returns the following result:

Difference Example Result

 STOCKPERIODSTOCK_PRICEDIFFERENCE
1SAP1/201792.52?
2SAP2/201794.642.12
3SAP3/201798.173.53
4SAP4/2017100.182.01
5SAP5/2017108.608.42
6SAP6/2017104.67-3.93
7SAP7/2017107.362.69
8SAP8/2017105.34-2.02
9SAP9/2017109.654.31
10SAP10/2017114.915.26
11SAP11/2017111.17-3.74
12SAP12/2017112.581.41

This example displays the current stock price and the difference between the current price and the previous price.

The difference is calculated by subtracting the stock price of the previous row from the stock price of the current row. A positive value indicates an increase and a negative value indicates a decrease.

In row 1, as there is no previous row value to subtract, a null value (?) is displayed in the DIFFERENCE column.