Aggregating Data

Objectives

After completing this lesson, you will be able to:
  • List the most important aggregate functions supported by HANA and use them to determine aggregated values on table columns.
  • Calculate grouped aggregates.
  • Filter groups using the HAVING clause.

Calculations Across Multiple Rows: Aggregation

Function expressions allow values within columns to be calculated based on values within other columns on a row-by-row basis. But in many cases, you may want to calculate a single value across the values of multiple rows in one or more columns. You can use aggregate expressions for this purpose.

The following table lists a selection of commonly-used aggregate functions supported by SAP HANA. For other, less-commonly used aggregate functions, please refer to SAP HANA documentation.

Aggregate Expressions Supported by SAP HANA

Aggregate NameDescription
COUNTReturns the number of rows.
MIN, MAXReturns the minimum or maximum value of an input column with a numeric data type.
SUM, AVGReturns the sum or the arithmetic mean value of an input column with a numeric data type.

COUNT(*) Expression

You can calculate the number of rows in the result set using COUNT(*). Rows containing only NULL values are included.

What is the quantity of cars with the brand Audi?

Code Snippet
123
SELECT COUNT(*) FROM Car WHERE Brand = 'Audi';
COUNT(*)
4

COUNT(<column>) Expressions

You can calculate the number of values within a single column. NULL values are not included.

You can only use a single column as parameter of COUNT().

  • How many cars are registered to an owner?
  • This does not calculate the number of different owners!
Code Snippet
12
SELECT COUNT(Owner) FROM Car;
COUNT(OWNER)
18

COUNT(DISTINCT <column>) Expression

You can calculate the number of distinct, non-NULL values of a certain column. You can use only a single column as parameter for COUNT (DISTINCT). NULL values are not included.

How many different owners have a registered car?

Code Snippet
12
SELECT COUNT(DISTINCT Owner) FROM Car;
COUNT(DISTINCT OWNER)
8

Differences between the COUNT Expressions

SQL queries with tabular results, showing the differences between the COUNT expressions.

The DISTINCT modifier may also be used with the SUM or AVERAGE aggregates, but not with the MAX or MIN aggregates.

MIN and MAX Expressions

You can calculate the minimum or maximum value in a column. What is the horsepower range of the registered cars?

Code Snippet
12
SELECT MIN(HP), MAX(HP) FROM Car;
MIN(HP)MAX(HP)
75260

Combining Aggregate and Function Expressions

You can combine aggregate expressions and normal functions. In which year was the youngest owner born?

Code Snippet
12
SELECT MAX(YEAR(Birthday)) AS Year FROM Owner;
Code Snippet
12
SELECT YEAR(MAX(Birthday)) AS Year FROM Owner;
YEAR
1986

Note that in general, the order in which functions and aggregate expressions are nested matters. As an example, the following two SELECT statements lead to different results:

Code Snippet
12
SELECT ABS( MAX(0-HP) ) FROM Car; -- Result: 75 SELECT MAX( ABS(0-HP) ) FROM Car; -- Result: 260

Aggregate Expressions

The WHERE clause may be included with aggregate functions. Aggregates are evaluated after the WHERE condition.

What is the horsepower range of BMWs?

Code Snippet
123
SELECT MIN(HP), MAX(HP) FROM Car WHERE Brand = 'BMW';
MIN(HP)MAX(HP)
140184

SUM(<column>) and AVG(<column>) Expressions

You can add up the values in a column or determine their arithmetic mean. The WHERE clause is evaluated before aggregation.

Individual NULL values contained in the column do not result in NULL for the sum or average (as long as there is at least one non-NULL value).

What is the sum and the mean of horsepower for all Audis?

Code Snippet
123
SELECT SUM(HP), AVG(HP) FROM Car WHERE Brand = 'Audi';
SUM(HP)AVG(HP)
744186

The GROUP BY Clause

The examples so far have all calculated a single value across all rows of a table that satisfied the WHERE condition of the SELECT statement. Sometimes, you may need such aggregations, not for all rows in total, but to produce different values for different subsets. This is where grouping and the GROUP BY clause comes into play.

Note

In SAP HANA SQL when combining aggregated and non-aggregated columns in a SELECT, you must explicitly include each column that is not used in an aggregate expression in the GROUP BY clause.

Treatment of NULL Values

NULL values in the WHERE column are treated as normal values creating a single group.

How often does each unique overtime value occur?

Code Snippet
123
SELECT Overtime, COUNT(*) AS Frequency FROM Official GROUP BY Overtime;
OVERTIMEFREQUENCY
103
201
?3
181
221

Grouping and Sorting

You can combine GROUP BY with ORDER BY for sorting.

Code Snippet
1234
SELECT Brand, MAX(HP) FROM Car GROUP BY Brand ORDER BY 2 DESC, 1 ASC;
BRANDMAX(HP)
Audi260
BMW184
Mercedes170
VW160
Renault136
Skoda136
Opel120
Fiat75

Combining Functions and Grouping

You can use functions in the GROUP BY clause.

What is the number of owners per year of birth?

Code Snippet
1234
SELECT YEAR(Birthday), COUNT(*) FROM Owner GROUP BY YEAR(Birthday) ORDER BY 2 DESC, 1 ASC;
YEAR(BIRTHDAY)COUNT(*)
?3
19863
19341
19521
19571
19661

Grouping by Several Columns

You can use a combination of columns in the GROUP BY clause.

Code Snippet
123
SELECT Brand, Color, COUNT(*) FROM Car GROUP BY Brand, Color;
BRANDCOLORCOUNT(*)
Fiatred1
VWblack3
BMWblue1
Mercedeswhite2
Mercedesblack1
Audiyellow1
Audiblue1
Skodared2
BMWblack1
BMWgreen1
Renaultred2
Skodablack1
Opelgreen1
Audiorange1
Audigreen1

The HAVING Clause

When using grouping, you can discard some of the resulting groups in a way similar to how you exclude rows from a non-aggregated result set. The corresponding keyword is HAVING. The HAVINGclause behaves similar to the WHERE clause, but affects the resulting groups instead of the rows when calculating the aggregate values.

HAVING

Using the HAVING clause you can specify which conditions a group must meet to be included in the result set.

  • Note that the HAVING clause is evaluated after the aggregate groups are calculated.
  • Which combinations of brand and color occur at least twice?
Code Snippet
1234
SELECT Brand, Color, COUNT(*) FROM Car GROUP BY Brand, Color HAVING COUNT(*) >= 2;
BRANDCOLORCOUNT(*)
VWblack3
Mercedeswhite2
Skodared2
Renaultred2

SELECT statement

  • What is the number of cars per brand that are black or red?
  • Rename the Brand column to "Manufacturer".
  • Display only those manufacturers with at least two cars.
  • Sort the result set first – descending by number of cars.
  • Sort the result set second – ascending by manufacturer.
Code Snippet
123456
SELECT c.Brand AS "Manufacturer", COUNT(*) FROM Car c WHERE Color IN ('black', 'red') GROUP BY Brand HAVING COUNT(*) >= 2 ORDER BY 2 DESC, Brand ASC;
ManufacturerCOUNT(*)
Skoda3
VW3
Renault2

Note

The HAVING clause is one of the most frequently-misunderstood elements in the SQL language. It requires extensive practice in order to fully understand its behavior.