Herong's Tutorial Notes on SQL
Dr. Herong Yang, Version 3.02

Select Statements

Part:   1  2  3  4  

(Continued from previous part...)

GROUP BY Clause

"GROUP BY clause" modifies the base table by grouping original rows into group rows based on identical combined values of the specified group columns. In other words, each resulting row represents a group of original rows that has a unique combination of the values in the specified group columns. Original columns are reduced to the specified group columns only. Group rows can also be filtered out by a specified condition. "GROUP BY clause" syntax is:

GROUP BY group_columns [HAVING having_condition]

where "group_columns" is a list of columns in the orifginal base table, and "having_condition" is a predicate operation that will result a true or false condition.

Rule 1: Two types of data can be used in select expressions: 1. group columns; 2. a group function of any original columns. Group functions are:

  • COUNT(column): Number of original records in the group represented by this resulting record. Actually, the COUNT() will produce the same number regardless of the specified field.
  • SUM(column): The sum of all values of the specified column in the group represented by this resulting row.
  • MIN(column): The minimum value of the specified column in the group represented by this resulting row.
  • MAX(column): The maximum value of the specified column in the group represented by this resulting row.
  • AVG(column): The average value of the specified column in the group represented by this resulting row.

For examples, the following is nice salary statistics report per department:

SELECT Department, COUNT(Name) AS NumberOfEmployees, 
 MIN(Salary) AS MinimumSalary, MAX(Salary) AS MaximumSalary, 
 AVG(Salary) as AverageSalary
FROM Employee WHERE Status='Active' GROUP BY Department

Rule 2: If multiple group columns are used, rows are grouped into a single rows based the identical combined values of the group columns, not individual identical values. For example, the following statement reports age statistics per department and per sex:

SELECT Department, Sex, COUNT(Name) AS NumberOfEmployees, 
 MIN(Salary) AS MinimumSalary, MAX(Salary) AS MaximumSalary, 
 AVG(Salary) as AverageSalary
FROM Employee WHERE Status='Active' GROUP BY Department, Sex

If there are 10 individual departments, you will get 20 records, assuming that every department has both sexes.

Rule 3: If a having condition is specified, it will be used to filter out the resulting group rows that do not satisfy this condition. Since the having condition is applied on the grouped rows, it can only use group columns and group functions. For example, the following statement report salary statistics only for those departments that have more than 10 active employees:

SELECT Department, COUNT(Name) AS NumberOfEmployees, 
 MIN(Salary) AS MinimumSalary, MAX(Salary) AS MaximumSalary, 
 AVG(Salary) as AverageSalary
FROM Employee WHERE Status='Active' 
GROUP BY Department HAVING COUNT(Name)>10

The following is bad example, "Sex='Male'" can only be used in the WHERE clause, not in the HAVING clause:

SELECT Department, COUNT(Name) AS NumberOfEmployees, 
 MIN(Salary) AS MinimumSalary, MAX(Salary) AS MaximumSalary, 
 AVG(Salary) as AverageSalary
FROM Employee WHERE Status='Active' 
GROUP BY Department HAVING sex='Male'

ORDER BY Clause

"ORDER BY clause" modifies the base table by sorting rows according the specified order. "ORDER BY clause" syntax is:

ORDER BY order_exp, order_exp, ...

where "order_exp" specify a single order expression. If multiple order expressions are specified, the order expression on the left has higher precedence than the one on the right. This means the order expression on the right will only be used to sort rows that are having the same for the order expression on the left.

If ORDER BY clause is used with GROUP BY clause, it must contain only group columns or group functions. For example, the following statement shows us which department has the oldest average age:

SELECT department, COUNT(name) AS numberOfEmployees, 
 min(age) AS minimumAge, max(age) AS maximumAge, 
 AVG(age) as averageAge
FROM employee WHERE status='Active' GROUP BY department
ORDER BY AVG(age) DESC

Part:   1  2  3  4  

Dr. Herong Yang, updated in 2006
Herong's Tutorial Notes on SQL - Select Statements