(A quick reminder: take a look at our introduction to SQL post to learn how to download the sample database we are using)
What is the SQL GROUP BY clause?
The SQL GROUP BY clause allows you to group your data in some fashion. We usually use at least one aggregate function when using the GROUP BY clause. For example, let’s suppose we want to find the total sales for each salesperson in the Sales.SalesOrderHeader table. To do this, we will group by the SalesPersonID and SUM the TotalDue column:
SELECT SalesPersonID, SUM(TotalDue)
GROUP BY SalesPersonID
SQL is grouping all of the rows returned into the different SalesPersonID values. Then, it sums up the TotalDue column values and outputs the sum associated with each SalesPersonID.
Let’s look at another example. Suppose we wanted to find how many products were associated with each product color (i.e. there are x number of red products, y number of blue products, etc.). To complete this request using the Production.Product table, we would use the query:
SELECT Color, COUNT(*)
GROUP BY Color
With that query, we are able to find how many rows are associated with each Color value. In other words, how many products are each color. Notice how in the SELECT clause we are using the column we wish to group by and then the aggregate function. No other columns or functions appear in the SELECT clause. If you tried to put another column in the previous query, say the Size column, SQL would produce an error.
When using the GROUP BY clause in a SQL SELECT statement, each column in the SELECT clause must either be included in the GROUP BY clause OR be an aggregate function. This is a common mistake with beginners. Execute the following query and see the error that it produces:
SELECT Color, Size, COUNT(*)
GROUP BY Color
The error message,
Msg 8120, Level 16, State 1, Line 1
Column ‘Production.Product.Size’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
is an error message that is very helpful. It reiterates the bolded comment above that all columns in the SELECT clause must appear in either the SQL GROUP BY clause or be an aggregate function. If you ever get this error, instantly take a look at your query to make sure the columns appear in the GROUP BY clause or are aggregate functions.
Using some of the examples in this lesson as your guide, try to complete the following practice problems!
- Find out how many rows in the Production.Product table are associated with each Size value. That is, find out how many products are each size.
- Find the average sales amount for each SalesPersonID in the Sales.SalesOrderHeader. Use the TotalDue column for the sales amount.
- Using the HumanResources.Employee table, find out how many people are male and how many are female using the Gender column.