(A quick reminder: take a look at our introduction to SQL post to learn how to download the sample database we are using)

What is the SQL GROUP BY clause?

The SQL GROUP BY clause allows you to group your data in some fashion. We usually use at least one aggregate function when using the GROUP BY clause. For example, let’s suppose we want to find the total sales for each salesperson in the Sales.SalesOrderHeader table. To do this, we will group by the SalesPersonID and SUM the TotalDue column:

SELECT SalesPersonID, SUM(TotalDue)
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID

SQL is grouping all of the rows returned into the different SalesPersonID values. Then, it sums up the TotalDue column values and outputs the sum associated with each SalesPersonID.

Let’s look at another example. Suppose we wanted to find how many products were associated with each product color (i.e. there are x number of red products, y number of blue products, etc.). To complete this request using the Production.Product table, we would use the query:

FROM Production.Product

With that query, we are able to find how many rows are associated with each Color value. In other words, how many products are each color. Notice how in the SELECT clause we are using the column we wish to group by and then the aggregate function. No other columns or functions appear in the SELECT clause. If you tried to put another column in the previous query, say the Size column, SQL would produce an error.

Online SQL training for beginners
Join over 4,000 students and check out our top-rated “Microsoft SQL for Beginners online course on Udemy. Click the image above for more information!

When using the GROUP BY clause in a SQL SELECT statement, each column in the SELECT clause must either be included in the GROUP BY clause OR be an aggregate function. This is a common mistake with beginners. Execute the following query and see the error that it produces:

SELECT Color, Size, COUNT(*)
FROM Production.Product

The error message,

Msg 8120, Level 16, State 1, Line 1
Column ‘Production.Product.Size’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

is an error message that is very helpful. It reiterates the bolded comment above that all columns in the SELECT clause must appear in either the SQL GROUP BY clause or be an aggregate function. If you ever get this error, instantly take a look at your query to make sure the columns appear in the GROUP BY clause or are aggregate functions.

Practice Problems

Using some of the examples in this lesson as your guide, try to complete the following practice problems!

  1. Find out how many rows in the Production.Product table are associated with each Size value. That is, find out how many products are each size.
  2. Find the average sales amount for each SalesPersonID in the Sales.SalesOrderHeader. Use the TotalDue column for the sales amount.
  3. Using the HumanResources.Employee table, find out how many people are male and how many are female using the Gender column.

2 replies
  1. gvg
    gvg says:

    Presumably correct version of your last example should be:

    SELECT Color, Size, COUNT(*)
    FROM Production.Product
    GROUP BY Color, Size

    But what if I do not want to group by Size, just need to group by Color?

  2. Brewster Knowlton
    Brewster Knowlton says:

    In that case, simply remove the Size column from both the SELECT clause and the GROUP BY clause.

Comments are closed.