Today’s “A:360” discusses why as many people as possible should learn SQL. SQL, Structured Query Language, is the primary language used to interact with a database and is a critical foundation for analytics knowledge. Between strong career earnings and self-sufficiency, SQL is a skill everyone should learn.

Watch and Listen

Click to Watch on YouTube.

Listen to the Podcast

Click to Listen on SoundCloud
Click to Listen to on iTunes

Read the Transcribed Audio

Hey everyone. Welcome to today’s A:360. My name is Brewster Knowlton, and today we’re going to be talking about why as many people as possible should be learning SQL.

SQL stands for Structured Query Language, and it’s the primary language responsible for managing data and data structures contained within relational database systems. So anytime you are trying to retrieve data from a database, add data to a database or make any modifications to the data structures, you’re probably going to be using some variant of the SQL language.

Let’s dig into the first reason why I believe you should learn SQL. First, you can earn some great money. The earnings potential for somebody with strong SQL skills is pretty significant. If you’re a financial analyst or anybody who is an Excel guru but lacks SQL knowledge, learning SQL might be the next step for you to really progress your career and identify an opportunity for a high earning job.

Along with the high earning potential, SQL is one of the most sought after skills by employers that are hiring. This makes sense given the emphasis on analytics and the growth of data. This would imply that employers are looking for more and more individuals who have the talent to retrieve data from a database using SQL and know what to do with it.

Along with the career-oriented reasons for learning SQL, another reason that you might want to learn SQL is because it enables you to get an answer to any question that you ask. If you’re an inquisitive person and want to know as much as possible about your business, SQL enables you to think of questions and then get an answer to those questions without having to go to a bunch of different people trying to find the answers.

If you’re somebody who works with Excel a lot, how many times have you worked with a really big spreadsheet, tried to create some formula or do some operation, and then Excel crashes? Now yes, it’s gotten better with newer versions, but it still happens. SQL eliminates that crashing. By writing a SQL query or building some type of SQL code or script and running it against a database, that system is designed to process millions and millions (and millions!) of rows very quickly. It’s designed to be efficient for a lot of the queries you’re going to write. So, if you’re somebody who has to deal with Excel crashing frequently because of the volume or types of operations you try to do with data, SQL is going to enable you overcome those obstacles and be more efficient.

The last reason we’re going to talk about as to why you should learn SQL is that it enables you to never have to ask yourself a question like, “How did I make that report again?”

SQL queries can be saved and easily rerun at any point in time. If there is something that has many steps to it or is a little bit more complicated, instead of having to redo all your work, you can save that query. When you need to run it again you just simply open it up and execute it.
While there are certainly some things you can do within Excel to be more efficient with your reports and automate certain things (through VBA, PowerPivot or something like that) SQL is going to make that process much, much easier. So again, SQL is going to enable you to be incredibly efficient and not waste any time by redoing manual processes that could simply be opened up from your saved query, executed, and, voila, you have your results.

Wrapping up, you should learn SQL because of these five points:

  • It offers a higher earning potential.
  • It’s one of the most in demand skills
  • It empowers you to answer your own questions of your data
  • You will rarely have to deal with Excel crashing if you write your queries in SQL against the database
  • You won’t have to redo nearly as much manual effort because you can save your queries, easily open, and then rerun them at any time.

If you are interested in learning SQL, we have a great course out there on the online learning platform, Udemy. The course is called Microsoft SQL for Beginners, and it’s a great first course to dip your toes into learning SQL and we try to break it down and make it very, very simple. I’ll include that link in the transcription and you can find it on our website in various different places. So, keep an eye out for that and if you’re interested in learning SQL, I highly recommend you take a look at that course.

That’s it for today! Thanks again for listening to today’s A:360.

Subscribe to have new content sent directly to your email!

[mc4wp_form]

Photo Credit

What is the SQL EXCEPT set operation?

The SQL EXCEPT set operation is another set operation like the UNION or INTERSECT operations. This operation, however, returns all rows from the first query except those rows that appears in the second query. Think of this as subtraction but for sets. Visually, the SQL EXCEPT set operation looks like:

SQL EXCEPT

Order matters with the EXCEPT set operation just like when subtracting numbers. Let’s look at an example. Suppose we have two queries: the ProductID and Name columns for all rows from Production.Product and another query that returns the ProductID and Name columns for all rows with a ProductID greater than 100. Applying the EXCEPT operation to the two queries, we get:

SELECT ProductID, Name
FROM Production.Product

EXCEPT

SELECT ProductID, Name
FROM Production.Product
WHERE ProductID > 100

The first query returns all rows from Production.Product. Then the EXCEPT operator tells SQL to exclude any row from the first query that appears in the second query; in this case, any row with a ProductID greater than 100. This is why the results are limited to just those few rows from the table with a ProductID less than or equal to 100.

Let’s look at another example. Simply switch the order of the queries from the previous example and look at the results:

SELECT ProductID, Name
FROM Production.Product
WHERE ProductID > 100

EXCEPT

SELECT ProductID, Name
FROM Production.Product

No rows are returned. This is the expected behavior for this query. Since the SQL EXCEPT operation excludes all rows present in the second query, an empty result field is the proper outcome. Since the first query only includes rows with a ProductID greater than 100, and the second query includes ALL rows from this table, then it follows that every row in the first query would appear in the second query. Since the EXCEPT operation excludes all rows that appear in the second query, we arrive at the result where we return no rows.

The SQL EXCEPT set operation can be a very useful operation. It can, at times, replace a lower performing query that may use the NOT IN OR NOT EXISTS operators in the WHERE clause.

Practice Problems

Using the examples in this lesson as a resource, try to complete the following practice problems.

  1. Use the EXCEPT set operation on the following two queries. The first query will be all columns in the HumanResources.Employee table for rows where the BusinessEntityID is less than 100. The second query will be all columns in the HumanResources.Employee table where the JobTitle equals “Chief Executive Officer”.
  2. Use the EXCEPT set operation on the following two queries. The first query will be all columns in the Production.Product table for rows with a ListPrice greater than $2,000. The second query will be all columns in the Production.Product table for rows with a Color value equal to “Red”.

What is the SQL UNION set operation?

The SQL UNION set operation combines the results of two separate queries into a single result that contains all rows from both queries. Visualized by a Venn diagram, the UNION set operation looks like:

UNION

Notice that every section of the circles (representing the results of separate queries) is shaded. This means that all rows of the each query are returned as part of the results. Let’s take a look at an example. Suppose we have two queries: one that returns the ProductID and Name column from Production.Product for all rows where ProductID is greater than 100 and another returning the same columns but for all rows where the ProductID is less than or equal to 100. Let’s apply the SQL UNION operation to them and see what the results look like:

SELECT ProductID, Name
FROM Production.Product
WHERE ProductID > 100

UNION

SELECT ProductID, Name
FROM Production.Product
WHERE ProductID <= 100

The two queries, which you can run separately to look at the results, combine together and all rows from each query are returned.

What happens if some rows appear in both queries of the SQL UNION set operation? Let’s take a look and see what happens. Modify the previous example so that the second query looks at all rows where the ProductID is less than or equal to 400. Now the full query becomes:

SELECT ProductID, Name
FROM Production.Product
WHERE ProductID > 100

UNION

SELECT ProductID, Name
FROM Production.Product
WHERE ProductID <= 400

If you run the queries separately, there are some rows that are in both queries. In fact, all rows with a ProductID between 101 and 400 are contained in both queries. However, when the SQL UNION operation is evaluated, only one row is presented in the results. No duplicate rows appear. This is because the UNION operation, by default, selects DISTINCT rows and outputs them to the results panel. There is a feature, the SQL UNION ALL set operation, that returns all rows regardless of if they are duplicates.

Consider our complete SQL for Beginners Course to learn more about UNION set operations.

What is the RIGHT OUTER JOIN

The RIGHT OUTER JOIN is nearly identical to the LEFT OUTER JOIN. The only difference between the two types of OUTER JOINs is the order in which SQL evaluates them. The LEFT OUTER JOIN returns all rows from the first table and the rows that match from second table. The RIGHT OUTER JOIN, on the other hand, returns all rows from the second table and then rows that match from the first table.

The Venn diagram below visualizes the RIGHT OUTER JOIN:

SQL RIGHT OUTER JOIN

Suppose that we wanted to find the FirstName and LastName of each employee listed in the HumanResources.Employee table. We also want to view their JobTitle. We could use the RIGHT OUTER JOIN to complete this request:

SELECT P.FirstName, P.LastName, E.JobTitle
FROM Person.Person P
RIGHT OUTER JOIN HumanResources.Employee E
ON E.BusinessEntityID = P.BusinessEntityID

The only rows that are returned are the 290 rows from HumanResources.Employee and then the FirstName and LastName columns from Person.Person if they exist for that BusinessEntityID.

Put simply, the RIGHT OUTER JOIN is nearly identical to the LEFT OUTER JOIN; the only difference is the order. The previous query is functionally identical to the query:

SELECT P.FirstName, P.LastName, E.JobTitle
FROM HumanResources.Employee E
LEFT OUTER JOIN Person.Person P
ON E.BusinessEntityID = P.BusinessEntityID

To reiterate, the ONLY difference is the order in which the tables appear in the SELECT statement.

To practice using the RIGHT OUTER JOIN, head back to our post on the LEFT OUTER JOIN and try each of those problems again. This time, however, use the RIGHT OUTER JOIN instead of the LEFT OUTER JOIN to complete the query.

What is a LEFT OUTER JOIN?

Like the INNER JOIN, a LEFT OUTER JOIN allows you to connect multiple tables in the same query. However, the LEFT OUTER JOIN returns all rows from the left table (first table listed) where the joining key matches a row in the second table listed. This is represented by the following Venn diagram:

LEFT OUTER JOIN

An example will quickly clarify the difference between a LEFT OUTER JOIN and an INNER JOIN. Suppose we want to return the FirstName and LastName column for all rows of the Person.Person table and the JobTitle column from HumanResources.Employee if the BusinessEntityID matches. Essentially, if the ID matches to an employee’s ID then return their job. To complete this request, we use the query:

SELECT P.FirstName, P.LastName, E.JobTitle
FROM Person.Person P
LEFT OUTER JOIN HumanResources.Employee E
ON E.BusinessEntityID = P.BusinessEntityID

Notice in the results that the vast majority of the rows contain a NULL in the JobTitle column. This is because most rows in the Person.Person table do not match up to a row in the HumanResources.Employee table. But, this is exactly what we wanted to happen and this is why we use the LEFT OUTER JOIN.

As another example, let’s join multiple tables together to find, for each row in Sales.SalesOrderHeader, what the sales person’s name was. This requires multiple joins. We want to return every row from Sales.SalesOrderHeader regardless of whether or not a salesperson was associated with that sale. Restrict the Sales.SalesOrderHeader table to only rows where the OrderDate was between January 1, 2006 and June 30, 2006. Also, return the SalesOrderNumber and TotalDue columns from Sales.SalesOrderHeader. To complete this request, we use the query:

SELECT SOH.SalesOrderNumber, SOH.TotalDue, P.FirstName AS [Salesperson First Name], P.LastName AS [Salesperson Last Name]
FROM Sales.SalesOrderHeader SOH
LEFT OUTER JOIN Sales.SalesPerson SP
ON SP.BusinessEntityID = SOH.SalesPersonID
LEFT OUTER JOIN Person.Person P
ON SP.BusinessEntityID = P.BusinessEntityID
WHERE SOH.OrderDate BETWEEN '1/1/2006' AND '6/30/2006'

Our results show plenty of NULL values in the “Salesperson First Name” and “Salesperson Last Name” columns. This is good and expected, since we wanted to return rows whether or not the sale was associated with a salesperson.

Practice Problems

Having learned what makes the LEFT OUTER JOIN different from the INNER JOIN that we learned about in the last section, try to complete the practice problems below!

  1. Find, for each row in Sales.SalesOrderHeader with an OrderDate between January 1, 2006 and June 30, 2006, the salesperson’s name and what their job title is. You will use the same query that was used in the last example of this lesson except you must complete one more join to the HumanResources.Employee table.
  2. Using a LEFT OUTER JOIN, find the territory name associated with each sale in Sales.SalesOrderHeader. Use the Sales.SalesTerritory table.
  3. Find each product’s subcategory regardless of whether or not the product has a product subcategory.

What is the SQL INNER JOIN?

A SQL INNER JOIN is a technique that allows you to retrieve data from multiple tables by connecting them with a joining key. An INNER JOIN can be thought of as the intersection of two tables based on some joining key. Typically, ID columns like SalesPersonID or BusinessEntityID in the AdventureWorks database, act as the joining keys. Joining keys are the columns that connect multiple tables together.

As mentioned earlier, INNER JOINs are like an intersection of tables. Rows are returned as long as there a match on the joining key between both tables. A Venn diagram is often used to visualize the INNER JOIN:

SQL INNER JOIN

A few examples will help clarify the concept of the SQL INNER JOIN. In the AdventureWorks database, the Production.Product table contains details about each product the company sells. There is a table called Production.ProductSubCategory that contains details about the SubCategory for each product. In both tables there exits an ID column called ProductSubCategoryID. This can be used as the joining key between the two tables so that we can return the product’s name and the product subcategory’s name in the same query. This is important and new because the columns appear in different tables.

SELECT Production.Product.Name, Production.ProductSubCategory.Name
FROM Production.Product
INNER JOIN Production.ProductSubCategory
ON Production.Product.ProductSubCategoryID = Production.ProductSubCategory.ProductSubCategoryID

We are telling SQL to link these two tables based on the ProductSubCategoryID column from each table. Do you remember column aliases? Well, we can use something called table aliases to reduce the amount of typing. A table alias is just like a column alias except that it assigns an alias to a table instead of a column. Unlike column aliases, a table alias can be referenced in the same query in all clauses.

To reduce the amount of code used in the prior SQL INNER JOIN example with table aliases and assign proper column aliases in the SELECT clause:

SELECT P.Name AS [Product Name], PS.Name AS [SubCategory Name]
FROM Production.Product P
INNER JOIN Production.ProductSubCategory PS
ON P.ProductSubCategoryID = PS.ProductSubCategoryID

We do not need to use the AS when assigning a table alias.

Let’s look at another example of the SQL INNER JOIN. Let’s join the HumanResources.Employee table with the Person.Person table using the BusinessEntityID joining key. We will then return the FirstName and LastName column from Person.Person and the JobTitle column from HumanResources.Employee. Since we are using an INNER JOIN, the only rows that will be returned are those rows where the BusinessEntityID value is in BOTH tables.

SELECT P.FirstName, P.LastName, E.JobTitle
FROM HumanResources.Employee E
INNER JOIN Person.Person P
ON P.BusinessEntityID = E.BusinessEntityID

And just like that, we are able to retrieve columns from two different tables in the same query!

Practice Problems

Using the examples above as a resource, try to complete the following practice problems.

  1. Join the Person.Person table to the HumanResources.Employee table. Return the FirstName and LastName columns from Person.Person and the JobTitle and NationalIDNUmber columns from HumanResources.Employee.
  2. Join the Production.ProductSubCategory table with the Production.ProductCategory using the ProductCategoryID column. Return the Name column from each table and assign proper aliases.
  3. Join the Production.Product, Production.ProductSubCategory, and Production.ProductCategory tables together. Return the Name column from all three tables and assign proper column aliases. (Hint: INNER JOIN from Production.Product to Production.ProductSubCategory and then INNER JOIN from Production.ProductSubCategory to Production.ProductCategory.)

What is the SQL HAVING clause?

The SQL HAVING clause is a clause that allows you to filter out groups based on some filtering criteria. The HAVING clause is a lot like the WHERE clause; the only difference is that the WHERE clause filters rows, while the HAVING clause filters groups.

For example, if we wanted to find only the Color values in the Production.Product table that had at least 30 products of that color, we would use the query:

SELECT Color, COUNT(*)
FROM Production.Product
GROUP BY Color
HAVING COUNT(*) >=30

We are telling SQL to first find out how many products are associated with each color. Then, we are only returning those colors that had a count of at least thirty rows (products).

Unlike the WHERE clause, that takes a column as an input, the SQL HAVING clause requires an aggregate function before the comparison operator. Looking at another example, let’s find out which JobTitle values in HumanResources.Employees belong to least ten employees:

SELECT JobTitle, COUNT(*)
FROM HumanResources.Employee
GROUP BY JobTitle
HAVING COUNT(*) >=10

We must always specify a GROUP BY clause when we use the HAVING clause. This makes sense if you consider what the SQL HAVING clause does. Since the HAVING clause filters groups, then it follows that we must first have groups specified.

Let’s look at one more example. Let’s find the SalesPersonID values that did at least $3 million in sales in the year 2006. Let’s also order our results by the total sales column in descending order.

SELECT SalesPersonID, SUM(TotalDue) AS [2006 Sales]
FROM Sales.SalesOrderHeader
WHERE OrderDate BETWEEN '1/1/2006' AND '12/31/2006'
GROUP BY SalesPersonID
HAVING SUM(TotalDue) >=3000000
ORDER BY [2006 Sales] DESC

We have used every single SQL clause in this one query. This is the “complete” SQL SELECT statement from:

SELECT [Column1], [Column2],…, [Column N]
FROM [Table Name]
WHERE [Column Name] [logical/comparison operator] [filter criteria]
GROUP BY [Grouping Column(s)]
HAVING [AGGREGATE FUNCTION] [logical/comparison operator] [group filtering criteria]
ORDER BY [Column Name, Alias, Ordinal]

These are but a few of the ways that you can use the SQL HAVING clause. Feel free to leave a comment below if you have any other questions! Again, remember we are using the AdventureWorks database for all of these examples. Refer back to our introduction to SQL post for instructions on how to get started.

Practice Problems

Complete the following practice problems based on the concepts learned in this lesson!

  1. Find all SalesPersonID values from Sales.SalesOrderHeader whose average sales price (TotalDue) exceeded $25,000.
  2. Find all SalesPersonID values from Sales.SalesOrderHeader whose average sales price (TotalDue) exceeded $30,000 for all sales with an OrderDate during the year 2006.
  3. Find all JobTitles, in the HumanResources.Employee, that have at least 25 employees.

(A quick reminder: take a look at our introduction to SQL post to learn how to download the sample database we are using)

What is the SQL GROUP BY clause?

The SQL GROUP BY clause allows you to group your data in some fashion. We usually use at least one aggregate function when using the GROUP BY clause. For example, let’s suppose we want to find the total sales for each salesperson in the Sales.SalesOrderHeader table. To do this, we will group by the SalesPersonID and SUM the TotalDue column:

SELECT SalesPersonID, SUM(TotalDue)
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID

SQL is grouping all of the rows returned into the different SalesPersonID values. Then, it sums up the TotalDue column values and outputs the sum associated with each SalesPersonID.

Let’s look at another example. Suppose we wanted to find how many products were associated with each product color (i.e. there are x number of red products, y number of blue products, etc.). To complete this request using the Production.Product table, we would use the query:

SELECT Color, COUNT(*)
FROM Production.Product
GROUP BY Color

With that query, we are able to find how many rows are associated with each Color value. In other words, how many products are each color. Notice how in the SELECT clause we are using the column we wish to group by and then the aggregate function. No other columns or functions appear in the SELECT clause. If you tried to put another column in the previous query, say the Size column, SQL would produce an error.

Online SQL training for beginners
Join over 4,000 students and check out our top-rated “Microsoft SQL for Beginners online course on Udemy. Click the image above for more information!

When using the GROUP BY clause in a SQL SELECT statement, each column in the SELECT clause must either be included in the GROUP BY clause OR be an aggregate function. This is a common mistake with beginners. Execute the following query and see the error that it produces:

SELECT Color, Size, COUNT(*)
FROM Production.Product
GROUP BY Color

The error message,

Msg 8120, Level 16, State 1, Line 1
Column ‘Production.Product.Size’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

is an error message that is very helpful. It reiterates the bolded comment above that all columns in the SELECT clause must appear in either the SQL GROUP BY clause or be an aggregate function. If you ever get this error, instantly take a look at your query to make sure the columns appear in the GROUP BY clause or are aggregate functions.

Practice Problems

Using some of the examples in this lesson as your guide, try to complete the following practice problems!

  1. Find out how many rows in the Production.Product table are associated with each Size value. That is, find out how many products are each size.
  2. Find the average sales amount for each SalesPersonID in the Sales.SalesOrderHeader. Use the TotalDue column for the sales amount.
  3. Using the HumanResources.Employee table, find out how many people are male and how many are female using the Gender column.

What are SQL Aggregate Functions?

Aggregate functions are functions that aggregate values and output a scalar (single-row) value as the result. Examples of aggregate functions include the COUNT, SUM, AVG, MIN, and MAX functions. For example, to find the total number of rows in the Person.Person table, we can use the query:

SELECT COUNT(*)
FROM Person.Person

In this example, the COUNT function takes the asterisk, *, as the input. This tells SQL to count all rows from the table. COUNT is the only function that can receive the asterisk as an input. Be careful that all values in the column you specify within the COUNT function are non-NULL. The COUNT function will count all non-NULL values of the specified column which may or may not be your desired result.

The SUM function is an aggregate function that takes a numeric column as its input and sums the values contained in the column. For example, to find the sum of the TotalDue column for all rows in Sales.SalesOrderHeader where the OrderDate was between January 1, 2006 and December 31, 2006:

SELECT SUM(TotalDue)
FROM Sales.SalesOrderHeader
WHERE OrderDate BETWEEN '1/1/2006' AND '12/31/2006'

The SUM function ignores NULL values when computing the aggregation. It also requires that the column data type be a numeric data type. Conceptually this makes sense because we can’t add up words.

The MAX function is an aggregate function that finds the largest value from a column. This works for numeric, string, and date values. To find the most recent (max) OrderDate from the Sales.SalesOrderHeader table:

SELECT MAX(OrderDate)
FROM Sales.SalesOrderHeader

Or, to find the the max LastName (the last name, alphabetically from A to Z) from Person.Person:

SELECT MAX(LastName)
FROM Person.Person

The AVG function is similar to the SUM function except that it averages all values in the column you specify instead of summing them up. Like the SUM function, the column’s data type must be some numeric data type for this function to execute error-free.

As an example, we can find the average sales price, using the TotalDue column, from all rows in the Sales.SalesOrderHeader table with the query:

SELECT AVG(TotalDue)
FROM Sales.SalesOrderHeader

The MSDN (Microsoft Developer Network) website contains much more detailed information about each aggregate function found here.

Practice Problems

Complete the practice problems below to test some of the concepts you learned in this lesson!

  1. Find the largest value of the TotalDue column from the Sales.SalesOrderHeader table.
  2. Find the average ListPrice for all rows in the Production.Product table.
  3. Find the total sales amount, using the TotalDue column, for all rows in the Sales.SalesOrderHeader table.

What are the different SQL Comparison Operators?

In previous lessons, we have discussed many of the different SQL comparison operators that are available. These operators give you plenty of options to filter data in the WHERE clause of your SELECT statement. Having discussed many of them, I want there to be one consolidated place for you to get information on each of the SQL comparison operators.


The table below contains the SQL comparison operator symbol and a description of what it does:

SQL Comparison Operator Description
= Equal to
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
<> Not equal to
!= Not equal to
!< Not less than
!> Not greater than

The table above lists all of the SQL comparison operators that you can use in your WHERE clause. You may notice that the IN, BETWEEN, and LIKE operators are not included in this table. Those are technically considered logical operators and have slightly different properties. Accordingly, this is why the IN, BETWEEN, and LIKE operators each have lessons dedicated to them.