We’ve published quite a few articles this year! With the number of articles published, we wanted to create a post where we summarized our top ten most viewed articles about data and analytics. So, without further ado, here they are:

1. 5 Steps to Developing Your Business Intelligence Strategy

From Brewster Knowlton of The Knowlton Group – five steps you can start completing today to begin developing your business intelligence strategy.

2. KPIs and The Sierpinski Triangle

Learn how the Sierpinski Triangle is the perfect visualization for how an organization’s KPIs and departmental metrics should be organized and defined.

3. Reporting – First Step for Business Intelligence

Business intelligence for community banks and credit unions doesn’t have to be hard. Start with basic reporting, keep things simple, and take small steps.

4. Data Inventory: What and Why

A data inventory defines all data sources available to your organization along with ownership details, descriptions, priorities and other relevant details.

5. When Life Gives You Data, Make Information

Data for the sake of data is meaningless. Real value comes from turning data into information. We explain a simple process to make get most out of your data

6. Analytics in the Credit Union and Banking Industry: A Competitive Necessity

Analytics in the bank and credit union industries will become a necessity for survival. Those who adopt early will see significant competitive advantages.

7. 5 Reasons to Invest in Data and Analytics in 2016

As we enter the new year, learn five critical reasons why your organization should invest in data and analytics in 2016 and become data-driven.

8. Common Business Intelligence Roles

Learn about some of the most common business intelligence roles that will need to be fulfilled as you deploy and expand your business intelligence solution.

9. The CLO and Lending Analytics

In this post, we highlight five ways that CLOs can use lending analytics to improve their operations and drive loan growth.

10. Analytics and “The Flywheel Effect”

Jim Collins’ “Flywheel Effect” perfectly applies to the world of analytics. Learn how starting your analytics program has a “flywheel effect” as well!


Not getting these posts sent directly to you? Sign up using the form below!

[contact-form-7 404 "Not Found"]

What is the SQL INTERSECT set operation

The SQL INTERSECT set operation returns the distinct rows from two or more queries that appear in each query’s results. Unlike the SQL UNION set operation that returns all rows from both queries, the SQL INTERSECT operation only returns the rows that appear in both queries. Visualized with a Venn diagram, the SQL INTERSECT operation looks like:

SQL INTERSECT, Set operations

For example, suppose we apply the INTERSECT set operation to two queries: the ProductID and Name column from Production.Product for all rows where the ProductID is less than 300 and another query for the same columns but for all rows where the ProductID is greater than or equal to 100:

SELECT ProductID, Name
FROM Production.Product
WHERE ProductID < 400

INTERSECT

SELECT ProductID, Name
FROM Production.Product
WHERE ProductID >= 100

The results only contain those rows that appeared in the results for each query. In this case, any row of Production.Product with a ProductID between 100 and 399 would appear in the SQL INTERSECT operation result set. If you look at the results, those are the exact rows that results from the complete query.

Let’s look at another example. Take two queries: one query for the BusinessEntityID, NationalIDNumber, and JobTitle columns from HumanResources.Employee where the BusinessEntityID is less than 50 and another query for the same columns but for rows where the JobTitle is either “Chief Executive Officer” or “Research and Development Manager”:

SELECT BusinessEntityID, NationalIDNumber, JobTitle
FROM HumanResources.Employee
WHERE BusinessEntityID < 50

INTERSECT

SELECT BusinessEntityID, NationalIDNumber, JobTitle
FROM HumanResources.Employee
WHERE JobTitle IN ('Chief Executive Officer', 'Research and Development Manager')

The results of this SQL INTERSECT operation only include the rows that appear appear in both queries. So, if there was a row with a BusinessEntityID greater than 50 and the JobTitle “Research and Development Manager” then that row would not appear in the results of the INTERSECT operation.

Practice Problems

Using the examples from this section as a resource, try to complete the following practice problems.

  1. Use the INTERSECT operation on the following two queries: the SalesOrderID and TotalDue columns from Sales.SalesOrderHeader for all rows where the SalesOrderID is less than 44,000 and another query for the same columns but for rows where the TotalDue is less than $6,000.
  2. Use the INTERSECT operation for two queries. The first query will be all columns from Production.Product for all rows with a ProductID less than 50. The second query will be all columns from Production.Product for all rows with a ProductID greater than 200. What do the results look like? Why?

What is a FULL OUTER JOIN?

The FULL OUTER JOIN is the last of the join methods. This join method returns rows from both the first table and the second table regardless of whether or not a match exists. Visualized using a Venn diagram, the FULL OUTER JOIN looks like this:

SQL FULL OUTER JOIN

As an example, we can return the Name column from Production.Product and the subcategory Name from Production.ProductSubCategory regardless of any matches from either table. So, if a ProductSubCategoryID exists in Production.ProductSubCategory but not in Production.Product it will be returned. Similarly, if a ProductSubCategoryID exists in Production.Product but not in Production.ProductSubCategory, that row will also be returned:

SELECT P.Name AS [Product Name], PS.Name AS [Product Subcategory Name]
FROM Person.Person P
FULL OUTER JOIN Production.ProductSubCategory PS
ON P.ProductSubCategoryID= PS.ProductSubCategoryID

I would argue that this type of join is the least commonly used. It does, however, have its uses in special circumstances. When we get to some of the functions involving the COALESCE function, some additional uses for the FULL OUTER JOIN might be highlighted.

How do I sort data with the SQL ORDER BY clause?

Sorting returned data is a very common task when completing SQL statements. Luckily, the SQL ORDER BY clause allows us to complete this activity pretty easily. For example, to return the the FirstName and LastName columns from Person.Person sorted by the FirstName column in alphabetical order:

SELECT FirstName, LastName
FROM Person.Person
ORDER BY FirstName

If you look at the results, the default sorting is ascending order; that is, A to Z. We did not specify the sorting order in our query. SQL, by default, assumes ascending order unless specified otherwise. We could have explicitly stated that we wanted to FirstName to be sorted in ascending order with the query:

SELECT FirstName, LastName
FROM Person.Person
ORDER BY FirstName ASC

Unlike with ascending order, we MUST explicitly tell SQL when we wish to sort in descending order. We can modify the previous query to sort the results by the FirstName column in descending order with the query:

SELECT FirstName, LastName
FROM Person.Person
ORDER BY FirstName DESC

The ORDER BY clause also has the unique property to be able to reference the column aliases created in the SELECT clause. Suppose we gave the FirstName and LastName columns the aliases “First Name” and “Last Name” respectively. We could then sort using not just the FirstName column but by specifying the “First Name” alias:

SELECT FirstName AS [First Name], LastName AS [Last Name]
FROM Person.Person
ORDER BY [First Name] ASC

The SQL ORDER BY clause is the only clause in a SELECT statement that can reference a column alias in the same query. This is due to the order in which the database engine evaluates the clauses in a query. This is explained in more detail in our Microsoft SQL for Beginners course on Udemy.


Practice Problems

Complete the practice problems below to reinforce some of the lessons learned in this section.

  1. Return the first 100 rows from the Person.Person table. Return only the FirstName and LastName columns. Order the results by the LastName column in descending order.
  2. Return the SalesOrderNumber and TotalDue columns from Sales.SalesOrderHeader. Order the results by the TotalDue column in ascending order.
  3. Return the SalesOrderNumber and TotalDue columns from the Sales.SalesOrderHeader table. Assign the aliases “Order Number” and “Sales Amount” to the columns respectively. Order the results by the “Sales Amount” alias in descending order.

What is the LIKE operator?

The LIKE operator is an operator that allows you to use SQL wildcard operators in the WHERE clause. SQL wildcard operators along with the LIKE operator allow you to search for patterns of strings. For example, to find all rows in the Person.Person table where the FirstName column starts with the letters “Mi”, we can use the query:

SELECT *
FROM Person.Person
WHERE FirstName LIKE 'Mi%'

The “%” SQL wildcard operator, in this example, tells SQL to find all values in the FirstName column that start with “Mi” and end with either no other characters or some other characters. The placement of the % operator changes how SQL interprets it. For example, to find all values in Person.Person where the FirstName column ends with “s”, we use the query:

SELECT *
FROM Person.Person
WHERE FirstName LIKE '%s'

Similarly, we can use multiple % SQL wildcard operators to find how many rows in the Person.Person table have an “s” in the FirstName column. Unlike the previous two examples, we will look for an “s” anywhere in the string:

SELECT *
FROM Person.Person
WHERE FirstName LIKE '%s%'

Below is a table listing the different SQL wildcard operators that can be used with the LIKE operator:

  • % (percent symbol) – searches for zero or more characters in place of the % symbol
  • _ (underscore) – searches for a single character in place of the underscore
  • [ ] (square brackets) – searches for a single character limited to a specific set of characters or numbers
  • [^] (square brackets with carat) – searches for a single character not included in the subset listed in the square brackets

Practice Problems

Using the examples from this section as a guide, try to complete the practice problems listed below!

  1. Find all rows in the Production.Product table where the Name column starts with “Bi”.
  2. Find all rows in the Person.Person table where the LastName column contains an “s”.
  3. Find all rows in the Person.Person table where the LastName ends with an “s” or the LastName starts with an “s”.

What is the BETWEEN operator?

The BETWEEN operator is a logical operator that allows you to search for data between two values. In previous lessons to find all rows in Production.Product where the ListPrice was between 100 and 200, we wrote the query:

SELECT *
FROM Production.Product
WHERE ListPrice >= 100 AND ListPrice <= 200

The BETWEEN operator, however, gives us a simpler solution to this problem:

SELECT *
FROM Production.Product
WHERE ListPrice BETWEEN 100 AND 200

Notice that this operator is an inclusive operation; that is, the end points are included in the results. If you look at the first query we wrote, the “greater than or equal to” and “less than or equal to” comparison operators were used to represent BETWEEN.

The BETWEEN operator works for text and dates – not only for numeric values. To find all rows in Sales.SalesOrderHeader where the OrderDate is between July 1, 2005 and August 15, 2005:

SELECT *
FROM Sales.SalesOrderHeader
WHERE OrderDate BETWEEN '7/1/2005' AND '8/15/2005'

Using this method means that we don’t have to repeat the column name. Using the BETWEEN operator also makes the code a little easier to read. It’s much easier to say “find all dates between July 1, 2005 and August 15, 2005” than “find all dates that are greater than or equal to July 1, 2005 and less than or equal to August 15, 2005…”. Anything that improves how the code reads – as long as it doesn’t negatively impact performance – is a positive in my opinion.

As mentioned before, we can use the BETWEEN operator for text values. If I wanted to find all rows in the Person.Person table with a LastName value between the names “Johnson” and “Smith”, we could use the query:

SELECT *
FROM Person.Person
WHERE LastName BETWEEN 'Johnson' AND 'Smith'


Practice Problems

Complete the following practice problems to gain some more experience with the BETWEEN operator in your WHERE clause:

  1. Find all rows in Sales.SalesOrderHeader with a TotalDue value between 300.00 AND 450.00
  2. Find all rows in the HumanResources.Employee tables with a BirthDate value between January 18, 1971 and September 6, 1974.
  3. Find all rows in the rows in the Person.Person table with a FirstName value between the names “Chris” and “Mark”.

A critical component of a successful business intelligence strategy involves the use and embrace of data visualization. As we shift our primary reporting tool from Excel to a true business intelligence solution, this will become less of a “want to have” and more of a “need to have”.

What is Data Visualization?

Data visualization is the general concept around creating charts, graphs, infographics and other visual representations of data. Data visualization examples can be as simple as creating a line chart in Excel or as complicated as a time-driven, interactive bubble chart.

simple data visualization

Data visualization software like Tableau, Qlikview and others are revolutionizing the way in which we visualize data. This has led to a variety of benefits that have all contributed to the growth and success of business intelligence. Below, we highlight a couple of the biggest benefits of data visualization.

Easily Understand Data

What is easier to comprehend? An Excel spreadsheet filled with rows and columns of data, or a simple chart?

See for yourself:

Excel Spreadsheet
Simple Data Visualization Excel

Clearly, the chart is much simpler to digest than the spreadsheet. Both show the same data; one relies on text, while the other relies on visuals.

Our brain processes visuals 60,000 times faster than text, so relying on visualization instead of text/spreadsheets means we can quickly grasp what our data is trying to tell us.

Does the busy executive want to try to dissect a complicated Excel spreadsheet, or would they rather view a chart that clearly describes the underlying patterns and trends?

The key takeaway here is that visualization increases clarity of the underlying data. Where spreadsheets force the viewer to determine the patterns and trends, proper data visualization highlights the trends and patterns for the viewer.

Comprehend Large and Complex Datasets

Breaking down and visualizing simple spreadsheets is one thing. Trying to comprehend very large and complex datasets is a completely different challenge. A challenge, however, that can be overcome by proper data visualization.

The U.S. Census Bureau has some great examples of visualizing large and complex data sets.

For example, the chart below looks at moving patterns from across the United Status for those individuals that moved to the Arizona:

Facts for Features: Super Bowl 2015[Source: U.S. Census Bureau]

Another example from the U.S. Census Bureau maps the county population with Irish roots as a percentage of each county’s population:

Irish Roots: U.S. population with Irish ancestry: 11.1%

The datasets that comprise the previous two visualizations are quite large – too large for the human brain to be able to identify the trends and patterns without clear and concise visualization.

When working with large datasets, the best (if not only) way to tell a story with the data is to use visuals that clearly show the user the underlying patterns and trends.

My Challenge to You

Don’t send another report with rows and columns of data. (You are welcome, executives.) Send charts, graphs, and other visualizations that tell the data’s story. The quality of reporting will increase drastically throughout your organization, and a data-driven, data-visualization culture will begin to emerge.

How do I use the SQL WHERE clause with dates?

Using the WHERE clause with dates is very similar to filtering text. Date values must be enclosed with the single apostrophe. To find all rows that have an OrderDate equal to July 15, 2005 in the Sales.SalesOrderHeader table, we use the query:

SELECT *
FROM Sales.SalesOrderHeader
WHERE OrderDate = '7/15/2005'

We can use different comparison operators with date values as well. To find all orders that took place after July 15, 2005 we can use the query:

SELECT *
FROM Sales.SalesOrderHeader
WHERE OrderDate > '7/15/2005'

While the previous examples have used the slash symbol to separate the month, day, and year, you can also use the hyphen to separate the values:

SELECT *
FROM Sales.SalesOrderHeader
WHERE OrderDate > '7-15-2005'

Just like with the other examples of the SQL WHERE clause, we can combine multiple filtering conditions using the “AND” or “OR” operator. To find all rows where the OrderDate was either July 15, 2005 or July 16, 2005 then we can execute the query:

SELECT *
FROM Sales.SalesOrderHeader
WHERE OrderDate = '7/15/2005' OR OrderDate = '7/16/2005'

There are many date functions that can be used in a WHERE clause that give you a little more flexibility with your queries. An upcoming post will discuss these various date functions. As we get to later lessons, we will apply some of those to give you dynamic options.


Practice Problems

Given the examples completed in this lesson, try and complete some of the SQL WHERE clause practice problems below!

  1. Find all orders that took place before August 1, 2006 using the OrderDate column and the Sales.SalesorderHeader table.
  2. Find all rows in the HumanResources.Employee table with a BirthDate after September 12, 1970
  3. Find all orders in the Sales.SalesOrderHeader table with an OrderDate value that was after January 3, 2006 and before July 8, 2006.

How do I use the SQL WHERE clause with text?

The SQL WHERE clause allows you to filter rows based on a criteria you define. This lesson will focus specifically on filtering rows based on text values. Were we to find all rows in the Person.Person table where the FirstName contains the word “Michael”, we would type:

SELECT *
FROM Person.Person
WHERE FirstName = 'Michael'

Notice the equals comparison operator after the WHERE and before the filtering text in the WHERE clause. Every WHERE clause must include one of several possible comparison operators. See our lesson on SQL Comparison Operators that will be coming up soon for the complete list of operators you can use.

We can use the operators “AND” and “OR” to add more conditions to our WHERE clause. To find all rows from the Person.Person table with a FirstName value equal to “Michael” and a LastName value equal to “Allen”, we can type:

SELECT *
FROM Person.Person
WHERE FirstName = 'Michael' AND LastName = 'Allen'

We could use the “OR” operator to find all rows of Person.Person with a FirstName equal to “Michael” or equal to “James”:

SELECT *
FROM Person.Person
WHERE FirstName = 'Michael' OR FirstName = 'James'

We can find all rows of Production.Product where the Name column starts with a letter less than “K” with the query:

SELECT *
FROM Production.Product
WHERE Name < 'K'


Practice Problems

There are several other ways to filter rows of data based on text values in the SQL WHERE clause. Later sections will discuss other comparison operators and ways to filter text data. Complete the practice problems below to gain a bit more experience with the techniques learned in this lesson:

  1. Find all rows in Person.Person where the FirstName value equals “Steve”.
  2. Find all rows in Production.Product where the Color column is “Red”.
  3. Find all rows in Production.Product where the Color column is either “Red” or “Black”.

What is SELECT DISTINCT and how do I use it?

The previous lessons have discussed the basics of the SQL SELECT statement. There is an option that can be added to the SELECT statement to return a distinct list. For example, if I wanted to find the distinct list of all values in the Color column of Production.Product, I would execute the query:

SELECT DISTINCT Color
FROM Production.Product

A quick reminder: all of our practice problems and examples use the AdventureWorks sample database provided by Microsoft. For some help getting this set up, take a look at our introduction to SQL post!

Instead of returning all 504 rows of the Production.Product table’s color column, this query returns only the ten unique color values. Quickly being able to find the distinct values of a certain column can be very helpful. If we wanted to find the distinct list of job titles for employees in the AdventureWorks company, we could use the query:

SELECT DISTINCT JobTitle
FROM HumanResources.Employee

It is possible to create a SELECT DISTINCT statement on more than one column. To find the list of distinct combinations of FirstName and LastName values from the Person.Person table, use the query:

SELECT DISTINCT FirstName, LastName
FROM Person.Person

Using the lessons learned in a previous section about column aliases, we can write a SELECT DISTINCT SQL statement and use column aliases at the same time:

SELECT DISTINCT FirstName AS “First Name”, LastName AS “Last Name”
FROM Person.Person

Practice Problems

Complete the following practice problems to gain more experience using SELECT DISTINCT in your SQL queries.

  1. Find the distinct list of values in the Size column from Production.Product.
  2. Find the distinct MaritalStatus column values in the HumanResources.Employee table.
  3. Find the distinct list of CardType values from the Sales.CreditCard table.