I hear the term “big data” being thrown around all the time now.
“We have big data.”
“We want big data.”
“We need to develop a big data solution.”
I believe that anything that gets people talking about data and its various applications is a positive. That being said, when most people say “big data” they tend to actually refer to traditional business intelligence instead of the true essence of big data. This misuse of the term “big data” happens all the time. This article is meant to clarify the definition of big data that seems to be so often misused.
What is Big Data?
Below are a few definitions of taken from various sources:
A term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. – SAS Insights
Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation. – Gartner
“Big data” refers to datasets whose size is beyond the ability of typical database
software tools to capture, store, manage, and analyze. – McKinsey
SAS, McKinsey, and Gartner are well-respected thought leaders. But notice how each definition provided is slightly different. In fact, the only commonality amongst the three definitions is that large quantities of data must be involved. Search most business magazines, thought-leaders, or any other source and you will probably find many more variations of the definition.
So which definition is correct?
If there are many definitions, shouldn’t one be the correct definition (or at least more correct) than the others? However, each source has defined big data correctly but through slightly different lenses. I tend to agree more with the SAS definition because it specifically calls out “unstructured data”.
Because I don’t want you to read this whole article and not get a concise definition laid out, here is my definition:
“Big data refers to datasets large in volume – incorporating both structured and unstructured data – that demand non-traditional database systems and business intelligence solutions to process, store, and analyze.” – Brewster Knowlton
I like this definitions because it addresses the size of data, the structured and unstructured components, and the fact that new and innovated solutions are required to handle the information we now gather. It is not particularly specific – there is no set definition of how many gigabytes, terabytes, or petabytes of data constitute big data – yet captures the essence of what I believe the term represents.
If not big data, then what?
If the above definition has made you realize that you are slightly misusing the term “big data”, then the next logical questions is “then what am I talking about?”.
More than likely, you are referring to traditional “business intelligence”. Traditional business intelligence refers to database systems, applications, and practices that allow an organization to analyze its structured raw data. In this context, the data typically tends to be structured (i.e. rows and columns) and often resides in a standard SQL database like Microsoft SQL Server, Oracle, or MySQL. For the vast majority of organizations, they are involved in developing and implementing a business intelligence practice as opposed to a big data practice.
I would argue that big data is a specific subset of business intelligence, but business intelligence is the umbrella term that the most often describes the data solutions implemented by most small to medium-sized organizations. When most people say “big data” they really mean “business intelligence” as they don’t tend to tick off all the boxes for what it really represents.
Even though I am writing this article about the difference between big data and business intelligence, I am also tempted to say “who cares”. Regardless, I do believe there are distinct differences between the two that are worth discussing.
But hey – as long as data is being used and a conscious effort is being put forth to analyze the vast amount now collected by organizations, I don’t care whether you call it “big data”, “business intelligence” or “peanut butter and jelly”.