What the hell is big data?
This is a question I am debating on a daily basis these past few weeks.
Do you write it as a single word – BigData – or two words – Big Data?
Is it a blue elephant logo with yellow text on it? Or is it the other way around?
Maybe Big Data is the 0’s and 1’s tunnel stock photo?
Is 100 terabytes enough to be termed Big Data or do you need to wait until you reach a petabyte to call it BigData? What I do know is that Big Data is not a technology – it is just a concept. A mindset. When you search for the term Big Data, what you find out is the 3 V’s:
- Volume – large amounts of data (again – what does large means?)
- Variety – large types and sources of data to deal with (sometimes known in advance, sometimes unknown in advance, or unstructured, or maybe with some semi-structure to them)
- Velocity – the speed at which you need to process them
All of the above are said to accelerate at a pace that our “current” technologies can’t accommodate. So what do we do? Well, we reduce our requirements in order to be able to handle it all. But at the same time we change the architecture and our mindset. With Big Data you will tend to store more data than you used to – even data you are not sure you will have uses for – because you might need it later for some purpose. With Big Data you will try to strive for real time analytics – real time everything while you are at it (and then you will find it not to be as easy as it sounds). With Big Data you will become a data hog. With Big Data you will need new skillsets in your organization. When you want to ride the hype of Big Data, please ask yourself some questions before you start:
- Is it really something you need?
- Have you chosen your database just because it is cool, new, NoSQL’ish in nature and is referred to as Big Data? Or did you have a solid reason for using it?
- How do you see yourself in the future? Is the data you hold an asset? Of what kind? Will you be analyzing it massively? Try and monetize it? Use it for other purpose?
I could have easily decided to place this blog on MongoDB or Hadoop instead of the crap of a MySQL database that it uses. Probably not easily, but it is possible. But to what end? Would that give me any gains over MySQL? Probably none. Want to use Big Data? Make sure you understand what it is and what technologies were selected for your Big Data project before you go off on that adventure.
While I think BigData is a term which has been diluted almost as much as Cloud. Here is a definition of what I think it is:
Big Data is usually a new database technology, like NoSQL. It is usually also a distributed database. Which depends on distributed storage to even function. Possibly using Map Reduce.
As someone somewhere else already mentioned:
“The main challenge that these databases solve is how to handle massive amount of data at a reasonable cost and without poor performance”
This is where we disagree.
Big Data for me isn’t a new database technology (BTW – NoSQL isn’t a database technology either – it is just a term that describes a wide and varied set of database types) – it is a mindset – it says what you plan to do with the data you are collecting and how you are going to treat it.
It usually revolves around large sets of data, but not always. It usually does include unstructued data, but not always. It is always a compromise in terms of the capabilities you had up until today in the way you handled data – one you are willing to take in order to solve the issues that come from the 3 V’s.