What the hell is big data?
This is a question I am debating on a daily basis these past few weeks.
Do you write it as a single word – BigData – or two words – Big Data?
Is it a blue elephant logo with yellow text on it? Or is it the other way around?
Maybe Big Data is the 0′s and 1′s tunnel stock photo?
Is 100 terabytes enough to be termed Big Data or do you need to wait until you reach a petabyte to call it BigData? What I do know is that Big Data is not a technology – it is just a concept. A mindset. When you search for the term Big Data, what you find out is the 3 V’s:
- Volume – large amounts of data (again – what does large means?)
- Variety – large types and sources of data to deal with (sometimes known in advance, sometimes unknown in advance, or unstructured, or maybe with some semi-structure to them)
- Velocity – the speed at which you need to process them
All of the above are said to accelerate at a pace that our “current” technologies can’t accommodate. So what do we do? Well, we reduce our requirements in order to be able to handle it all. But at the same time we change the architecture and our mindset. With Big Data you will tend to store more data than you used to – even data you are not sure you will have uses for – because you might need it later for some purpose. With Big Data you will try to strive for real time analytics – real time everything while you are at it (and then you will find it not to be as easy as it sounds). With Big Data you will become a data hog. With Big Data you will need new skillsets in your organization. When you want to ride the hype of Big Data, please ask yourself some questions before you start:
- Is it really something you need?
- Have you chosen your database just because it is cool, new, NoSQL’ish in nature and is referred to as Big Data? Or did you have a solid reason for using it?
- How do you see yourself in the future? Is the data you hold an asset? Of what kind? Will you be analyzing it massively? Try and monetize it? Use it for other purpose?
I could have easily decided to place this blog on MongoDB or Hadoop instead of the crap of a MySQL database that it uses. Probably not easily, but it is possible. But to what end? Would that give me any gains over MySQL? Probably none. Want to use Big Data? Make sure you understand what it is and what technologies were selected for your Big Data project before you go off on that adventure.
Liked this post?
Or just grab the RSS feed!