How Much Data?
According to IBM, 90% of the data created in the history of the world, was created in the past 2 years. The article was looking at Social Media Information but the claim was generic. Talk about Information Overload. How do we keep up with this?
I worked with a very fast thinker once. Working with him was like trying to see ahead underwater while travelling in the wake of an outboard motor engine. The trick was to decide what to ignore so you could just address the important things. He used it as a tactic to get his own way during meetings. I was reminded of this while thinking about this topic. It seems the whole human race is about to face the same dilemma. How to sort the important information from the huge volume of total information being produced.
Not all of information produced is of the same quality, usefulness or relevance. Assessing Information Relevance will become increasingly more important. A post on Facebook letting us all know that someone’s dog just farted is not as valuable to know for most of us compared to the passing of a new law that puts a carbon tax on high carbon emitters.
The CERN Large Hadron Collider (LHC) is expected to produce data equal to 1% of the worlds production rate when it is running. This required a new approach to data storage. For those who aren’t familiar with it, the Large Hadron Collider is a higher energy version of the Australian Synchrotron which has specialised detectors that examine the fine details of how the matter of the universe is constructed. The intent is to look for evidence that the Higgs Boson exists as predicted by the Standard Model of particle physics.
I mention it here because they have to record the experimental data knowing that it may be some time before they can fully interpret it. They have planned for the Information Overload as well as the long term Information Storage.
In fact it is a great example of long term planning with the original proposal in 1985 and the construction beginning in 1994 and being complete in 2008. You see the steps involved in LHC Milestones.
So how do you store all that data?
If we used DVDs it would produce a stack that goes to the Moon and back. That’s too big to store as DVDs.
The increase in data comes from 3 sources:
- new data sources such as ubiquitous sensors, LHC, business metrics, research…
- increased data creation from existing sources such as social media, blogs, web publishing…
- unprecedented processing power
So far the storage solution is the growth of server farms and while many higher density storage technologies are being investigated, most data is stored on conventional hard disks. Redundacy and data security are of course hot topics.
The other major issue is how do we make sense of all this data. Traditional data Integration tools are considered to be not ready for Big Data, and this is likely to get worse before it gets better. Information Processing is going to be one of the opportunity areas of the next decade.
According to CNN, Data Scientist will be one of the hot jobs in 2022.
Even in the much smaller world of Successful Endeavours where we develop new products and have to do the Innovation, research, Prototypes and testing associated with them; managing all the data requires both discipline and planning.
Successful Endeavours specialise in Electronics Design and Embedded Software Development. Ray Keefe has developed market leading electronics products in Australia for nearly 30 years. This post is Copyright © 2012 Successful Endeavours Pty Ltd