'Big data' - the bigger picture
01 May 2012
According to IBM, we create 2.5 quintillion* bytes of data every day, and that 90 percent of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few.
These huge data sets are termed 'big data'.
The IT research group, Gartner describes big data as "the volume, variety and velocity of structured and unstructured data pouring through networks into processors and storage devices, along with the conversion of such data into business advice for enterprises." Properly handled and analysed, big data might also reap big rewards for a canny user; market researcher McKinsey suggests that a retailer making the most of big data could increase its operating margin by more than 60 percent. What, then, might it achieve for a national economy?
As the amount of data continues to grow exponentially - compounded by the Internet, social media, cloud computing, mobile devices and the like - it poses both challenge and opportunity: how to manage, analyse and make use of all this data as it is generated. A recent report from the Centre for Economics and Business Research, which investigated how UK organisations can unlock the economic value of big data through the adoption of big data analytics, suggests that it could add £216bn to the UK economy by 2017 and create 58,000 jobs.
Rupert Ogilvie, an optimisation consultant at Cambridge based Intergence, believes visualisation can be a key tool in making full use of big data, helping users explore and communicate the data through graphic representations and by doing so, profit from it.
As Gartner's definition suggests, big data is the convergence of the three Vs: Volume, Variety and Velocity. Standard data management techniques can handle volume – for example, enormous datasets can be exploited by well configured relational databases - while variety and velocity can be handled by good process management and conventional business intelligence practices. But, warns Dr Ogilvie, big data management has to juggle the convergence of all three.
‘The Cloud’ is often talked about in the same breath as big data. But what is it about the cloud (public, private or a mixture of the two) that makes it so appealing to those looking to utilise their big data? Scalability is a big plus point for big data and the cloud; if the real-time feeds providing data suddenly rocket in volume due to an external event, the cloud can respond at speed, minimising the risk of data loss. Although all the data can, in theory at least, be stored in the cloud, the organisation using it can choose how much it needs to pull back for presentation and further analysis.
This flexibility in resource usage can be a problem for organisations, either when they plan their upgrade path or when budgeting for their next cloud bill. Visualisation can help these organisations look at what was used and when, explains Dr Ogilvie, as well as tracking the usage trends over time for the future-proofing of their private clouds. Data Visualisation is all about telling a story and big data visualisation is no different, he asserts.
Often what is valuable in the data isn’t just the hard numbers, but the trends – how they change over time. Visualisation is an invaluable tool in identifying trends within massive data sets, spotting anomalies as well as outliers and providing a common framework in which to view the data from the many different data sources.
According to Dr Ogilvie, visualisation allows the user to cut into, and move between, different granularities of data. From a high level overview, the user can drill down to those nuggets, which might previously have been discarded, to search out answers and perform deeper analysis on the data. Creating subsets and groups can reduce the data density, allowing rapid summaries of different sections of the data, and helping the user find the right level of information. Once the main data set has been acquired, it can be manipulated and drilled down to identify the underlying raw data and its sources.
And then you need to ask the question: do you need static and real-time views on your data? Visualisation can show the state of a network or process at a single point, as well as stream the data to you in real time. Using advanced visualisation techniques it is possible to replay/rewind data to look for the root cause of problems and how trends shift over time.
If a process has multiple inputs from different data sources it is possible to see quite quickly if the various inputs to a process are being updated with sufficient regularity. When planning new processes and thresholds, the ability to pull up views showing the velocity of the needed sources can provide a valuable insight into the amount of work required to scrub and clean the data.
Finally, variety. Having a common view on the data removes the worries associated with the different structured and semi-structured raw data. Having this template allows an organisation to have confidence that as new data sources become available, the data that they provide will fit seamlessly with minimum requirement for change.
As well as assisting with this common framework, visualisation will allow an organisation to overlay and combine data from different sources in different views for different levels of an organisation and departments. Having a common visualisation tool for the whole organisation provides a solid collaboration and communication platform helping improve user work flows.
If you would like to meet members of Dr Ogilvie’s company, Intergence, they are exhibiting at IDC’s ‘Evolution of the Datacentre 2012’ conference (May 22) in London. For more information about this event, click here.
Les Hunt
Editor
*To the best of my knowledge, this is 10 raised to the power of 18. You may disagree; I rarely encounter such unimaginable numbers – and what are a couple of orders of magnitude out either way between friends!
Reader comment:
From Mr Kristen Cadman:
Use of lots of big data may help some firms match their products to the needs of the market, which gives them a competitive edge against other firms. This cannot be applied to all to give a spurious growth figure because that would require the economy to grow rather than divert resources. In my experience, making firms more efficient is a detailed task using precise (rather than statistical) information dealing with people and machines to help them work well.
IBM concentrated on its core capabilities and opted out of the software for its computers. Probably the worst business decision in history. Later it opted out of hardware manufacture. Is IBM now going to concentrate on its core skills by advising firms on how to eliminate themselves from the economy?
Contact Details and Archive...