Sunday, 17 May 2015

What is Big Data?

In the process of discovering and determining these insights, large complex sets of data are generated that then must be managed, analyzed and manipulated by skilled professionals. The compilation of this large collection of data is collectively known as big data.

So, how big is big data?


Most professionals in the industry consider multiple terabytes or petabytes to be the current big data benchmark. Others, however, are hesitant to commit to a specific quantity, as the rapid pace of technological development may render today’s concept of big as tomorrow’s normal. Still others will define big data relative to its context. In other words, big data is a subjective label attached to situations in which human and technical infrastructures are unable to keep pace with a company’s data needs.

The Three – and Sometimes Four – V’s of Big Data

Though the word big implies such, big data isn’t simply defined by volume, it’s about complexity. Many small datasets that are considered big data do not consume much physical space but are particularly complex in nature. At the same time, large datasets that require significant physical space may not be complex enough to be considered big data.
In addition to volume, the big data label also includes data variety and velocity making up the three V’s of big data – volume, variety and velocity. Variety references the different types of structured and unstructured data that organizations can collect, such as transaction-level data, video, and audio, or text and log files. Velocity is an indication of how quickly the data can be made available for analysis.

In addition to the three V’s, some add a fourth to the big data definition. Veracity is an indication of data integrity and the ability for an organization to trust the data and be able to confidently use it to make crucial decisions.

Understanding the Big Picture of Big Data

To gain a better perspective on how much data is being generated and managed by big data systems, consider the following noteworthy facts:
  • According to IBM, users create 2.5 quintillion bytes of data every day. In practical terms, this means that 90% of the data in the world today has been created in the last two years alone
  • Walmart controls more than 1 million customer transactions every hour, which are then transferred into a database working with over 2.5 petabytes of information
  • According to FICO, the credit card fraud system currently in place helps protect over two billion accounts all over the globe
  • Facebook currently holds more than 45 billion photos in its user database, a number that is growing daily
  • The human genome can now be decoded in less than one week, a feat which originally took ten years to complete

Uses of Big Data

As stated earlier, organizations are increasingly turning to big data to discover new ways to improve decision-making, opportunities, and overall performance.  For example, big data can be harnessed to address the challenges that arise when information that is dispersed across several different systems that are not interconnected by a central system. By aggregating data across systems, big data can help improve decision-making capability. It also can augment data warehouse solutions by serving as a buffer to process new data for inclusion in the data warehouse or to remove infrequently accessed or aged data.
Big data can lead to improvements in overall operations by giving organizations greater visibility into operational issues. Operational insights might depend on machine data, which can include anything from computers to sensors or meters to GPS devices. Big data provides unprecedented insight on customers’ decision-making processes by allowing companies to track and analyze shopping patterns, recommendations, purchasing behavior and other drivers that are known to influence sales.

 Big Data - Videos





1 comment: