Monday 8 June 2015

All about Big Data

Before Understand What is Big data,Please know What Happens in an Internet Minute


What Happens in an Internet Minute?

Terabytes, Petabytes, Exabytes.  Who can keep track?  These strange terms have just begun to enter the business lexicon, but the hype surrounding them has reached a fever pitch.  We have undoubtedly entered the age of big data.

1.What is Big Data?

1.1.Big data is a collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis.
 
Big Data Explained

1.2.Big data is not just about the size of the data. Well… yes it is data and it is big but it’s not all about the size of the data. Big data is about the pace at which the data is generated and the complexity of the attributes it carries. For example, the data could be in all sorts of data types i.e. tables, documents, chat logs, blog postings, videos, and formats i.e. html, .doc, .pdf, .txt, .ppt, etc., from various sources, and produced in different languages and styles.

 

2.Where is it come from?

The data, that is loosely classified as big data, is mainly user-generated content through various online platforms such as web media, social media, blogs, video blogs, and through offline daily interactions such as smart cards, credit card purchases, points cards (membership or discount cards), interactive point of sale machines (iPOS), and bar code scanners. This data is left behind by various end-users such as businesses, educational institutions, healthcare and scientific institutions, and of course individuals like us

3.Why use Big Data?

Like Google says: “Know the user. Know the magic. Connect the two.” and yes that is when you are able to do cool stuff that can impact people’s lives. When you combine online data with offline data, it paints the whole picture of your target industry trends. What does these trends and analysis tells us? We can benefit from them by finding shifting and emerging trends in the industry and customer behaviour patterns, purchase habits, declining sales and shares value, and adapt our business strategy accordingly. It helps us optimize customer interactions and put the right product in front of the right customer in the right time.
Marketers can use big data analysis for informed decision making such as customer profiling, behaviour analysis, RMF analysis (recencey, frequency and monetary value of customers), and predicting trends, sales, and customer behaviour.
According to IBM data analytics helps us “provide customized intelligence to gain faster insight” from the information we have. We can predict outcomes and financial trends by “turning information into insight and developing conclusive, fact based strategies to gain that competitive edge.”

4.Three V's of  Big Data?

Big Data relates to data creation, storage, retrieval and analysis that is remarkable in terms of volume, velocity, and variety:


Three V's Of Big Data
  • Volume. A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data every day; a Boeing 737 will generate 240 terabytes of flight data during a single flight across the US; the proliferation of smart phones, the data they create and consume; sensors embedded into everyday objects will soon result in billions of new, constantly-updated data feeds containing environmental, location, and other information, including video.
  • Velocity. Clickstreams and ad impressions capture user behavior at millions of events per second; high-frequency stock trading algorithms reflect market changes within microseconds; machine to machine processes exchange data between billions of devices; infrastructure and sensors generate massive log data in real-time; on-line gaming systems support millions of concurrent users, each producing multiple inputs per second.
  • Variety. Big Data data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media.
    Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent data structure. Traditional database systems are also designed to operate on a single server, making increased capacity expensive and finite. As applications have evolved to serve large volumes of users, and as application development practices have become agile, the traditional use of the relational database has become a liability for many companies rather than an enabling factor in their business. 

5.Types of Data Big Data Handles.
















1.Structured Data

Structured data first depends on creating a data model – a model of the types of business data that will be recorded and how they will be stored, processed and accessed. This includes defining what fields of data will be stored and how that data will be stored: data type (numeric, currency, alphabetic, name, date, address) and any restrictions on the data input (number of characters; restricted to certain terms such as Mr., Ms. or Dr.; M or F).
Structured data has the advantage of being easily entered, stored, queried and analyzed. At one time, because of the high cost and performance limitations of storage, memory and processing, relational databases and spreadsheets using structured data were the only way to effectively manage data. Anything that couldn't fit into a tightly organized structure would have to be stored on paper in a filing cabinet.

2.Unstructured and Semi-Structured Data

Unstructured data is all those things that can't be so readily classified and fit into a neat box: photos and graphic images, videos, streaming instrument data, webpages, PDF files, PowerPoint presentations, emails, blog entries, wikis and word processing documents.
Semi-structured data is a cross between the two. It is a type of structured data, but lacks the strict data model structure. With semi-structured data, tags or other types of markers are used to identify certain elements within the data, but the data doesn't have a rigid structure. For example, word processing software now can include metadata showing the author's name and the date created, with the bulk of the document just being unstructured text. Emails have the sender, recipient, date, time and other fixed fields added to the unstructured data of the email message content and any attachments. Photos or other graphics can be tagged with keywords such as the creator, date, location and keywords, making it possible to organize and locate graphics. XML and other markup languages are often used to manage semi-structured data.

3.Multi Structured Data

Multi-structured data refers to a variety of data formats and types and can be derived from interactions between people and machines, such as web applications or social networks. A great example is web log data, which includes a combination of text and visual images along with structured data like form or transnational information.As digital disruption transforms communication and interaction channels—and as marketers enhance the customer experience across devices, web properties, face-to-face interactions and social platforms—multi-structured data will continue to evolve.

Example Application of Type of data Big Data Handles.




1.       Web and social media

This includes clickstream and social media data such as Facebook, Twitter, LinkedIn, and blogs. Big data governance programs will increasingly be required to integrate this data with master data and with core business processes such as customer loyalty programs. The big data governance program needs to establish policies regarding the acceptable use of social media data especially as regulations and precedents are continually evolving. The program also needs to establish guidelines regarding the acceptable use of cookies, especially third-party cookies, to track users and to personalize their web interactions. Metadata is also critical to web and social media. For example, two sites may measure the term “unique visitors” differently for clickstream analytics. One site may measure unique visitors within a month while another one may measure unique visitors within a week.

2.       Machine-to-machine data

Machine-to-machine (M2M) refers to technologies that allow both wireless and wired systems to communicate with other devices. M2M uses a device such as a sensor or meter to capture an event (such as speed, temperature, pressure, flow, or salinity) which is relayed through a wireless, wired, or hybrid network to an application that translates the captured event into meaningful information. M2M communications create the so-called “internet of things.” The big data governance program needs to establish a number of policies around M2M data. For example, the program needs to draw up guidelines around the acceptable use of geolocation and RFID data that can be used to build a profile of individuals and potentially violate their privacy. The program also needs to establish retention policies around the massive volumes of M2M data that can easily overwhelm IT budgets if not properly controlled. The big data governance program also needs to address any data quality concerns such as RFID read rates in environments with high moisture content and lots of congestion.

3.       Big transaction data

This includes healthcare claims, telecommunications call detail records, and utility billing records. Big transaction data is increasingly available in semi-structured and unstructured formats. Information governance challenges such as metadata, data quality, privacy, and information lifecycle management also apply to this data.

4.       Biometrics

Biometric information includes fingerprints, retinal scans, facial recognition, and genetics. Advances in technology have vastly increased the available biometric data. Law enforcement, the legal system, and intelligence agencies have been using this information for a long time. However, biometric data is increasingly available in the commercial arena where it can be co-mingled with other types of data such as social media. For example, page 45 of the attached FTC report describes a scenario where retailers can combine facial recognition with social media to personalize messages to customers.

5.      Human generated data

Human beings generate vast quantities of data such as call center agents’ notes, voice recordings, email, paper documents, surveys, and electronic medical records. This data may contain sensitive information that needs to be masked. It may contain insights that can improve the quality of structured data sets and should be integrated with MDM. Finally, organizations need to establish policies regarding the retention period for this data to adhere to regulations and to manage storage costs.


2 comments:

  1. Had a clear idea about Bigdata. Keep posting more about it.

    ReplyDelete
    Replies
    1. Thanks for your comments.I will post the detailed contents about big data in upcoming posts.I have planned to take the big data class from very basic to the end,even non IT profiler can understand.

      Thanks for your support.

      Delete