What Happens in an Internet Minute? |
Terabytes, Petabytes, Exabytes. Who can keep track? These strange
terms have just begun to enter the business lexicon, but the hype
surrounding them has reached a fever pitch. We have undoubtedly entered
the age of big data.
1.What is Big Data?
1.1.Big data is a collection of data from traditional and digital sources
inside and outside your company that represents a source for ongoing
discovery and analysis.
Big Data Explained |
2.Where is it come from?
The data, that is loosely classified as big data, is mainly
user-generated content through various online platforms such as web
media, social media, blogs, video blogs, and through offline daily
interactions such as smart cards, credit card purchases, points cards
(membership or discount cards), interactive point of sale machines
(iPOS), and bar code scanners. This data is left behind by various
end-users such as businesses, educational institutions, healthcare and
scientific institutions, and of course individuals like us
3.Why use Big Data?
Like Google says: “Know the user. Know the magic. Connect the two.”
and yes that is when you are able to do cool stuff that can impact
people’s lives. When you combine online data with offline data, it
paints the whole picture of your target industry trends. What does these
trends and analysis tells us? We can benefit from them by finding
shifting and emerging trends in the industry and customer behaviour
patterns, purchase habits, declining sales and shares value, and adapt
our business strategy accordingly. It helps us optimize customer
interactions and put the right product in front of the right customer in
the right time.
Marketers can use big data analysis for informed decision making such
as customer profiling, behaviour analysis, RMF analysis (recencey,
frequency and monetary value of customers), and predicting trends,
sales, and customer behaviour.
According to IBM data analytics helps us “provide customized
intelligence to gain faster insight” from the information we have. We
can predict outcomes and financial trends by “turning information into
insight and developing conclusive, fact based strategies to gain that
competitive edge.”
4.Three V's of Big Data?
Big Data relates to data creation, storage, retrieval and analysis that is remarkable in terms of volume, velocity, and variety:
Three V's Of Big Data |
- Volume. A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data every day; a Boeing 737 will generate 240 terabytes of flight data during a single flight across the US; the proliferation of smart phones, the data they create and consume; sensors embedded into everyday objects will soon result in billions of new, constantly-updated data feeds containing environmental, location, and other information, including video.
- Velocity. Clickstreams and ad impressions capture user behavior at millions of events per second; high-frequency stock trading algorithms reflect market changes within microseconds; machine to machine processes exchange data between billions of devices; infrastructure and sensors generate massive log data in real-time; on-line gaming systems support millions of concurrent users, each producing multiple inputs per second.
- Variety. Big Data data isn't just numbers,
dates, and strings. Big Data is also geospatial data, 3D data, audio and
video, and unstructured text, including log files and social media.
Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent data structure. Traditional database systems are also designed to operate on a single server, making increased capacity expensive and finite. As applications have evolved to serve large volumes of users, and as application development practices have become agile, the traditional use of the relational database has become a liability for many companies rather than an enabling factor in their business.
5.Types of Data Big Data Handles.
1.Structured Data
Structured data first depends on creating
a data model – a model of the types of business data that
will be recorded and how they will be stored, processed and accessed. This
includes defining what fields of data will be stored and how that data will be
stored: data type (numeric, currency, alphabetic, name, date, address) and any
restrictions on the data input (number of characters; restricted to certain
terms such as Mr., Ms. or Dr.; M or F).
Structured data has the advantage of being easily entered, stored,
queried and analyzed. At one time, because of the high cost and performance
limitations of storage, memory and processing, relational databases and
spreadsheets using structured data were the only way to effectively manage
data. Anything that couldn't fit into a tightly organized structure would have
to be stored on paper in a filing cabinet.
2.Unstructured and Semi-Structured Data
Unstructured data is all those
things that can't be so readily classified and fit into a neat box: photos and
graphic images, videos, streaming instrument data, webpages, PDF files,
PowerPoint presentations, emails, blog entries, wikis and word processing
documents.
Semi-structured data is a cross between the two. It is a
type of structured data, but lacks the strict data model structure. With
semi-structured data, tags or other types of markers are used to identify
certain elements within the data, but the data doesn't have a rigid structure.
For example, word processing software now can include metadata showing the
author's name and the date created, with the bulk of the document just being
unstructured text. Emails have the sender, recipient, date, time and other
fixed fields added to the unstructured data of the email message content and
any attachments. Photos or other graphics can be tagged with keywords such as
the creator, date, location and keywords, making it possible to organize and
locate graphics. XML and other markup languages are often used to manage
semi-structured data.
3.Multi Structured Data
Multi-structured data refers to a variety of data formats
and types and can be derived from interactions between people and machines,
such as web applications or social networks. A great example is web log data,
which includes a combination of text and visual images along with structured
data like form or transnational information.As digital disruption transforms communication and
interaction channels—and as marketers enhance the customer experience across
devices, web properties, face-to-face interactions and social
platforms—multi-structured data will continue to evolve.
Example Application of Type of data Big Data Handles.
1. Web and
social media
This includes clickstream and social media data such as Facebook, Twitter, LinkedIn, and blogs. Big data governance programs will increasingly be required to integrate this data with master data and with core business processes such as customer loyalty programs. The big data governance program needs to establish policies regarding the acceptable use of social media data especially as regulations and precedents are continually evolving. The program also needs to establish guidelines regarding the acceptable use of cookies, especially third-party cookies, to track users and to personalize their web interactions. Metadata is also critical to web and social media. For example, two sites may measure the term “unique visitors” differently for clickstream analytics. One site may measure unique visitors within a month while another one may measure unique visitors within a week.
2.
Machine-to-machine data
Machine-to-machine (M2M) refers to technologies that allow both wireless and wired systems to communicate with other devices. M2M uses a device such as a sensor or meter to capture an event (such as speed, temperature, pressure, flow, or salinity) which is relayed through a wireless, wired, or hybrid network to an application that translates the captured event into meaningful information. M2M communications create the so-called “internet of things.” The big data governance program needs to establish a number of policies around M2M data. For example, the program needs to draw up guidelines around the acceptable use of geolocation and RFID data that can be used to build a profile of individuals and potentially violate their privacy. The program also needs to establish retention policies around the massive volumes of M2M data that can easily overwhelm IT budgets if not properly controlled. The big data governance program also needs to address any data quality concerns such as RFID read rates in environments with high moisture content and lots of congestion.
3.
Big transaction data
This includes healthcare claims, telecommunications call detail records, and utility billing records. Big transaction data is increasingly available in semi-structured and unstructured formats. Information governance challenges such as metadata, data quality, privacy, and information lifecycle management also apply to this data.
4.
Biometrics
Biometric information includes fingerprints, retinal scans, facial recognition, and genetics. Advances in technology have vastly increased the available biometric data. Law enforcement, the legal system, and intelligence agencies have been using this information for a long time. However, biometric data is increasingly available in the commercial arena where it can be co-mingled with other types of data such as social media. For example, page 45 of the attached FTC report describes a scenario where retailers can combine facial recognition with social media to personalize messages to customers.
5. Human generated data
Human beings generate vast quantities of data such as call center agents’ notes, voice recordings, email, paper documents, surveys, and electronic medical records. This data may contain sensitive information that needs to be masked. It may contain insights that can improve the quality of structured data sets and should be integrated with MDM. Finally, organizations need to establish policies regarding the retention period for this data to adhere to regulations and to manage storage costs.