1.Structured
Data
Structured data first depends on creating
a data model – a model of the types of business data that
will be recorded and how they will be stored, processed and accessed. This
includes defining what fields of data will be stored and how that data will be
stored: data type (numeric, currency, alphabetic, name, date, address) and any
restrictions on the data input (number of characters; restricted to certain
terms such as Mr., Ms. or Dr.; M or F).
Structured data has the advantage of being easily entered, stored,
queried and analyzed. At one time, because of the high cost and performance
limitations of storage, memory and processing, relational databases and
spreadsheets using structured data were the only way to effectively manage
data. Anything that couldn't fit into a tightly organized structure would have
to be stored on paper in a filing cabinet.
2.Unstructured and Semi-Structured Data
Unstructured data is all those
things that can't be so readily classified and fit into a neat box: photos and
graphic images, videos, streaming instrument data, webpages, PDF files,
PowerPoint presentations, emails, blog entries, wikis and word processing
documents.
Semi-structured data is a cross between the two. It is a
type of structured data, but lacks the strict data model structure. With
semi-structured data, tags or other types of markers are used to identify
certain elements within the data, but the data doesn't have a rigid structure.
For example, word processing software now can include metadata showing the
author's name and the date created, with the bulk of the document just being
unstructured text. Emails have the sender, recipient, date, time and other
fixed fields added to the unstructured data of the email message content and
any attachments. Photos or other graphics can be tagged with keywords such as
the creator, date, location and keywords, making it possible to organize and
locate graphics. XML and other markup languages are often used to manage
semi-structured data.
3.Multi Structured Data
Multi-structured data refers to a variety of data formats
and types and can be derived from interactions between people and machines,
such as web applications or social networks. A great example is web log data,
which includes a combination of text and visual images along with structured
data like form or transnational information.As digital disruption transforms communication and
interaction channels—and as marketers enhance the customer experience across
devices, web properties, face-to-face interactions and social
platforms—multi-structured data will continue to evolve.
Example Application of Type of data Big Data Handles.
1. Web and
social media
This includes clickstream and social media data such as Facebook, Twitter,
LinkedIn, and blogs. Big data governance programs will increasingly be required
to integrate this data with master data and with core business processes such
as customer loyalty programs. The big data governance program needs to
establish policies regarding the acceptable use of social media data especially
as regulations and precedents are continually evolving. The program also needs
to establish guidelines regarding the acceptable use of cookies, especially
third-party cookies, to track users and to personalize their web interactions.
Metadata is also critical to web and social media. For example, two sites may
measure the term “unique visitors” differently for clickstream analytics. One
site may measure unique visitors within a month while another one may measure
unique visitors within a week.
2.
Machine-to-machine data
Machine-to-machine (M2M) refers to technologies that allow both wireless and
wired systems to communicate with other devices. M2M uses a device such as a
sensor or meter to capture an event (such as speed, temperature, pressure,
flow, or salinity) which is relayed through a wireless, wired, or hybrid
network to an application that translates the captured event into meaningful
information. M2M communications create the so-called “internet of things.” The
big data governance program needs to establish a number of policies around M2M
data. For example, the program needs to draw up guidelines around the
acceptable use of geolocation and RFID data that can be used to build a profile
of individuals and potentially violate their privacy. The program also needs to
establish retention policies around the massive volumes of M2M data that can
easily overwhelm IT budgets if not properly controlled. The big data governance
program also needs to address any data quality concerns such as RFID read rates
in environments with high moisture content and lots of congestion.
3.
Big transaction data
This includes healthcare claims, telecommunications call detail records, and
utility billing records. Big transaction data is increasingly available in
semi-structured and unstructured formats. Information governance challenges
such as metadata, data quality, privacy, and information lifecycle management
also apply to this data.
4.
Biometrics
Biometric
information includes fingerprints, retinal scans, facial recognition, and
genetics. Advances in technology have vastly increased the available biometric
data. Law enforcement, the legal system, and intelligence agencies have been
using this information for a long time. However, biometric data is increasingly
available in the commercial arena where it can be co-mingled with other types
of data such as social media. For example, page 45 of the attached FTC report
describes a scenario where retailers can combine facial recognition with social
media to personalize messages to customers.
5. Human generated data
Human
beings generate vast quantities of data such as call center agents’ notes,
voice recordings, email, paper documents, surveys, and electronic medical
records. This data may contain sensitive information that needs to be masked.
It may contain insights that can improve the quality of structured data sets
and should be integrated with MDM. Finally, organizations need to establish
policies regarding the retention period for this data to adhere to regulations
and to manage storage costs.