Making Sense of BIG DATA (Part 1)

Making Sense of BIG DATA (Part 1)
COMMENTS ()
Tweet
One of the latest buzzwords that one hears often in the technology industry these data is BIG DATA and the importance of having BIG DATA Solutions, particularly in large enterprises and Fortune 500 companies. However to the layman the term makes no sense so in this series we will try to shed some light on the topic and the SOLUTIONS that are involved.

What is BIG DATA?

BIG DATA is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools. As such, there is huge challenge related to its acquisition, storage, maintenance and analysis. According to IBM (http://www-01.ibm.com/software/data/bigdata), companies face four main challenges when it comes to BIG DATA:

  • Volume – Is the data growing faster than they can handle?
  • Velocity – Is the data arriving too fast for them to act on?
  • Variety – Do they understand all the data they have?
  • Veracity – Do they trust the information they have?

However, most people don’t realize just how much of a part BIG DATA plays in their everyday lives, and the impact its correct management and processing has on their daily activities. Let’s look at some examples.

Financial Transactions – resulting from Banks, Insurance companies and Stock markets

Air travel Data – Data resulting from air planes, flight operations, airlines and airports. A single flight from London’s Heathrow Airport to John F. Kennedy in New York generates about 650TB of data (IBM)

Population Data – Recording data from people, families, births and deaths

Telecommunication Data – Data from subscribers, calls, messaging, internet and other services usage

Internet Data – Facebook posts, tweets, video uploads, news, blogs, emails, cloud storage, etc.

Industrial Data – Data generated from industrial manufacturing e.g. food industry, pharmaceuticals, garments, automobiles, etc.)

Sensor Data – Data received from sensors is usually continuous and massive in size. For example, a single oil well can have more than 20,000 individual sensors generating multiple terabytes per day

Astronomical Data – from telescopes, observatories, monitoring stations, space agencies, etc.

Weather Data – Data related to temperature, humidity, wind speed, wind direction, precipitation levels, etc.

Travel/Tourism Data – Aside from air travel, tourism itself generates tons of data from hotels, car rentals and other branches of travel industry

Shopping Data – from department stores, retailers, e-commerce websites, social media platforms, etc. Wal-Mart for example, logs one million transactions per hour at its retail locations

Healthcare Data – Medical records, medical imagery & medical equipment sensors also build up a vast quantity of data

Utilities Data – Billing data, sensor data from grids & electronic meters that record consumption

Planning BIG DATA Strategy

Considering just how big a role BIG DATA plays in our everyday lives, it is therefore vital for companies that handle BIG DATA to have processes and technologies in place to store, manage and leverage this data. That is where Big Data Solutions or Business Intelligence Services come in, as they help companies harvest, organize, store and analyze the data coming in; helping them extract actionable information which the organization can leverage to their advantage, such as:

  • Pinpointing new revenue-generating opportunities
  • Improving operational efficiencies & visibility across the organization
  • Optimizing the return on existing business and IT investments, such as data management, data mining, customer intelligence, customer relationship management & ERP technology
  • Achieving greater compliance with government and regulatory guidelines
  • Enabling faster decision making & problem-solving at strategic, operational and tactical levels

Business Intelligence (BI) services typically focus on 4 key areas i.e. data mining, data warehousing, information analytics and reporting. Within these areas Business Intelligence or Big Data Solutions offer a variety of sub-services such as Custom, Large & ERP Data Warehousing, ETL Services (Extraction, Transformation, Loading), Performance Management Solutions, Query & Analytics Services, Periodic & Ad Hoc Reporting, etc.

Creating such solutions however is not easy and requires a well-planned strategy on developing the solution’s architecture, which in turn requires looking at the big picture of how the data needs to be managed.

In order to understand just how a Big Data Solution’s architecture is conceived, we’ll compare it to a real life example, in this case a Library, as a library has many operations that resemble a BIG Data Solution’s architecture.

Let’s say we were to build a huge library. What would be the requirements, what would be the areas we would need to consider?

To setup a library, we would need to:

– Acquire a suitable space, making sure there’s ample room to start the library and to expand, as the collection of books increases.

– Create a structure and a boundary to outline the limits of the library.

– Define the process of acquiring books.

– Categorize and store the books in such a manner so as to facilitate quick retrieval.

– Define a process for library membership and a workflow for library members to access the books they desire.

– Have security protocols in place to prevent theft and to ensure the return of books at the correct time

– Have protocols in place for disaster recovery.

– Define a process for checking and maintaining the relevancy and health of titles in the library.

The above list of activities bears a surprising resemblance to the activities involved in building a BIG DATA store, which can be summarized as follows:

Acquisition

Or the collection of data from a variety of different sources. The process of fetching data from these sources will also involve a data validation procedure, in order to ensure that the data being gathered matches the criteria required. A well designed BIG DATA Solution will have a good deal of Metadata for the acquisition process itself.

Marshaling

The Storing, tagging and indexing of the acquired data in the data store and the implementation of techniques to optimize the data storage and retrieval process. This process would also involve the setup of data clustering or data replication processes

Analysis

Analysis to help identify trends and outliers, in order to uncover facts about the business (risks, operations, customers etc.).

Governance

The framework for keeping data secure, defining its access privileges, making sure the information is available and accessible and carrying out actions on the basis of facts that are uncovered.

Key Players in BIG DATA

BIG DATA Solutions are developed and managed by highly skilled staff and typically comprise of the following two roles:

BIG DATA Solution Architect

These individuals are software architects who have vast experience in the design and development of data projects and solutions. They the ones responsible for evaluating and developing the Big Data Solution’s architecture, by ensuring that enterprise data technology services are designed with an optimal balance of industry best practices, vendor recommendations, domain requirements, future orientation and tactical pragmatism. Based on these considerations, they recommend the appropriate Enterprise Development Services Solution Architecture that will support the organization’s corporate and business goals.

Data Scientist

In addition to solution architects, BIG DATA Solutions also require engineers to run and manage their day to day operations. These individuals known as “Data Scientists” are multi-faceted computer engineers who are also experts in variety of other subjects such as statistics, analytics, mathematics, data engineering, pattern recognition, advanced computing, visualization, uncertainty modeling & data warehousing.

They help companies mine and analyze their data for actionable information, by employing their deep expertise in one or more of the above scientific disciplines.

However data scientists and solution architects only form part of the equation. The other key component in a big data solution is the technology used – the platforms, tools and scripts that drive the entire process. The next part of this series will look at these technologies in more detail.

(to be continued … )

CALL

USA408 365 4638

VISIT

1301 Shoreway Road, Suite 160,

Belmont, CA 94002

Contact us

Whether you are a large enterprise looking to augment your teams with experts resources or an SME looking to scale your business or a startup looking to build something.
We are your digital growth partner.

Tel: +1 408 365 4638
Support: +1 (408) 512 1812