The story behind Big Data

Thanoshan MV
4 min readMar 23, 2019

What is Big Data?

First of all I would like to clarify that both Big Data and Data Mining are two different things. Both of them relate to the use of large data sets to handle the collection or reporting of data that serves businesses or other recipients.

So, what actually is Big Data?

Gartner’s definition on the term Big Data,

Data that contain greater variety arriving in increasing volumes and with ever-higher velocity is known as Big Data.

I’m going to put this to you in a simple way, Big Data is a huge amount of data that traditional data processing software can’t handle!

Big Data can be defined using the three Vs term. They are Volume, Velocity, and Variety.

Volume

Big Data don’t sample, it observe and track what happens. It have to process high volumes of low density data. For instance Twitter data feeds, click streams on a web page or a mobile application. For some organizations this volume might be tens of terabytes of data.

Velocity

Velocity is the data transaction rate at which it’s received and acted on. Big Data is often available in real-time and will require real-time evaluation and action.

Variety

Data comes in all types of format. Even though traditional data types were structured and fit neatly in a relational database, eventually with the rise of Big Data, data comes in unstructured and semi-structured data types, such as text, images, audio, video streams etc.

The History of Big Data

The origin of large data sets go back to 1960s and ’70s when the world of data was just getting started.

Around 2005, the term Big Data was coined by Roger Mougalas. People began to realize how much data users generated through Facebook, YouTube, and other online services.

Hadoop (an open-source platform originally designed to store and process big data sets), NoSQL and more recently Spark are allowing big data to expand as they make it easier to work with and less costly to store.

With the advent of the Internet of Things (IoT) in this modern era, not only humans generate huge amounts of data, but also Internet-connected objects and devices, gathering data on customer patterns and the emergence of machine learning has produced more data in this area.

How Big Data works ?

Big Data work mainly in three ways,

1.Integrate

Big Data is made up of data from many different sources. They need to incorporate Big Data in order to evaluate Big Data sets at terabyte, or even petabyte size. Traditional methods of data integration are not up to the level like ETL. They are therefore utilizing new strategic tools. During the integration mechanism, we need to incorporate the data, process it and ensure that it is formatted in a form that is available.

2.Manage

Obviously, big data requires storage. You can choose your storage solution according to where your data is currently residing. Storage solution can be in the cloud, on premises, or both.

3.Analyze

Investment in big data pays off when we analyze and act on our data. Explore the data further to make new discoveries and we can build data models with Machine Learning and Artificial Intelligence through analyzing big data.

Big Data storage

It’s a massive infrastructure that is designed specifically to store, manage and retrieve massive amounts of big data.

As we’re talking about Big Data storage, it should enable the sorting of Big Data in order to access them in real-time or near real-time responses that are processed by applications and services.

Big Data storage infrastructure tend to demand high processing, very large capacity and security.

A typical Big Data storage architecture is made up of a scalable direct attached storage (DAS) pools and scale-out network attached storage (NAS) or an infrastructure based on object storage format.

Moreover, we have some supportive tools that are designed for store and analyse Big Data such as Apache Hadoop, Microsoft HDInsight, NoSqL, Hive, Sqoop and PolyBase.

Big Data Pros and Cons

I’ll list you down the advantages and disadvantages of Big Data below.

Advantages of Big Data

  1. Better decision-making
  2. Greater innovation
  3. Increased revenue
  4. Increased productivity
  5. Helping a lot in Machine learning & AI

Disadvantages of Big Data

  1. Cyber Security risks
  2. Rapid change of technology
  3. Costs
  4. Difficulty in integrating legacy systems.

Conclusion

As the emergence of Big Data, it’s useful to use them for our organization. Using them appropriately will give better benefits. I suggest you to read more articles related on Big Data.

Thank you for reading!

--

--