Big Data

Abhinav shukla
11 min readSep 17, 2020

Big data is large amount of data. Big Data in normal layman’s term can be described as a huge volume of unstructured data. It is a term used to describe data that is huge in amount and which keeps growing with time. Big Data consists of structured, unstructured and semi-structured data. This data can be used to track and mine information for analysis or research purpose.

What is Big Data?

Big data in simple terms is a large amount of structured, unstructured, semi-structured data that can be used to for analysis purpose.

  • Volume: The name Big Data itself suggest it contains large amount of data. The size of the data is very important in determining whether the data is “Big data” or not. Hence, “Volume” is an important characteristic when dealing with Big data.
  • Velocity: Velocity is the speed at which data is generated. In Big Data the velocity is a measure of determining the efficiency of the data. The more quickly the data is generated and processed will determine the data’s real potential. The flow of data is huge and Velocity is one of the characteristics of Big Data.
  • Variety: Data comes in various forms, structured, unstructured, numeric, etc. Earlier spreadsheets and database were considered as data. But now pdf’s, emails, audio, etc are considered for analysis.

Let us know more about Big Data

Big Data has turned out to be really important for businesses who want to maintain their files and huge amount of data. Companies have moved to Big Data technologies in order to maintain data for analysis or business development purposes.

Importance of Big Data:

Big Data is important not in terms of volume but in terms of what you do with the data and how you utilize it to make analysis in order to benefit your business and organisation.

Big Data helps analyse:

  • Time
  • Cost
  • Product Development
  • Decision Making, etc

Big data when teamed up with Analytics help you determine root causes of failure in businesses, analyse sales trends based on analysing the customer buying history. Also help determine fraudulent behaviour and reduce risks that might affect the organisation.

Big Data Technology has given us multiple advantages, Out of which we will now discuss a few.

  • Big Data has enabled predictive analysis which can save organisations from operational risks.
  • Predictive analysis has helped organisations grow business by analysing customer needs.
  • Big Data has enabled many multimedia platforms to share data Ex: youtube, Instagram.
  • Medical and Healthcare sectors can keep patients under constant observations.
  • Big Data changed the face of customer-based companies and worldwide market

Big Data Categories

  • Structured
  • Unstructured
  • Semi-structured

Structured Data: Data which is stored in a fixed format is called as Structured Data. In structured data the data is formatted so that it is easily accessible and can be used for analysis.

Unstructured Data: Any data whose structure is not classified is known as unstructured data. Unstructured data is very huge in size. Unstructured data usually consists of data that contains combination of text, images, files, etc. They do not use conventional database models.

Semi-structured Data: It contains both structured as well as unstructured data. The data is not organized in a repository but has associated information which makes it accessible.

Characteristics of Big Data


Volume refers to the unimaginable amounts of information generated every second from social media, cell phones, cars, credit cards, M2M sensors, images, video, and whatnot. We are currently using distributed systems, to store data in several locations and brought together by a software Framework like Hadoop.

Facebook alone can generate about billion messages, 4.5 billion times that the “like” button is recorded, and over 350 million new posts are uploaded each day. Such a huge amount of data can only be handled by Big Data Technologies.


As Discussed before, Big Data is generated in multiple varieties. Compared to the traditional data like phone numbers and addresses, the latest trend of data is in the form of photos, videos, and audios and many more, making about 80% of the data to be completely unstructured

Structured data is just the tip of the iceberg.


Veracity basically means the degree of reliability that the data has to offer. Since a major part of the data is unstructured and irrelevant, Big Data needs to find an alternate way to filter them or to translate them out as the data is crucial in business developments


Value is the major issue that we need to concentrate on. It is not just the amount of data that we store or process. It is actually the amount of valuable, reliable and trustworthy data that needs to be stored, processed, analyzed to find insights.


Last but never least, Velocity plays a major role compared to the others, there is no point in investing so much to end up waiting for the data. So, the major aspect of Big Data is to provide data on demand and at a faster pace.

How are the Top MNCs Using Big Data Analytics to their Advantage?

1-Using Big Data Analytics to Boost Customer Acquisition and Retention

The utilization of big data enables organizations to watch different client related examples and patterns. Watching client conduct is essential to trigger devotion. Hypothetically, the more information that a business gathers, the more illustrations and patterns the company can probably recognize.

In the cutting-edge business world and the present innovation age, a business can, without much of a stretch, gather all the client information it needs. The client information implies it is straightforward the cutting-edge customer. Fundamentally, all that is essential is having a significant information investigation methodology to expand the knowledge available to you.

With an appropriate client information examination component set up, a business will have the capacity to infer primary conduct bits of knowledge that it needs to follow up on to hold the client base.

2-Use of Big Data Analytics to Solve Advertisers Problem and Offer Marketing Insights

Big data analytics can help change all business activities. The analytics incorporates the capacity to coordinate client desire, changing an organization’s product offering and guaranteeing that the showcasing efforts are incredible.

How about we face the stripped truth here. Organizations have lost millions spent in running ads that are not productive. For what reason is this incident? There is a high probability that they avoided the exploration stage.

3-Big Data Analytics for Risk Management

The extraordinary occasions and profoundly unsafe business condition calls for better chance administration forms. Fundamentally, a hazard of the executive’s plan is underlying speculation for any business paying little respect to this area.

Having the option to observe, in advance, a potential hazard and moderating it before it happens is necessary if the business is to stay beneficial. Business specialists will advise that a venture hazard the executives includes considerably more than guaranteeing your business has apt protection.

4-Big Data Analytics as a Driver of Innovations and Product Development

Another immense bit of leeway of enormous information is the capacity to help organizations improve and redevelop their items. Fundamentally, the vast information has turned into a road for making extra income streams through empowering advancements and item improvement.

Associations start by amending as much information as would be conceivable before planning new product offerings and re-structuring the current items. Each structure procedure needs to start from setting up what precisely fits the clients.

There are different channels through which an association can contemplate client needs. At that point, the business can recognize the best way to deal with it again by that need dependent on big data analytics.

5-Use of Big Data in Supply Chain Management

Big data offers provider systems more prominent exactness, clearness, and Insights. Through the utilization of enormous information investigation, providers accomplish logical insight over the supply chains. Fundamentally, through massive information investigation, providers can get away from the limitations confronted before.

Previously, the data was using the conventional undertaking management frameworks, and the store network the executive’s frameworks. These inheritance applications didn’t use enormous information investigation, and in this manner, providers brought about colossal misfortunes and were inclined to making mistakes.

Be that as it may, through present-day methodologies based on vast information, the providers can almost certainly use on more significant amounts of logical knowledge, which is essential for store network achievement.

Here are some Uses of Big Data and where it is used

  • Health Care
  • Detect Frauds
  • Social Media Analysis
  • Weather
  • Public sector.

Contribution of Big Data in Health Care

The contribution of Big Data in Healthcare domain has grown largely. With medical advances there was need to store large amount of data of the patients. Big data is used extensively to store the patients health history.

This data can be used to analyse the patients health condition and to prevent health failures in future.

Detect Fraud

Fraud detection and prevention is one of the many uses of BIg Data today. Credit card companies face a lot of frauds and big data technologies are used to detect and prevent them.

Earlier credit card companies would keep a track on all the transactions and if any suspicious transaction is found they would call the buyer and confirm if that transaction was made. But now the buying patterns are observed and fraud affected areas are analysed using Big Data analytics. This is very useful in preventing and detecting frauds.

Social Media Analysis

The best use case of big data is the data that keeps flowing on social media networks like, Facebook, Twitter, etc. The data is collected and observed in the form of comments, images, social statuses, etc.

Companies use big data techniques to understand the customers requirements and check what they say on social media. This helps companies to analyse and come up strategies that will be beneficial for the companies growth.


Big Data technologies are used to predict the weather forecast. Large amount of data is feeded on the climate and an average is taken to predict the weather This can be useful to predict natural calamities such as floods, etc.

Examples of how some MNCs are handling Big Data Analytics

  1. Amazon

The online retail goliath has got access to a gigantic measure of information on its clients; names, locations, installments, and search accounts are altogether documented in its information bank. While this data is put to use in publicizing calculations, Amazon likewise utilizes the data to improve client relations, a region that numerous big data users disregard.

Whenever you contact the Amazon help work area with an inquiry, don’t be astounded when the worker on the opposite end has already received a large portion of the relevant data about you close by. The applicable data takes into consideration a quicker, progressively practical client administration experience that does exclude illuminating your name multiple times.

2. American Express

The American Express Company is utilizing big data to break down and anticipate shopper conduct. By taking a gander at authentic exchanges and fusing more than 100 factors, the organization uses refined prescient models instead of conventional business insights based on knowing the past.

Current time permits an increasingly precise conjecture of potential beat and client dedication. American Express has guaranteed that, in their Australian market, they can anticipate 24% of records that will close within four months.

3. BDO

National bookkeeping and review firm BDO puts enormous information examination to use in recognizing danger and extortion during reviews. Where, previously, finding the wellspring of inconsistency would include various meetings and long periods of labor, counseling with personal information initially and takes into consideration a fundamentally limited field and streamlined procedure.

In one case, BDO Consulting Director Kirstie Tiernan noted, they had the option to cut a rundown of thousands of merchants down to twelve and, from that point, audit information exclusively for irregularities. A particular source was generally recognized rapidly.

4. Netflix

The entertainment streaming service has an abundance of information and examination, giving knowledge into the survey propensities for many global customers. Netflix utilizes this information to commission unique programming content that interests all around just as acquiring the rights to movies and arrangement boxsets that they realize will perform well with specific crowds.

For instance, Adam Sandler has demonstrated disliked in the US and UK showcases as of late, yet Netflix green-lit four new films with the on-screen character in 2015, equipped with the information that his past work had been effective in Latin America.


Did you know that Google processes about 3.5 billion search queries on single day? Do you know that each request queries about pages numbering 20 billion? Google derives such search results from knowledge graph database, indexed pages and Google bots crawling over a plethora of web pages. The user requests are processed in Google’s application servers. The application server searches results in GFS (Google File System) and logs the search queries in logs cluster for quality testing. Google uses Dremel which is a query execution engine to run almost near real-time, ad-hoc queries from search engines. This kind of advantage is not present in MapReduce. Google launched BigQuery which runs queries based on aggregation over billions row tables in a matter of seconds. Google is really advanced in its implementation of big data technologies.


Did you know that users of Facebook upload 500+ terabytes of data per day? To process such large chunks of data, Facebook uses Hive for parallel map-reduce opertions and Hadoop for its data storage. Would you believe me if I say Facebook uses Hadoop cluster which is the largest in the world? Employees also use Cassandra which is fault-tolerant, distributed storage system aiming to manage large amount of structured data across variety of commodity servers. Facebook also uses Scuba to carry out real-time ad-hoc analysis on massive data sets. Hive is used to store large data in Oracle data warehouse. Prism is used to bring out and manage multiple namespaces instead of a single one managed by Hadoop. Facebook also uses many other big data technologies such as Corona, Peregine, among many others.


There is an explosive growth like 12.5 billion devices which doesn’t include phones, tablets and PCs. This has helped to increase the research and development in the field of Internet-of-Things and in storage requirements which in turn require database management support. Oracle users use Oracle Advanced Analytics which requires Oracle database to be loaded with data. Oracle advanced analytics provides functionalities such as text mining, predictive analytics, statistical analysis and interactive graphics among many others. HDFS data can be loaded into an Oracle data warehouse using Oracle Loader for Hadoop. This feature is used to link data and search query results from Hadoop to Oracle data warehouse. Oracle Exadata Database Machine provides scalable and high-end performance for all database applications. Oracle is leveraging big data to mainly expand its business in Database management systems.