big data
TRANSCRIPT
BIG DATAPalash Jain
CONTENTS
• Introduction• Characteristics• Architecture• Hadoop• Applications• Disadvantages
INTRODUCTION• Big data is a term for data sets that are so large or
complex that traditional data processing applications are inadequate. Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. Big data "size" is a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data.
Characteristics• Big data can be described by the following characteristics:Volume-The quantity of generated and stored data. The size
of the data determines the value and potential insight and whether it can actually be considered big data or not.
Variety-The type and nature of the data. This helps people who analyze it to effectively use the resulting insight.
Velocity-In this context, the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development.
Architecture• A distributed parallel architecture distributes data across
multiple servers; these parallel execution environments can dramatically improve data processing speeds. This type of architecture inserts data into a parallel DBMS, which implements the use of Map Reduce and Hadoop frameworks. This type of framework looks to make the processing power transparent to the end user by using a front-end application server.
Hadoop• Apache Hadoop is an open-source software
framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework. The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part called MapReduce
Hadoop Continued• Hadoop splits files into large blocks and distributes them
across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed. This approach takes advantage of data locality nodes manipulating the data they have access to allow the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking
Applications• Big data has increased the demand of information
management specialists in that Software AG, Oracle Corporation, IBM, Microsoft, SAP,EMC, HP and Dell have spent more than $15 billion on software firms specializing in data management and analytics. In 2010, this industry was worth more than $100 billion and was growing at almost 10 percent a year: about twice as fast as the software business as a whole.
• Common Application of Big Data are as follows
Applications Continued • The use and adoption of big data within governmental
processes is beneficial and allows efficiencies in terms of cost, productivity, and innovation.
• Data on prescription drugs: by connecting origin, location and the time of each prescription, a research unit was able to exemplify the considerable delay between the release of any given drug, and a UK-wide adaptation of the National Institute for Health and Care Excellence guidelines. This suggests that new or most up-to-date drugs take some time to filter through to the general patient
Applications Continued
• Manufacturing: A conceptual framework of predictive manufacturing begins with data acquisition where different type of sensory data is available to acquire such as acoustics, vibration, pressure, current, voltage and controller data. Vast amount of sensory data in addition to historical data construct the big data in manufacturing. The generated big data acts as the input into predictive tools and preventive strategies such as Prognostics and Health Management (PHM).
Applications Continued • Healthcare: Big data analytics has helped healthcare
improve by providing personalized medicine and prescriptive analytics, clinical risk intervention and predictive analytics, waste and care variability reduction, automated external and internal reporting of patient data, standardized medical terms and patient registries and fragmented point solutions
• Education: A McKinsey Global Institute study found a shortage of 1.5 million highly trained data professionals and managers and a number of universities including university of Tennessee and UC Berkeley, have created masters program to meet these demand.
Applications Continued
• Sports: Big data can be used to improve training and understanding competitors. Besides, it is possible to predict winners in a match using big data analytics. Future performance of players could be predicted as well. Thus, players' value and salary is determined by data collected throughout the season. The movie Money Ball demonstrates how big data could be used to scout players and also identify undervalued players.
Disadvantages
• Privacy: advocates are concerned about the threat to privacy represented by increasing storage and integration of personally identifiable information; expert panels have released various policy recommendations to conform practice to expectations of privacy.
• Echo-Chamber Effect: Echo-chamber effects occur if the users design their big data model poorly. If a big data system does not consider this, it will damage the quality of the analysis. Hence they cannot extract the useful hidden knowledge.
Conclusion• The availability of Big Data, low-cost commodity
hardware, and new information management and analytic software have produced a unique moment in the history of data analysis. The convergence of these trends means that we have the capabilities required to analyze astonishing data sets quickly and cost-effectively for the first time in history. These capabilities are neither theoretical nor trivial. They represent a genuine leap forward and a clear opportunity to realize enormous gains in terms of efficiency, productivity, revenue, and profitability.