converting big data hype final - bitpipedocs.media.bitpipe.com/io_10x/io_108197/item_634895... ·...

8
Converting Big Data Hype into Big Value With Analytics Colin White, BI Research October 2012 Sponsored by IBM

Upload: others

Post on 16-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Converting Big Data Hype Final - Bitpipedocs.media.bitpipe.com/io_10x/io_108197/item_634895... · 2013. 1. 18. · Sponsored by IBM . Converting Big Data ... There is a tremendous

08 Fall  

Converting Big Data Hype into Big Value With Analytics Colin White, BI Research October 2012 Sponsored by IBM

Page 2: Converting Big Data Hype Final - Bitpipedocs.media.bitpipe.com/io_10x/io_108197/item_634895... · 2013. 1. 18. · Sponsored by IBM . Converting Big Data ... There is a tremendous

Converting Big Data Hype into Big Value With Analytics

Copyright 2012 BI Research, All Rights Reserved. 1

BIG DATA VALUE COMES FROM ANALYTICS There is a tremendous amount of interest in the topic of big data, but when evaluating use cases for big data too much focus is on the types and volumes of data involved and on the underlying data management technologies. It is equally, if not more, important to consider the analytics derived from big data that can be used to improve business decision making and enhance business innovation. It is these analytics that provide the business value for big data.

The Role of Big Data in Analytics The business community has used analytics for decades to track business operations and to help enhance business decision making. From a technology perspective, the names have changed and evolved (decision support, online analytical processing or OLAP, data mining, business intelligence, for example), but the objectives have always remained the same – to use analytics to more effectively run the business.

Big data is often considered to be about processing large volumes of multi-structured data. This is one aspect of big data, but its scope is much broader than this. Big data represents a new wave of analytics innovation – it symbolizes analytics solutions that could not previously be supported because of

• technology limitations, e.g., poor performance or inadequate analytic capabilities,

• the high hardware and software costs involved, or

• incomplete or limited data available for generating the required analytics.

Big data is not a single technology, but a set of overlapping technologies. There is no one single solution that can satisfy everybody’s needs. Instead, vendors offer a menu of different choices that enable customers to deploy analytic systems that are optimized to suit certain business needs and workloads. Optimization may involve improving performance, reducing costs, and/or enabling new types of data to be explored, captured and analyzed.

To overcome the bottlenecks and boundaries of the past, vendors tackling big data solutions offer advances in both data management and analytics. A list of some of the more significant advances is shown in Figure 1.

Need to consider both data management and analytics when reviewing use cases for big data

Big data represents more than just large volumes of multi-structured data

Big data is a set of overlapping technologies

Figure 1. Big data advances and examples of IBM’s offerings

Page 3: Converting Big Data Hype Final - Bitpipedocs.media.bitpipe.com/io_10x/io_108197/item_634895... · 2013. 1. 18. · Sponsored by IBM . Converting Big Data ... There is a tremendous

Converting Big Data Hype into Big Value With Analytics

Copyright 2012 BI Research, All Rights Reserved. 2

The table in Figure 1 also shows examples of the products IBM (the sponsor of this paper) offers for big data.

Big Data: Advances in Data Management In the data management area, three advances are important to note: analytic relational database platforms, non-relational systems, and stream processing systems.

Analytic relational database platforms are packaged hardware and software systems that have been optimized to improve the price/performance of both analytic processing. They make possible what may not have been possible before, for cost reasons, or because the analytics could not be produced in a timely manner. These systems are also often enhanced to handle a wider variety of data types, for example, multi-structured data, such as web logs or sensor data. This allows business users to blend and analyze different types of data from both internal and external data sources. The improved price/performance offered by these systems also enables organizations to keep more detailed data online for longer periods of time, which reduces the need to aggregate or subset the data – this in turn can help improve the accuracy of the analytical results.

Non-relational systems are not new and there are many different types ranging from high-performance file systems and document management systems to graphical databases. One area of focus at the moment from an analytics perspective is the Hadoop distributed computing environment. The objective of Hadoop is somewhat similar to analytic relational database platforms – improved price/performance – but Hadoop is more likely to be used for handling large volumes of multi-structured data in batch, whereas analytic relational database platforms are geared toward a higher-percentage of structured data and both pre-planned and ad hoc processing.

Stream processing systems analyze both structured and multi-structured data as it flows through and between different IT systems. It is a unique approach because it can filter and analyze data in motion without the need to persist it in a data warehouse first. This is particularly useful for real-time decision making and in situations where it is not cost-effective or required to persist large volumes of detailed data in a data warehouse.

Big Data: Advances in Analytics There have been many advances in the analytics area over recent years, many of them associated with big data. Three improvements that are worthy of note are: new and improved analytic techniques, enhanced data navigation and visualization, and automated decision management.

New and improved analytic techniques are used to uncover patterns in both existing and new types of data. These techniques fall into three main groups. The first group includes techniques that aid business users in analyzing new types of data, for example, text, images or video. The second group enhances techniques for predictive modeling and analysis. The third and last group includes prebuilt function libraries containing advanced statistical and analytical functions. These libraries allow business users to exploit the value of the advanced functions without having to know how to code them. The libraries may be supplied by vendors, be available under open source license, or may be written by the customer. The functions often run in the

Analytic relational database platforms offer improved price/performance

Non-relational systems such as Hadoop are designed to process large volumes of multi-structured data in batch

Stream processing systems analyze data in motion

Enhanced analytic functions are easier to use and enable the analysis of new types of data

Page 4: Converting Big Data Hype Final - Bitpipedocs.media.bitpipe.com/io_10x/io_108197/item_634895... · 2013. 1. 18. · Sponsored by IBM . Converting Big Data ... There is a tremendous

Converting Big Data Hype into Big Value With Analytics

Copyright 2012 BI Research, All Rights Reserved. 3

underlying data management system, which enables them to take advantage of the parallel computing capabilities of that system to improve performance.

Enhanced data navigation and visualization helps business users uncover business insights while navigating through large volumes and varieties of data. Examples of capabilities include faceted navigation, tree maps, text and special visualizations. Two key objectives are to provide consumer-like user interaction and experience, and to support a wide skill base ranging from business managers to data scientists.

Automated decision management uses analytics, predictive models and business rules to drive business operations and automate the decision-making process. Fraud detection is a good use case here. Based on prior fraudulent activity, predictive models and rules can be created that can be used by business processes to check credit card transactions for potential fraud. If the results indicate that the transaction is potentially fraudulent it can be rejected, or referred for manual evaluation. The predictive models and rules can be refined over time as the system gains more knowledge about fraud by analyzing data from multiple customer channels.

THE BENEFITS OF BIG DATA The advances outlined above do not replace existing enterprise data warehousing (EDW), business intelligence (BI), or analytics approaches – they instead enhance and extend them. Figure 2 summarizes some of the key improvements big data helps bring to the traditional decision-making environment.

The top half of the table in Figure 2 lists extensions provided by the data management advances of big data. Value here comes from improved

New data navigation and visualization tools make it easier to explore diverse types of data

Decision management helps automate and speed up business decision making

Figure 2. What does big data add to the traditional decision-making environment?

Page 5: Converting Big Data Hype Final - Bitpipedocs.media.bitpipe.com/io_10x/io_108197/item_634895... · 2013. 1. 18. · Sponsored by IBM . Converting Big Data ... There is a tremendous

Converting Big Data Hype into Big Value With Analytics

Copyright 2012 BI Research, All Rights Reserved. 4

price/performance, the capability to make close to real-time decisions, and the ability to process a much richer set of data sources.

The additional data sources that can be now blended into the analytical environment can help answer questions that could not be answered before. There are now many more channels of customer information available for analysis, for example. Information can also be related to very different data sources. Product sales could be affected by other factors such as the weather, a drought, or a natural disaster. These kinds of relationships can now be investigated thoroughly.

Another advantage of big data is that more detailed data can be kept online for longer, which helps improve the accuracy of the results. Analytics can now be generated on a full set of detailed data, rather than on aggregated data. Some retail companies, for example, are now keeping ten years of customer data online, whereas in the past they only kept two years’ worth of data before summarizing it.

The bottom half of Figure 2 lists the extensions provided by the analytics advances of big data. These new analytic extensions enable powerful investigative computing platforms to be deployed by data scientists to model and blend together data from a variety of different sources to look for ways of improving existing predictive models and analyses and to investigate potential new business opportunities. The results of this investigation work – updated models, new business rules, new analyses and/or new data – can then be promoted back into the production decision-making environment. In some cases, the results may lead to a new built-for-purpose line-of-business (LOB) system that is deployed on its own platform. Although these LOB systems are usually analytics-driven, they are, nevertheless, operational in nature because they drive day-to-day business operations. Many of these new LOB applications blur the line between what is operational and what is analytical processing. The management of online display advertising is an example of a hybrid LOB application.

New business models and rules may also be generated from an investigative computing platform and run as a part of a business process with the assistance of a decision management capability. This helps speed up the decision making process, and in some cases, fully automate business decisions. Examples of applications here include fraud detection, next best customer offer, product offers to customers to avoid churn, and so forth.

These new big data analytical capabilities not only extend the power of the analyses that can be performed, they also change the way analytics applications are built and deployed. Data scientists, for example, can now use an investigative computing platform to explore data, identify different patterns, and experiment with different algorithms without the need for any of the information being stored in the traditional decision-making environment.

CHOOSING THE RIGHT SOLUTION As discussed earlier, big data is not a single market or technology and there is no single solution that can satisfy everybody’s needs. Instead, organizations will likely use multiple data management and analytic approaches. The challenge is deciding

A richer set of data to explore increases the number of questions that can be answered

Many new LOB applications involve a hybrid of operational and analytical processing

Investigative computing changes the way analytic applications are being developed

Page 6: Converting Big Data Hype Final - Bitpipedocs.media.bitpipe.com/io_10x/io_108197/item_634895... · 2013. 1. 18. · Sponsored by IBM . Converting Big Data ... There is a tremendous

Converting Big Data Hype into Big Value With Analytics

Copyright 2012 BI Research, All Rights Reserved. 5

which to use when, and how to interconnect the various systems involved. The number of options can be daunting and this is one of the reasons why IT tends to focus on technology differences between products, rather than on matching technologies to different use cases.

Although big data technologies are still evolving, there are now a number of use cases and customer case studies that identify many of the benefits that can be obtained from big data. Figure 3 summarizes six common use cases together with application examples of each.

Real-Time Monitoring and Analytics Analytics are increasingly being used to monitor business operations in real time and to take action if certain business events occur. Fraud detection is an example we have already discussed in this context. Models are built based on known fraudulent situations and rules generated for use by decision management software to help business processes track and check for fraudulent situations. The business value of real-time analytics is the ability to detect fraud faster, which reduces the risk of financial impact to the organization.

Another approach to real-time monitoring and analytics is to use a stream processing system to analyze data as it flows through the business. This is particularly useful for analyzing events from sensors embedded in rivers, networks, smart grids, oil wells, airplanes, etc. The readings from these sensors can be monitored over time and when an out-of-line situation is detected (a possible equipment failure is predicted, for example), alerts are sent and appropriate actions taken. Stream processing systems can also be used for matching and correlating data from unrelated data streams, for example, weather data and sales data.

Six common use cases for big data

Figure 3. Big data use cases and application examples

Real-time analytics enable business processes to be optimized in real time

Stream processing systems can be used to monitor sensor data in real time

Page 7: Converting Big Data Hype Final - Bitpipedocs.media.bitpipe.com/io_10x/io_108197/item_634895... · 2013. 1. 18. · Sponsored by IBM . Converting Big Data ... There is a tremendous

Converting Big Data Hype into Big Value With Analytics

Copyright 2012 BI Research, All Rights Reserved. 6

Near-Real-Time Analytics The objective of near-real-time analytics is basically the same as for real-time analytics, to speed up the decision making process. The main difference is that the need for split second information is not as high in a near-real-time environment – some latency is acceptable. Unlike fraud detection, for example, re-routing a package in the event of a delay does not have to been done in a split second.

Near-real-time decisions are possible using high-performance analyses performed against low-latency data in a data warehouse or high-speed analytic relational database. The data warehouse or data store is updated from source systems at intervals to match the data latency requirements of the analytical processing. The low-latency data and prepackaged analytical results can also be accessed directly by a business process using a web service call. Customer next best offer or option is an example of an application that fits into this category. The actual offer made to the customer may depend on the channel the request comes through (e-mail, call center, web chat, mobile device, etc.), the type of customer service call being handled, value of the customer, churn risk, or even possibly the service-center agent handling the call. Regardless, the ability to quickly make a valuable offer to a customer helps enhance customer satisfaction and improve customer retention.

Data Integration Hub With growing data volumes and data sources, there is significant interest by organizations in managing all the detailed structured and multi-structured data that is used for decision making on a single data platform. This reduces data management, data transformation and data movement costs, and makes more data available for analysis online. In the past, the cost to maintain large amounts of data online has often been prohibitive, and as a result companies have often had to aggregate data to reduce costs. With the advent of systems such as Hadoop it is now possible to cost-effectively maintain large amounts of data online and this is one of the fastest growing use cases for Hadoop. A Hadoop data integration hub in a retail organization could, for example, be used to collect and manage all sales- and customer-related detailed data (point-of-sale, web, supply chain) for down stream analysis. One large retail organization maintains ten years’ worth of sales data in Hadoop and the last two years sales data in its traditional data warehouse. Additional data is then brought into the data warehouse from Hadoop as required. A key requirement here is robust and high-performance connectors between Hadoop and other systems.

Analytics Accelerator An analytics accelerator is a separate analytic platform that is used to accelerate the performance of certain workloads. For example, a trading desk of a large financial organization offloads customer critical analyses from a traditional data warehouse environment to a high-performance analytic relational database platform. The performance gain reduces the analyses from hours to minutes, which gives the financial company a significant customer advantage. Analytic accelerators are not new, but the data management and analytics advances of big data now provide a range of platforms that can be used to offload certain performance-critical workloads. The actual platform used will depend on the workload and the types of data being analyzed.

Near-real-time analytics can be generated from low-latency data stores

A data hub is one of the fastest growing use cases for Hadoop

Several big data platforms exist for use as analytics accelerators

Page 8: Converting Big Data Hype Final - Bitpipedocs.media.bitpipe.com/io_10x/io_108197/item_634895... · 2013. 1. 18. · Sponsored by IBM . Converting Big Data ... There is a tremendous

Converting Big Data Hype into Big Value With Analytics

Copyright 2012 BI Research, All Rights Reserved. 7

New LOB Analytic Application This use case has potentially the largest long-term business potential for big data. This is because the full power of the advances outlined in Figure 3 can be brought to bear on specific LOB problems and requirements. It enables organizations to build analytic solutions that were not previously possible and to expand the use of analytics to a broader set of business areas. As mentioned earlier, many of these solutions are new applications that are a hybrid combination of operational and analytical processing. Depending on the nature of the problem being addressed and the types of analytical processing and data involved, either a relational or non-relational system may be used to deploy the application. The display advertising industry is an example of how analytics are being used by new industries and business areas. There are organizations in this market sector that specialize in helping companies place advertisements on various web properties. These organizations calculate the fair market value of tens of thousands of ads per second, bid for appropriate ad space, place the ads, and measure the effectiveness of ad campaigns in terms of product sales and revenue. These are small organizations with limited IT resources that have to support the processing and analysis of huge amounts of data very rapidly. The hybrid operational/analytical applications involved would not have been possible to build prior to the innovations around big data outlined in this paper.

Investigative Computing Platform An investigative computing platform provides an analytic playground for data scientists to explore data and experiment with different analytic algorithms. The output may result in a new LOB analytic application, improved models and analytics or new types of data and analyses that can be migrated into a production decision-making environment. Companies are using these platforms to experiment with new types of data and new algorithms. Several retailers, for example, are experimenting with social computing data to determine how different types of customers use different social computing channels, the types of social data that are valuable for measuring customer reaction and satisfaction, and so forth.

CONCLUSION Big data fuels a new wave of analytical innovation and value for organizations. True value is achieved from big data by the analytics and analytic solutions that can be created from a hybrid of new and existing data systems and the new types of business applications they can support. The vendors that win in this environment are those that will provide good integration between the systems involved, satisfy advanced analytics requirements, and continue to provide the best price/performance.

About BI Research BI Research is a research and consulting company whose goal is to help organizations understand and exploit new developments in business intelligence, data integration, and data management.

New LOB applications are the biggest potential and fastest growing area for big data

An investigative computing platform is an analytic playground for data scientists

Big data fuels a new wave of analytic innovation