big data analysis guide

Upload: anonymous-dlef3gv

Post on 14-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Big Data Analysis Guide

    1/11

    Big Data Analysis

  • 7/27/2019 Big Data Analysis Guide

    2/11

    2

    Big Data Overview (1/2)

    Big data refers to datasets whose sizes are beyond

    the ability of typical database software tools to

    capture, store, manage and analyze.

    A primary goal for looking at big data is to discover

    repeatable business patterns.

    It has many additional uses, including real-time fraud

    detection, web display advertising and competitive

    analysis, call center optimization, social media and

    sentiment analysis, intelligent traffic management,

    and smart power grids.

    Big data analytics is often associated with cloud

    computing because the analysis of large data

    sets in real-time requires a framework

    like MapReduce to distribute the work among tens,

    hundreds or even thousands of computers.

    As technology advances over time, the size of

    datasets that qualify as big data will also increase andbig data is expected to play a significant economic

    role to benefit not only private commerce but also

    national economies and their citizens.

    Big data involves more than simply the ability to

    handle large volumes of data. Instead, it represents a

    wide range of new analytical technologies and

    business possibilities.

    Big data is a general term used to describe the voluminous amount of unstructured and semi-structured data a company creates.

    Its the data that would take too much time and cost too much money to load into a relational database for analysis.

    Source:http://searchcloudcomputing.techtarget.com/definition/big-data-Big-Data, McKinsey Big Data Report, BI Research Using Big Data for Smarter Decision Making

    Big Data Can Generate Significant Financial Value Across

    Sectors

    http://www.caseware.com/products/ideahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://www.caseware.com/products/idea
  • 7/27/2019 Big Data Analysis Guide

    3/11

    3

    Big Data Overview (2/2)

    Three Vs of Big Data

    The three Vs of big data (volume, variety andvelocity) constitute a comprehensive definition.

    Each of the three Vs has its own ramifications for

    analytics.

    Data volume is the primary attribute of big data

    Big data can also be quantified by counting records, transactions,

    tables or files. Some organizations find it more useful to quantify big

    data in terms of time. For example, due to the seven-year statute of

    limitations in the U.S., many firms prefer to keep seven years of data

    available for risk, compliance and legal analysis.

    The scope of big data affects its quantification, too. For example, in

    many organizations, the data collected for general data warehousing

    differs from data collected specifically for analytics.

    Data variety comes from a greater variety of sources Big data comes from a variety of sources, including logs, click streams,

    social media, radio-frequency identification (RFID) data from supply

    chain applications, text data from call center applications, semi-

    structured data from various business-to-business processes, and

    geospatial data in logistics.

    The recent tapping of these sources for analytics means that so-called

    structured data is now joined by unstructured data (text and human

    language) and semi-structured data (XML, RSS feeds).

    Data feed velocity as a defining attribute of big data

    The collection of big data in real time isnt new; many firms have been

    collecting click stream data from the web for years, using streaming

    data to make purchase recommendations to web visitors.

    Even more challenging, the analytics that go with streaming data have

    to make sense of the data and possibly take actionall in real time.

    Source:TWDI Research report on Big Data Analytics

    http://www.caseware.com/products/ideahttp://www.caseware.com/products/idea
  • 7/27/2019 Big Data Analysis Guide

    4/11

    4

    Big Data Future

    International Data Corporation (IDC) released a worldwide big data technology and services forecast report based on a survey in

    March 2012. As per the survey:

    The big data market is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015. This represents a compound annual

    growth rate (CAGR) of 40% or about seven times that of the overall information and communications technology (ICT) market.

    The big data market is expanding rapidly and for technology buyers, opportunities exist to use big data technology to improve

    operational efficiency and to drive innovation.

    There are also big data opportunities for both large IT vendors and start ups. Major IT vendors are offering both database

    solutions and configurations supporting big data by evolving their own products as well as by acquisition. At the same time, more

    than half a billion dollars in venture capital has been invested in new big data technology.

    While the five-year CAGR for the worldwide market is expected to be nearly 40%, the growth of individual segments varies from27.3% for servers and 34.2% for software to 61.4% for storage.

    The growth in appliances, cloud, and outsourcing deals for big data technology will likely mean that over time, end users will pay

    increasingly less attention to technology capabilities and will focus instead on the business value arguments. System

    performance, availability, security and manageability will all matter greatly; however, how they are achieved will be less of a point

    for differentiation among vendors.

    There is a shortage of trained big data technology experts, in addition to a shortage of analytics experts. This labor supply

    constraint will act as an inhibitor of adoption and use of big data technologies, and it will also encourage vendors to deliver big

    data technologies as cloud-based solutions.

    While software and services make up the bulk of the market opportunity, through 2015, infrastructure technology for big data

    deployments is expected to grow slightly faster at 44% CAGR. Storage, in particular, shows the strongest growth opportunity,

    growing at 61.4% CAGR through 2015.

    Source:http://www.idc.com/getdoc.jsp?containerId=prUS23355112

    IDC defines big data technologies as a new generation of technologies and architectures designed to extract value economically

    from very large volumes of a wide variety of data by enabling high-velocity capture, discovery and/or analysis.

    http://www.caseware.com/products/ideahttp://www.idc.com/getdoc.jsp?containerId=prUS23355112http://www.idc.com/getdoc.jsp?containerId=prUS23355112http://www.caseware.com/products/idea
  • 7/27/2019 Big Data Analysis Guide

    5/11

    5

    Big Data Risks/Challenges

    Big data is complex because of the variety of data it encompasses from structured data, such as transactions one makes or

    measurements one calculates and stores, to unstructured data such as text conversations, multimedia presentations and videostreams.

    Big data presents a number of challenges relating to its complexity:

    One challenge is how one can understand and use big data when it comes in an unstructured format, such as text or video.

    Another challenge is how one can capture the most important data as it happens and deliver that to the right people in real-

    time.

    A third challenge is how one can store the data and analyze and understand it given its size and the computational capacity.

    Big data also poses security and privacy risks for a large amount of data stored in data warehouses, centralized in a singlerepository.

    Big data and extreme workloads require optimized hardware and software. The main challenges of big data and extreme

    workloads are data variety and volume, and analytical workload complexity and agility.

    Many organizations are struggling to deal with increasing data volumes, and big data simply makes the problem worse. To solve

    this problem, organizations need to reduce the amount of data being stored and exploit new storage technologies that improve

    performance and storage utilization.

    Big datas increasing economic importance also raises a number of legal issues, especially when coupled with the fact that data is

    fundamentally different from many other assets. For example, one piece of data can be copied perfectly and easily combined

    with other data. The same piece of data can be used simultaneously by more than one person.

    Sectors with a relative lack of competitive intensity and performance transparency, along with industries where profit pools are

    highly concentrated, are likely to be slow to fully leverage the benefits of big data.

    Source:BI Research Using Big Data for Smarter Decision Making, http://spotfireblog.tibco.com/?p=6793,

    https://www.privacyassociation.org/publications/2012_03_23_big_data_it_risks_and_privacy_meet_in_the_boardroom

    http://www.caseware.com/products/ideahttp://spotfireblog.tibco.com/?p=6793https://www.privacyassociation.org/publications/2012_03_23_big_data_it_risks_and_privacy_meet_in_the_boardroomhttps://www.privacyassociation.org/publications/2012_03_23_big_data_it_risks_and_privacy_meet_in_the_boardroomhttp://spotfireblog.tibco.com/?p=6793http://www.caseware.com/products/idea
  • 7/27/2019 Big Data Analysis Guide

    6/11

    6

    Big Data Importance

    Creating transparency

    Making big data more easily accessible to relevant stakeholders in a timely manner can create tremendous

    value. In the public sector, making relevant data more readily accessible across otherwise separated

    departments can sharply reduce search and processing time.

    Enabling experimentation to

    discover needs

    As more transactional data is created and stored in digital form, organizations can collect more accurate and

    detailed performance data on everything from product inventories to personnel sick days. Using data to

    analyze variability in performance is generated by controlled experiments.

    Segmenting populations tocustomize actions

    Big data allows organizations to create highly specific segmentations and to tailor products and services

    precisely to meet those needs. This approach is well-known in marketing and risk management but can be

    revolutionary elsewhere.

    Replacing human decision

    making with automated

    algorithms

    Sophisticated analytics can substantially improve decision making, minimize risks and unearth valuable

    insights that would otherwise remain hidden. Such analytics have applications for organizations from tax

    agencies that can use automated risk engines to flag candidates for further examination.

    Innovating new business

    models, products and

    services

    Big data enables companies to create new products and services, enhance existing ones, and invent entirely

    new business models. Manufacturers are using data obtained from the use of actual products to improve the

    development of the next generation of products and to create innovative after-sales service offerings.

    Source:McKinsey Big Data Report

    http://www.caseware.com/products/ideahttp://www.caseware.com/products/idea
  • 7/27/2019 Big Data Analysis Guide

    7/11

    7

    Big Data Vendors

    2012 Big Data Pure-Play Vendors, Yearly Big Data Revenue (in $US Million)

    In the current market, big data pure-play vendors account for $300 million in big data-related revenue. Despite their relatively

    small percentage of current overall revenue (approximately 5%), big data pure-play vendors (such as Vertica, Splunk and

    Cloudera) are responsible for the vast majority of new innovations and modern approaches to data management and analytics

    that have emerged over the last several years and made big data the hottest sector in IT.

    Source:http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/

    http://www.caseware.com/products/ideahttp://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.caseware.com/products/idea
  • 7/27/2019 Big Data Analysis Guide

    8/11

    8

    Big Data Trends

    The McKinsey Global Institute estimated that enterprises globally stored

    more than seven exabytes of new data on disk drives in 2010, while

    consumers stored more than six exabytes of new data on devices such as

    PCs and notebooks.

    Big data has now reached every sector in the global economy. In total,

    European organizations have about 70% of the storage capacity of the

    entire United States at almost 11 exabytes.

    The possibilities of big data continue to evolve rapidly, driven by innovation

    in the underlying technologies, platforms and analytic capabilities for

    handling data, as well as the evolution of behavior among its users as more

    and more individuals live digital lives.

    The use of big data is becoming a key way for leading companies to

    outperform their peers. McKinsey estimated that a retailer embracing big

    data has the potential to increase its operating margin by more than 60%.

    The increasing use of multimedia in sectors, including health care and

    consumer-facing industries, has contributed significantly to the growth of

    big data and will continue to do so.

    The surge in the use of social media is producing its own stream of new

    data. While social networks dominate the communications portfolios of

    younger users, older users are adopting them at an even more rapid pace.

    Source:McKinsey Big Data Report

    http://www.caseware.com/products/ideahttp://www.caseware.com/products/idea
  • 7/27/2019 Big Data Analysis Guide

    9/11

    9

    Big Data Examples

    Big data includes web logs, RFID, sensor networks, social networks, social data, Internet text and documents, Internet search

    indexing, call detail records, complex and/or interdisciplinary scientific research, military surveillance, medical records, photography

    archives, video archives, and large-scale e-commerce.

    Examples of Companies Using Big Data:

    IBM has formed a partnership with the Netherlands Institute for Radio Astronomy (ASTRON) for the DOME Project, which

    provided support in developing the tools needed to crunch the data for the ambitious international Square Kilometer Array (SKA)

    radio telescope.

    San Francisco-based SeeChange Companyoffered a better way of designing health insurance plans with what it calls value-

    based benefits.The company used a substantial amount of data gleaned from personal health records, claims databases, lab

    feeds and pharmacy data to identify patients with chronic illnesses who would benefit from a customized compliance program.

    Boston-based company Humedica combined its data analytics with a real-time clinical surveillance and decision support system.

    The company also sells its detailed clinical spending data to life sciences companies, with the idea that customers will use it to

    quantify patient populations, market share and market opportunities.

    Castlight Health aimed to push transparency in healthcare pricing by offering consumers a search engine to find prices of

    healthcare services. Castlights technology allowed consumers to run side-by-side comparisons of out-of-pocket medical

    expenses. Armed with prices, consumers will then shop for bargains, limiting the growth of healthcare costs.

    Cleveland-based Explorys has started a Google-like service that helps clinicians analyze real-time information culled from trovesof electronic medical records (EMRs), financial records and other data. The idea is that medical researchers can mine the vast

    amounts of data to learn how variations in treatment can affect outcomes, uncovering best practices to enhance patient care and

    lower costs.

    Apixiostechnology brings together data from structured sources l ike EMRs with unstructured data, such as a physicians patient

    encounter notes. The companys software uses natural language processing technology to interpret clinicians free-text searches

    and return the most relevant results.

    Source:http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/, http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-

    healthcare-problems/

    http://www.caseware.com/products/ideahttp://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.caseware.com/products/idea
  • 7/27/2019 Big Data Analysis Guide

    10/11

    10

    Role of Internal Audit in Managing Big Data Case Study

    Check the extent of data assets and deep dive into what all is available. Data that is redundant or unimportant may be

    identified and reduced.

    To manage data holdings effectively, an organization must first be aware of the location, condition and value of its research assets.

    Conducting a data audit provides this information, raising awareness of collection strengths and identifying weaknesses in data

    policies and management procedures.

    The benefits of conducting an audit for managing big data effectively are:

    Monitor holdings and avoid big data leaks. Data hacking, social engineering and data leaks are all concepts that plague

    a company an audit can help a company identify areas where there is a possibility of leakage.

    Manage risks associated with big data loss and irretrievability. Data which is not structured and is lying untouched

    may never be retrieved; an audit can help identify such cases.

    Develop a big data strategy and implement robust big data policies. Big data requires robust management and proper

    structurization.

    Improve workflows and benefit from efficiency savings. Check where there are complex and time-consumingworkflows and where there is a scope of improving efficiencies.

    Realize the value of big data through improved access and reuse to check if there are areas that have not been used in

    a while.

    Source:http://www.data-audit.eu/docs/DAF_briefing_paper.pdf

    http://www.caseware.com/products/ideahttp://www.data-audit.eu/docs/DAF_briefing_paper.pdfhttp://www.data-audit.eu/docs/DAF_briefing_paper.pdfhttp://www.data-audit.eu/docs/DAF_briefing_paper.pdfhttp://www.data-audit.eu/docs/DAF_briefing_paper.pdfhttp://www.caseware.com/products/idea
  • 7/27/2019 Big Data Analysis Guide

    11/11

    11

    Complex Big Data

    Big Data Security

    Big Data

    Accessibility

    Big Data

    Quality

    Big Data

    Understanding

    Managing Big Data Through Internal Audit

    Most companies collect large volumes of data but they dont have comprehensive approaches for

    centralizing the information. Internal audit can help companies manage big data by streamlining and

    collating data effectively.

    Following are issues of big data that internal audit can help mitigate:

    Maintaining effective data security is increasingly recognized as a critical risk area for organizations. Loss of

    control over data security can have severe ramifications for an organization, including regulatory penalties,

    loss of reputation, and damage to business operations and profitability. Auditing can help organizations

    secure and control data collected.

    Giving access to big data to the right person at the right time is another challenge organizations face.

    Segregation of duties (SoD) is an important aspect that can be checked by an IA.

    The more data one accumulates, the harder it is to keep everything consistent and correct. Internal audit

    can check the quality of big data.

    Understanding and interpretation of big data remains one of the primary concerns for many organizations.

    Auditors can effectively simplify an organizations data effectively.

    Source:http://www.acl.com/pdfs/wp_AA_Best_Practices.pdf, http://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-data

    http://www.caseware.com/products/ideahttp://www.acl.com/pdfs/wp_AA_Best_Practices.pdfhttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://www.acl.com/pdfs/wp_AA_Best_Practices.pdfhttp://www.caseware.com/products/idea