data blending for dummies

Upload: james-osullivan

Post on 04-Feb-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/21/2019 Data Blending for Dummies

    1/53

  • 7/21/2019 Data Blending for Dummies

    2/53

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

    http://www.alteryx.com/
  • 7/21/2019 Data Blending for Dummies

    3/53

    Data Blending

    Alteryx Special Edition

    by Michael Wessler, OCP & CISSP

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    4/53

    Data Blending For Dummies, Alteryx Special Edition

    Published byJohn Wiley & Sons, Inc.

    111 River St.Hoboken, NJ 07030-5774www.wiley.com

    Copyright 2015 by John Wiley & Sons, Inc., Hoboken, New Jersey

    No part of this publication may be reproduced, stored in a retrieval system or transmitted in any formor by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except aspermitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior writtenpermission of the Publisher. Requests to the Publisher for permission should be addressed to thePermissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011,fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

    Trademarks:Wiley, the Wiley logo, For Dummies, the Dummies Man logo, A Reference for the Restof Us!, The Dummies Way, Dummies.com, Making Everything Easier, and related trade dress aretrademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United Statesand other countries, and may not be used without written permission. Alteryx is a registered trade-mark of Alteryx, Inc. All other trademarks are the property of their respective owners. John Wiley &Sons, Inc., is not associated with any product or vendor mentioned in this book.

    LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKENO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETE-NESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES,INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE.NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS.THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERYSITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER ANDAUTHOR ARE NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONALSERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENTPROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHORSHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATIONOR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCEOF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHERENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE ORRECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNETWEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHENTHIS WORK WAS WRITTEN AND WHEN IT IS READ.

    For general information on our other products and services, or how to create a custom For Dummiesbook for your business or organization, please contact our Business Development Department

    in the U.S. at 877-409-4177, contact [email protected], or visit www.wiley.com/go/custompub.For information about licensing theFor Dummiesbrand for products or services, contactBrandedRights&[email protected].

    ISBN 978-1-119-03766-8 (pbk); ISBN 978-1-11903764-4 (ebk)

    Manufactured in the United States of America

    10 9 8 7 6 5 4 3 2 1

    Publishers AcknowledgmentsSome of the people who helped bring this book to market include the following:

    Senior Project Editor:Zo Wykes

    Acquisitions Editor:Amy Fandrei

    Editorial Manager:Rev Mengle

    Business Development Representative:Kimberley Schumacker

    Project Coordinator:Melissa Cossell

    Alteryx Contributors:Matt Madden,Bob Laurent, Paul Ross

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

    http://www.wiley.com/http://www.wiley.com/go/permissionsmailto://[email protected]://www.wiley.com/go/custompubmailto://BrandedRights%[email protected]://BrandedRights%[email protected]://www.wiley.com/go/custompubmailto://[email protected]://www.wiley.com/go/permissionshttp://www.wiley.com/
  • 7/21/2019 Data Blending for Dummies

    5/53

    Table of ContentsIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1

    About This Book ........................................................................ 1

    Foolish Assumptions ................................................................. 2

    Icons Used in This Book ............................................................ 2

    Beyond the Book ........................................................................ 3

    Where to Go from Here ............................................................. 3

    Chapter 1: The Rise of the Data Analyst . . . . . . . . . . . . . .5

    Introducing Todays Data Analyst ........................................... 6

    Supporting the Business Decision Maker ............................... 7

    Data Analysis in the Data Blending World .............................. 8

    Chapter 2: Understanding Data Blending . . . . . . . . . . . .11

    Exploring the Wide World of Data ......................................... 12

    Accessing Multiple Data Sources ........................................... 14Accessing the Relevant Data .................................................. 16

    Defining Data Blending ............................................................ 18

    Differentiating Data Integration from Data Blending ........... 20

    Chapter 3: The Building Blocks of Effective DataBlending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21

    Identifying the Components of Data Analysis ...................... 22

    Moving through Data Blending .............................................. 23

    Data preparation and cleansing ................................... 23

    Joining data .................................................................... 24

    Automation and repeatability ...................................... 25

    Empowering the Analyst and Decision Makers .................... 27

    Chapter 4: Using Data Blending in the Real World . . . .29

    Maximizing Business Agility with Better Tools .................... 30

    Intuitive workflows ........................................................ 30

    Deeper business insight ................................................ 31Faster access to information ........................................ 31

    Implementing Data Blending in Business.............................. 32

    Putting the Right Tools in the Hands of the

    Data Analyst .......................................................................... 32

    Using Alteryx Designer ............................................................ 34

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    6/53

    Data Blending For Dummies, Alteryx Special Editioniv

    Chapter 5: Ten Things to Considerwith Data Blending . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39

    Integrate Predictive Analytics into Your Business .............. 39

    Gain Insights Rapidly ............................................................... 40

    Augment the Tools You Now Use .......................................... 40

    Work behind the Scenes ......................................................... 41

    Consider Spatial Analysis ....................................................... 41

    Leverage Analytic Apps........................................................... 42

    Incorporate Visualization ....................................................... 43

    Improve Business Analyst and IT Relations ......................... 43

    Drive Down IT Support Costs ................................................. 44Feed the Downward Stream .................................................... 44

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    7/53

    IntroductionD

    ata analysts support their organizations decision makersby providing the timely information and answers to key

    business questions needed to drive the business forward.Decisions are only as good as the information they are based

    on, so data analysts strive to use the best and most completeinformation possible. Unfortunately, as the amount of data inthe world grows exponentially, the time available to identifyand combine all the data sources where insight may exist isshrinking. Being successful in business depends on being agileand knowing more than your competitor does; thus, the speedof business is accelerating. Under this pressure and with morework and less time to do it in, the data analyst is indeed in achallenging position.

    Fortunately, the data analyst canbecome faster at deliveringbetter information and results all with less effort and com-plexity. Giving the analyst tools to blend datafrom multipledata sources ensures that all the relevant data, no matterwhere it exists, is available to the analyst. Data blendingreduces the time to prepare and join data and empowers dataanalysts to be self-reliant in their job.

    About This BookData blending is important because it allows data analyststo access data from all the relevant data sources: Big Data,the cloud, social media, third-party data providers, in-housedatabases, department data stores, and more. Historically,the challenge of data analysts has been accessing this dataand then cleansing and preparing the data for analysis. Thesestages of access, cleansing, and preparing data are complexand time intensive. Easy-to-use software tools that reduce theburden of this data preparation and turn data blending into anasset greatly empower the data analyst to become more effec-tive and open new opportunities to the business.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    8/53

    Data Blending For Dummies, Alteryx Special Edition2

    The focus of this book is how data blending is used and whatit can provide the data analyst working to support business

    decision makers. I identify what features to look for in datablending tools and how to successfully deploy these tools anddata blending within your business.

    Foolish AssumptionsAssumptions often cause problems, but still they have theirplace and if they werent useful, people wouldnt makethem. I sometimes have to make assumptions too, and writ-ing this book is a prime example. Mainly, I assume that youknow something about data, perhaps have heard of Big Dataand the cloud, and also know something about data analysis.Hopefully, you also know what data analysts are, the toolsthey sometimes use, and the important work they do. Assuch, this book is written primarily for data analysts and thebusiness folks they support.

    Icons Used in This BookThroughout this book, you occasionally see special icons thatcall attention to important information. You wont find cutelittle emoticons, but youll definitely want to take note. Hereswhat you can expect.

    This icon points out things youll be glad I mentioned later on.This is the stuff you want to remember when you start using thematerial on your own.

    I try to keep the techie stuff to a minimum, but I am a techieperson at heart and old habits die hard! These are technicaltidbits that arent essential, but they are nice to know.

    This icon points out pieces of sage wisdom that I wish some-one had told me when I was learning this subject.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    9/53

    3Introduction

    Beyond the BookAlteryx Analytics provides data analysts with an intuitiveworkflow for data blending and advanced analytics that leadsto deeper business insights in hours rather than the days orweeks required of most analytics solutions. To learn moreabout Alteryx Analytics, visit www.alteryx.com.

    Where to Go from HereBlending data doesnt have anything to do with a kitchenblender, although I do use a cooking analogy later in thisbook. But, just like cooking, having all the right ingredientsat the start is very important; no one likes to begin a proj-ect only to realize they forgot something. Furthermore, justhaving all the ingredients isnt enough; if you want to be suc-cessful, you need to have a good recipe to follow as well.

    In this book, I give you everything you need to become goodat data blending. Or, at least I show you how to start a datablending project with the confidence that youll be successfulat the end.

    If youre trying to learn about data blending and the role ofthe data analyst for the first time, the task may seem daunting,but withData Blending For Dummies,Alteryx Special Edition,you have help to guide you on this exciting adventure.

    If you dont know much about data analysis or what a dataanalyst does, Chapter 1 would be a good place to start. If youreready to jump into data blending, Chapter 2 is for you. However,if you see a particular topic that interests you, feel free to jumpahead to that chapter. Each chapter is written to stand on itsown, so feel free to start reading anywhere or to skip around.Read this book in any order that suits you (although I wouldntrecommend upside down or backward). I promise that youll

    put the book down thinking, Wow, I didnt know this stuff couldbe so easy!

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

    http://www.alteryx.com/http://www.alteryx.com/
  • 7/21/2019 Data Blending for Dummies

    10/53

    Data Blending For Dummies, Alteryx Special Edition4

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    11/53

    Chapter 1

    The Rise of the DataAnalyst

    In This Chapter

    Identifying the requirements of a data analyst

    Providing value to todays business decision makers

    Building the case for data blending

    The role of the data analyst has expanded as organizationshave gone from the primitive days of data processing on

    mainframes into the fast-paced, cloud-driven world of today.Businesses have always had people filling the role of dataanalysts, but that role has grown with technologys increasedcapabilities. This growth is good because it has enhanced thecapabilities of data analysts and made their contributions to acompanys success even more important.

    Today, the data analyst is no longer shackled to the IT depart-ment. Modern analytics tools that can be used without writ-ing code or sitting through tons of training classes allow dataanalysts to access more data sources, join data, performcomplex calculations, and quickly answer the tough businessquestions that are asked. These capabilities firmly put thedata analyst in the critical path traveled by business decisionmakers who determine a companys direction.

    This chapter looks at how the role of the data analyst hasevolved and what factors have influenced that growth. It alsosets the stage for continued evolution with data blending.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    12/53

    Data Blending For Dummies, Alteryx Special Edition6

    Introducing Todays Data AnalystThe data analysts tools are always evolving, and the dataanalysts responsibilities are growing. Data processing andanalytics capabilities are ever increasing, so what was impos-sible only a few years ago is now commonplace. To learn moreabout how the data analyst has evolved, see the Evolution ofthe data analyst sidebar.

    Todays data analysts are a force to be reckoned with; they know

    their companies business rules, understand how to use technol-ogy, know what data they need, and are not afraid to venture outon their own or collaborate with dedicated IT resources to getthe results required by the companies they work for. Data ana-lysts typically support their organizations from within the busi-ness side (not the IT side) of their companies, such as:

    Sales

    Marketing

    Operations

    Product and service design

    Finance and accounting

    Successful data analysts provide the best information for theircompanies by leveraging these skills:

    Working from within a business perspective

    Knowing business requirements better than anyone else

    Using a logical and analytical mindset to solve problems

    Accessing and analyzing data to provide a result oranswer that satisfies an outstanding business question orsolves a business problem

    The value that a data analyst brings to a company is that ofanswering critical business questions based on calculations,analytics, and models performed on very specific datasets.Technology has evolved to give the data analyst greater, butnot yet complete, ability to provide answers to straightfor-ward questions in a timely manner, as I discuss later.

    In previous decades, IT was more static and defined, whichdoesnt mean that IT was easierthan it is today. In many ways,IT was more difficult in the past. Limitations in processing

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    13/53

    Chapter 1: The Rise of the Data Analyst 7

    power coupled with terse command-line interfaces dictatedthat IT was not for the faint of heart. As hardware and soft-ware have improved to make some aspects of computingeasier, more computing options and industry turbulence haveadded complexity in new areas. Basically, for those in IT, oneset of challenges has been replaced with a new set.

    Supporting the BusinessDecision MakerQuestions from an organizations decision makers are whatdrive data analyst requirements. The business decisionmakers are the executives and senior managers who deter-mine what products and services to develop and sell, which

    Evolution of the data analystIn the early days of data process-ing, the computing environment wasconceptually simple in terms of whatoptions were available limitedsources of structured data in a staticenvironment with very few tools toprocess that data. This proved to be

    both a blessing and a curse: Fewer tools and sources of data

    allowed experts to dig deep intowhat features were availableat the time. The relatively fewpeople with the education andaccess to these tools becameexperts in their use.

    The limitation of tools and pro-cessing power hindered theability to get information outof the data that was available.Additionally, only a few qualifiedpeople were able to perform thecomplex work.

    As data grew in volume, variety, andvelocity, the only way to get accessto the data organizations needed tomake decisions was to hire and trainexpensive data scientists. Theseindividuals were skilled in dataaccess languages such as SQL and

    Python, and the complex code theywrote served as the barrier to ensurethey remained at the center of dataanalysis.

    As new workflow-based analyticstools became available that coulddeliver the same insights faster andeasier, data analysts in the lines ofbusiness had an alternative to wait-

    ing on IT and their data scientists.Data analysts themselves becameempowered by these new tools andtoday can find the information theyneed without heavy IT departmentinvolvement.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    14/53

    Data Blending For Dummies, Alteryx Special Edition8

    markets to enter or avoid, and which business opportunitiesor challenges exist that require their action.

    To be effective, business decision makers ask their data ana-lysts tough questions. For example, they may ask, ShouldI increase marketing in the northern Indianapolis suburbswhere the average new sports car customer is in the agerange of 30 to 40 and has been employed for more than oneyear? This is a difficult question to answer without the righttools and data, but it isa question that could identify a newbusiness opportunity and is therefore sent to data analysts to

    answer.

    The needs of business decision makers center on:

    Getting the right data and answers to make key decisionsin a timely manner

    Gaining deeper business insight to really understand thechanging market environment

    Utilizing reports and analytics that present data in a busi-ness context rather than from an IT perspective

    Having the ability to share their insights with other busi-ness colleagues and run their queries again with differentinput values

    The demands of business decision makers are high becausefast-moving markets, fleeting opportunities, tight profit mar-gins, and fierce competition dont allow for delays or incorrectresults. Data analysts must continue to evolve their tools andprocesses to meet these increasing demands.

    Data Analysis in the DataBlending World

    To understand why data blending (discussed in detail laterin the book) is necessary, understanding the current dataanalysis environment is important. Technology has evolved tomake the job of the data analyst better, but the full potentialis not yet fully realized. Using traditional approaches such asspreadsheets, data analysts can do the following:

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    15/53

    Chapter 1: The Rise of the Data Analyst 9

    Find answers to straightforward questions

    Work with simple algorithms and formulas

    Access readily available, prepared data from a fewsources

    The evolution of hardware and software has enabled dataanalysts to perform simple to low-complexity work. However,when the business questions become more difficult, requiringmore complex computing algorithms or large amounts of datafrom different sources, data analysts often seek the help of IT

    experts.

    In cases where the requirements exceed what data analystscan provide by themselves, a companys IT experts partnerwith data analysts to:

    Identify, cleanse, and prepare relevant data from a vastsea of frequently irrelevant data

    Join data from multiple input sources and channels

    Develop complex algorithms and programs to processdata based on business criteria

    Perform the computational analysis

    Format and display the output results in a usable format

    Package and automate the preceding process so thesame question can be asked again as needed

    Working together, data analysts and IT do accomplish more,but this relationship does have some downsides:

    Reliance on IT for support introduces long delays forthe data analyst because IT staff is seldom able to assistimmediately.

    Maintaining the requisite large, highly skilled IT staff isexpensive.

    Many analytic tools used by IT are expensive and complex,requiring customization and training before their use.

    IT staff knows technology, but their knowledge of busi-ness rules is relatively weak. This necessitates long andcomplex Scope of Work (SOW) documents, and if inputfrom data analysts is lacking, mistakes can be made.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    16/53

    Data Blending For Dummies, Alteryx Special Edition10

    Although collaboration with IT using complex tools can pro-vide results (if theres enough time), the process of data analy-

    sis is itself a complex operation with multiple challenges, asyou can see in Figure 1-1.

    Figure 1-1: Traditional data analysis process and challenges.

    Each process takes time and is complex, requiring multiple

    advanced tools. Each step is also expensive and reliant on ITor others within the organization. Unfortunately for the dataanalyst, the business environment is continuing to evolve withincreased demands, such as:

    Greater focus on using complex analytical analysis

    Expansion of data in terms of size, complexity, andnumber of input sources

    Increased requirements for ad-hoc analysis and reporting

    Expectations of faster response times

    Business requirements have increased the demands of thedata analyst to the point where the current slow, methodicalrelationship with IT is no longer sufficient. Whats neededare more tools that are data analyst friendly and perform thecomplex steps without the dependence on IT. Fortunately,tools that combine data blending with powerful, easy-to-useworkflows, such as those provided by Alteryx, meet the needsto fully empower the data analyst.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    17/53

    Chapter 2

    Understanding DataBlending

    In This Chapter

    Discussing the forms of data in the real world and how it naturallyexists

    Understanding the challenges of accessing data

    Retrieving all necessary data

    Explaining data blending in support of business analytics

    Knowing the difference between data blending and data integration

    Making the right business decisions is crucial to suc-cess for any profitable company. The best decisions

    are arrived at by identifying all the right data, applying thecorrect analytical methodology to that data, and correctly

    interpreting the results to form the decision. In business, thisis a collaborative effort of data analysts and business decisionmakers working together.

    Basing your decision on the right data is critical. Without theright data, you may have incomplete results on which to makeyour decisions. In a worst-case scenario where you have thewrong data, you get incorrect results that send you down thepath of a disastrous decision. Knowing wherethe right data

    exists and howto bring all that data from different sourcesinto your decision-making process is at the core of good deci-sion making in business.

    This chapter looks at what data blending is and how it is usedin decision making.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    18/53

    Data Blending For Dummies, Alteryx Special Edition12

    Exploring the WideWorld of Data

    In the real world, data comes in all different shapes, sizes, andformats. Increasingly, data comes from everywhere, rangingfrom corporate databases that are decades old to informationgenerated on smartphones during the time it takes to readthis sentence.

    Data exists in several different formats:

    Structured:Traditional data stored in a neat recordformat with well-defined data types such as fixed fieldnumeric and alphanumeric characters. Structured datais the basis for most existing and legacy databases and isrelatively easy to store and manage.

    Semi-structured:Unformatted or loosely formattednumbers or characters inside a field but with little or nostructure within the field. A social media post such asa Twitter tweet is an example of semi-structured data.Semi-structured data is more complex to store and pro-cess than structured data.

    Unstructured:Data that is not text based, such as pic-tures, images, or sound files generated by devices orposted on social media. Unstructured data is a challengeto manage because it is large in size, difficult to catalog

    and index, and problematic to store in databases.

    In addition to characterizing data based on its format, a usefulway to view data is based on its nature. Data naturally fallsinto these three categories:

    Traditional:Data that lives in existing or legacy data-bases and is often in a well-structured format. Rows inan Excel spreadsheet, records in an accounting database

    table, or account information in an insurance mainframedatabase are examples of traditional data. More modernexamples include information in data warehouses andcloud applications.

    Enrichment:Data that is industry specific or special pur-pose and used to supplement (or enrich) existing data.For example, spatial grid coordinates identifying wherecustomers like to shop would enrich sales information, or

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    19/53

    Chapter 2: Understanding Data Blending 13

    demographic information about a customers backgroundcould help a retailer looking at traditional sales data.

    Emerging:Data that is often related to Big Data (see thesidebar, Why Big Data is so big) as well as other sourcessuch as social media or marketing automation data arecommon examples of emerging data. This is newer, morevaluable, and often the most difficult data to identify andleverage. This data helps sales and marketing staff followtheir brand and identify useful patterns.

    Why Big Data is so bigBig Data is, well, excuse the pun,a big thing in organizations today.Actually, Big Data isnt a thing at allbut is rather a classification of datathat is very large, generated veryfast, and comes from a wide varietyof sources, formats, and structures.

    The IT industry research groupGartner defines Big Data as:

    Big Data are high-volume, high-velocity, and/or high-variety

    information assets that require

    new forms of processing to

    enable enhanced decision

    making, insight discovery, and

    process optimization.

    Data categories such as enrichmentand emerging data are most likely toinclude sources of Big Data, but evensome traditional data components,particularly when dealing with cloudtechnologies, could be Big Data

    related. Unstructured and semi-structured formats are most commonwithin Big Data, but structured datacan also be included in some cases.

    Analysts will have to work with newtypes and sources of data, with BigData being one such example. Here

    are a few common examples of BigData:

    Social media feeds such asFacebook and Twitter where arelentless stream of semi- and

    unstructured data include text,sound, and images

    Point-of-sale terminal informa-tion related to credit card pur-chases, customer identity anddemographic information, andinventory control data

    Online sales and shopping infor-

    mation as well as consumertracking, demographics, andwebsite metrics

    Scientific, medical, and device(Internet of Things) data show-ing images, diagnostics, andtelemetry of nearly any deviceimaginable

    Big Data is important to business anddata analysts because it representsmuch of the best, new data availablefor better decision making. Any toolthat purports to support data blend-ing needs to access Big Data quicklyand intuitively.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    20/53

    Data Blending For Dummies, Alteryx Special Edition14

    Based on the business question being posed, various categoriesof data (traditional, enrichment, and emerging) are funneled

    into the analysis process. The output is a better, deeper under-standing of the market environment to identify and capitalizeon business opportunities. Understanding the data types andformats is important for the analyst. Knowing the character-istics of the data allows the analyst to better pick and choosewhich sources have the right data to be used to build the data-set that will feed the analysis engine during processing.

    Accessing MultipleData Sources

    Bringing data together from different sources is how dataadds value to the decision-making process. In the real world,the data you need usually wont sit neatly in a predefineddatabase inside a fully prepared data table just waiting for you

    to access it. In most cases, data must be obtained from differ-ent sources to add the depth and wide scope necessary forthe best possible analysis and decision making.

    The simple fact is that the best decisions will be made onlywhen all the relevant data is available for analysis. The keydata almost always comes from multiple data sources andoften comes in different formats.

    Data analysts face the following challenges when attemptingto access data from different data sources:

    Data outside users workstations is typically owned andmanaged by the IT department. Requests for access mustinvolve the IT staff.

    Access to the right data is a multistep process, often withmultiple parties involved with lengthy documentation.This equates to long turnaround times for access.

    Data analysts have limited tools to access more complexdata sources. While access to lower-end databases ispossible, connectivity to more powerful databases anduseful access to that data via SQL is often both manualand complex.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    21/53

    Chapter 2: Understanding Data Blending 15

    IT staff can assist (once they have time), but IT staffdoesnt know the core business processes and require-

    ments as well as the data analyst. If data access is leftsolely to the IT department, the results could be slowand incorrect.

    Access to enrichment and emerging data can represent aparadigm shift for IT, especially if the staff is experiencedonly in traditional database technologies. The associatedlearning curve can extend the turnaround time for dataaccess.

    Once data is obtained, differing data formats, character-istics, and amounts of irrelevant data greatly add to thetime and complexity required to cleanse and preparethe data for analytical processing. This data cleanup andpreparation is one of the most time-consuming and com-plex steps performed by data analysts.

    Despite these challenges, the rewards of accessing multipledata sources far outweigh the challenges. Given these chal-

    lenges, solutions to improve the situation for data analystsinclude the following:

    Tools that empower the data analyst to not require ITsupport at the technical level. IT involvement should belimited to granting security access to the data.

    Tools that have predefined access to multiple datasources, including traditional, enrichment, and emergingsources. Analysts should spend their time analyzing data,not struggling to access data.

    Access to the new paradigm of data such as Big Data,cloud data, and third-party data brokers. Access shouldbe streamlined, intuitive, and easily implemented withoutIT involvement.

    Data cleansing and preparation tools that must be pow-erful yet intuitive and easily usable by the data analyst.Because these steps have historically been complex andtime consuming, special emphasis should be placed touse tools that reduce the complexity while speeding upthe process for the analyst.

    The more that data analysts are empowered and become self-reliant to find, access, prepare, and ultimately analyze andprocess the relevant data, the more effective the data analysts

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    22/53

    Data Blending For Dummies, Alteryx Special Edition16

    will be in their role. Tools that allow analysts to perform thesetasks easily will encourage the analysts to seek out new, rel-

    evant data because the process isnt arduous, further improv-ing the outcome for the business.

    Getting a second opinion is always a wise practice with medicaldecisions, so why not get your data from multiple sources whenmaking key business decisions? Many intricate relationshipswithin data can be identified only after datasets from differentsources are combined and analyzed. Only through data blend-ing can this kind of analysis occur.

    Accessing the Relevant DataThe nature of Big Data and IT is more data, bigger data, andfaster incoming data. Furthermore, once data is created, itis rarely deleted. For a variety of reasons, companies tendto store data for years (or even decades) even if it probablywont be used. These factors create myriad sources that could

    potentiallyhold the data required by the analyst.

    Despite the wide range of potential data sources, analystslikely know where the relevant data exists. Odds are, analystsknow the business processes very well and understand wheredata comes to the business from, where and how (at a non-technical level) data is processed, and where that data goesonce processing is complete.

    At a minimum, analysts know what traditional data is neededfor business processing to occur. If the analysts are businesssavvy and experienced, they have also used enrichmentdata to add value to business processes and analytics. In abest case, analysts are also now using emerging data to gaingreater awareness of product performance and customer per-ception to really add more intelligent business planning.

    The great struggle is, how do analysts access all the relevantdata to build a usable dataset to support the necessary ana-lytical processing? Analysts know what data to use, but notnecessarily how to get to it. Fortunately, tools such as Alteryxhave the capability to access these relevant, traditionalenrichment and emerging data sources:

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    23/53

    Chapter 2: Understanding Data Blending 17

    Oracle, IBM, and Teradata databases

    Microsoft Access, SQL Server, SharePoint, and Excelproducts

    Amazon Web Services, Salesforce, SAS, and GoogleAnalytics software and services

    MongoDB and Hadoop products

    Salesforce.com and Marketo cloud-based services

    Twitter, Facebook, Foursquare, and other social mediaservices

    Thats MY data!Organizations can become veryterritorial and protective of theirdata often for good reason. Themost sensitive details of a company,its customers and employees, and itssales information are within its data,typically in databases managed bythe IT department. Legal, profes-sional, and ethical requirements tosafeguard specific data elementsare very strong and must be adheredto. Understandably, access to databy anyone, especially those outsidethe immediate group of data ownersand users, is scrutinized and oftenchallenging to obtain. No companywants to make the news by havingits customers data stolen as hasrecently happened with several big-name companies. The damage tothose companies reputations andtheir legal liabilities are staggering.

    Unfortunately, in some companies(and even worse in many govern-ment agencies), access to the dataeven for reasons that benefit thebusiness can be difficult to obtain.Database administrators (DBAs) and

    data owners justifiably take datasecurity very seriously and some-times go overboard protecting theirdata. Ive witnessed large IT projectsfail mainly due to data owners block-ing access to key data from outsidesystems, which is truly unfortunate.

    As a data analyst, what are you to dowhen faced with such adversity? First,resist any temptation to use a back-door method to access data yourenot authorized for; those in IT call

    that data theftand hacking,and it willget you fired. Worse yet, if the datais sensitive, you can face legal rami-fications. The proper response is tofollow the procedures to request dataaccess and limit it to the least privi-lege and amount of data you require,which will greatly appease the secu-rity team. Inform the data owners

    why you are requesting access andwhat initiative it supports. If you stillface opposition, let your managementchain know and unleash your projectsponsor or key business decisionmaker on the opposing data owner,which often yields positive results.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    24/53

    Data Blending For Dummies, Alteryx Special Edition18

    The intuitive, easy-to-use interface of Alteryx allows directaccess to traditional, highly structured databases and data

    sources within a company, ranging from larger databases tosmaller micro applications hosted on power users machines.Of even greater value is the data analysts ability to use Alteryxto access all relevant data, from Big Data, enrichment data, andemerging data without dependency on IT, lengthy delays, orunnecessary complexity.

    If the previous list of data sources seems bewildering, and youperhaps had to look up a few of the names, dont worry. The

    explosion of Big Data, cloud computing, and social media hasbrought many new players to the IT field and has caused long-standing pillars of the technology world, such as IBM, Oracle,and Microsoft, to evolve to meet business opportunities. Tolearn more about Alteryx and its technologies, check outwww.alteryx.com/try.

    Defining Data BlendingOnce the right data sources are identified, and the access tothose data sources is established, the next step is merging, sort-ing, joining, and otherwise combining all the useful data intoa functional dataset while discarding the vast, loud noise ofunnecessary data. This process is referred to as data blending.

    Data blending is defined by Alteryx as the process used bydata analysts to combine data from multiple sources to reveal

    deeper intelligence that drives better decision making. Whatdoes that statement mean? Heres a breakdown:

    Data blending is aprocess(not just a one-time event),and the process can be repeated as necessary to add orremove more data sources.

    Data analystsin line-of-business departments are empow-ered by platforms such as Alteryx to perform data blend-ing and dont have to rely on their IT departments.

    Data comes from multiple sourcesto include one or manysources, but usually many. The sources can be from any-where to include traditional, enrichment, or emergingdata types.

    The intent is to reveal deeper intelligence,which meansthat greater knowledge is obtained from multiple sourcesthan could be obtained from one source.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

    http://www.alteryx.com/tryhttp://www.alteryx.com/try
  • 7/21/2019 Data Blending for Dummies

    25/53

    Chapter 2: Understanding Data Blending 19

    The ultimate purpose is to drive better decision makingbysenior leaders whose job is to guide the business toward

    promising opportunities to maximize organizational benefit.

    Data blending is essentially using data from multiple sourcesso that you have the best mix of relevant data for your analyti-cal toolsets. In Figure 2-1, you see how the analyst combinesdata into a unified, analytical decision-making process.

    Figure 2-1: Multiple data approaches of data blending.

    As Figure 2-1 shows, data blending allows different datasources, which by themselves can be difficult to use, tobe combined into a single view that easily generates deepinsights into new business opportunities.

    Data blending is a complex technological process, but the dataanalyst should be shielded from the details by the toolset.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    26/53

    Data Blending For Dummies, Alteryx Special Edition20

    Differentiating Data Integrationfrom Data Blending

    Data integrationis a common term in IT, so a great questionwould be, How is data integration different from data blend-ing? The answer lies in the primary use of that data and howlong the data is stored.

    Data integration is a process in which multiple data sources

    are combined (or integrated) to create a single unified ver-sion of the data in databases, data warehouses, or data marts.These data stores are permanent entities with cleansed, nor-malized data and are managed by IT database administrators(DBAs). Access to these new databases is controlled by DBAsand business intelligence (BI) experts. Over time, these data-bases grow, are used to support applications, and themselvesbecome data sources for other systems and workflows.

    Data blending is a process conducted by a business or dataanalyst (not a DBA or BI expert) to build a dataset for use inanalytic processing to answer a specific business question.The data for the dataset is created from one or more datasources. The blendingoccurs as the dataset is built from mul-tiple data sources to capture only the relevant data. Analyticalprocessing occurs on that purpose-built dataset to derive ananswer for the question being posed.

    The key differentiators are as follows:

    Integrationresults in a permanent database with the intentof storing a single copy of data and is managed by DBAsand BI experts.

    Blendingresults in a dataset with the purpose of sup-porting analysis for a specific business question and iscreated by business and data analysts.

    Both data integration and data blending have a valid place inbusiness, but given the increasing need for delivering betterinformation and results faster and with less effort and com-plexity more and more organizations are turning to data ana-lysts who use Alteryx to deliver these deeper business insightsin hours, not weeks, as other analytics processes often require.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    27/53

    Chapter 3

    The Building Blocks ofEffective Data Blending

    In This Chapter

    Looking at the workings of data analysis

    Detailing the phases in the data blending process

    Enhancing the capabilities of data analysts and business decisionmakers

    Once data analysts have identified and accessed thedesired data sources, then their real work begins. The

    art and science of transforming myriad data into a usable,very specific dataset for use in analytical processing is next.This data blending process is where the data analysts reallyshine and show their value. However, this process has histori-cally been a difficult and tiresome endeavor.

    Fortunately, now that the data blending process is under-stood, new toolsets exist that greatly simplify and acceleratethis difficult process. Tools that support data blending areempowering both the analyst and business decision maker,resulting in increased business agility.

    This chapter looks at what the most critical steps are in datablending and how data blending tools help the analyst and

    ultimately the business.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    28/53

    Data Blending For Dummies, Alteryx Special Edition22

    Identifying the Componentsof Data Analysis

    Analysts follow a structured methodology when performingtheir analysis. Using modern toolsets, the process of datablending is composed of three fundamental phases:

    Access data:Identify and obtain access to the data.

    Prepare and cleanse data:Reformat data into a usableformat and correct or remove invalid data.

    Join data:Combine data for further analysis.

    After performing these three phases of data blending, analystsoften continue their analysis using an additional two phasesfor advanced analytics:

    Apply advanced analytics:Execute predictive and spatial

    analytic processes against the dataset to obtain results.

    Output results:Deliver insight in a readable, often graphicalreport or output data to feed another process.

    Figure 3-1 shows the components and phases of data blend-ing and advanced analytics. You can see that the data analystperforms a process-oriented methodology to connect to datasources, cleanse the data, blend the data into a usable data-set, apply analytics against the dataset, and finally preparethe results for interpretation and presentation.

    Figure 3-1: The five phases of data blending and advanced analytics.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    29/53

    Chapter 3: The Building Blocks of Effective Data Blending 23

    Moving through Data BlendingIn cooking, often the most difficult and time-consuming part isthe preparation. Ingredients are gathered, cleaned, and slicedand diced while pots and pans are assembled and ovens andstovetops heat up. Experienced chefs know that the actualcooking is a relatively quick component in comparison to thetotal preparation process. Good chefs also know that if keyparts of the preparation are missed (such as forgetting to adda main ingredient), the overall dish will suffer.

    The same concept of preparation holds true for the data ana-lyst; the cruxes of data blending are the data preparation/cleansing and the data joining; these are also the areas whereanalysts have typically spent most of their time.

    Fortunately, just as in cooking where appliances such as elec-tric blenders speed up the food preparation process, dataanalysts have automated tools such as Alteryx that greatly

    accelerate and improve the data processes. The next sectiondiscusses how Alteryx makes the preparation steps for theanalyst much easier and faster.

    Data preparation and cleansingData analysts are estimated to spend 60 to 80 percent of theirtime in the preparation and cleansing phase, and when you

    consider all the detailed work required, you can see why prep-aration and cleansing is time intensive.

    During this phase, the analyst dives deep into the largeamount of data provided by the data sources, determinesexactly what data is and isnt required, and reviews the datafor quality and completeness. The analyst commonly restruc-tures and reformats the incoming data. Almost always, a largeamount of extraneous data must be filtered out so that only

    relevant data remains; this is a particularly important part.Additionally, the analyst must determine how much of theavailable data to use in some cases, the full amount of dataisnt required and only a statistical sampling size is needed.These key stages are noted here:

    Restructure, reformat, and recondition data

    Identify and correct missing or incorrect data

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    30/53

    Data Blending For Dummies, Alteryx Special Edition24

    Remove or cull out vast amounts of extraneous data

    Determine full or partial dataset size

    Alteryx tools assist the data analyst with these stages by:

    Providing drag-and-drop tools in the form of icons tomanipulate and alter the data formats

    Using transformations to summarize and aggregate thedata

    Addressing the issues with missing or incomplete data

    Automatically setting and assigning field types on data

    Creating and applying formulas to support if/then/elseconditions against the data

    Filtering records that are specific to the data analysis

    Adding spatial attributes to the analysis as needed

    The data cleansing and preparation phase is critical becausethe analyst must determine what data to use and then cleanup the data so that only quality, relevant data is utilized.

    Joining dataJoining data is the phase where the prepared data from variousdata sources is combined together to build the working data-set that will be the subject of the advanced analytics phase.

    At this point, the data is already prepared so that it is freefrom errors, is in usable formats, and is relevant for the spe-cific analysis being performed; but the data is still in separatebuckets from each data source. The analyst must join thatdata together to create a dataset so that analysis can begin.

    Historically, the data scientist or data analyst used the follow-ing processes to join data into a workable dataset:

    Developed manual processes, formulas, and lookups tojoin spreadsheets

    Wrote SQL queries to link and join data based on specificconditions and characteristics

    Leveraged complex, purpose-built data integration tools

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    31/53

    Chapter 3: The Building Blocks of Effective Data Blending 25

    Implemented simple, custom scripts and processes forsmall and well-defined data sources

    Pulled in special, outside tools and processes thatincluded spatial context

    These processes were functional, but they were seldom easyand took a long time to use. Fortunately, Alteryx improved thejoining phase with tools to allow the analyst to:

    Join data of any data type

    Join data at the row level Join data based on multiple fields

    Implement fuzzy logic matching or nonidentical data

    Incorporate spatial context without specialized tools

    Fuzzy logic matchingis matching two datasets based onnonmatching data. Rather than using an exact match, fuzzylogic matches are based on approximate (or fuzzy) values.An example of fuzzy logic matching is creating a customerdataset based on lists composed of sales information andaddresses where the names on the lists may not be exact.Fuzzy logic matching identifies close enough values wherea listing of John, Jon, Jonny, or Johnathan at one address islikely the same person and would process any of those namesas a match.

    Alteryx allows the data analyst to join data easily and intui-tively as part of an overall workflow for data blending andadvanced analytics (see Figure 3-2). Its intuitive, graphicalinterface contains powerful tools that obscure the complexityfrom the analyst. You can see in Figure 3-3 how Alteryx sup-ports data blending via an intuitive workflow process.

    Automation and repeatabilityOnce the data analyst has built the workflow, the same analysisoften needs to be performed daily, weekly, monthly, quarterly,yearly, or on an ad-hoc basis. So, having the capability to rerunthe same analysis without repeating the entire process is ahighly desirable option for the organization.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    32/53

    Data Blending For Dummies, Alteryx Special Edition26

    Figure 3-2: Access, prepare, and join data in an Alteryx workflow.

    Figure 3-3: Alteryx workflow allows data blending steps to be automated.

    Alternatively, the same basic analysis may need to berepeated but with some minor tweaks to account for newbusiness conditions and what-if analysis. For example, the

    analyst may need to apply a set of predictive calculations foreach city in which the company is considering opening a newfranchise.

    Alteryx supports automation and repeatability within itsworkflow processes via:

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    33/53

    Chapter 3: The Building Blocks of Effective Data Blending 27

    Creating macros from analytic steps that are oftenrepeated so they can be added to future workflows

    Allowing choices to be made by the analyst during ananalytic process to support customization

    Scheduling existing processes to be run automaticallywith new data and data sources

    Saving processes into applications (analytic apps) thatcan be shared within the company or community forreuse via Alteryx Server or the Alteryx Analytics Gallery

    Figure 3-4 shows how the Alteryx automated workflow pro-cess supports the creation and reuse of analytic apps, graphi-cal visualization tools to assist decision makers, and customreports that can be repeated on a regular basis.

    Figure 3-4: Workflow automation and resulting outputs.

    Empowering the Analystand Decision MakersThe greatest benefit of data blending is the capability to putaccurate, actionable data in the hands of the analysts anddecision makers in a timely manner. Diving deeper into thebenefits, you can see that data blending:

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    34/53

    Data Blending For Dummies, Alteryx Special Edition28

    Puts the most critical data in front of analysts and deci-sion makers at the right times

    Simplifies the tools used by analysts so that highlytrained and expensive technical people are not neededevery step of the way

    Integrates all data types from across all different sourcesin a rapid, straightforward manner

    Opens up the access to large amounts of data with mini-mal complexity to promote a more complete level ofanalysis with better data and results

    Alteryx empowers line-of-business analysts with an intuitiveworkflow that can easily access, prepare, cleanse, and join allsources of data and deliver critical insight in the fastest timepossible.

    Enhancing decision making withpredictive analyticsPredictive analytics is the processof using complex computationalalgorithms on a dataset to developa model of likely behavior in thefuture. Predictive models and sta-tistics provide a mathematical

    likelihood of an outcome based onthe data in response to a businessquestion.

    The business value of predictiveanalytics is nearly unlimited. Ask abusiness leader if a crystal ball thatcould tell the future would be useful,and the answer would be Yes! Thesame concept applies to predictiveanalytics. Possible business sce-narios include the following:

    What is the likelihood of insur-ance fraud in this group of insur-ance claims?

    What percentage of custom-ers turns to a competitor if youincrease the cost of a new prod-uct by 5 percent in a specificgeographic location?

    What is the probability of singlemales in a specific suburbanarea between the ages of 25 and30 of purchasing a new sportscar in the next year if the carcosts less than $35,000?

    Predictive analytics gives forward-thinking business leaders powerfulinsight into how to best plan for the

    future and take advantage of oppor-tunities. Savvy data analysts willblend data that supports predictiveanalytics and use predictive macrosagainst the dataset to identify theseopportunities for the business.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    35/53

    Chapter 4

    Using Data Blendingin the Real World

    In This Chapter

    Understanding how data blending tools help the analyst

    Knowing in what business situations data blending makes sense

    Observing data blending tools in action

    Examining Alteryxs real-world data blending tool

    By understanding what data blending is and how it works,you can identify its use in real business situations. You

    know that data blending makes the analysts job easier, buthow does that translate into business value in the real world?To understand this, you need to see how data blending isdeployed within businesses.

    Examining the situations of where and how data blending isdeployed in business helps to highlight its value. Identifyingthe characteristics of successful data blending tools allowsyou to accurately realize their benefit and select the right toolwhen the time comes to equip your analysts.

    This chapter looks at how data blending is implemented inthe real world and how it will enable your analysts to have agreater business impact.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    36/53

    Data Blending For Dummies, Alteryx Special Edition30

    Maximizing Business Agilitywith Better Tools

    What is the purpose of better, faster tools in the workplace?Sure, they make peoples jobs easier, but what is the underlyingbusiness reason for doing so? The answer is that by enablingemployees to be more effective, your business becomes faster,smarter, and better able to take advantage of changing marketconditions and new opportunities.

    Success is closely tied to agility and the ability to read chang-ing conditions faster than your competitors. Tools that yourbusiness uses should support the tenets of increased agilityand greater market intelligence. By the same token, toolsand processes that are slow, antiquated, or otherwise dontsupport business agility or market intelligence should bereplaced.

    In the realm of the data analyst, what are the key charac-teristics of tools that support business agility and marketintelligence? The tools of the data analyst should supportintuitive workflows, deeper business insight, and fasteraccess to information.

    Intuitive workflowsData analysts are both creative and logical people; their toolsshould work in harmony with their thought processes. Usingintuitive workflows that flow with the natural processes of thedata analyst ensures that analysts work and creativity are notdisrupted. Essentially, you want tools that work withyour ana-lysts, not againstthem.

    Workflows provide the benefit of documenting what is occur-ring during the data preparation; this functionality makes the

    job of the data analyst easier and more efficient. Unlike cum-bersome spreadsheets that often lead to confusion and dontsupport retracing the steps through a process, analytic toolssuch as Alteryx give analysts the ability to easily understandprocesses using graphical workflows (see Figure 4-1).

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    37/53

    Chapter 4: Using Data Blending in the Real World 31

    Figure 4-1: Alteryxs intuitive workflow for data blending and advanced

    analytics.

    Deeper business insightAnswers to difficult business questions are out there, buriedsomewhere in the data; the challenge lies in whether theanalyst can dig deep enough into the data to find thoseanswers. Tools that bring the power of advanced analyticalfunctions to the analyst and that can interrogate data frommultiple sources are likely to find the answers sought by theanalyst. Simple tools with limited capability will not suffice;tools that perform complex tasks in a simplified manner arewhat you need.

    Faster access to informationIn the military intelligence community, actionable data has ashort shelf life before it becomes irrelevant; the same conceptapplies in business. Tools that allow the data analyst to iden-tify market trends and new business opportunities before thecompetition identifies them have real value for the business.

    On the other hand, if a tool is so complex, large, or cumber-some that any answers it provides are yesterdays news, thattool (and its information) has little value.

    The secrets to business success really arent secrets, but com-panies often lose sight of whats necessary to be successful.The same pitfall holds true for data analysts and their tools;

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    38/53

    Data Blending For Dummies, Alteryx Special Edition32

    analysts tools must support business agility and market intel-ligence in a timely manner with the depth necessary to have

    value.

    Implementing Data Blendingin Business

    Nearly all business departments can benefit from some degree

    of data blending, but some areas will see greater benefitsthan others. Generally, situations where the ability to searchthrough larger amounts of data from different data sourcesto identity trends, patterns, and relationships will be the bestenvironments to use data blending.

    Here are some areas where data blending is commonly used:

    Sales and Marketing

    Finance Operations

    Site and Merchandising Operations

    The key to knowing where to deploy data blending is in under-standing where competitive advantage can be gained by access-ing large amounts of data from different data sources. If, fortheir analysis, analysts need to access and incorporate a mixof data from traditional sources within the company, cloud and

    social media sources, Big Data, and external third-party datasources, odds are that data blending is the right solution.

    Putting the Right Tools in theHands of the Data Analyst

    When selecting tools for the data analyst, you should develop

    a set of key criteria. Desired capabilities and features basedon business requirements create the evaluation yardstickagainst which tools will be judged. You want to establish a setof evaluation criteria to:

    Ensure that all the essential capabilities are present inany tool selected and that no required capabilities areomitted

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    39/53

    Chapter 4: Using Data Blending in the Real World 33

    Separate nice-to-have features from core requirementsto reduce the risk of over-purchasing unnecessary

    components Reduce the likelihood of being swayed solely on a sales

    presentation or marketing materials

    Build a supportable justification for selecting a specificproduct that can withstand the scrutiny of managementand contracting oversight

    As you determine your set of core requirements that must

    be supported by a data analysis tool, make sure that the toolincludes these capabilities:

    Easy-to-use User Interface (UI), intuitive design, and robustsupport and help documentation. The tool must be readilyusable by the analysts; otherwise, it will be discarded ornot fully implemented.

    Able to access all the relevant data regardless of itslocation or type: databases, local files, third-party datasources, Big Data, social media, and legacy systems.

    Data preparation, data cleansing, and data joiningfunctions within the tool to build usable datasets withaccurate, appropriately sized data; 60 to 80 percent ofanalysts time is spent on the data preparation and join-ing steps, so tools to reduce that workload are important.

    Predefined tools for predictive analytics functions andspatial capabilities. Data analysts do not have time andtraining to write code to create these complex tools fromscratch, so leveraging prebuilt functions is a must. Thevendor should provide regular updates to these toolsand functions to increase capabilities.

    Workflow processing and drag-and-drop capabilities toperform the tasks within the tool intuitively. Data ana-lysts work in a creative and logical manner, so workflowsthat support those thought processes will increase the

    analysts productivity.

    Selecting the toolset to be used by data analysts is an impor-tant task, but it doesnt have to be overly difficult. Identifyand document the core business-specific features required bythe analysts and use the preceding list of capabilities to guideyour product evaluation and selection process.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    40/53

    Data Blending For Dummies, Alteryx Special Edition34

    If the opportunity exists, be sure to take advantage of the soft-ware trial periods that many companies, including Alteryx,

    offer prospective customers.

    Using Alteryx DesignerAlteryx Designer is the Alteryx flagship tool for data blendingand advanced analytics. The Alteryx Designer tool is built forthe needs of todays data analyst. The key capabilities of AlteryxDesigner are categorized as:

    Data blending:Accessing and blending data from all therelevant sources. Easy-to-use workflows assist the ana-lyst in data preparation and cleansing to reduce the timespent in preparation phases.

    Predictive analytics:Leveraging more than 40+ prebuiltanalytical tools coupled with the capability to createcustom predictive tools based on open-source R, a pow-

    erful predictive analytics language widely used through-out the world. Analysts are able to perform predictiveanalytics without the requirement to be a highly traineddata scientist or programmer.

    Spatial analytics:Blending location data with advancedspatial analytics give the analyst insight into location-specific functions to better understand customers andanswer business questions. Analysts use intuitive work-flows to uncover geographic relationships and informa-tion that otherwise would be lost in a sea of data.

    Sharing insights:Displaying in an understandable,insightful format the analytical data results so thatbusiness decision makers see relationships and dis-cover new opportunities. Analysts can easily andquickly send results into friendly output formats andtools such as automated custom reports, graphicaldisplays, Microsoft Excel, XML, PDFs, Tableau and

    QlikView visualization files, and more.

    The features supported by Alteryx Designer are rich and engi-neered to support the analyst every step of the way duringthe data analysis process.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    41/53

    Chapter 4: Using Data Blending in the Real World 35

    Alteryx Designer provides easy-to-use workflows that enablethe analyst to be self-sufficient and to accomplish more work

    in less time. Figure 4-2 shows you the Alteryx Designer capa-bility to use a workflow to blend data.

    What you see in Figure 4-2 is the Alteryx Designer workflow toblend data from Salesforce and Marketo into a dataset. In thiscase, the analyst is gathering data from a cloud CRM provider(Salesforce) and a marketing automation system (Marketo) insupport of a sales and marketing campaign.

    Figure 4-2: Easily blend Marketo data and Salesforce.com data using an

    Alteryx Designer workflow.

    Once the analytical processing is completed, many differ-ent output presentation formations are available to the dataanalyst. As shown in Figure 4-3, Alteryx can also create thenative file formats used by Tableau and QlikView, which areespecially effective with senior business decision makers toperform data discovery.

    In Figure 4-3, you see output in the form of a Tableau work-book displaying sales results by categories for stores in a city.

    The visualization aspect allows a clear understanding of salesdata per store based on customer segment. Additionally, aspatial aspect is present with data relating to the customersdistance from their stores, zip codes, and territory assign-ment data.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    42/53

    Data Blending For Dummies, Alteryx Special Edition36

    Figure 4-3: Output directly to a visualization tool. Alteryx supports direct

    output to Tableau and QlikView.

    Alteryx Designer empowers the data analyst to be self-reliantto easily and rapidly perform data analysis against all therelevant data. Using intuitive workflows, prebuilt macros, and

    analytical tools, the analyst answers meaningful businessquestions and puts the results into many formats, includingeasy-to-understand visualization tools.

    More precise than ExcelMicrosoft Excel is a great tool, but the

    wise data analyst knows when to putExcel away and use a tool more suitedto the task at hand. Platforms such asAlteryx match the simplicity of Excelbut exceed its capabilities via:

    Drag-and-drop interface toeasily and quickly create ana-lytic workflows

    Rapid access to data from alldata sources, including complexdata sources from the cloud, BigData, and social media

    Graphical tools that cleanse,

    prepare, and blend data frommultiple sources without requir-ing IT involvement

    Leveraged automation, prebuiltmacros, and spatial capabilities

    More than 200,000 data analystscurrently use Alteryx to outperformthe basic data blending capabilities

    of Excel and use the same intuitiveworkflow for subsequent data analy-sis and reporting.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    43/53

    Chapter 4: Using Data Blending in the Real World 37

    Case study: Data blending in businessDatabaseUSA is a leading providerof sales leads, business and personalmailing lists, email lists, and otherrelated products. DatabaseUSA cus-tomers are the sales and marketingdepartments of large companies thatneed accurate, reliable sales and

    marketing data. In the customer databusiness, data accuracy and reliabil-ity is everything, and DatabaseUSAhas built its reputation on data qual-ity and depth.

    The challenge faced by DatabaseUSAwas building the high-quality, deepdata their customers needed whenfaced with so much extraneous data.

    Somehow, DatabaseUSA had to sortthrough the data of 238 million people,14 million businesses, and 164,000households to build quality datasetsfor its customers. Reducing the time,effort, and cost to build the datasetswere additional requirements.

    Existing DatabaseUSA data blending

    methods were centered on customscripting languages on their internalhosted databases. Using highly skilledIT data specialists, DatabaseUSA wasable to blend data, but the processwas slow, complex, and intensive.

    With millions of rows of raw databeing added monthly, the IT staffwas struggling to keep up with the

    increased workload of bringing in somuch data, cleansing and checking itfor quality, and blending it into usabledatasets. DatabaseUSA opted to

    keep its internally hosted databases(housing its data) but set out to find atool to better perform the data blend-ing and ETL functions it required.

    Already an Alteryx user, DatabaseUSAbegan to leverage the capabilitieswithin Alteryx to:

    Input and update data whileremoving duplicate data

    Blend in data from all its datasources

    Clean and prepare data and testthe data for accuracy

    Standardize data format, values,

    and fields

    In particular, DatabaseUSA likedhow Alteryx could standardize differ-ent data types and remove duplicatedata prior to loading into the com-panys internal databases. The dataintegration capabilities of Alteryxensured that only correct data in

    standard formats was being loaded.Additionally, because of data pri-vacy and security concerns, Alteryxwas deployed to the DatabaseUSAsecure hosted server environment.

    Alteryx helped DatabaseUSA achieveits requirement of loading clean,accurate, and standardized datainto its databases on a regular basis

    amid many possible large externaldata sources. Manual, low-level dataprogramming was replaced by Alterxyfuzzy logic matching algorithms

    (continued)

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    44/53

    Data Blending For Dummies, Alteryx Special Edition38

    resulting in less workload but faster,

    more repeatable processes; databasebuilds are now completed in a weekinstead of a month. Finally, reduceddependency on expensive customIT support is driving down supportcosts while lowering the risk of ITdependency.

    DatabaseUSA had a data blending

    problem with too much varied dataand not enough time and resources toprocess it. Alteryx was able to blendthat data using Designer features tosolve DatabaseUSAs problem whilereducing costs and improving dataquality and processing speed.

    (continued)

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    45/53

    Chapter 5

    Ten Things to Consider withData Blending

    In This Chapter

    Combining analytics with data blending for better decision making

    Unlocking the power of data faster using macros and prebuilt analyticapps

    Deploying spatial and analytical applications to gain real business value

    Understanding the fundamentals of both a technologyand a toolset is key to sustained success. But, once you

    have a working knowledge of the fundamentals, understand-ing the tips, tricks, and special features of the toolset acts as asuccess accelerator to allow you to further maximize benefitsand accomplish your goals.

    This chapter takes a look at some specific ways to optimizedata blending, combined with additional analytical toolsand techniques. The chapter also extends the discussion ofAlteryx Designer (see Chapter 4) to examine how Alteryxhelps data analysts and what it takes to get started using it.

    Integrate Predictive Analytics

    into Your BusinessPredictive analytics enables companies to apply statisticalanalysis functions against datasets to develop probabilities ofa future event or change. In effect, predictive analytics allowscompanies to forecast the future. Used properly, predictiveanalytics identifies new opportunities for companies and helpsreduce existing expenses.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    46/53

    Data Blending For Dummies, Alteryx Special Edition40

    In addition to the robust tools for data blending, Alteryxcontains prebuilt predictive analytics macros and functions.

    Building the right datasets with relevant data blended in ben-efits predictive analytics by generating more accurate predic-tive models. Data blending also reduces the time spent in thedata preparation stage in the predictive analytics cycle.

    Data analysts should be encouraged to unleash their creativ-ity to experiment with the predictive analytical functions intheir daily jobs to help unleash this powerful capability forthe business. Beyond the raw technical capabilities of a tool-

    set, analysts must adapt their mindset to look to predictiveanalytics as part of their job.

    Gain Insights RapidlyA challenge with many technologies is the lengthy time ittakes to start using a new product or technology once it isbrought into the IT shop. Depending on the complexity of thetechnology and the existing knowledge and workload of theIT staff, it can take weeks or months before a new technologyis in use. Furthermore, before a technology really starts pro-viding true business value and making a positive return oninvestment (ROI), it can take many months or even years.

    Many Alteryx clients are pleasantly surprised to find out justhow fast they can start using Alteryx Designer to gain insightsinto their business. Easy-to-understand and easy-to-use graph-ical workflows and prebuilt macros allow clients to performdata blending and advanced analytics to gain business insightmore quickly. In many cases, Alteryx customers report thatthey have found positive business insights within the firstweek after installation. Having a positive initial experiencewith a product greatly enhances the likelihood that a com-pany and its users will embrace it.

    Augment the Tools You Now UseWhy reinvent the wheel? is a clich that has merit, especiallyin IT and technology-related fields. For decades, program-mers have touted code reuse as a benefit of many program-ming languages and code repositories and for good reason.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    47/53

    Chapter 5: Ten Things to Consider with Data Blending 41

    Incorporating into your code a tested and proven programfunction that someone else has written isntlazy; its a wise

    use of time, on your part and your companys.

    The same concept of code reuse applies to data analysts. Whenit makes sense, continue to use tools such as Microsoft Excelbut use Alteryx to augment the processes that require morepowerful tools. Alteryx is very good at all three phases of datablending data access, data preparation, and data joining as well as advanced analytics, areas in which Excel is not asstrong. So, using a tool such as Alteryx to augment Excel makes

    good sense.

    Work behind the ScenesAlteryx tools are designed with prebuilt macros and functionsto make the analysts job easier and more productive, but anentry point for developers to write more complex code exists.Developers can leverage the Alteryx Application ProgrammingInterfaces (APIs) and Alteryx Software Development Kit (SDK)to write their own custom tools within Alteryx workflows.

    Developers can write their own code and leverage .NET andC++ APIs to make calls to the Alteryx Engine. Using the SDK,IT developers are empowered to write their own customcode in C and C++ to further enhance Alteryx to support theircompany-specific requirements. Writing custom code to assistwith data blending further enhances Alteryxs support of theanalyst during the time-consuming data access, preparation,and cleansing stages. For example, you could write a separateapplication to leverage and incorporate the reports generatedfrom Alteryx workflows.

    Consider Spatial AnalysisSpatial analysis is the value-added element to business equa-tions that can make a big difference. Being able to target abusiness question to a specific geographic location insteadof across a broad category allows business decision makersto make key decisions to open new stores or launch cam-paigns in areas with favorable conditions. Spatial analysismakes information actionable, providing information to deci-sion makers that they can act upon. Data analysts have long

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    48/53

    Data Blending For Dummies, Alteryx Special Edition42

    wanted spatial analysis capabilities, but they did not want tocode custom spatial features themselves or to enlist IT to do

    the programming. In other cases, data analysts simply did nothave the spatial data available even if they did have access tospatially aware analytics.

    Alteryx Designer brings spatial analysis to the data analyst intwo ways. First, Alteryx enables easy access to external datasources such as Big Data, social media, and mobile devices thatprovide spatial data. Using data blending features in AlteryxDesigner, the analyst blends the spatial elements needed into

    the larger dataset used for analysis. Second, Alteryx providesprebuilt macros that support spatial analysis so that the dataanalyst doesnt need to handcraft complex spatial functions.

    Leverage Analytic AppsOne of the great benefits of the cloud is the capability to accessthe right application to perform the job you need, when you

    need it, and to do so quickly and securely. Using the AlteryxAnalytics Gallery of apps, your data analyst has access tothe Alteryx cloud of prebuilt analytical applications (apps).The analyst browses the Gallery in the cloud to find an appthat does the job the analyst needs, and then the analystsimply downloads and executes that app. Because the app isdownloaded to the analysts workstation, no data is passedinto the cloud, so the analyst knows that the process is secure.Analysts have the capability to customize or tweak the app as

    needed for the specific task at hand. Plus, analysts can evenwrite their own apps to upload (publish) to the Alteryx AnalyticsGallery to be securely shared within their company or industry.

    You may also choose to deploy Alteryx Server as an effectiveway to share analytic apps with your employees. In addition tooffering the data processing power of a server-based analyticarchitecture, Alteryx Server gives everyone in your organizationthe power and flexibility to run their own analytic apps and

    customize the output to their specific needs while keeping allcomponents of the analytic process behind your corporatefirewall.

    Leveraging analytic apps is another smart way to avoid rein-venting the wheel. Analysts can build on the efforts of othersin the form of analytic apps so they can better focus on theirdata analysis duties.

    These materials are 2015 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.

  • 7/21/2019 Data Blending for Dummies

    49/53

    Chapter 5: Ten Things to Consider with Data Blending 43

    Incorporate VisualizationMany analytical tools perform number crunching well, but theresults are output as raw data or in a format that is difficultfor nontechnical people to understand. Business decisionmakers are often visual people so the capability to displayoutput in a graphical format is important. In the past, dataanalysts have imported that data into presentation tools suchas Microsoft Excel to render more user-friendly graphics.This process works, but the time has come to evolve past this

    tedious process.

    Alteryx Designer supports generating output into a widevariety of formats, including Excel, Extensible MarkupLanguage (XML), and Portable Document Format (PDF) files.Furthermore, the data analyst can generate output to Tableauand QlikView visualization products.

    Visualization products allow data to be more easily understood

    and relationships to be identified and explored, thus allow-ing business decision makers to more quickly identify newopportunities. However, visualization tools are not as strongwith advanced data blending