ibm predictive analytics for defence · pdf fileibm predictive analytics for defence michael...
TRANSCRIPT
© 2012 IBM Corporation
Business Analytics
IBM Predictive Analytics for Defence
Michael Scruggs – North America Predictive Analytics Leader IBM Business Analytics
© 2012 IBM Corporation
Business Analytics
2
§ A leading provider of predictive analytic software, services and solutions
– Software – data collection, text and data mining, advanced statistical analysis and deployment technologies
– Services – implementation, training, consulting, and customization
– Solutions – combine software and services to deliver high-value line-of-business solutions; used for optimizing marketing campaigns, call center effectiveness, identification of fraudulent activity and more
§ Over 40 years of experience and a broad customer base – 250,000 customers: 100 countries, 50 states, 100% of top universities – Widely used throughout US DoD and US Intelligence
§ Non-proprietary approach – Non-intrusive integration (Services Orientated Architecture) – Database agnostic – Leverages existing operational software, IT investments, and custom analytic assets
SPSS Predictive Analytics
© 2012 IBM Corporation
Business Analytics
Capture Predict Act
…
…
Data Collection/ Access
Deployment Technologies Platform
Build Content
Statistics Text Mining
Data Mining
Data Collection/Access delivers an accurate view of history, behaviors, attitudes, and opinions
Predictive capabilities bring repeatability to ongoing decision making, and drive confidence in your results and decisions
Unique deployment technologies and methodologies maximize the impact of analytics in your operation
Enabling Predictive Analytics Across the Enterprise
© 2012 IBM Corporation
Business Analytics
SPSS Support of Various Mission Areas
§ Security • Insider Threat Detection • Cyber Threat Detection – anomaly detection, attribution, malicious detection • IED Risk Assessment • Fraud Detection – vendor pay, credit card, reimbursement • Law Enforcement – criminal investigation support, crime hot-spot anticipation, violent crime risk
assessment • Identification of Suspicious Inbound Cargo • Early Disease/Bio-terror Outbreak Identification
§ Human Capital • Retention – predict personnel likely to attrite, identify incentive most likely to lead to retention, measure
satisfaction • Planning – forecast skill-set requirements, forecast n-strength requirements • Training – pre and post testing and assessment • Recruiting – target market, identify recruits most like to attrite vs. serve long careers
§ Logistics • Optimize Inventory Routing and Supply • Readiness – detect operational shortages and patterns/trends, forecast mission success outcomes • Maintenance – predict part failures, total lifecycle cost forecasting
© 2012 IBM Corporation
Business Analytics
National Geospatial Intelligence Agency
Department of Energy
Defense Intelligence Agency
National Security Agency US Intelligence Agency
Customs and Border Protection Federal Bureau of Investigation
A Sample of Customers…
Naval Special Warfare Command
Office of the Director of National Intelligence
US CERT
© 2012 IBM Corporation
Business Analytics
Navy § inFADS § SPAWAR BUMED § NAVFAC HQ FM § NAVFAC HQ MILCON § NETC § NAVAIR § OCHR § CNIC § USFF (NRRE, DRRS-N) § NAVFAC – NITC § NAVSISA - OCHR § SPAWAR SSC SD
Army § NGB (Enterprise) § USARC (Enterprise) § HR Command (Enterprise) § TRADOC – SIDPERS-3 § DIMHRS § AMC - ARDEC § FORSCOM § IMCOM § Aberdeen Testing Center § Corps of Engineers
Air Force
USMC
§ Manpower and Reserve Affairs
§ Programs and Resources § Logistics Command § Systems Command § Marine Corps Community
Services
Other § DCMA § MDA § SOCOM § DISA § DFAS § DSCA § DoD - DRRS
§ Materiel Command (Wright Patterson AFB)
§ Secretary of the Air Force Installations and Logistics (Pentagon)
§ Education and Training Command (Randolph AFB)
§ Safety Center (Kirtland AFB) § Office of Special
Investigations (Andrews AFB) § Communications Agency
(Scott AFB) § Cost Analysis Agency (Hill
AFB)
Sample IBM Business Analytics DoD/COCOM Customers
COCOM
§ USTRANSCOM – SDDC (CAB, iSDDC) – AMC – UST J8
§ SOCOM (SORBIS) § Joint Chiefs of Staff
© 2012 IBM Corporation
Business Analytics
Challenge – Identify and reduce security threats from employees
• Malicious insiders can have a devastating impact, including violation of confidentiality, undermining of intelligence integrity, adverse influence on US policy, the revelation of sources and methods, and the death and compromise of field agents.
• Challenged by having numerous employees with authorized access to very sensitive data • Need to identify source of information leaks to the press or groups outside of the
organization • Need to identify anomalous or potentially malicious behavior to avoid significant loss of
information or threats to security.
Solution – SPSS Modeler and Text Analytics • SPSS Modeler used for data preparation and data modeling, including:
• aggregation of data and comparison by date, location, and peer group
• use of segmentation algorithms, temporal analysis functions, and association (market basket) algorithms
• SPSS Text Analytics for determining subject matter and content of chat logs, e-mails, and opened, printed, or saved documents.
Customer Story | U.S. Intelligence Agencies
Intelligence – Insider Threat
© 2012 IBM Corporation
Business Analytics
Benefits § Learn and model normal access and behavior patterns by mining network connection data
§ Enable detection of anomalous, likely-threatening patterns
§ Let data derive activity, versus historical alarms and hard-coded business rules
§ Continuous, unattended monitoring of network traffic, and resultant refresh of the anomaly detection models is required
§ Trend activity over time and forecast expected behavior
§ Associate disparate activity as part of larger intrusion
§ Network defenses can react to evolving threats.
Customer Story | JTF-GNO and other U.S. Intelligence Agencies
Cyber Security
© 2012 IBM Corporation
Business Analytics
Background/Need • Required to enlist over 100,000 men and women each year • Need to prioritize over 600,000 leads to better focus the time of Recruiters on the candidates most likely to succeed • Need to minimize attrition of recruits/personnel at all stages of their career lifecycle
Solution • Incorporate Predictive Analytic techniques and technology to support…
• The creation and deployment of lead prioritization models that rank order leads for recruiters based on the highest likelihood for contract
• The creation and deployment of attrition models that score personnel for each individual’s likelihood to attrition, at what stage, and why
Results/Benefits • Recruiters can now target 20% of their qualified lead applicants to get 75% of their monthly contracts • Recruiters now spend 5 times less work to achieve applicant quotas…ultimately resulting in higher levels of quota
achievement and better qualified applicants • Retention Officers can now prioritize their efforts and know, at an early stage, which individuals are likely to attrite
and why • For individuals the organization wishes to retain, an offer can be made with the best “predicted” incentive option to
retain each individual • By reducing the amount of attrition, If only 500 less recruits are needed per year, it saves over $9 Million annually
($18K per enlistee) in recruiting and training costs
Customer Story | U.S. Army Recruiting Command
Personnel and Human Resources
© 2012 IBM Corporation
Business Analytics
Challenge – Improve logistics readiness § Reduce ownership cost over the life of a system § Reduce unscheduled downtime § Determine which parts and operating conditions are associated with higher frequencies of system failure
§ Identify the root cause of part failures from the free-form text entered in maintenance and repair logs Solution – SPSS Modeler and Text Analytics § Use data preparation functions to create low, medium, and high cost bins for past system failures. § Use association and sequence detection algorithms in order to identify the parts or events that have a tendency to
lead to “high cost” repairs.
Customer Story | U.S. Navy – Naval Surface Warfare Center
Logistics and Readiness
© 2012 IBM Corporation
Business Analytics
§ IBM® Cognos® Insight § IBM SPSS® Modeler § IBM Global Business Services® –
Business Consulting Services § IBM Research
A global aerospace manufacturer uses analytics and modeling to predict and avoid program delays and cost overruns
50% increase in ability to identity and predict overall schedule risk
Solution Components
Smarter Analytics
Business Challenge: This global aerospace manufacturer needed a more analytical, data-driven approach to program management to predict and prevent program risk, particularly project delays and cost overruns, and to better understand the factors that lead to risk. The Smarter Solution: The aerospace manufacturer now uses predictive analytics on structured and unstructured program data to identify and predict where and when key causal factors may trigger program risk. Early warnings, critical-path modeling and advanced visualization allow managers to evaluate and implement options to prevent risk. “This solution helps minimize the human biases and shortcomings that enter into manual program management functions by maximizing the fact-based, data-intensive approach of predictive analytics.” —President, military aircraft division
10x better at predicting slippages of more than 100 days
~USD6million saved by avoiding missing a delivery deadline by even one month
© 2012 IBM Corporation
Business Analytics
12
Appendix – Extra / Supporting slides
© 2012 IBM Corporation
Business Analytics
13
The Science
Unsupervised § Clustering – Detecting unusual provider billing activity
§ Association Detection - Discovering unusual groupings of procedure codes billed to a recipient ID
Supervised § Sequence Detection - Detecting unusual sequence of procedure codes billed to a recipient ID
§ Rule Induction - Building ‘classifiers’ trained to score provider activity based on similarity to known fraud patterns
© 2012 IBM Corporation
Business Analytics
Predictive Analytics in preventing Fraud, Waste, and Abuse
Payment/ Investigation
Unsupervised/ ‘Discovery’ techniques
Select by: • Prov type • Date • Location • etc
Investigation will ‘flag’ fraudulent activity
Supervised/ ‘Rule Induction’ techniques
© 2012 IBM Corporation
Business Analytics
IBM SPSS Modeler Counter-Intelligence Example
15
© 2012 IBM Corporation
Business Analytics
Data Preparation and Transformation
16
Blog data is added as a data source and passed
to the Text Analytics algorithm.
© 2012 IBM Corporation
Business Analytics
Three-Tiered Approach to Text Extraction
17
Using a combination of Natural Language Processing algorithms, Non-Linguistic Entity Extraction
techniques, and Custom Dictionaries, names of people (among many other entities) are extracted.
© 2012 IBM Corporation
Business Analytics
Merging Structured and Unstructured Data
18
Watchlist data is merged with the extracted names of people from the Counter-intelligence blogs. If a name mentioned in the blog, matches a name on the watchlist, the data from the
blog containing the matched name is passed forward for further analysis…specifically, a Text Link Analysis algorithm
to define entity relationships at the sentence level.
© 2012 IBM Corporation
Business Analytics
Text Link Analysis
19
Synonymous terms are automatically grouped as “concepts”. Similar concepts are grouped as “types”. Text Link Analysis directly links concepts together, such as people, actions, locations, and dates
Raw data
Text Link Analysis describes relationships between concepts/entities at the sentence level, as well as any
opinions, sentiment, or qualifiers associated with these concepts/entities.
© 2012 IBM Corporation
Business Analytics
Association Algorithm
20
In the highlighted example: if an individual’s
current location is “afghanistan” and they are associated with “Zargar, Mushtaq Ahmad” and they were born in “afghanistan”
then they are likely to be associated to the group: “abu sayaff”
The results derived from Text Analytics and the Watchlist data are subsequently passed to an Apriori Association algorithm to help “Predict” relationships that are likely to exist, but are not present in the data.
© 2012 IBM Corporation
Business Analytics
Operationalize and Deploy
21
The complete analytic process can be run manually, or operationalized to run in batch mode, on a scheduled or event basis, or fully integrated as a service into a
larger production-level system (SOA)
New data can be analyzed through the Text Extraction and Text Link Analysis algorithms, and automatically scored
through the Association Model to provide case-by-case “Predictions” for each
record.
© 2012 IBM Corporation
Business Analytics
4 Key Categories Data Collection
Delivers accurate view of customer attitudes & opinions • IBM SPSS Data Collection
Statistics Drives confidence in your results & decisions
• IBM SPSS Statistics
Modeling Brings repeatability to ongoing decision making
• IBM SPSS Modeler • IBM SPSS Text Analytics
Deployment Maximizes the impact of analytics in your operation
• IBM SPSS Collaboration & Deployment Services • IBM SPSS Decision Management
IBM SPSS Predictive Analytics Software
© 2012 IBM Corporation
Business Analytics
Predict: IBM SPSS Statistics
§ Advanced statistics and data management for research
§ Collection, preparation, analysis, interpretation, explanation and presentation of data
§ Provides insight into a sample of data and tools for prediction and forecasting based on the data
§ User driven analysis – Descriptive – Inferential
Drives confidence in your results and decisions
© 2012 IBM Corporation
Business Analytics
Predict: IBM SPSS Modeler § Workbench with data preparation
functions to build analytic streams or jobs and a run time environment for job execution
§ Set of data mining algorithms that provide insight and prediction
§ Enables the discovery of key insights, patterns and trends in the data
§ Highly intuitive interface for SMEs and Comprehensive enough for the power user or statistician
Brings repeatability to ongoing decision making
© 2012 IBM Corporation
Business Analytics
Predict: IBM SPSS Modeler: A Few Highlights
§ Auto Modeling
§ Auto Data Prep
§ Integration with Statistics
§ Read from and write to XML
§ Additional in-database mining support
© 2012 IBM Corporation
Business Analytics
Predict: IBM SPSS Text Analytics
§ Uses natural language processing heuristic rules and statistical techniques to reveal conceptual meaning in text
§ Extracts concepts from text and categorizes them
§ Makes unstructured qualitative data more quantifiable, enabling the discovery of key insights from sources such as survey responses, documents, emails, call center notes, web pages, blogs, forums and more
Brings repeatability to ongoing decision making
© 2012 IBM Corporation
Business Analytics
Discover critical information with Text Mining
© 2012 IBM Corporation
Business Analytics
© 2012 IBM Corporation
Business Analytics
Act: Collaboration & Deployment Services
§ Analytic content management repository – Version control – Powerful search
• Analytic awareness – Security and auditing
§ Process management – Multi-step jobs – Conditional job flow – Scheduling – Automated model evaluation
• Champion - challenger – Open integration
• SPSS tools and non-SPSS tools
§ Integration & delivery interfaces – Reporting – Automatic delivery of analytical output – Multiple IT infrastructure integration options
• Web services, authentication, and database interfaces
© 2012 IBM Corporation
Business Analytics
Predict & Act: Decision Management
§ Workflow oriented approach that allows business analysts to optimize operational decisions
– Decision centric user interface Vs. analytics centric
– Combine business logic / rules with predictive models
– Completely customizable – Quick start with sample applications for
target business problems – True web based architecture – Fully integrated with the SPSS product
portfolio
© 2012 IBM Corporation
Business Analytics
© 2012 IBM Corporation
Business Analytics
Predict & Act: Decision Management Scenarios
© 2012 IBM Corporation
Business Analytics
33
1. Data Access and Normalization 2. Apply Advanced Analytic Algorithmic Techniques
3. Automation and Deployment into Production
Advanced Analytics Overview – Three Primary Phases
© 2012 IBM Corporation
Business Analytics
© 2007 SPSS Inc.
Did you know… …IBM SPSS Advanced Analytic techniques enable easy access to ANY format of data, without writing
SQL or other query scripts
…IBM SPSS Advanced Analytic techniques provide visual menus and easy access to virtually any data
manipulation function This means… …time spent preparing data for modeling and
analysis (80% of any project) is greatly reduced
…SME’s can perform advanced analytics without DBA knowledge
Customers have experienced a 70-80% reduction of alarm data
Cyber Threat Data Challenges
§ Too Large of Volume – Streaming – Billions of daily records
§ Meaningless in Raw State
– ip address vs. classified octets – recode timestamp into larger bin
§ Unstructured Often Ignored
– Open source, rss feeds, email content
§ Difficult to Access Cyber Specific Data – PCAP, HTTP/TCP headers, hash files,
click-stream, IPS/IDS logs,
© 2012 IBM Corporation
Business Analytics
§ Classification Algorithms - “Prediction” – Decision Trees, Rule Induction, Forecasting, etc. – “Predict” the outcome based on various variable inputs – “Discover” behaviors, characteristics from data that result
in an outcome
§ Segmentation Algorithms – “Clustering”
– Anomoly Detection, K-Means, Kohonen, etc. – An exploratory approach to group cases together (or
apart) based upon characteristics or behavior
§ Association Algorithms – “Link-Analysis”
– Discover variables that occur together or are “likely” to occur together
– Usually include multiple antecedents and multiple consequents
– May also include SEQUENCING of associations
© 2007 SPSS Inc.
Primary Classes of Advanced Analytic Algorithms
© 2012 IBM Corporation
Business Analytics
§ Traditional Statistical Approaches – Discriptive, Inferential, Regression, etc.
§ Text Mining Algorithms – 3 Tier Approach: Natural Language Processing, Non-Linguistic Entity Extraction, and Custom Dictionaries – ANY Unstructured Source: blogs, xml traffic, email, etc. – Includes Sentiment, Symbolics, and Text Link Analysis
§ 3rd Party Techniques – R, Python, C, SAS, etc.
§ Automated Techniques – Model Selection, ensembles, split models, etc. for model building – Self-learning and Champion/Challenger functions
© 2007 SPSS Inc.
Primary Classes of Advanced Analytic Algorithms (continued)
© 2012 IBM Corporation
Business Analytics
Automation and Deployment into Production
§ Batch
§ Scheduled Jobs (time or event based) § Export PMML (XML) or .exe § Deploy as a service feeding reports
§ Invoke self-learning & “champion/ challenger “modeling
into streaming data
into sensors
© 2012 IBM Corporation
Business Analytics
§ Raw Data Challenges – Simultaneous access to ANY data source/format/language…Oracle, MarkLogic, DB2, SQL, etc. – Multiple approaches for DATA NORMALIZATION and MANIPULATION – Scale to handle MASSIVE amounts of data
§ Actionable Information Generation Challenges – Deep breadth of algorithmic OPTIONS…Segmentation, Association, Classification, Traditional Statistics – Incorporate CUSTOM or 3RD PARTY analytic techniques…including R, Python, SAS, etc. – FLEXIBILITY to tweak analytics to each unique situation…vs. hard-coded, custom built for one, unique situation – Invoke SELF-LEARNING, AUTOMATED analytic processes
§ Collaboration, Deployment, and Integration Challenges – SHARE best practices, functions, or analytic processes across analysts…SPSS and 3rd Party or Custom – OPERATIONALIZE and AUTOMATE analytic processes based on NON-PROPRIETARY standards... XML, SQL, etc. – Deploy actionable information as a SERVICE to support each unique request…Services Oriented Architecture – Integrate Scoring and Model Building directly into Real-time Processes…IBM STREAMS
§ Implementation and Learning Challenges – SHORT IMPLEMENTATION – days/weeks to implement vs. months/years – SHORT LEARNING CURVE – custom & standard training; onsite or offsite – KNOWLEDGE TRANSFER through SPSS or Partner consultative services – Access to over 44 YEARS of SPSS’ expertise… and now IBM Research
IBM SPSS Advanced Analytics’ Immediate Value-Add Empower the Cyber Analyst to Perform Advanced Analytics…
… while rapidly creating an Advanced Analytic Application that easily Embeds into an Operational Process and Architecture