big data to big understanding
TRANSCRIPT
BIGData
Understandingto
From
COCHRANE a s s o c i a t e s
cochrane.org.uk c a - g l o b a l . o r g
Peter Cochrane
WHYAll T h e E x c i t e m e n t ?
A new tool addresses some of our biggest challenges:
-A ful ly networked, connected and increasingly open world -Disparity of disconnected/si loed discipl ines and industr ies -Rapid r ise of complexity far exceeding human abi l i t ies -Gross fai lure of old thinking, models and methods -Rise of non-l inerity and emergent behaviour -Displacement of technologies and people -Growth of interdiscipl inary relationships -Acceleration of technology and change -Freedom of data and information -Global isation of everything - +++++
WHYThe B I G D e a l ?
These problem sets are way beyond:
-The desk top -Past thinking -Spread sheets -Old data bases -Simple analysis -Basic mathematics -Simple programming -Relational data bases -Al l our past experiences - +++++
WHY Is I t a l l s o I m p o r t a n t ?
We are in a new era of:
-Novel causal it ies -Correlation discovery -More powerful computer models -Unusual and unexpected solutions -Extremely rare event identif ication -Unexpected behaviours and outcomes -Original classes of relationship discovery -Degree of freedom reduction from six to two -Rare and improbable association identif ication -Previously unseen classes of objects/behaviours -Essential element creation for sustainable futures - +++++
FREEDOM What D e g re e o f S e p a r a t i o n ?
Analogue World D i s c o n n e c t e d Insulated Society
Digital World C o n n e c t e d N e t w o r k e d
People
Organisat ions
Machines
Things
<6
<5
∞
∞
<3
<2
<10
<30
The smaller the separation the bigger the networking
and data generated
BIG Global Dynamic D i s t r i b u t e d
We have to addresses new sets of issues and dimensions created by
a fast moving digital world that have remained largely unseen, untapped, and of a scale and complexity never seen before
SCALE! Beyond H u m a n C a p a c i t y
A small sample of our formidable challenges:
-Sustainabi l i ty -Genome Decode -Protein structure/folding -Social network associations -Genome-protein communication -Disease causal ity and propagation -Global/National medical records analysis -Seismic analysis for raw material location -Money laundering and tax avoidance tracing -Terrorist and criminal activ ity characterisation -Astronomical data analysis and ‘body’ c lassif ication +++++
SIZE How M u c h D a t a ?
0 2000 2005 2010 2015 2020 2025 2030
1000
100
10
1
0.1
0.01
0.001
More Data Created in 2002 than al l of t ime up to that point !
Spread of data
creation est imates
Beyond Moore’s Law exponen t i a l g rowth that is Aaccelerating
EBytes/day
WHERE Does I t a l l C o m e F ro m ?
Things Pe o p l e M a c h i n e s E d u c a t i o n H e a l t h C a re I n s t i t u t i o n s G o v e r n m e n t C o m m u n i c a t i o n + + + + + + + + + + + + +
Tr a n s p o r t a t i o n N e t w o r k i n g
C o m m e rc e B u s i n e s s S e c u r i t y Po l i c i n g S c i e n c e
M e d i a + + + +
C o m p e t i t i o n E x p l o r a t i o n Re s e a rc h M a r k e t s G re e n S o c i a l O p e n A p p s + + + +
ANALYTICS Structured U n - S t r u c t u re d S e m i - S t r u c t u re d
Applicabil ity:
-Retai l -Science -Banking -Security -Defence -Medicine -Wholesale -Commerce -Production -Technology -Manufacturing
Manufacturing Government Exploration Inst itut ions Resourcing Innovation Education Creativity
Logist ics Energy +++++
ANALYTICS Lots o f D e t a i l v Re l a t i o n s h i p s
Data Mining Data Micro-View Data & Detai l
Big Data Macro-View Relationships
Limited C o n t a i n e d C o n s t r a i n e d
Expaning Te n d i n g t o
T h e I n f i n i t e
HUH? Knowns U n k n o w n s U n k n o w n U n k n o w n s
The many problems:
-Certain and wel l defined chal lenges
-Suspected or manifest in some way, but i l l def ined
-To be discovered, become apparent, present problems
-Primary l imitations are our abi l i ty to detect and characterise
-Secondary l imitations include our inabi l i ty to recognise s ignif icance
-Causal ity, probabl ity, statist ics conspire to conceal , confuse and tr ick us!
- +++++
TRUTH Is It S t a t i c Ve r a c i o u s ?
The Earth is: - F l a t - S t a t i c - S p h e r i c a l - A n o b l a t e s p h e ro i d -T h e c e n t r e o f u n i v e r s e -T h e a x i s o f t h e s u n a n d s t a r s - C e n t r e o f u n i v e r s e
P l a n t s : - G ro w o u t o f t h e s o i l - H a ve n o s e n s o r y f a c i l i t y - C a n n o t g ro w w i t h o u t l i g h t
The ab i l i t y to o b s e r ve , measure and model with increasing accuracy creates dynamic and more relevant truths in line with our growing knowledge & reality
In general ‘truth’ is dynamic and not a f ixed entity - it mutates as we
gather more information and create deeper
understandings
HUMAN Limited Re a s o n i n g a n d A n a l y s i s !
Big Data scale and complexity:
-Render Big Data beyond human abi l i t ies alone
-See structured and relational databases fal l ing far short
-Make crude correlation and association analysis inadequate
-Includes many disparate/hidden relationships that are confounding
-Introduces mult i -dimensional visual isation/conceptual isation diff icult ies
-Extends analysis beyond ‘Order 5’ mathematical models/general methods
- +++++
THEORYWe D o N o t H a v e O n e !
Big Data really needs a Big Theory:
-Complexity confounds us -There are no general ised solutions -There is no suitable math framework -To some degree we are working partial ly bl ind -We can only use what we have already establ isged -Computer model l ing/s imulation/analogues can be used -Hypothesis test ing and experimental tr ials are often vital - +++++
We Have ‘NO” G e n e r a l P u r p o s e To o l s / M e t h o d o l o g i e s
SYMBIOSIS A New P a r t n e r f o r M a n k i n d
Joining forces with m a c h i n e s a p p e a r s t o b e t h e o n l y v i a b l e f u t u re
Vital S y m b i o s i s & A u g m e n t a t i o n
LowHigh
Low High
MAN AIHighLow
Analysis Modelling Processing
Mathematics Computation
History Intuition
Creativity Experience
Dimensionality
+
PICTURESVary with people, things +++:
-Social - Intent -Interest -Mobil i ty -Browsing -Expertise -Ownership -Connectiv ity -Consumption -Communication - +++++
In F o r m & S c a l e
PICTURES We M i g h t C o n j o u r !
This wil l vary with people, things, organisation:
-Social - Intent -Interest -Mobil i ty -Browsing -Expertise -Ownership -Connectiv ity -Consumption -Communication - +++++
TOOLSGenerally beyond the abil it ies of most companies:
-Graph theory -Hash f i l ter ing -Causal ity test ing -Weighted mapping -Trajectory projection -N Dimensional s i ft ing - +++++
Key to S u c c e s s S p e c i a l i s e d
THE # Key to A n a l y s i s B y Tr a n s f o r m a t i o n
Very weak to strong relationship identif ication:
-General ly appl icable -Reveals subtle relationships -Effective for small and big data -Weeds out/reveals concealed l inks - +++++
MAPPING Path Tr a c i n g Re l a t i o n s h i p s
N-Dimensional relationship characterisation:
-Easi ly understood -General ly appl icable -Fits human perception -Reveals subtle relationships -Effective for small and bid data -Weeds out/reveals concealed l inks - +++++
VISUALISATIONExploit graphics & display technology
-Big -Clear -Static -Colour -Expl ic it - Intuit ive -Animated -Interactive - +++++
In E v e r y D i m e n s i o n
ANIMATIONS Add New D i m e n s i o n s G re a t e r U n d e r s t a n d i n g
Search: Hans Roling @ TED Hans Rosling
Gapminder
GENESIS The J o u r n e y H a s J u s t B e g u n
Big Data and Complexity need generalised theories l ike physics
needed thermodynamics and quantum mechanics
UNDERSTANDING Never B e e n s o D i f f i c u l t
Our s ingle biggest chal lenge as a species:
-Our past was bui lt on it -Our future depends upon it -We are not evolving to be any smarter -Ult imately we are l imited by our tools -Techno l ogy i s ou r on l y s u r v i va l rou t e
END GAME Wisdoms K n o w l e d g e U n c e r t a i n t y
Our big conceptual challenge: moving on from a history of ‘static truths’ and mostly clear and certain answers to a world dominated by the probabal ist ic and uncertain where ‘the truth’ has to be updated and rewritten
Our primary tool here: Increasingly powerful instruments of observation and m e a s u r e m e n t , c o m p l e m e n t e d b y computer deep model l ing and simulation