apache hadoop bigdata-in-banking

30
© tresata all rights reserved Apache Hadoop and the Big Data Opportunity in Banking © Hortonworks Inc. 2011 A webcast from Hortonworks & Tresata

Upload: mhepburn

Post on 22-Nov-2014

5.692 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Apache hadoop bigdata-in-banking

©  tresata      all  rights  reserved  

Apache  Hadoop  and  the    Big  Data  Opportunity  in  Banking  

©  Hortonworks  Inc.  2011  

A  webcast  from  Hortonworks  &  Tresata  

Page 2: Apache hadoop bigdata-in-banking

Presenters  

•  Arun  C.  Murthy  − Co-­‐founder  of  Hortonworks  − Lead  of  NextGen  MapReduce  in  Apache  Hadoop  − Long-­‐Jme  contributor  and  commiKer  to  Apache  Hadoop  

•  Abhishek  (Abhi)  Mehta  − Co-­‐founder  of  Tresata  − Creator  of  first  Hadoop-­‐powered  big  data  &  analyJcs  plaOorm  for  financial  industry  data  

3  

Page 3: Apache hadoop bigdata-in-banking

Agenda  

•  What is Apache Hadoop?

•  Creating Value from Big Data

•  Tresata and Hadoop for Banking

•  Future of Hadoop

4  

Page 4: Apache hadoop bigdata-in-banking

Key  A9ributes  •  Reliable  and  redundant  –  Doesn’t  slow  down  or  loose  data  even  as  hardware  fails  •  Very  powerful  –  Harnesses  huge  clusters,  supports  best  of  breed  analyJcs  •  Scalable  –  scales  linearly  to  handle  “big  data”  volumes  •  Cost-­‐effecDve  –  runs  on  commodity  machines  &  network  •  Simple  and  flexible  APIs  –  enabling  a  large  ecosystem  of  soluJon  providers  

What  is  Apache  Hadoop?  

A  set  of  open  source  projects  owned  by  the  Apache  FoundaJon  that  transforms  commodity  computers  and  network  into  a  distributed  service    

•  HDFS  –  Stores  petabytes  of  data  reliably  •  MapReduce  –  Allows  huge  distributed  

computaJons    

5  

Page 5: Apache hadoop bigdata-in-banking

Core Apache Hadoop Projects

Programming Languages

Computation

Object Storage

Zookeepe

r    (Coo

rdinaJ

on)  

Core Apache Hadoop Related Apache Projects

HDFS    (Hadoop  Distributed  File  System)  

MapReduce  (Distributed  Programing  Framework)  

Hive  (SQL)  

Pig  (Data  Flow)  

HBase  (Columnar  Storage)  

 

HCatalog  (Meta  Data)  

HMS  

(Managem

ent)  

Table Storage

6  

Page 6: Apache hadoop bigdata-in-banking

Apache  Hadoop:  Why  is  it  TransformaDonal?  

Data  Deluge  (growth  faster  than  Moore’s  law)  •  Economist:  Only  5%  of  generated  data  is  structured  

•  Gartner:    Data  growth  is  the  biggest  data  center  hardware  infrastructure  challenge  for  large  enterprises  

•  Forrester:  Four  Vs  -­‐  Volume,  Velocity,  Variety,  Variability  

•  Hundreds  of  exabytes  of  data  per  year!  

 

7  

Source:  IDC  Digital  Universe  Study,  May  2011  

Page 7: Apache hadoop bigdata-in-banking

Apache  Hadoop:  Why  is  it  TransformaDonal?  

New  way  of  thinking  about  your  data:    

•  Current    − What  data  do  I  keep?  − What  reports  do  I  run?  − Sample  –  Store  –  Extrapolate  à  What-­‐If  scenarios  à  Variable  insights    

•  New  dawn  

− Store  everything  (viable  and  economical)  –  Process  whenever!  − Test  every  what-­‐if  situaJon,  prove  every  hypothesis…  do  it  in  a  Jmely  manner!  − No  more  down-­‐sampled  data  

8  

Page 8: Apache hadoop bigdata-in-banking

5  Ways  to  Create  Value  from  Big  Data  

9  

Source:    McKinsey  &  Company  report.  Big  data:  The  next  fronJer  for  innovaJon,  compeJJon,  and  producJvity.  May  2011.  

Page 9: Apache hadoop bigdata-in-banking

Crossing  the  Chasm  

DisrupJon:  Data  

10  

Geoffrey  A.  Moore*  

Page 10: Apache hadoop bigdata-in-banking

Typical  ApplicaDons  &  Early  Adopters  

11  

advertising optimization mail anti-spam

video & audio processing ad selection

web search

user interest prediction

customer trend analysis

analyzing web logs

content optimization

data analytics

machine learning

data mining

text mining

social media

Page 11: Apache hadoop bigdata-in-banking

Big  Data  Has  Reached  Every  Market  Sector  

Source:    McKinsey  &  Company  report.  Big  data:  The  next  fronJer  for  innovaJon,  compeJJon,  and  producJvity.  May  2011.  

Digital  data  is  personal,  everywhere,  increasingly  accessible,  and  will  con:nue  to  grow  exponen:ally  

Page 12: Apache hadoop bigdata-in-banking

Big  Data  Value  CreaDon  OpportuniDes  Financial  Services  •   Detect  fraud  •   Model  and  manage  risk  •   Improve  debt  recovery  rates  •   Personalize  banking/insurance  products  

Healthcare  •   OpJmal  treatment  pathways  •   Remote  paJent  monitoring  •   PredicJve  modeling  for  new  drugs  •   Personalized  medicine  

Retail  •   In-­‐store  behavior  analysis  •   Cross  selling  •   OpJmize  pricing,  placement,  design  •   OpJmize  inventory  and  distribuJon  

Web  /  Social  /  Mobile  •   LocaJon-­‐based  markeJng  •   Social  segmentaJon  •   SenJment  analysis  •   Price  comparison  services  

Manufacturing  •   Design  to  value  •   Crowd-­‐sourcing  •   “Digital  factory”  for  lean  manufacturing  •   Improve  service  via  product  sensor  data  

Government  •   Reduce  fraud  •   Segment  populaJons,  customize  acJon  •   Support  open  data  iniJaJves  •   Automate  decision  making  

13  

Page 13: Apache hadoop bigdata-in-banking

Why  can  you  bank  on  Hortonworks?  

•  Architects  of  Hadoop  since  big  bang,  circa  2006  •  Real  world  experience  supporJng  the  largest  Hadoop  install  

in  the  world:  − 50,000  node  footprint    − Over  200PB  of  data  − 24x7  service    − Billions  of  ad  dollars  

•  We’ve  taken  the  3am  calls  to  fix  stuff  when  it  breaks!  

14  

Page 14: Apache hadoop bigdata-in-banking

15  

Tresata  =  Hadoop  for  Banking  

Page 15: Apache hadoop bigdata-in-banking

What  We  Believe  

16  

1.   Data  will  reboot  the  financial  service  industry  

2.  Data  growth  is  viral…exisJng  tools  can’t  keep  up  

3.  The  economics  to  store,  process,  analyze  and  visualize  all  of  your  data  makes  it  a  ‘no-­‐brainer’  to  do  so  

4.   Big  Data  capabiliDes  are  needed  to  address  business  problems  

Page 16: Apache hadoop bigdata-in-banking

and  What  McKinsey  Said  

17  

most players are smaller, local businesses with a limited ability to gather and analyze information.

A !nal note: this analysis is a snap- shot in time for one large country. As companies and organizations sharpen their data skills, even low-ranking sectors (by our gauges of value potential and data capture), such as construction and education, could see their fortunes change.

Q4 2011Big data sidebar on sector productivityExhibit 1 of 1

Big

dat

a: e

ase-

of-c

aptu

re in

dex1

Big data: value potential index1

The ease of capturing big data’s value, and the magnitude of its potential, vary across sectors.

Utilities

Manufacturing

Health care providersNatural resources

Example: US economy

Professional services

Construction

Administrative services

Management of companies

Computers and other electronic products

Transportation and warehousingReal estate

Wholesale trade

Finance and insurance

Information

Other services

Retail trade

Accommodation and food

Educational services

Arts and entertainment

Government

High

Low High

! "#$%&'()*+'&%',-+*.)(*#/%#0%1'($*.23%2''%)--'/&*,%*/%4.5*/2'6%7+#8)+%9/2(*(:('%0:++%$'-#$(%!"#$%&'&($)*+$,+-'$./0,'"+/$.0/$",,01&'"0,2$3045+'"'"0,2$&,%$5/0%63'"1"'73%);)*+)8+'%0$''%#0%.<)$='%#/+*/'%)(%1.>*/2'6?.#1@1=*?

A#:$.'B%CA%D:$'):%#0%E)8#$%A()(*2(*.2F%4.5*/2'6%7+#8)+%9/2(*(:('%)/)+62*2%%

Size of bubble indicates relative contribution to GDP

6

1 The big data value potential index takes into account a sector’s competitive conditions, such as market turbulence and performance variability; structural factors, such as transaction intensity and the number of potential customers and business partners; and the quantity of data available. The ease-of-capture index takes stock of the number of employees with deep analytical talent in an industry, baseline investments in IT, the accessibility of data sources, and the degree to which managers make data-driven decisions.

Source:    mcksinsey  global  insJtute  october  2011    

Page 17: Apache hadoop bigdata-in-banking

The  Opportunity  

18  

1.   1-­‐5%  of  data  in  a  financial  insJtuJon  is  analyzed  

2.   Not  all  data  is  stored  

3.   Top-­‐down  macros  cannot  be  implemented  or  acted  on  

4.   New  approaches  to  data  &  analyDcs  are  essenJal  

Source:    tresata  research  

Page 18: Apache hadoop bigdata-in-banking

The  Business  Problems  

19  

1.   Storage  &  retrieval  –  archive  data  on  disk  

2.   Consumer  behavior  analysis  –  as  it  happens  

3.   Modeling  &  AnalyDcs  –  model  enJre  data  sets…  

4.   Single  View  Of  Customer  –  structured  &  unstructured  data    

Page 19: Apache hadoop bigdata-in-banking

Hadoop  in  Banking  Today  

1.   Early  adopter,  like  any  other  technology  trend  a.  Interest  is  global,  not  just  limited  to  the  US  b.   Approved  for  use  by  most  technology  teams  &  architects    c.   Proof  of  concepts  ongoing  at  most  financial  services  insJtuJons  

2.   Broad  agreement  that:  a.   Hadoop  will  be  the  Big  Data  operaDng  system  b.  A  hadoop  powered  Data  processing  &  analyDcs  plaaorm  will  spur  

rapid  adopJon  c.  Need  to  apply  to  business  problems  with  revenue/profit  impact  

20  

Page 20: Apache hadoop bigdata-in-banking

Elephants  in  the  Room  

1.   Resources  a.   Talent  –  how  many  MapReduce  developers  can  we  get  b.   ApplicaDons  –  data,  analyJcs  and  business  problems  are  the  same,  

why  should  each  insJtuJon  build  their  own  hadoop  applicaJons  to  run  the  same  processes  

c.   EssenDals  –  how  to  manage  security,  provisioning,  &  performance    

2.   ImplementaDon  a.   IntegraDon  –  fit  with  exisJng  technology  infrastructures  &  business  

processes  b.   Support  –  experience  with  managing  thousands  of  nodes/  servers  c.   Open  Source  –  ‘out  of  the  box’  applicability  

 21  

Page 21: Apache hadoop bigdata-in-banking

How  Tresata  Changes  the  Game…  

22  

1.   Our  ApplicaDon  a.   FS  for  Hadoop  –  fully  built  on  hadoop.    Store,  process,  analyze  and  

visualize    leveraging  the  full  power  of  hadoop  b.   Processing  Pipeline  –  automated  ingesJon,  cleaning,  de-­‐duping,  

matching  engine  built  for  financial  data  c.   AnalyDcs  –  massively  parallel  scoring  and  algorithm  containers  

codified  by  business  problem  

2.   Tame  the  elephants  (making  it  work  at  scale)  

 

Page 22: Apache hadoop bigdata-in-banking

Tresata  Case  Study  

23  

A.  Client  Business  Problem  i.  Problem  –  Process  data  and  score  for  >30  MM  client  applicaJons  ii.  Data  Sources  –  23  separate  data  sources,  mulDple  Dme  series  iii.  Raw  Variables  –  100  variables  per  client  per  data  source  iv.  Current  state  -­‐  expensive  legacy  plaOorm,  algorithms  developed  on  

sub-­‐samples,  unable  to  scale  algorithms  to  full  data  set  

B.   Tresata  SoluDon  i.  Data  Engine  –  automated  data  import,  cleaning,  matching,  scoring  ii.  Compute  Engine  –  algorithms  process  &  score  >30MM  in  minutes  iii.  IntegraJon  –  work  with  exisJng  tools  and  processes  iv.  Scalable  deployment  –  Big  Data  as  a  Service  delivery  

 

Page 23: Apache hadoop bigdata-in-banking

Tresata  Big  Data  ApplicaDon  

24  

Page 24: Apache hadoop bigdata-in-banking

Tresata  AnalyDcs  

25  

Page 25: Apache hadoop bigdata-in-banking

Tresata  +  Hortonworks  

26  

1.   Commitment  to  train  a.  Series  of  webinars  b.  Hadoop  training  tailored  for  financial  services  c.  Dedicated  training  programs  (tailored  for  client  needs)  

2.   Commitment  to  support  a.  ProducJon  scale  support  model  (Tresata  cerJfied  for  Financial  Svcs)  b.  Meet  and/or  exceed  industry  standards  on  distro    

Page 26: Apache hadoop bigdata-in-banking

©  tresata      all  rights  reserved  

Where  Hadoop  is  Going  

©  Hortonworks  Inc.  2011  

Page 27: Apache hadoop bigdata-in-banking

The  Future  

•  Hadoop  is  going  mainstream  − Amazon,  Microsou,  Oracle,  EMC,  NetApp,  etc.  

•  Apache  Hadoop  is  covering  new  ground  (Hadoop  .Next)  − Much  of  the  development  led  by  Hortonworks  − More  scale,  more  performance  − Other  paradigms  than  MapReduce  for  data  processing  − Enhanced  operability  and  management  (Ambari)  − Metadata  management  (Hcatalog)  

28  

Page 28: Apache hadoop bigdata-in-banking

Technology  Roadmap  

 

Hadoop.Now  –  Making  Apache  Hadoop  Accessible  •  Release  the  most  widely  deployed  version  of  Hadoop  ever  (0.20.205)  •  Release  directly  usable  code  via  Apache  (RPMs,  .debs…)  •  Frequent  sustaining  releases  off  of  the  stable  branches  

 

Q4  2011    

 

Hadoop.Next  –  Next  GeneraDon  Apache  Hadoop  •  Address  key  product  gaps  (HBase  support,  HA,  Management…)  •  Enable  community  &  partner  innovaJon  via  modular  architecture  &  

open  APIs  •  Work  with  community  to  define  integrated  stack  

 

2012  (Alphas  starJng    late  2011)  

29  

Page 29: Apache hadoop bigdata-in-banking

Next  Steps  

•  Engage  with  Hortonworks  &  Tresata  − www.hortonworks.com  − www.tresata.com  

•  AddiJonal  Webcast  Series  − Reference  Architecture  for  Hadoop  in  Financial  Services  -­‐  Nov  15  − Hadoop  in  Financial  Services  DEEP  DIVE  -­‐  Dec  6  − www.hortonworks.com/webcasts  

•  Hortonworks  party  @  Hadoop  World  − Tuesday  Nov  8  at  7pm  − Inc  Lounge  at  the  Time  Hotel  − RSVP:  hortonworks.eventbrite.com  

30  

Page 30: Apache hadoop bigdata-in-banking

©  tresata      all  rights  reserved  

QuesDons?    

©  Hortonworks  Inc.  2011