evaluating and deploying sql-on-hadoop toolsfiles.meetup.com/5717572/meetup-bluedata-3.24.16.pdf ·...

22
EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS Bay Area Big Data Meetup March 24, 2016 Shreyas Subramanya @shreyas_subra

Upload: others

Post on 22-May-2020

22 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS

   

Bay  Area  Big  Data  Meetup  March  24,  2016  Shreyas  Subramanya  @shreyas_subra    

 

Page 2: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Big Data / Hadoop Ecosystem

Hadoop  /  Spark  distribu0on  vendors    

•  Hortonworks  •  Cloudera  •  IBM  •  MapR  •  Pivotal  •  Databricks  (Spark)  

Other  Apache  Projects   SQL  Solu0ons   Third  Party  Apps  

Page 3: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Evaluating SQL-on-Hadoop Tools

1.  DifferenMators  2.  Relevant  features  3.  Performance      

SQL

Page 4: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Trying Out Different Tools

Typical  Approaches  and  Op0ons  1.  Download  virtual  machines  2.  Manually  install  on  physical  machines  (or  on  EC2)  3.  Use  cloud-­‐based  trial  versions  from  vendors      

Page 5: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Each Option Has its Own Challenges

 

•  Download  virtual  machines    

•  Manually  install  on  physical  machines  (or  on  EC2)  

•  Use  cloud-­‐based  trial  versions  from  vendors  

 Typical  Approaches  and  Op0ons    •  Need  a  fairly  beefy  laptop  to  try  mulMple  

products  

•  VM  images  can  be  huge  

•  Manual  installaMon  steps  could  be  tricky  with  dependency  management  and  lack  of  hardware  

•  Reusability  and  portability  (moving  to  producMon)  

•  Security,  cost,  and  scaling  

 Challenges  and  Pain  Points  

Page 6: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

A DevOps Model for Agility & Scale

Performance  &  Scale  

Integrated  Environment  

Mul0-­‐Node  Cluster  

Single      Container  

Page 7: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Docker Containers

•  Each  Docker  image  is  a  package  of  a  complete  runMme  environment,  including  your  soUware,  libraries  and  other  tools  

•  Docker  containers  run  as  separate  processes  in  user  space  on  the  host,  sharing  the  kernel  

•  Eliminates  environment  inconsistencies  

•  Easy  distribuMon,  reduces  development  to  deployment  Mmes  

Source:  hWps://www.docker.com/what-­‐docker  

Virtual  machines   Containers  

Page 8: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Single Docker Container

•  Install  Docker  toolbox  for  your  Mac  or  Linux  machine  •  Download  a  Docker  image  from  Docker  Hub  •  Or  build  your  own  Docker  image  with  a  simple  set  of  instrucMons  

•  Run  it!  

Page 9: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Example: Drill Embedded (Demo)

Page 10: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Example: Drill Embedded (Demo)

Page 11: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Example: Drill Embedded (Demo)

Page 12: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

A DevOps Model for Agility & Scale

Performance  &  Scale  

Integrated  Environment  

Mul0-­‐Node  Cluster  

Single      Container  

Page 13: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Multinode-multiuser systems

•  Clustering  and  orchestraMon  •  Resource,  container  management  •  Networking  •  Storage  •  ApplicaMon  management  (versioning,  upgrades)  •  Template  (infrastructure  as  code)  

Page 14: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Clustering frameworks

•  Puppet,  chef,  ansible  (orchestraMon)  •  Docker  swarm  •  Kubernetes  •  Mesos  •  Amazon  ECS  (cloud  formaMon)  •  Bluedata  

Page 15: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Simplifying Big Data Deployment

IOBoost™  -­‐  Extreme  performance  and  scalability  ElasMcPlane™  -­‐  Self-­‐service,  mulM-­‐tenant  clusters  

DataTap™  -­‐  In-­‐place  access  to  enterprise  data  stores  

Blue  Data  EPIC  SoKware  PlaMorm  MarkeMng   R&D   Sales   Manufacturing  Support  

BI/AnalyMcs  Tools  

NFS   Gluster   Object  Store  Remote  HDFS   CEPH  

Local  HDFS  

Page 16: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Deploying SQL-on-Hadoop Tools

•  Amazon  like  environment  on-­‐premise  

•  Many  big  data  applicaMons  are  available  out  of  the  box  

•  Bring  your  own  apps  

Page 17: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Single Physical Node (Demo)

Page 18: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Running Different SQL Tools (Demo) Spark  SQL  

Drill  

Impala  

Page 19: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Automated orchestration in Bluedata

•  Docker  image  •  Deployment  specificaMon  (metadata)  •  Glue  scripts  •  RegistraMon    

Page 20: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Authoring in BlueData

Page 21: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

Key Takeaways and Next Steps

•  EvaluaMng  SQL-­‐on-­‐Hadoop  tools  can  be  challenging  •  Docker  containers  can  help  simplify  deployment  •  BlueData  enables  a  DevOps  model  for  Big  Data  apps  

ü   Spin  up  instant  clusters  using  Docker  images  ü   Evaluate  mulMple  Big  Data  tools  and  frameworks  ü   MulM-­‐tenant  deployment,  from  dev/test  to  prod  ü   Enterprise-­‐grade  security,  scalability,  performance  

Page 22: EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLSfiles.meetup.com/5717572/Meetup-BlueData-3.24.16.pdf · EVALUATING AND DEPLOYING SQL-ON-HADOOP TOOLS !! Bay!AreaBig!Data Meetup! March!24,!2016!

THANK YOU

www.bluedata.com  Try  BlueData  EPIC  Lite  for  Free:  bluedata.com/free