strata+hadoop world ny 2016 - avinash ramineni
TRANSCRIPT
StrataHadoopWorld|NewYorkCity|September29th,2016
Choice Hotels’ journey to better understand its customers
through self-service analytics
NarasimhanSampath&AvinashRamineni
Agenda
• Who is Choice Hotels
• Platform Architecture
• Implementation
• Value Add
StrataHadoopWord|NewYorkCity|September29th,2016Page3
Who is Choice Hotels?
UnitedStates&CaribbeanHotelsopen 5,276Hotelsunderdevelopment 606Roomsopen&underdev. 446,813
CanadaHotelsopen 323Hotelsunderdevelopment 45Roomsopen&underdev. 30,135
SouthAmericaHotelsopen 64Hotelsunderdevelopment 7Roomsopen&underdev. 9,737
AsiaPacificHotelsopen 315Hotelsunderdevelopment 25Roomsopen&underdev. 23,289
EuropeHotelsopen 402Hotelsunderdevelopment 31Roomsopen&underdev. 50,388
MexicoHotelsopen 28Hotelsunderdevelopment 4Roomsopen&underdev. 3,219
CentralAmericaHotelsopen 14Hotelsunderdevelopment 0Roomsopen&underdev. 1,468
MiddleEastHotelsopen 1Hotelsunderdevelopment 2Roomsopen&underdev. 564
Project Goals
Page8 StrataHadoopWord|NewYorkCity|September29th,2016
• Business Drivers− SelfServiceReportingandAnalytics− Requirementsfornearreal-timeanalytics− SimplifyGovernance,ComplianceandAuditing− Bettersupportfornewapplications
• Technical Drivers− Unabletohandlevolume,velocity,andveracity− RetireLegacySystems− Difficulttofindskillset(Informix4GL)− SimplifyTechnologyStack
Key Design Tenets
• Separation of Compute and Storage• Independentlyscalecomputeandstorage• DataDemocratizationandGovernance• BringyourownCompute(BYOC)
• Lift and Shift between cloud provider(s) and On-premise
• HA / DR
• Open Source Stack
Page9
Separation of Compute and Storage
• Scale storage and compute independently (up or down)
• Shifts bottleneck from Disk IO to Network
• Centralized Data Storage • Write once & read everywhere• Data Democratization
• Easier Hardware upgrade paths
• Flexibile ArchitecturePage10
Storage
Servers
BYOC (Bring Your Own Cluster)
• Eliminates the need for very large clusters
• Easier to administer and maintain
• Reduces multi-tenancy issues
• Clusters can be upgraded independently
• Enables on-demand computing
• Lower costsPage11
MarketingCluster
CentralizedStorage
PersonalizationCluster
MainCluster
Platform Architecture – Data Ingestion Layer
• DB Ingestor
• Stream Ingestor− KafkaandSparkStreaming
• File Ingestor
• FTP / SFTP / Logs
• Ingestion using Service APIPage13
Platform Architecture – Data Processing Layer
• Storage layer carved into logical buckets• Landing, Raw, Derived and Delivery• Schema stored with data (no guesswork)
• Platform Jobs for• Converting text to Parquet• Saving streaming data Parquet• Derivatives• Compaction• Standardization
Page14
Platform Architecture – Data Delivery Layer
• Data Delivery • SQL - Spark Thrift Server / Impala
• Tableau, SQL IDE, Applications• SparkR
• Self Service • Derivatives
• Represented Via SQL on Delivery Layer• Stored in Derived Storage Layer • Metadata driven
• Derived Layer Generators• Long running Spark Job• Derivative Refresh
Page15
Implementation
• CDH Cloud ready-ness• Cloudera Director Limitations• Multi-Availability zone, regions
• Spark Thrift Server• Support• Performance Tuning• Concurrency, partition strategy• Cache Tables
• Security• Sentry Integration• Kerberos Ticket Renewal• Navigator Integration
Page16
Implementation
• Rapidly Changing Technology• Feature addition• Documentation• Bugs• Jar hell
• Compression Codec for Parquet
• S3 Eventual Consistency
• Small files • Performance Issues• Compaction
Page17
Implementation
• Partition Strategy• Parquet Files
• Balancing parallelism and throughput• Table Partitions
• Cluster sizing, optimization and tuning
• Integrating with Corporate infrastructure• Deployment practices• Monitoring and Alerting• Information Security Policies
Page18
Value Add
Enabling predictive analytics and real-time decisions
Integrated Scorecards – Daily /Weekly / Monthly Insights Near Real Time / Hourly / Daily Insights
Multivariate Testing, APT (Test vs. Control Analysis), and Text Analytics
Testing for Both Hotel and Customer / Research For Guest Insights
Personalized Display Ad Serving Real-time Actions (Machine Learning) Across Guest Touch Points
Hotel Lifecycle Data Real-time Alerts for Hotel Related Actions
StrataHadoopWorld|NewYorkCity|September29th,2016Page19
• One of the fastest growing big data companies
• Extensive experience in providing strategic and architectural consulting on Big
Data platforms and implementations
• Global delivery experience across multiple locations in US, Asia and Latin
America
• 100+ big data experts worldwide - US, Latin America and Asia
B A C K G R O U N D
C L A I R V O Y A N T S O F T . C O M
CLAIRVOYANT
A W A R D S & R E C O G N I T I O N
Questions
StrataHadoopWord|NewYorkCity|September29th,2016Page21
Principal @ Clairvoyant Email: [email protected]: https://www.linkedin.com/in/avinashramineni