Download - Shaping the Future of Travel with MongoDB
Shaping the Future of Travel with
MongoDB
© 2
014 A
madeus IT G
roup S
A
Real-Time Analytics with MongoDB
Attila Tozser, Amadeus Data Processing GmbH, 12.11.2014.
Munich MongoDB Days
Page 2
Agenda
© 2
014 A
madeus IT G
roup S
A
_ Amadeus IT Group
_DevOps considerations
_ Real-time analytics
_Deployment customization
1
2
3
4
Amadeus IT Group and Amadeus Global Operations
1
© 2
014 A
madeus IT G
roup S
A
Amadeus, the leading technology provider for the travel industry
Transaction-based business model:
Volume driven, highly resilient and profitable
Key global player in the c. €60bn growing
travel and technology market
Two highly synergistic and profitable
businesses
Loyal customer base: Long term contracts
and over 90% recurring revenuesStrong cash flow generation profile
Consumers/General public
Corporate travel
departments
Online and offline travel
agencies
IT SolutionsProvision of IT solutions to travel providers
Travel
agencies
DistributionProvision of indirect distribution services
Travel
buyers
Airlines
Hotels
Railway operators
Car rental
Tour operators
Cruise and ferries
Insurance companies
Travel providers
Strong barriers to entry
Page 5
26%
40%40%
26%
30%
29%
22%
24%
26%
28%
30%
32%
34%
36%
38%
40%
42%
44%Amadeus Travelport Sabre
-1 pp
-14 pp
+14 pp
Estimated air market share gain (2000-2013)
Source: Numbers of travel agency air bookings according to Company estimates. Excludes air bookings made through in-house or single country operators, primarily in China, Japan, South Korea and Russia. Where competitors have merged in past, combined totals shown pre merger. 4th competitor with market share c.5% not shown
Leading GDS globally Well positioned in fast growing emerging markets
1. Global leader in Distribution, having steadily gained market share with travel agencies …
Page 6
Operations excellence
1.9+ billiontransactions per day (peak)
30+Petabytes*
storage
500+application
software loadsper month
6,000+IT infrastructure
changesper month
<0.5 secaverage systems response time
3.9+ millionnet bookings
per day (peak)
31+ billionSQL executions
per day
30,000+transactions per second
(peak)
12,000+physical IT
infrastructure devices
* Kilo -> Mega -> Giga -> Tera -> Peta = 1015= 1.000.000.000.000.000
Page 7
_ Exterior walls/roof is 1m thick steel reinforced concrete with no windows
_ External doors are 50cm thick solid steel
_ On-site 7*24 staffed security, with video cameras in- and outside
_ External power supply from two different substations
_ On-site well for cooling water supply
_ Separation by function
_ Underground diesel supply for days of operation
Data Centre infrastructure
High Availability Systems
_ Highly resilient architecture
_ Redundancy in all critical areas – minimum (n+1) concept
_ No single point of failure
Page 8
_ 810+ employees working in Global Operations
_ 580+ employees in the data center in Erding
_ 50+ nationalities
Global operations
DevOps considerations2
© 2
014 A
madeus IT G
roup S
A
Page 10
Operational Requirements
1
2
3
4
Operational
Ready
Service
_ Security:
• Centralized Access control
• Encryption
• Auditing
_ Operability• Monitoring and
alerting, backup restore, refresh
• Automated interaction possibilities
• Easy Integration with the tooling already in place
• Online changes without downtime
• No SPOF
_Deployability
• Easy planning
• Automated fast deployment
_Capacity management
• Easy scaling
• Optimal and predictable performance
Page 11
© 2
014 A
madeus IT G
roup S
A
MongoDB Security
_ Centralized authentication
_ Authorization is MongoDB internal
_ Access Control
_ Encryption
_Audit
_ Data encryption at rest only with 3rd
party tools
_ Channel encryption out of the box (with enterprise)
_ Built-in, auditing meets the compliancy needs.
Page 12
© 2
014 A
madeus IT G
roup S
A
_MMS backup API:• Snapshots• Backups• Restores
_ Operational tooling integration with mcollective
• Stop• Start• Stepdown• Compact
_Monitoring integration
• MMS API
_ Service continuity• Rolling upgrades• Rolling HW/SW
Maintenance
_ No SPOF• MMS
• Automation tools
• Operational tools
• HW,SW
MongoDB Operability
Page 13
© 2
014 A
madeus IT G
roup S
A
_ Deployability1. Translate the
requirements to actual HW sizing
2. Puppet does the magic (from the point we
got the HW)3. Automatic connection
to operational tools
_Capacity management• Initial sizing
• Deployment through automation
• KPIs:• OS metrics: IO, NW, CPU
• MongoDB metrics
• Scaling• Linear behaviour
• Predictable HW requirements
Capacity planning
ProvisionScale Measure
Plan
Detailed analysis / planning
Automated / deployment
Operational ownership handover
21 3
Page 14
© 2
014 A
madeus IT G
roup S
A
Plan
_ Deployment flow considerations:
1. Configuring hundreds of processes without automation is not economical
1. Easy to start as standalone for experimenting
2. Hard to maintain big setups in a reasonable manner
2. The available automation tools (Puppet forge module for example) are not supporting the most valuable features:
1. SSL setup
2. Kerberos setup
3. MMS Backup setup
4. MMS Monitoring setup
3. Connecting Other Enterprise Systems without APIs is hard �use MMS
Real-time analytics with MongoDB
3
© 2
014 A
madeus IT G
roup S
A
Page 16
© 2
014 A
madeus IT G
roup S
A
No indexing
Scanning the data
Efficient scanning and concurrencyhandling capabilities needed
Few and heavy queries
(CPU intensive)
Adhoc queries
Low or limited responsetime
Complex computation
Real-Time Analytics
Behaviour Implications
Does not sound like a perfect fitfor MongoDB recommendations
Page 17
Run 1 mongod / server
Use simple documents
2
© 2
014 A
madeus IT G
roup S
A
Recommendations :
We do not know what to index (adhoc queries)
Need of complex documents to store everything in 1 collection. That supports the analytical approach. (Aggregation works on one collection)
No concurrency for few big queries: They starvewhile the HW is idling.
Use indexes
1
3
Page 18
© 2
014 A
madeus IT G
roup S
A
_Able to fit to the purpose
_Ease of development
• Dynamic schema
• Rich document structure
• Rich query language
_Ease of operations
• No need for special HW
• No shared storage
• Runs on standard x86
• Good security features
• SSL channel encryption
• Kerberos, LDAP auth
• Configurable clustering features
• Replication
• Sharding
• Linear scalability
_Aggregation framework
• Provides the analytical functionality
• Improvements in 2.6:
• Cursor
• $out
Advantages of MongoDB
Page 19
© 2
014 A
madeus IT G
roup S
A
_100% in memory (6 physical nodes)
• Dark blue: with soft pagefaults
Linear scalability
0
2
4
6
8
10
12
0 200 400 600 800 1000 1200
Tim
e (s)
data size (GB)
Full scan
0
50
100
150
200
250
300
350
400
450
0 200 400 600 800 1000 1200
Tim
e (s)
data size (GB)
Full scan with aggregation,
computation heavy workload
MongoDB Deployment customization
4
© 2
014 A
madeus IT G
roup S
A
Page 21
1 query = 1 thread / 1 mongod but 1 query = 24 threads / 24 mongod
© 2
014 A
madeus IT G
roup S
A
_MongoDB query routing
• Each query is represented by onethread on the database side
_Modern HW architecture
• The clock speed does not increase that fast
• Many cores : 8-16
• Many sockets : 2-4
_What we need is efficiency:• We have 2 concurrent queries
• And 4 sockets and 12 cores each
• We have two choices:
• Run one mongod and use only 2 cores from the 48
• Run 24 mongod and use all cores from all CPU-s
Mircrosharding
Page 22
© 2
014 A
madeus IT G
roup S
A
_ Impact of microsharding (6 physical nodes 1 TB data 100% in memory)
Results:
0
50
100
150
200
250
300
350
400
0 20 40 60
tim
e
shards
Full scan
0
200
400
600
800
1000
1200
1400
1600
1800
0 10 20 30 40 50 60
tim
e
shards
Full scan with aggregation
Page 23
Primaries (with two replica)
© 2
014 A
madeus IT G
roup S
A
Alignment
_Span shards on the available nodes in a way that distributes the primaries and secondaries equally (use priorities to ensure the distribution). Put primaries on each node.
Node1
Shard1
Shard2
Shard3
Node2 Node3 Node4 Node5
Shard4
Shard5
Page 24
Secondaries
© 2
014 A
madeus IT G
roup S
A
Alignment
_Primaries collocated on a given node, should replicate to different other nodes, to support smooth failover (use priorities to ensure the distribution)
Node1
Shard1
Shard2
Shard3
Node2 Node3 Node4 Node5
Shard4
Shard5
Page 25
NUMA : Non-uniform memory access
© 2
014 A
madeus IT G
roup S
A
_Numactl –-hardware
_Only a portion of the memory is handled by a given core
_Remote access is slower
• 20-30% latency � throughput
_The recommendation is to interleave
• Tailored for 1 mongod
• Provides consistent speed of memory access
• This speed is not the optimal
_Pin the processes to given NUMA nodes (CPU , memory)
• Restrict the CPU access of secondary processes
• Tipp: The NUMA settings are handled by the init scripts coming with the default packages. Easy to change it there
NUMA settings
Page 26
The two types of processes: CPU-bound, IO-bound
© 2
014 A
madeus IT G
roup S
A
_Even if there is IO contention time-to-time, mongod is a CPU-bound type process
• Lots of small IO operations
• No data locality = no streaming IO
• Latency matters at the end
• Mostly Random access
_Known optimizations:
• Small readahead
• THP should be off
_By Default Linux systems are prepared for the IO-bound case
• Configure the task scheduler: set 8-10X times bigger than the default
• kernel.sched_min_granularity_ns
• kernel.sched_migration_cost
Use /proc/sched_debug to find the optimal settings:
Kernel tuning
Page 27
Query 2 : progress at 70%
Query 2 : progress at 0%
Control groups helps when data does not fit into memory
© 2
014 A
madeus IT G
roup S
A
_MongoDB uses mmap to cache data in memory
• No good influence on the caching
• Due to LRU works as a FIFO queue in this case
_Example:
• 1., We have 200GB data and 100GB memory
• Or
• 2., 200GB data and 1GB memory
• The scanning speed is the same
_With cgroups the first case could be 40-50% faster.
Cgroups
Query 1 : progress at 100%
In cachePaging GAP
Query 1 : progress at 100%
Query 2 : progress at 20%
In cacheIn cache Paging GAP
Query 1 : progress at 100%
In cache
_50% memory 2 subsequent queries
_100% paged in and out
1
2
3
Page 28
Q 1
© 2
014 A
madeus IT G
roup S
A
_ Using many shards instead of one divides the work to smaller chunks
_ Define a high memory and a low memory cgroup and assign the shards to them
_ 40% served from memory 60% from disk
_ The analogy can be applied for many tiers
• Memory -> SSD -> spinning disk
Cgroups
Query 1 : progress at 100%
Control groups helps when data does not fit into memory
Q 1 Q 1 Q 1 Q 1
Q 1
Query 1 : progress at 100%
Q 1 Q 1 Q 1 Q 1
In cache In cache
• High memory cgroup
• All served from memory
• Low memory cgroup
• All served from disk
Page 29
© 2
014 A
madeus IT G
roup S
A
_MongoDB stores the data in BSON structure• The BSON format is well compressable
• MongoDB does not support (currently ) compression at storage level
• The data directly mapped to memory (mmap)
_Let us take the previous example: 200GB data, 100GB memory• Assuming that the data is able to be compressed to 33% we will serve
everything from memory
_Needs fine tuning of swappiness (can go along with cgroupsbased tiering)
ZRAM, ZSWAPCompressed in memory swap
Q 1
Query 1 : progress at 100%
Q 1 Q 1 Q 1 Q 1
In memory In memory
• High memory cgroup
• All served from memory
• Low memory cgroup
• All served from memory/ZRAM
In ZRAMIn ZRAMIn ZRAM
Page 30
What else brings the most value? (these can bring 50% more)
© 2
014 A
madeus IT G
roup S
A
The resolution is a highly customized environment
NUMA
Kernel tuning
Alignment
Cgroups
_ “Microsharding” (having two threads instead of one, halves the response time)
_ Let us have as many shards as cores (as many threads / query)
Page 31
MongoDB is a good fit for real-time analytics
© 2
014 A
madeus IT G
roup S
A
_MongoDB is compliant with the enterprise requirements
_MongoDB has sufficient querying capabilities to run analytical workload
_Some customization is required compared to the general recommendations
• Cluster alignment
• OS configuration
_The HW utilization can be efficient on any type of HW
_The support is aware of the possible optimization steps
_Upcoming features can bring a big boost for this type of usage:
• Pluggable storage
• Index intersection for many indices
Conclusions
Thank you
© 2
014 A
madeus IT G
roup S
A
You can follow us on:
AmadeusITGroup
amadeus.com/blogamadeus.com