empowering congress with data-driven analytics (bdt304) | aws re:invent 2013
DESCRIPTION
MACPAC is a federal legislative branch agency tasked with reviewing state and federal Medicaid and Children's Health Insurance Program (CHIP) access and payment policies and making recommendations to Congress. By March 15 and again by June 15 each year, the agency produces a comprehensive report for Congress that compiles results from Medicaid and CHIP data sources for the 50 states and territories. The CIO of MACPAC wanted a secure, cost-effective, high performance platform that met their needs to crunch this large amount of health data. In this session, learn how MACPAC and 8KMiles helped set up the agency’s Big Data/HPC analytics platform on AWS using SAS analytics software.TRANSCRIPT
![Page 1: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/1.jpg)
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Empowering Congress with Data-Driven Analytics
Mathew Chase, November 13, 2013
Sri Vasireddy,
![Page 2: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/2.jpg)
• A small federal legislative branch agency • Newly established in late 2010 • Going beyond the “Cloud First” goal to “Cloud Only”
![Page 3: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/3.jpg)
Hello
• Mathew Chase • Federal CIO • Over 20 years experience in the
public and private sectors leading technology operations
![Page 4: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/4.jpg)
Who are you?
• Government • Health care industry • Cloud newbies • AWS ninjas
• Whoops… wrong session
![Page 5: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/5.jpg)
Question?
How many of you are using AWS as your primary
computing datacenter?
![Page 6: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/6.jpg)
MACPAC’s AWS Datacenter
• AWS to replace an onsite or hosted datacenter
• Single primary region with cold recovery on the the other coast
• Multiple AZs for redundancy • Separate VPCs for security “air gaps”
![Page 7: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/7.jpg)
MACPAC: the “perfect” cloud customer
• Predicable work cycles • Two intense work periods (annual)
• Growing with an undefined future • Potential need for more computing
resources • Very cost conscious • No legacy infrastructure
![Page 8: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/8.jpg)
What we achieved in the cloud
• > 40% reduction in capital expenses – With additional savings in rent, utilities, and labor
• Cost spread over typical equipment lifespan • On demand storage and archiving • Zero over provisioning • Ability to expand and contract resources at will
![Page 9: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/9.jpg)
Core focus
Recommendations to Congress on Medicaid and the Children’s Health
Insurance Program
![Page 10: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/10.jpg)
Reports to the Congress
Reports due by: • March 15th & • June 15th
www.MACPAC.gov/reports
![Page 11: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/11.jpg)
Research backed by analytics
• Analyze Medicaid program data • Find intersections with Medicare • Evaluate Medicaid survey information
![Page 12: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/12.jpg)
Tools
• SAS Office Analytics enterprise platform • Red Hat Enterprise Linux x64 • Amazon EC2
![Page 13: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/13.jpg)
Concerns
1. Security 2. Performance
![Page 14: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/14.jpg)
Security
![Page 15: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/15.jpg)
Security Requirements
• Multi-user controlled environment • Isolated environment with strong controls • No sensitive and personal data sitting at
periphery • Data encrypted at rest and in transit
![Page 16: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/16.jpg)
Access Protection Challenge
• Twenty Instances • Twenty Ports for AD • 20 x 20 = 400 Rules
![Page 17: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/17.jpg)
AD Security Group
DNS SecurityGroup
Infra Security Group
Client Instances
DNS-1
AD-1 AD-2
DNS-2 Accept DNS queries from ‘Infra’ group
Accept AD related requests from ‘Infra’ group
Access Control Using Security Groups
Accept DNS queries from AD group
![Page 18: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/18.jpg)
Encrypted Data flow
![Page 19: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/19.jpg)
Cloud Security Design
![Page 20: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/20.jpg)
Performance
![Page 21: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/21.jpg)
SAS Requirements
• Very IO intensive • Sequential read and writes
o 35-70mb/sec per core of IO desired o GOAL: 4 core system = ~200mb /sec IO
![Page 22: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/22.jpg)
Base AWS Structure
• M3 extra large running RHEL x64 for cluster o 1 TB EBS RAID 10 for primary data (4, 500gb drives) o 1 TB EBS RAID 0 for temp work space (4, 256gb drives) o 1 TB EBS LUKS encrypted RAID 0 for ETL (4, 256gb drives)
![Page 23: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/23.jpg)
Can AWS yield the necessary performance?
![Page 24: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/24.jpg)
“These go to eleven!” In the immortal words of Spinal Tap:
![Page 25: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/25.jpg)
Turning up the AWS dial
![Page 26: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/26.jpg)
Volume @ 3
Specifications M3 extra large
4 – 256gb EBS Disks
RAID 0 Stripe
![Page 27: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/27.jpg)
fio Sequential Read @ 3 [ec2-user]# fio sastest.fio
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.1.2
Starting 1 process
Jobs: 1 (f=1)
job1: (groupid=0, jobs=1): err= 0: pid=31661: Sun Oct 27 23:07:18 2013
read : io=102400KB, bw=77167KB/s, iops=19291, runt= 1327msec
clat (usec): min=3, max=25911, avg=44.70, stdev=572.02
lat (usec): min=5, max=25913, avg=46.86, stdev=572.02
Run status group 0 (all jobs):
READ: io=102400KB, aggrb=77166KB/s, minb=77166KB/s, maxb=77166KB/s, mint=1327msec, maxt=1327msec
77,166 KB/s
![Page 28: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/28.jpg)
Volume @ 10
Specifications M3 extra large
4 – 256gb EBS Disks
4000 iops per drive
RAID 0 Stripe
![Page 29: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/29.jpg)
fio Sequential Read @ 10 [ec2-user]$ fio sastest.fio
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.1.2
Starting 1 process
job1: (groupid=0, jobs=1): err= 0: pid=2731: Tue Nov 5 22:55:33 2013
read : io=102400KB, bw=191402KB/s, iops=47850, runt= 535msec
clat (usec): min=3, max=51820, avg=13.29, stdev=337.22
lat (usec): min=4, max=51821, avg=15.52, stdev=337.21
Run status group 0 (all jobs):
READ: io=102400KB, aggrb=191401KB/s, minb=191401KB/s, maxb=191401KB/s, mint=535msec, maxt=535msec
191,401 KB/s
![Page 30: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/30.jpg)
“If we need that extra push over the cliff. You know what we do?”
“11! Exactly.” — Nigel
![Page 31: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/31.jpg)
fio Sequential Read @ 11 [ec2-user]$ fio sastest.fio
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.1.2
Starting 1 process
job1: (groupid=0, jobs=1): err= 0: pid=3133: Tue Nov 5 23:13:13 2013
read : io=102400KB, bw=432068KB/s, iops=108016, runt= 237msec
clat (usec): min=0, max=1594, avg= 8.26, stdev=42.59
lat (usec): min=0, max=1594, avg= 8.38, stdev=42.59
Run status group 0 (all jobs):
READ: io=102400KB, aggrb=432067KB/s, minb=432067KB/s, maxb=432067KB/s, mint=237msec, maxt=237msec
432,067 KB/s
![Page 32: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/32.jpg)
Volume @ 11
Specifications 4 – 256gb EBS Disks
4000 iops per drive
RAID 0 Stripe
cg1.4xlarge (10gb io channel)
![Page 33: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/33.jpg)
fio Sequential Read @ 11 [ec2-user]$ fio sastest.fio
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.1.2
Starting 1 process
job1: (groupid=0, jobs=1): err= 0: pid=3133: Tue Nov 5 23:13:13 2013
read : io=102400KB, bw=432068KB/s, iops=108016, runt= 237msec
clat (usec): min=0, max=1594, avg= 8.26, stdev=42.59
lat (usec): min=0, max=1594, avg= 8.38, stdev=42.59
Run status group 0 (all jobs):
READ: io=102400KB, aggrb=432067KB/s, minb=432067KB/s, maxb=432067KB/s, mint=237msec, maxt=237msec
432,067 KB/s
![Page 34: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/34.jpg)
I am pretty sure I can make the dial go higher
Ram Disks Block sizes Larger stripes Application tuning Etc…
![Page 35: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/35.jpg)
WARNING!
• Be sure to touch all sectors of a new disk per AWS guidance prior to testing and production
$ dd if=/dev/md0 of=/dev/null
Command for Unix environments
![Page 36: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/36.jpg)
You are not alone…
• Guidance from software vendors • AWS professional services • Use an iterative process (Fail quickly) • Third party partners (8kMiles)
so get going!
![Page 37: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/37.jpg)
What did we learn?
• Make a decision • Start at zero… • Spend time really thinking about security • And then crank it up where you need it
“Try again. Fail again. Fail better.” Samuel Beckett, Worstward Ho (1983)
![Page 38: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/38.jpg)
References • Amazon EBS Volume Performance
– http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html
• AWS Microsoft Platform Security – http://media.amazonwebservices.com/AWS_Microsoft_Platform_Se
curity.pdf
• Benchmarking SAS I/O: Verifying I/O Performance Using fio – http://support.sas.com/resources/papers/proceedings13/479-
2013.pdf
• This is Spinal Tap (Movie, 1984, Rob Reiner - Director)
![Page 39: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/39.jpg)
Special Thanks to: 8kMiles, AWS, and SAS
And thank you for your time today.
![Page 41: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013](https://reader034.vdocuments.us/reader034/viewer/2022051209/547e9445b4af9fd3158b5714/html5/thumbnails/41.jpg)
Please give us your feedback on this presentation
As a thank you, we will select prize winners daily for completed surveys!
BDT304