8 mattwoodaws-intro-pdf-110411093115-phpapp01
DESCRIPTION
Matt Wood of AWS "Cloud Research"Europe April 2011 @ the Eagle Genomics SymposiumTRANSCRIPT
Cloud Research
Matt WoodT E C H N O L O G Y E V A N G E L I S T
Hello.
Text
Thank you.
The Cloud by Example
The Cloud by Example
Infrastructureservices
?
On demand
Pay as you go
Pay for what you use
Elastic capacity
Capacity
Time
Estimated demand
Capacity
Time
Estimated demand
Investment
Infrastructure
Capacity
Time
Realdemand
Infrastructure
Capacity
Time
Realdemand
Elasticcapacity
Agility
Faster to prototype
Faster to production
Undifferentiated heavy lifting
Tools for accelerating
research
0
75
150
225
300
Q4 2006Q4 2007
Q4 2008Q4 2009
Q4 2010
The Cloud by Example
Data management
Biomarker Warehousepre-clinical, clinical, 3rd party data and publications
!"#$%"&&'
!#%&$(%&&&'
!)*(%"&&'
+,'-./01'
23,3415'61789:1'
;<./5'=>?6@'
6178170' 6A.7341' B817-135'
Estimated cost: 10 TB warehouse over 3 years
Data processing
http://web.mit.edu/stardev/cluster/
http://cyclecomputing.comhttp://wiki.github.com/documentcloud/cloud-crowd
sudo gem install cloud-crowd
Input S3 bucket
Output S3 bucket
Amazon S3
Hadoop
Amazon EC2 Instances
Input dataset
outputresults
Deploy Application
Web Console, Command line tools
End
Notify
Get ResultsInput Data
Amazon Elastic MapReduce
Hadoop Hadoop
Hadoop
Hadoop
Hadoop
Elastic MapReduce
Elastic MapReduce
Preprocessed reads
Map: Bowtie
Sort: Bin and partition
Reduce: SoapSNP
Crossbow: Rapid whole genome SNP analysis
Langmead B, Schatz MC, Lin, J, Pop M, Salzberg SL. Genome Biol 10(11): R134.
CloudBurst
Catalog k-mers Collect seeds End-to-end alignment
http://cloudburst-bio.sourceforge.net; Bioinformatics 2009 25: 1363-1369
ASSEMBLING GENOMES
140 million 454 reads
Image: Ma) Wood
Map 100 million, 100 base paired end readsQuad core with 5 GB of RAM would take 16 days
30 high-memory instances; 32 hours; $195
BLAT @ U. PENN
HEAVY-ION COLLISIONS @ RHIC
Problem: Quark physics conference imminent but no compute resources handy
Solution: NIMBUS context broker allowed researchers to provision 300 nodes and get the simulations done
Collaboration
http://aws.amazon.com/publicdatasets/
Applications and platforms
Security
Shared responsibility
Requirement based access
Certification
ISO 27001+
SAS 70 Type II
PCI DSSLevel 1
Security organisation Employee lifecycle
Logical security Secure data handling
Physical security Environmental safeguards
Change management Incident handling
Data integrity Availability and redundancy
Control objectives
Data access control
Identity and access
Independent buildings Separate flood zonesGeographically
separated
Redundantpower
Redundant connectivity Highly monitored
Default deny firewall
Security groups
DDOSMan in the Middle
IP spoofing
Resource isolationVirtual Private Cloud
Amazon Web Services infrastructure
Secure VPN connection over the internet
VPN Gateway Router
Customer’s isolated AWS resources
Subnet 1 Subnet 2
Subnet 4Subnet 3
Customer’s network
Dedicated instancesVirtual Private Cloud
aws.amazon.com/security
Data stays local
aws.amazon.com
Thank you!