parallel processing of large scale genomic data candidacy examination 08/26/2014 mucahid kutlu
TRANSCRIPT
![Page 1: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/1.jpg)
PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA
Candidacy Examination08/26/2014
Mucahid Kutlu
![Page 2: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/2.jpg)
MotivationThe sequencing costs are decreasing Big data problem
Candidacy Examination 2
*Adapted from genome.gov/sequencingcosts *Adapted from https://www.nlm.nih.gov/about/2015CJ.html
Parallel processing is inevitable!
![Page 3: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/3.jpg)
Typical Analysis on Genomic Data
• Single Nucleotide Polymorphism (SNP) calling
Candidacy Examination 3
Sequences 1 2 3 4 5 6 7 8Read-1 A G C GRead-2 G C G GRead-3 G C G T ARead-4 C G T T C C
Alig
nmen
t File
-1
Reference A G C G T A C C
Sequences 1 2 3 4 5 6 7 8
Read-1 A G A G
Read-2 A G A G T
Read-3 G A G T
Read-4 G T T C CAlig
nmen
t File
-2
*Adapted from Wikipedia
A single SNP may cause Mendelian disease!
✖ ✓✖
![Page 4: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/4.jpg)
Existing Solutions for Implementation
• Serial tools– SamTools, VCFTools, BedTools – File merging, sorting etc.– VarScan – SNP calling
• Parallel implementations– Turboblast, searching local alignments, – SEAL, read mapping and duplicate removal– Biodoop, statistical analysis
• Middleware Systems– Hadoop
• Not designed for specific needs of genetic data• Limited programmability
– Genome Analysis Tool Kit (GATK)• Designed for genetic data processing• Provides special data traversal patterns• Limited parallelization for some of its tools
Candidacy Examination 4
![Page 5: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/5.jpg)
Main Goal of My Thesis
Candidacy Examination 5
• We want to develop middleware systems– Specific for parallel genetic data processing– Allow parallelization of a variety of genetic algorithms– Be able to work with different popular genetic data
formats – Eases programming since most developers are biologists,
not computer scientists
![Page 6: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/6.jpg)
Papers During My PhD Study• Mucahid Kutlu, Gagan Agrawal. Cluster-based SNP Calling on Large-Scale
Genome Sequencing Data, the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2014) (Accepted - 19.1% acceptance rate)
• -Mucahid Kutlu, Gagan Agrawal, PAGE: A Framework for Easy PArallelization of GEnomic Applications,the 28th IEEE International Parallel & Distributed Process- ing Symposium (IPDPS 2014) (Accepted - 21.1% acceptance rate)
• -Mucahid Kutlu, Gagan Agrawal and Oguz Kurt, "Fault tolerant parallel data-intensive algorithms," High Performance Computing (HiPC), 2012 (25.1 % acceptance rate)
• -Mucahid Kutlu, Gagan Agrawal and Oguz Kurt, "Fault tolerant parallel data-intensive algorithms", High Performance and Distributed Computing (HPDC), 2012 (poster paper)
• RE-PAGE: Domain-Specific REplication and PArallel Processing of GEnomic Applications (to be submitted)
Candidacy Examination 6
![Page 7: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/7.jpg)
Outline
• Motivation & Background• Current Work– PAGE: A Framework for Easy PArallelization of GEnomic
Applications– RE-PAGE: Domain-Specific REplication and PArallel
Processing of GEnomic Applications
• Future Work
Candidacy Examination 7
![Page 8: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/8.jpg)
Our Work
• PAGE: A Map-Reduce-like middleware for easy parallelization of genomic applications
• Mappers and reducers are executable programs– Allows us to exploit existing applications– No restriction on programming language
Candidacy Examination 8
![Page 9: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/9.jpg)
File-mFile-2File-1
Map
Reduce
Region-1
Map
Region-n
Intra-dependent Processing
Candidacy Examination 9
O-11
O-1n
Output-1
Map
Reduce
Region-1
Map
Region-n
O-m1
O-mn
Output-m
• Each file is processed independently
![Page 10: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/10.jpg)
Map O1
Ok
On
Reduce Output
Region-1
Input Files
Map
Region-k
Map
Region-n
Inter-dependent Processing• Each map task processes a particular region of ALL files
Candidacy Examination 10
![Page 11: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/11.jpg)
Data Partitioning• Data is NOT packaged into equal-size data blocks as in
Hadoop– Each application has a different way of reading the data– Equal-size data block packaging ignores nucleotide base
location information
• Genome structure is divided into regions and each map task is assigned for a region.– Takes account location information– The map task is responsible of accessing particular region of
the input files• It is a common feature for many genomic tools (GATK, SamTools)
Candidacy Examination 11
![Page 12: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/12.jpg)
Genome Partition
• PAGE provides two data partitioning methods– By-locus partitioning: Chromosomes are divided into
regions
– By-chromosome partitioning: Chromosomes preserve their unity
Candidacy Examination 12
Chr-1 Chr-2 Chr-3 Chr-4 Chr-5 Chr-6
Chr-1 Chr-2 Chr-3 Chr-4 Chr-5 Chr-6
![Page 13: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/13.jpg)
Challenges
• Load Imbalance due to nature of genomic data– It is not just an array of
A, G, C and T characters
• High overhead of tasks
• I/O contention
Candidacy Examination 13
1 3 4
Coverage Variance
13
![Page 14: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/14.jpg)
Task Scheduling
Static • Each processor is responsible of regions with equal length.• All map tasks should finish before the execution of reduce
tasks.
Dynamic• Map & reduce tasks are assigned by a master process• Reduce tasks can start if there are enough available
intermediate results.
Candidacy Examination 14
PAGE provides two types of scheduling schemes.
![Page 15: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/15.jpg)
Sample Application Development with PAGE
• Serial execution command of VarScan Software– samtools mpileup –b file_list -f reference | java -jar VarScan.jar mpileup2snp
• To parallelize VarScan with PAGE, user needs to define:– Genome Partition: By-Locus– Scheduling Scheme: Dynamic (or Static)– Execution Model: Inter-dependent– Map command: samtools mpileup –b file_list -r regionloc -f
reference | java -jar VarScan.jar mpileup2snp >outputloc– Reduction : cat bash shell command
Candidacy Examination 15
![Page 16: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/16.jpg)
Experiments
• Experimental Setup– In our cluster
• Each node has 12 GB memory• 8 cores (2.53 GHz)
– We obtained the data from 1000 Human Genome Project– We evaluated PAGE with 4 applications
• VarScan: SNP detection• Realigner Target Creator: Detects insertion/deletions in
alignment files• Indel Realigner: Applies local realignment to improve quality
of alignment files• Unified Genotyper: SNP detection
Candidacy Examination 16
![Page 17: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/17.jpg)
Comparison with GATK
Candidacy Examination 17
Scalability Data Size Impact
- Unified Genotyper tool of GATK
10.9x 12.8x
Data Size: 34 GB # of cores: 128
![Page 18: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/18.jpg)
Scalability Data Size Impact
- VarScan Application
6.9x 12.7x
Comparison with Hadoop Streaming
Candidacy Examination 18
Data Size: 52 GB # of cores: 128
![Page 19: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/19.jpg)
Outline
• Motivation & Background• Current Work– PAGE: A Framework for Easy PArallelization of GEnomic
Applications– RE-PAGE: Domain-Specific REplication and PArallel
Processing of GEnomic Applications
• Future Work
Candidacy Examination 19
![Page 20: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/20.jpg)
RE-PAGE: Domain-Specific REplication and PArallel Processing of GEnomic Applications
• In this study, we improve our middleware PAGE from several aspects
• Main goal: Less I/O contention• Main approach: – Utilizing distributed disks– Intelligent replication technique– Scheduling scheme that minimizes network traffic
Candidacy Examination 20
![Page 21: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/21.jpg)
Execution Model
Candidacy Examination 21
![Page 22: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/22.jpg)
Allowing Remote Processing or Not?
Candidacy Examination 22
Advantages Disadvantages
As number of nodes increases, network traffic will increase
Data transfer will be more effective as computation becomes more data intensive
Data transfering can be problematic for large scale data
Better workload balance
![Page 23: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/23.jpg)
Proposed Scheduling Schemes• General idea: Replicate data and prohibit remote processing
– Replication will increase number of local tasks for nodes and be useful to decrease workload imbalance
• Data chunks can have varying sizes and varying replication factors• Master & worker approach• We propose 3 scheduling schemes
– Factoring – Help the busiest node (HBN)– Effective memory management (EMM)
Candidacy Examination 23
FactoringHBNEMM
![Page 24: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/24.jpg)
Proposed Replication Method
• Replicating all chunks into all nodes is not feasible.
• Depending on the analysis we want to perform, some genomic regions can be more important than others for the target analysis.
• General Idea: Replicate important regions more than others.
Candidacy Examination 24
![Page 25: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/25.jpg)
Replication & Distribution
Candidacy Examination 25
![Page 26: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/26.jpg)
Scheduling Scheme Evaluation
Candidacy Examination 26
• Works on real data• 32 nodes (256 cores) • 20 BAM files (21 GB)
• All 3 scheduling schemes are better than random scheduling
• Factoring is the best among all for all experiments
![Page 27: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/27.jpg)
Work Stealing vs. Our Approach
• Synthetic application• Fixed data chunk size,
varying execution time• Performance comparison is
shown: Work Stealing / Our approach
• As processing becomes more data intensive, our approach gives better results!
Candidacy Examination 27
![Page 28: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/28.jpg)
Data Size Impact
Candidacy Examination 28
+%3
+%7
+%4
-%1
• Unified Genotyper• 32 nodes (256 cores)• As data size increases, WS-3
becomes better than WS-1• As data size increases, RE-
PAGE becomes better than WS-3
![Page 29: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/29.jpg)
Candidacy Examination 29
4.2x 7.1x
2.2x
9.9x
Scalability Evaluation
Coverage Analyzer Unified Genotyper
![Page 30: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/30.jpg)
Outline
• Motivation & Background• Current Work– PAGE: A Framework for Easy PArallelization of GEnomic
Applications– RE-PAGE: Domain-Specific REplication and PArallel
Processing of GEnomic Applications
• Future Work
Candidacy Examination 30
![Page 31: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/31.jpg)
Future Work
• An API to Develop Parallel Genomic Applications for Memory Constraint Architectures
• Processing Compressed Genomic Data
Candidacy Examination 31
![Page 32: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/32.jpg)
API for Memory Constraint Architectures
• We employed CPUs so far
• Co-processors can be also useful for genomic applications
• The trend in computing technologies– More cores, smaller memory– Intel Many Integrated Core (MIC) architecture
Candidacy Examination 32
![Page 33: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/33.jpg)
Proposed Work
• An API which helps user implement parallel genomic applications with memory constraint architectures
• In this work, executables are not used, the developer needs to write map-reduce functions with C programming language
• The middleware helps the developer in 3 ways– Data reading from BAM and Fasta files– Memory utilization– Parallel execution and task scheduling
Candidacy Examination 33
![Page 34: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/34.jpg)
Execution Flow
Candidacy Examination 34
Input Data
Compressed Data
Intermediate Result
Compress Map
Reduce
Input Data
Compressed Data
Intermediate Result
Compress Map
Result
![Page 35: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/35.jpg)
Data Reading
• The middleware reads the data from files and generates genome matrices which are compressed inputs of map tasks.
• The genome matrix can be in two types– Sequence Based: Each row keeps a sequence – Location Based: Keeps the data in mpileup format. Each
row of the matrix keeps information for a different location
Candidacy Examination 35
![Page 36: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/36.jpg)
Genome Matrices
Sequence Based Location Based
Candidacy Examination 36
![Page 37: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/37.jpg)
Optimization of Memory Utilization
• In order to decrease memory usage, we apply two techniques:– Selective Loading– Transparent Compression
Candidacy Examination 37
![Page 38: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/38.jpg)
Selective Loading
• Each read-sequence in Sam/Bam files consist of 11 mandatory and 1 alternative sections – Sequence ID, location, base sequences, strand and others
• For many applications, we do not need all of them.– For counting bases, sequence ids can be ignored
• We load the parts only we need
Candidacy Examination 38
![Page 39: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/39.jpg)
Transparent Compression
• Main Idea: The genome matrices keep the data in compressed format but the developer can access the data with our API as it is uncompressed.
• Compression Technique: Will be investigated
Candidacy Examination 39
![Page 40: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/40.jpg)
Sample Map Taskvoid* map_coveragedepth( location_based_genome_matrix gm){ int i,j,position, indelLength, char* sequence; reduce_object *total; for(i=0;i<gm.number_of_results;i++) { position = getPosition_from_lbgm(gm.code[i],selected_parts) chromosome = get_chromosome_from_lbgm(gm.code[i],selected_parts); for(j=0;j<gm.num_samples;j++) { sequence = get_base_sequence_for_sample_n(gm->code[i], selected_parts, gm.num_samples,j ); count_num_bases(sequence); add_results_to_reduce_object(total, position, chromosome, sequence); } } return (void*)total;}
Input genome matrix
Reduce objectMethods we provide
Candidacy Examination 40
![Page 41: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/41.jpg)
Open Questions
• How to schedule map and reduce tasks?
• How to keep the intermediate results in memory?– Location based genome matrix structure is useful to
decrease the intermediate results.• No need iterative computation for many applications (e.g.
SNP calling)• Reduction is just concatenation of the intermediate results.
So they can be written to the disks as they are produced.
Candidacy Examination 41
![Page 42: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/42.jpg)
A middleware for processing compressed genomic data
• Compression is useful for archiving concern, however, it decreases the performance
• There are enormous amount of compression method for genomic data– No need to another compression method
• Our goal: A middleware that helps users to process compressed data without fully decompressing it.
Candidacy Examination 42
![Page 43: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/43.jpg)
Execution Model
Candidacy Examination 43
![Page 44: PARALLEL PROCESSING OF LARGE SCALE GENOMIC DATA Candidacy Examination 08/26/2014 Mucahid Kutlu](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649eaa5503460f94baf2d0/html5/thumbnails/44.jpg)
Candidacy Examination 44
THANKS!