ramil mauleon: galaxy: bioinformatics for rice scientists
DESCRIPTION
Ramil Mauleon's talk at ICG8 in Shenzhen on: Galaxy: bioinformatics for rice scientists. November 1st, 2013TRANSCRIPT
![Page 1: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/1.jpg)
IRRI Genotyping Service Laboratory Galaxy: bioinformatics for rice scientists
Ramil P. Mauleon
Scientist – Bioinformatics Specialist
TT Chang Genetic Resources Center
International Rice Research Institute
ICG-8, Shenzhen, China
![Page 2: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/2.jpg)
Presented in behalf of my co-authors from IRRI
Lead Scientists
• Kenneth L. McNally – Genebank resequencing
• Nickolai Alexandrov – rice informatics consortium
• Michael Thomson – Genotyping Service Laboratory
• Hei Leung – Program Leader
Laboratory, software team
• Venice Margaret Juanillas
• Christine Jade Dilla-Ermita
![Page 3: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/3.jpg)
Outline
• Introduction to IRRI & it’s research agenda
• Bioinformatics support to molecular rice breeding at IRRI: IRRI GSL Galaxy
• Bioinformatics support to efforts for harnessing Rice Genetic Diversity
• Future activity: International Rice Informatics Consortium
![Page 4: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/4.jpg)
INTERNATIONAL RICE RESEARCH INSTITUTE Los Baños, Philippines
Mission:
Reduce poverty and hunger,
Improve the health of rice farmers and consumers,
Ensure environmental sustainability
All done through research, partnerships Home of the Rice Green Revolution
Established 1960 www.irri.org
Aims to help rice farmers improve the yield and quality of their rice by developing.. •New rice varieties •Rice crop management techniques
![Page 5: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/5.jpg)
A single strategic work plan for global rice research…
Global Rice Science Partnership : GRiSP o Core: 3 international research centers
o Numerous research partners
o NEED TO SHARE RESEARCH SOLUTIONS
IRRI Many more…
![Page 6: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/6.jpg)
First GRiSP Research Theme with heavy bioinformatics …
Accelerating the development, delivery, and adoption of improved rice varieties
• 2.1. Breeding informatics, high-throughput marker applications, and multi-environment testing
![Page 7: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/7.jpg)
Allele mining for crop improvement
Diverse rice germplasm
“allele pool”
Beneficial alleles
Fine-mapping, candidate gene analysis, cloning
Association mapping Gene and QTL mapping
Flanking and gene-based markers for molecular breeding
Genes controlling traits of interest
![Page 8: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/8.jpg)
PXO71 PXO99 PXO363 PXO341
Xa genes for bacterial leaf blight
Major QTLs/genes for breeding • QTLs and major genes for
stress tolerance and disease resistance are known
• Flanking SSRs and gene-based STS markers have been used to transfer these major QTLs
• Move to SNP markers for Marker Assisted Backcrossing (MABC). Marker Assisted Selection (MAS), Genomic Selection (GS)
![Page 9: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/9.jpg)
Challenges for IRRI scientists/breeders
• Not familiar with SNP-based genotyping
o How do I score the alleles? (no gel image!!!)
o Data does not fit my spreadsheet (run out of columns, rows)…
o Cannot even view the data file using “ordinary” apps
o Computer runs out of memory when I load the dataset…
o Trusted analysis software crashes inexplicably…
• We need to
o enable field/bench researchers for bioinformatics
o Share solutions openly across GRiSP partners, with rice research community as a whole
![Page 10: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/10.jpg)
Galaxy has features that fit our needs
Open, web-based platform for accessible, reproducible, and transparent computational biomedical research.
• Accessible: Users w/o programming experience can easily specify parameters and run tools and workflows.
• Reproducible: Galaxy captures info so that any user can repeat and understand a complete computational analysis.
• Transparent: Users share and publish analyses via the web and create interactive, web-based documents that describe a complete analysis.
![Page 11: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/11.jpg)
Illumina BeadXpress Genotyping
Fluidigm EP1Genotyping
GenomeStudio with Alchemy Plugin
Software tools on IRRI GSL-Galaxy
Infinium Custom 6k chip
Integration of Galaxy to Genotyping Service Lab workflow
Data preparation Genetic / association
analysis Biioinformatic
analysis
![Page 12: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/12.jpg)
Standard Galaxy release
![Page 13: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/13.jpg)
IRRI GALAXY (current)
•Deployed in the cloud (Amazon Web Services Large instance in Asia-Pacific region) •Streamlined to contain rice-specific tools and genotyping data •NO NGS assembly tools included
![Page 14: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/14.jpg)
Rice genome browser installed as data source for curated SNP, genome information
Comprehensive information on SNPs used in GSL
![Page 15: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/15.jpg)
Data manipulation tools in GSL Galaxy
•Format conversion for most commonly used genetic analysis, diversity study software tools
![Page 16: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/16.jpg)
Workflows for rice data analysis already available
In place for Illumina BeadXpres, Infinium platforms, being tested on Fluidigm system…
![Page 17: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/17.jpg)
Software Tools for SNP analysis
• SNP calling: Alchemy (Wright et al 2010)
• SNP data exploration, visualization: Flapjack, TASSEL
• Genetic linkage mapping: Mapmanager QTX, R/QTL
• QTL analysis: R/QTL, Qgene, MPMap (for multi-parent crosses)
• GWA analysis: TASSEL
• Population structure / diversity analysis : Powermarker, Structure
![Page 18: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/18.jpg)
Flapjack: visualize, manipulate SNP data
http://bioinf.scri.ac.uk/flapjack/index.shtml
•View genotypes graphically, with color code (nucleotide, compared to selected line …) •Select/deselect lines, markers from dataset, •Filter lines by markers •Basic statistics on dataset
![Page 19: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/19.jpg)
http://statgen.ncsu.edu/powermarker/
![Page 20: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/20.jpg)
TASSEL (Buckler Laboratory, Cornell University) : a software package to evaluate traits associations, evolutionary patterns, and linkage disequilibrium. Three areas of strength:
2. Provides new and powerful statistical approaches to association mapping eg. General Linear Model (GLM) and Mixed Linear Model (MLM).
3. handles a wide range of indels (insertion & deletions) which is the most common type of polymorphism in maize.
1. Integrates with various diversity databases (Panzea, Gramene, Sorghum , and GRIN )
www.maizegenetics.net/tassel
![Page 21: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/21.jpg)
Tassel stand-alone has lots of analysis tools…
![Page 22: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/22.jpg)
TASSEL analysis tools are being incorporated into Galaxy …
TASSEL pipeline mode for GWAS, population analysis
![Page 23: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/23.jpg)
IRRI Galaxy Toolshed (“APPS STORE”) is under development
![Page 24: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/24.jpg)
Genotyping data management
IRRI GSL manages data of customers …
• Customer declares as private – retained in GSL Galaxy account of customer
• Customer declares data as public – loaded into Genotyping Data Management System; shared with research community
![Page 25: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/25.jpg)
![Page 26: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/26.jpg)
![Page 27: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/27.jpg)
![Page 28: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/28.jpg)
![Page 29: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/29.jpg)
![Page 30: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/30.jpg)
![Page 31: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/31.jpg)
Genotype data matrix …
![Page 32: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/32.jpg)
![Page 33: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/33.jpg)
![Page 34: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/34.jpg)
Second GRiSP Research Theme with heavy bioinformatics …
Harnessing genetic diversity to chart new productivity, quality, and health horizons
1.2. Characterizing genetic diversity and creating novel gene pools (SNP genotypes, whole genome sequencing, phenotypes)
1.3. Genes and allelic diversity conferring stress tolerance and enhanced nutrition (candidate genes)
![Page 35: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/35.jpg)
IRGC – the International Rice Genebank Collection @ IRRI World’s largest collection of rice germplasm held in trust for the world community and source countries (www.irri.org/GRC)
• Over 117,000 accessions from 117 countries
• Two cultivated species
Oryza sativa Oryza glaberrima
• 22 wild species
• Relatively few accessions have donated alleles to current, high-yielding varieties
![Page 36: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/36.jpg)
Rice exhibits deep population structure.
Phylogenetic tree for 200K SNPs on 3,000 lines McNally et al., 2013 unpublished
Unpublished data removed
![Page 37: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/37.jpg)
Kenneth McNally, Nickolai Alexandrov, Ramil Mauleon, Chengzhi Liang, Ruaraidh
Sackville Hamilton,
Zhikang Li, Ren Wang, Hongliang Chen, Gengyun Zhang, Hongsheng Liang,
Hei Leung, Achim Dobermann, Robert Zeigler
The Rice 3,000 Genomes Project: Sequencing for Crop Improvement
CAAS
+ Many Analysis Partners . . .
![Page 38: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/38.jpg)
Bioinformatics challenges of the project…
• Primary data analysis: SNP calls, reference genome refinement, phylogenetic analysis, genotype phenotype association, etc…
• Efficient database system that allows the integration of the genebank information with phenotypic, breeding, genomic, and IPR data
• Development of toolkits/workbenches for use by research scientists and rice breeders
• Make these databases, tools, & analyses results publicly accessible (& constantly updated)
![Page 39: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/39.jpg)
More analysis … Can we assemble new references? Find important SNPs (merging with current GWAS/QTL results) - in CDSs - in promoters and other regulatory motifs Reconstruct large deletions/insertions/inversions in genome Find correlated SNPs Focus on known genes associated with traits Find conserved genome regions selected by breeders
•Need for speed •Need for collaborations
![Page 40: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/40.jpg)
IRIC: International Rice Informatics Consortium
NIAS Cornell TGAC
MIPS Cirad IRD
CAS CAAS KZI
Academia Sinica MPI Wageningen UR
EMBRAPA AGI Plant Onto
CSHL Gramene Uni Queensland
PAG 2013 : first introduction of the initiative to the scientific community
IRIC Portal is a central point of IRIC
![Page 41: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/41.jpg)
Initial Contact Organizations GRiSP Centers (4) Universities (14) Institutions (16) IRRI Arizona Genomics Institute Academia Sinica, Taiwan CIAT Cornell University CAAS, Beijing & Shenzhen IRD Federal University of Pelotas, Brazil CAS, Beijing Cirad Huazhong AU, China Cold Spring Harbor Laboratory Katsetsart University, Thailand EMBL-EBI, U.K. Breeding Companies (7) Kyung Hee University, Korea EMBRAPA, Brazil Bayer CropSciences Louisiana State University ICAR, India Biogemma Michigan State University INRA, France Mahyco Oregon State University Kunming Zoo Institute, China Mars Food Global Perpignan University, France MIPS, Germany Pioneer UC-Riverside MPI-Tuebingen, Germany RiceTec University of Delaware NCGR-CAS, SIBS, Shanghai Syngenta University of Queensland, Australia NIAS, Japan Wageningen UR, Netherlands The Genome Analysis Centre, U.K. USDA-Research Foundations (5) Others (5) NCGR, Sante Fe , NM Gates Foundation GigaScience Journal NSF, U.S.A. GigaDB.org Tech Companies (3) Sloan Foundation iPlant Collaborative, U.S.A. BGI-Shenzhen USAID, U.S.A. Gramene Affymetrix USDA-CREES Plant & Trait Ontology Pacific Biosciences
![Page 42: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/42.jpg)
MSU Rice Genome Annotation Project http://rice.plantbiology.msu.edu/ RAP-DB http://rapdb.dna.affrc.go.jp/ BGI-RIS Rice Information System http://rice.genomics.org.cn/rice/index2.jsp PlantGDB http://www.plantgdb.org/OsGDB/ Gramene http://www.gramene.org/
Existing rice portals
![Page 43: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/43.jpg)
IRIC portal content • Sequences and analysis of 3,000 genomes*
o SNPs o assemblies o phylogenetic trees o genes associated with traits o regulatory motifs o most significant variations
• Other available rice genome sequences (~2,000 rice entries in SRA) • Sequences of rice microorganisms • Sequences of other grasses (e.g. for C4 project) • Genotyping results from GBS, 44K and 700K affy chips • Phenotypic data • Gene expression data • Gene functions and networks • Analysis tools • Linked to rice seeds database • Linked to other IRRI databases and portals
*Total amount of genotyping data: ~3K*20Mio = 60Bio
![Page 44: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/44.jpg)
IRIC portal development team @ IRRI Rolando Jay Santos Victor Jun Ulat Frances Nikki Borja Venice Margarette Juanillas Jeffrey Detras Roven Rommel Fuentes Ramil Mauleon Kenneth McNally Nickolai Alexandrov
Management team Hei Leung Ruaraidh Sackville Hamilton Kenneth McNally Ramil Mauleon Nickolai Alexandrov
Visionary input Achim Dobermann
Technological advices Marco van den Berg
![Page 45: Ramil Mauleon: Galaxy: bioinformatics for rice scientists](https://reader033.vdocuments.us/reader033/viewer/2022051400/54c6d07e4a79593f1e8b4597/html5/thumbnails/45.jpg)
We need you!!!
• Now Hiring: Two (2) post-doctoral positions for computational biology / bioinformatics based at IRRI
• http://irri.org