bioinformatics and biostatistics in limagrain / biogemma jobim conference, july 2015
TRANSCRIPT
Bioinformatics and Biostatistics in Limagrain / Biogemma
JOBIM Conference, July 2015
An international agricultural cooperative group
2
A portfolio of strong brands
4th largest seed company
worldwide
Sales of nearly 2 billion
Euros
Subsidiaries in 42 countries
Nearly
2,000 farmer members
Nearly
9,000 employees
13.5% of turnover
re-invested in research
A group that specializes in seeds and cereal products
3
Field Seeds
VegetableSeeds
Cereal Products
Limagrain Coop
FieldSeeds
BakeryProducts
CerealIngredients
GardenProducts
VegetableSeeds
A European group open to the world
4
64% of sales
64% of workforce
23% of sales
6% of sales
16% of workforce
7% of sales
12% of workforce
8% of workforce
Nearly
9,000 employees
66 nationalities
69% of sales
achieved outside France
Subsidiaries in
42 countries
Europe
Asia & Pacific
Africa & Middle East
Americas
An innovative group
5
13.5% of turnover invested in research
200 M€ invested in research(270 M€ with collabora-tions)
2.25%*5.4%*
10.2%*
13.5%
Averageindustry
Automobile industry
Pharmaceutical industry
Limagrain
* Source : Leem - April 2013
BIOGEMMA, a research partnership
Biotechnologies
Field Seeds
66
9.5% 9.5%
16%55 %
10%
| 7
Biogemma
Identification of genes associated with agronomic traits
Development of GM varieties in cereals
Development of tools and knowledgeBIOINFORMATIC
S
Bioinformatics for breeding
Analyze NGS-based data
Develop databases and tools to store and analyse biological data
Bioinformaticsdb
Tools
Tools
BiostatisticsDiscover Associations
BioanalysisExplain Associations
Molecular Breeding
Omics analysisPhenotype
Environment
ChromatinSilencing
Regulation of transcription
miRNA, siRNA
Protein modification, interaction, turnover
Regulation of translation
RNA stability
What wemeasure
Markers mRNATranscription levels, DGE
ProteinQuantity,
Activity levels
TraitPhenome
Regulationof expression
How we measure
GenotypingSequencing
RNA-Seqmicroarrays
HPLC Crystallo-
graphy
IA, NIR, HPLC, eyeball
DNAGenes,
Genomes
Biologicalmaterial
RNAmRNA, rRNATranscriptome
ProteinEnzyme
Proteome
TraitPhenome
MetabolomeTranscription Translation Expression
LD mapping, GWAS, GS
A great deal of complex information to correlate
Environment
Genotype Phenotype
Data processing tools getting more and more sophisticated
Data production & acquisition
Data analysis & processing
Results interpretation& decision supportfield trials
genotypingsequencinggenomics
LIMS, databases
data retrieval
quality control
statistical analyses
building predictive model
evaluation of individuals
predicting cross value
Data Life Cycle
Data production & acquisitionSequencing
NGS based: whole genome, targeted sequencing, transcriptome
Deliverables: SNP, structural variations, gene expression level, genomes
Genotyping
High density chips
- 103 – 105 SNP
- 105 samples
Automate calling / quality control
12
Steem_Z30_rep1
Steem_Z30_rep2
Steem_Z32_rep1
Steem_Z32_rep2
Steem_Z65_rep1
Steem_Z65_rep2
Data production & acquisition
Phenotypic data
Automate data collection
Sensors, images, NIR spectrometry…
Adjustments/corrections by geostatistical methods
Extraction of relevant information
13
Data production & acquisitionEnvironmental data
Local / internal:
- Sensors, airborne imagery, …
Global / external:
- Databases, internet, satellite images, …
Precise description of the growing conditions
14
Air temperature
Relative humidity
Dew point
ModellingMolecular data
Cost
Availability
Predict: genotype phenotype
QTL/GWAS – identify genomic regions involved
genomic selection – "black box" approach
15
Modelling
Statistical methods
Linear mixed models
Bayesian approaches
More and more complex models
GxE
Epistasis
computationally intensive methods
16(from Van Eeuwijk et al., 2010)
Data management
17
Integrative viewer for genomic data
Databases
BIG DATA: large volume of structured and unstructured data
Infrastructure
18
Local
on-the-premises computing
"data-centric computing"
Central
enterprise resources
Security
NGS data analysis on BIOGEMMA HPC (912 cores)
Elastic (cloud)
flexibility
low cost / hour CPU
Pied de page 19
Take Home Messages
Bioinformatics: a major activity supporting a large range of applications in Limagrain
Genomics
Phenomics
Enviromics
Biostatistics, Modelling and Prediction
Big Data (HPC, data management)
Both R&D and Applied
In a highly competitive and challenging research area
Pied de page20
More information…
21
Thank you