Bioinformatics ApS
Bioinformatics ApS
• Founded in February 2002 with investment from Biovision
• Founders: Staff working at the Bioinformatics Research Center, Aarhus University
• Employees: 2 software developers
Founders
• Leif Schauser, Ph.D. (CEO): Associate Professor, Molecular Biology, Bioinformatics Research Institute
• Jotun Hein, Ph.D. (Member of BoD): Professor in Bioinformatics at Oxford University
• Mikkel Schierup, Ph.D. (CSO): Associate Professor, Biology, Bioinformatics Research Institute
• Christian Storm, Ph.D. (CTO): Associate Professor, Computer Science, Bioinformatics Research Institute
Strategy
• Develop leading bioinformatics tools designed for association studies
• Branding the software
• Collaborations with pharmacogenomics industry
– Disease genes, adverse drug reactions
• Extend product suite to include other bioinformatics solutions (databases, comparative genomics)
Drug target hunting
Microarray analysis Disease gene mapping
100s of targets,not necessarilyrelevant
Mb of genomic sequence,Power dependentRequires high resolution
Problems:Tissues, controls,Limited by array
Problems:Sampling, modeling,penetrance
GeneRecon
• Association mapping using all markers at the same time and all other available information
• Fully probabilistic approach
Input:SNPs or microsattelites
Disease and control group
Output:Localisation of disease
GeneRecon - implementation
• C++ program (>10.000 lines of code)
• Multiplatform (Unix, Windows, MacOSX)
• Amenable to parallelization
• Bayesian MCMC approach
GeneRecon - implementation
• 10 million recalculations/hour
• Different models of disease transmission
• Diploid data with unknown phase
• Thorough tests for calculations
GeneRecon – output
• Disease gene location (full distribution)
• Disease-causing haplotypes
• Estimation of phenocopies
• Penetrance
• Date the origin of disease
GeneRecon - collaborations
• Scandinavian medium sized biotechnological companies– Proprietary dataset under analysis
• Danish University Hospital– Schizophrenia data are currently being collected
Disease mappingPedigree Analysis: Association Mapping:
Pedigree known
Few Meiosis (max 100s)
Many Generations
Resolution: cMorgans (Mbases)Pedigree sampled
Many Meiosis (>104)
Resolution: 10-4 Morgans (Kbases)
Tim
e
rM
D
rM
D
Linkage Disequilibrium (LD)
Haplotypes
Haplotypes:
SNPs:
A
T
G
C
C
A
{A,T} {C,G} {A,C}
2m-1
The Human Genome http://www.sanger.ac.uk/HGP/
1
2 3
4 56 7 8 9
X
Y15141312
10 11212019
181716
22
3 billion base pairs per haploid genome
30.000-40.000 genes
SNP facts
http://www.ncbi.nlm.nih.gov/SNP/
• For 2 complete haplotype genomes, there are about 3 million SNP differences (>1 SNP / kb).
• Currently 3 mio. SNPs in database
RefSNP with frequency with genotype
3.079.086 196.054 32.101
Large scale survey of LDReich et al. (2001)
Recent LD studies
• LD extends over considerable distance in most populations• African populations show less LD than European
populations• Small, isolated populations (e.g. Saami, Evenki) show
increased LD• Founder populations (e.g. Finland, Sardinia) do not always
show increased LS
• Evidence for heterogeneity in LD along chromosomes– Haplotype blocks– Recombination hotspots
Genetic Basis for DiseaseMonogenic
Cystic Fibrosis
Huntington’s Disease
Sickle Cell Anemia
Polygenic
Azheimer’s disease
Schizophrenia
Hereditaray Heart Disease
Astma
Cystic fibrosis: a case study
Traditional analysis Bayesian MCMC sampling
The market
• All major pharmaceutical and many biotech companies conduct genetic studies– Disease association (drug target identification)– Adverse drug response (pharmacogenomics)– Tailored drug administration
• Outsourcing of non-core activities
Timeline for drug discovery # Targets Discovery (5 yrs)
5000 Population study I
Pre-Clinical (1 yr)
50
Clinical (6 yrs)5 Population study II
Review (2 yrs)
1
Marketed
Cambridge Healthtech Institute: SNP-research market could reach
$1.2 billion by 2005• Annual expenditures on SNP research:
– $158 million in 2001 – $1.2 billion in 2005 (estimated): 7 fold growth
• Increasing interest in pharmacogenomics-or tailoring treatment to patients based on their genomic profiles-by pharmaceutical, biotechnology, and genomic tools companies.
Factors influencing SNP research
0
20
40
60
80
100
2000 2001 2002 2003
Price / SNP(Cents)
Identified SNPs(x100.000)
Investigations(x10)
Average size ofinvestigation(x20)
Example
• DeCode typed 10.000 markers in all Icelanders (250.000)
Needs of the market
• Detailed understanding of population biology
• Extract signals from noisy data (power)• Efficient algorithms that provide quick and precise answers
Comparative Genomics
• Gene finding (Correct annotation is crucial)• Identifying important residues in drug
targets (HIV, proteins etc.)• Identifying regulatory sequences, networks
Future
Disease gene finding:
GeneRecon
Databasesolutions Comparative Genomics
Haplotypes
Experimental methods of determining Haplotypes:
•Egg & Sperm Sequencing
•Cell Lines with Lost Chromosomes
•Sequencing Clones Spanning SNPs
These methods are very expensive so computational reconstruction of haplotypes from SNPs is preferable.
Haplotypes:
SNPs:
A
T
G
C
C
A
{A,T} {C,G} {A,C}
2m-1
ParametersBayesian Analysis, i.e. all parameters have assigned distributions.
Markov Chain Monte Carlo allows the calculation of posterior (post-data) calculation of parameters and quantities of interest.
The Shattered Coalescent(Morris, Whittaker & Balding,2002)
Advantages: Allows for multiple origins of the disease mutant + sporadic occurances of the disease without the mutation (phenocopies)