bioinformatics and the language of dna aydin tozeren [email protected] center for integrated...

51
Bioinformatics and the Language of DNA Aydin Tozeren [email protected] Center for Integrated Bioinformatics Drexel University, Philadelphia, PA, USA www.gpba-bio.com

Upload: eric-stephens

Post on 01-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Bioinformatics and the Language of DNA

Aydin [email protected]

Center for Integrated BioinformaticsDrexel University, Philadelphia,

PA, USAwww.gpba-bio.com

Page 2: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Yeni Hayat/ New Life by Orhan Pamuk

• Bir kitap okudum butun hayatim degisti

• Read a book and my life has changed forever

Page 3: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Living systems have the same building blocks: C, N, H, O, P, Su, minerals.

Five different macromolecules: DNA, RNA, Protein, Carbohydrates, Fat (lipid)

Information Flow: DNA to RNA (template) to Protein back to DNADNA has four basic building blocks, arranged in a sequence.Proteins have 20 building blocks, arranged in various orders in a linear chain.

Page 4: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Proteins: Molecular Machines

• There are about 40,000 types of proteins in a cell.

• Proteins change configuration upon actively resulting in movement and motion. They are responsible for heart beat, neuron firing, muscular movement, etc.

Page 5: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

DNA can be viewed as a long string of four letters in various combinations: ACGTTACCGCGCTCA.....

Billions of letters with no coma or period, just arranged serially. 

Genome: Collection of DNA in the nucleus of a cell.

Next Generation Sequences can human genome in 6 weeks.

Page 6: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

BACTERIA has single (circular) DNA organized into OPERONS

EUKARYOTES (plants, yeast, mammals) have DNA organized into chapters (single linear DNA molecules or chromosomes).

DNA : Book of Life

Each and every cell in the body has the same book of life

Page 7: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

DNA is the hard drive and the information storage unit of the living. Cells from different tissue types may use (read) different sections (pages) of the DNA (book of life).

DNA various only so slightly between individuals in a species.

The sequence of letters along DNA is similar among species such as dogs, human, monkey, and even mouse.

Page 8: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Gene is a segment of DNA that provides a recipe for a protein. typically it is 300 letters (nucleotides) but can be much longer. A three letter along a gene (CODON) represents an amino acid, one of the 20 building blocks of proteins. 

Page 9: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

gatcaggtcc ttatgatgac agattggggc ccactttgtt gtgctttttc ttattggttg ctgtcattat caactttata ttaagattga agtacaatga cgctaacact aagttatgaa attgtaattc caatatcgta agcgtgggtt acgcacaaac tgtattttca agatgctcac aaataattta gtttcatata tacgcatata tagaaagtat ccatctatag gtaatcatga acaataaaaa tattcacgtt tcaggagcta ttgtttgtac tcattacgtt tttggatatc aagttgaaaa tcagcccctt tcactagata tcaagcgcta taaaaaaatt ttaatttcga tgaggcatct ttcttttctc ttgtggctat gtaagcctaa gaagccgttt acacatcaat gataaataag tatacaaaaa gggttccatt ttttttttgg ccgctaccgg actagcaagg gcctaatggt acgctgagcg tagtacaacc aagcgcttgt  

Page 10: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

/

translation="MSDAVTIRTRKVISNPLLARKQFVVDVLHPNRANVSKDELREKLAEVYKAEKDAVSVFGFRTQFGGGKSVGFGLVYNSVAEAKKFEPTYRLVRYGLAEKVEKASRQQRKQKKNRDKKIFGTGKRLAKKVARRNAD"

Page 11: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

EUKARYOTE GENE SEQUENCES ARE HIGHLY CONSERVED BUT VIRAL SEQUENCES VARY WITH TIMESequence measurement is fast and accurate with next generation sequencing

Page 12: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Will Dampier: SNP Islands on HIV-1 Genomemutations vary from one HIV-1 protein after another. Less the mutation density more important the protein is for viral survival.

Page 13: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Will Dampier: Homology Islands Along HIV-1 Genome GenomGenome

Page 14: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Will Dampier, Invariant Sequences on HIV-1

• Original Seqquence

• AAGGAGAGAGATGGGTGCGAGAGCGTC AATAAAATAGTAAGAATGTA TAGAAGAAATGATGACAGCATG CAGGCTAATTTTTTAGGGAA ACAGGAGCAGATGATACAGT ATGGAAACCAAAAATGATAGG ATTGGAGGAAATGAACAAGT TTAGCAGGAAGATGGCCAGT ATTCCCTACAATCCCCAAAG CACAATTTTAAAAGAAAAGGGGGGATTGGGGGG TACAGTGCAGGGGAAAGAATA AAAATTCAAAATTTTCGGGT TTCAAAATTTTCGGGTTTATT AATTTTCGGGTTTATTACAG CTCTGGAAAGGTGAAGGGGCAGTAGTAAT

Page 15: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

eukaryotic linear binding motifs (ELMs): ISNP, LLARKQ, FVVDV

4 - 7 amino acid long sequence segments of a protein. it has been shown to be involved in binding interactions with other proteins in the same species.

there are about 130 such motifs conserved across eukaryotes

Page 16: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Yichuan Liu: Domains/Motifs stabilities across Eukaryotics

Page 17: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Perry Evans: Conserved ELM locations on HIV-1 ENV

Page 18: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Will Dampier: Patient Progression

Page 19: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Will Dampier: Predictive Ability

Page 20: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Will Dampier: ELM Locations

 

Page 21: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

HIV1 Sequence Motifs

Conserved regions can be targeted with micro RNA to prevent HIV-1 from multiplying.

Presence of Linear Binding motifs at certain locations in the alignment is correlated with the severity of the disease.

Given the sequence of a virus, we can predict motifs on the sequence relevant to clinical outcome

Page 22: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

[email protected]

www.gpba-bio.com

• VIRUS/HOST CELL

• CELL-CELL

• CELL EXTRACELLULAR MATRIX

Page 23: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Kar/ Snow –Orhan Pamuk

• Kadife, Lacivert ve Tiyatro trubu Karsta

• People in Kars becomes obsessed with a visiting theatre group, anticipating and watching a new show every night. Talk of the town.

Page 24: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Virus proteins

• Virus proteins hijack binding motifs found in the host cells.

• One mode of protein interaction is due to binding of an ELM on one protein to a domain on the other.

Page 25: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Evans: Conserved ELM locations on HIV-1 NEF

Page 26: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

HIV Host Protein Interactions

Page 27: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,
Page 28: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Evans: Host-pathogen KEGG proj.

 

Page 29: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Given Viral Sequence

• We can determine with good accuracy the host proteins targeted by the virus.

• One can then search for optimized therapies for the virus.

Page 30: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Yichuan Liu: Predicting Protein Interactions

Page 31: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Yichuan Liu: Heat Graph of PPI separated by BP GO terms in 5 different Cell Compartments

Page 32: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Machine learning

What are the sequence motifs that are enriched in proteins known to interact with other proteins?

Answer: ELMs and Counter Domains explain only 20% of known protein interactions. Therefore the language and grammar of crosstalk between two proteins are yet to be discovered.

Page 33: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Microarray ChipsA slide with thousands of dots with each dot sticky for a product of a gene higher the number of gene product copy shinier is the dot. Question is which subset of genes we should use to differentiate between disease subtypes?

Page 34: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Noor: Examples of Gene Enrichment

Page 35: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

each microarray experiment provides thousands of values for the activities of genes at a specific tissue at a given time

Page 36: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Adam Ertel: Switch-like gene expression

Page 37: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Adam Ertel: Switch-like genes involved in cell communication pathways

Page 38: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Adam Ertel: Clustering tissue by switch high/low state

Page 39: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

1265 Bimodal Genes

ECM-MEM Bimodal Genes

Michael Gormley: Model- based clustering of tissue type

KMeans Hierarchical Model-Based

Page 40: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Michael Gormley: Bimodal genes expressed in the “on” mode in specific tissues

Page 41: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Michael Gormley: Bimodal genes expressed in the “on” mode in HIV infection

Page 42: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Michael Gormley: Model- based clustering of infectious disease

Page 43: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

• Effect size (μ1-μ2/σ2)• Regression coefficients (β)• Number of samples (n) • Number of genes (p)• Number of significant genes (M) • Number of selected features (N)

Parameters

Michael Gormley: Simulation of supervised classification with bimodal genes

Page 44: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Mahdi Sarmady: Sample Model

1.Species and interaction initial value: inactive

2.Effect of TLR1/TLR2 activation by binding of a bacterial lipoprotein

Page 45: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Noor: Model and Data Description• Main Model Description:

o Comprises of 78 flux equations used to simulate the kinetics of 98 variables found in blood, cytoplasm, mitochondrial intermembrane space and matrix

o Simualtions take into consideration significant bindings of reactants to the cations H+, K+ and Mg2+, and are therefore pH sensitive

• Data Description:o Microarray data was collected for each of the 4 tissues and

significantly altered genes representing enzymes from aforementioned metabolic pathways were identified

o Fold change values from SAM test where used to adjust enzymatic Vmax values assuming Vmax is directly proportional to enzyme concentration Vmax = k[E]

Page 46: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Glycolysis: Glucose  • Hexokinase: A = Mg2+-bound ATP, B = GLC_c, P = G6P_c, Q = Mg2+-bound

ADP           • GLUT4: A = GLC_b, P = GLC_c

Noor: Examples of ODEs Used

Page 47: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

TCA: Fumarate  • Succinyl Dehydrogenase: A = SUC_x, B = COQ_x, P = QH2_x, Q = FUM_x

            

Noor: Examples of ODEs Used

Page 48: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Mitochandria Energy Flux

Page 49: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Pathway for Diabetes

Page 50: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Genomics Summary

• Given: Sequence Data

• Microarray Data

• A priori information (curated literature, pathways)

• Show: Host proteins targeted by virus

• Discover new protein-protein interactions

• Investigate side effects and treatment potential of drugs

Page 51: Bioinformatics and the Language of DNA Aydin Tozeren aydin.tozeren@drexel.edu Center for Integrated Bioinformatics Drexel University, Philadelphia, PA,

Sevgi Soysal

• Kadinin Adi Yok!

• A Woman Has No Name!

• More and more women’s names are embedded onto the marble stones of science.