plan for day 1 1.the course 1.registration 2.layout 3.expectations 4.evaluation and exam 2.what is...

41
Plan for day 1 1. The course 1. Registration 2. Layout 3. Expectations 4. Evaluation and exam 2. What is bioinformatics? 3. Setup and connect computers LUNCH 1. 13:00. Setup and connect computers 2. Software overview 3. CLC Combined Workbench (presentation, installation, demo) 4. Install and play with general computer tools

Post on 15-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Plan for day 1

1. The course1. Registration2. Layout3. Expectations4. Evaluation and exam

2. What is bioinformatics?3. Setup and connect computers

LUNCH

1. 13:00. Setup and connect computers2. Software overview3. CLC Combined Workbench (presentation, installation, demo)4. Install and play with general computer tools

Page 2: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

What is bioinformatics?

Anders Krogh & Morten LindowThe Bioinformatics CentreDept of BiologyUniversity of Copenhagen

Page 3: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

A big change in biology has taken place

Measure the expressionof a single gene in a singlesample

Measure the expression of allgenes in many samples

Before Now

Page 4: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Mutations

Before you mapped mutations in bacteria (lots of work, I think)

Now you sequence the whole genome with ”next generation sequencing”

Page 5: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Protein interactions

Find interaction partnersfor one protein Find interaction partners for all proteins

Before After

Page 6: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Biology has become an information science

Page 7: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Genome sequencing is just the beginning

GGCAAACCCTGTGATTCAGTTTGTCTGTGATTTGCTTAACCGGGATATTTCTTCTCGACCTTTATCTGATGCTGATCGTG TTAAGATAAAAAAGGCTCTTAGAGGTGTCAAAGTTGAAGTGACTCATCGAGGAAACATGCGCCGGAAGTACCGCATTTCC GGTTTGACTGCTGTGGCCACTCGGGAATTGACATTCCCAGTAGATGAAAGAAATACTCAGAAATCTGTTGTAGAATACTT CCACGAAACATATGGTTTTCGCATTCAGCACACTCAACTACCATGCTTGCAAGTTGGGAATTCTAATAGGCCTAATTACT TACCAATGGAGGTATGCAAGATTGTTGAAGGCCAGCGGTATTCCAAAAGATTGAATGAGAGACAGATCACTGCTTTGCTG AAGGTTACCTGTCAGCGCCCGATAGATCGAGAAAAAGATATCTTACAGACGGTGCAACTCAATGATTATGCTAAAGATAA TTATGCTCAAGAGTTTGGCATCAAAATAAGTACTTCTCTGGCTTCTGTTGAGGCTCGTATACTGCCTCCTCCATGGCTTA AGTACCACGAGTCTGGAAGGGAAGGGACTTGTCTGCCACAAGTTGGTCAATGGAACATGATGAATAAGAAAATGATCAAT GGTGGAACGGTGAATAATTGGATCTGCATCAACTTTTCTAGGCAAGTGCAGGACAATCTAGCGCGTACATTTTGTCAGGA ACTTGCTCAAATGTGTTACGTATCTGGCATGGCATTTAATCCGGAACCAGTCCTCCCACCAGTCAGTGCTCGCCCTGAGC AAGTAGAGAAGGTCTTGAAGACTAGATATCATGATGCCACATCAAAACTCTCCCAAGGAAAAGAAATTGATCTGCTTATT GTCATTCTGCCCGATAATAATGGATCATTATACGGTGATTTGAAACGCATATGTGAGACTGAACTCGGCATAGTCTCTCA ATGTTGCCTGACAAAGCATGTCTTTAAGATGAGCAAACAATACATGGCTAATGTTGCGCTGAAGATTAATGTGAAGGTTG GAGGAAGAAACACAGTGCTTGTTGATGCTCTATCTAGGCGGATTCCTCTAGTCAGTGATCGACCCACCATTATATTTGGT GCTGATGTTACCCACCCTCACCCTGGAGAGGATTCAAGCCCATCTATTGCTGCTGTTGTGGCATCTCAGGATTGGCCTGA AATCACTAAATATGCTGGATTAGTTTGCGCTCAAGCGCATAGGCAGGAGCTCATTCAGGATCTGTTCAAAGAGTGGAAGG ATCCTCAGAAAGGTGTGGTGACTGGTGGCATGATAAAGGAGTTGCTCATAGCCTTCCGTAGATCAACTGGGCATAAACCA CTAAGGATCATCTTCTACAGGGATGGAGTCAGTGAGGGACAATTTTACCAAGTTTTGCTCTATGAACTTGATGCCATCCG CAAGGCCTGTGCTTCGCTGGAAGCAGGTTATCAACCACCAGTGACATTTGTGGTGGTGCAGAAGCGTCATCACACGAGGC TGTTTGCTCAGAACCACAATGATCGCCATTCGGTGGACAGAAGTGGGAATATTTTACCTGGCACTGTTGTGGACTCTAAA ATCTGCCACCCTACAGAGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCA CGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGTATGCAAGAT GCACACGCTCAGTTTCAATTGTTCCCCCTGCATATTATGCACATCTAGCAGCTTTTAGGGCTCGATTCTACATGGAGCCA GAGACATCAGACAGTGGCTCAATGGCTAGTGGGAGCATGGCACGTGGAGGTGGAATGGCTGGTAGAAGCACACGCGGGCC TAATGTCAATGCTGCAGTGAGGCCACTCCCAGCTCTGAAAGAGAATGTGAAGCGTGTCATGTTCTACTGCTGAGTTGATT CACCCTCTATCTATCTTTATGACCTAAATTAATGAAGAATATCATGTATGCTTTCTAAGACTTATCGTGTGTTTGGATAT TTCATCACTCTTTCTCTATGAGTATGAGATGCTTTATGACTCTTGTTTGACAACTACTAAACTTTATTATTCAAAACAGA CTTTGATCCTTTCAAAAAAAAAAAAAAAAAAAA TAGAGAGAGAGAGAAAGATATAGAGAGAACACAGAGAGGCGAGAGCGACGTAGGGTTGGTGTTTCGTACGGATTTTCTCG GTCAATCCTAGTTTCTCCGGCGAGAGATTGCTTTTCAGGAATCATCATGGTGAGAAAGAGAAGAACGGATGCTCCATCTG AAGGAGGTGAAGGCTCTGGGTCTCGTGAAGCTGGTCCAGTCTCAGGTGGTGGACGTGGTTCACAGCGAGGTGGTTTCCAG CAGGGAGGAGGACAACACCAAGGTGGAAGGGGTTATACTCCTCAACCTCAACAGGGAGGTCGTGGTGGTCGTGGATATGG GCAACCACCACAACAGCAACAACAGTATGGAGGACCACAAGAGTACCAAGGAAGAGGAAGAGGAGGACCTCCTCATCAAG GAGGTCGAGGAGGGTATGGCGGTGGCCGTGGAGGTGGACCTTCTTCTGGACCACCGCAGAGACAATCAGTTCCCGAGCTG CATCAAGCTACCTCACCTACTTATCAAGCGGTGTCTTCTCAGCCTACACTGTCTGAGGTGAGTCCTACCCAGGTACCAGA ACCTACTGTTCTGGCTCAGCAATTTGAACAACTCTCTGTTGAACAAGGAGCTCCCAGTCAGGCAATCCAGCCTATACCGT CTTCTAGCAAGGCTTTCAAGTTTCCAATGAGGCCTGGTAAAGGACAGAGTGGAAAGCGTTGCATTGTGAAGGCTAACCAT TTCTTTGCTGAACTGCCTGATAAGGATTTGCACCATTATGATGTTACCATTACTCCGGAAGTTACATCAAGGGGTGTCAA TCGTGCTGTGATGAAACAACTTGTTGATAATTATCGTGATTCTCACCTTGGAAGTCGTCTTCCAGCGTATGATGGTCGAA AAAGTCTTTACACTGCTGGTCCACTTCCCTTTAACTCCAAGGAGTTCAGAATCAATCTTCTTGACGAAGAAGTAGGGGCT GGAGGTCAAAGACGAGAAAGGGAATTTAAAGTTGTGATCAAGCTAGTTGCACGTGCTGATCTGCATCACCTAGGAATGTT TTTGGAGGGGAAACAATCAGATGCCCCACAGGAAGCTCTGCAGGTTCTTGACATTGTTCTTCGTGAGCTGCCGACCTCTA GGTATATTCCGGTGGGCCGGTCCTTTTATTCCCCTGATATAGGAAAAAAACAATCATTGGGGGATGGCTTGGAGAGCTGG CGTGGATTCTACCAAAGCATTCGTCCTACACAGATGGGCTTATCACTCAATATTGATATGTCATCGACAGCCTTCATAGA GGCAAACCCTGTGATTCAGTTTGTCTGTGATTTGCTTAACCGGGATATTTCTTCTCGACCTTTATCTGATGCTGATCGTG TTAAGATAAAAAAGGCTCTTAGAGGTGTCAAAGTTGAAGTGACTCATCGAGGAAACATGCGCCGGAAGTACCGCATTTCC GGTTTGACTGCTGTGGCCACTCGGGAATTGACATTCCCAGTAGATGAAAGAAATACTCAGAAATCTGTTGTAGAATACTT CCACGAAACATATGGTTTTCGCATTCAGCACACTCAACTACCATGCTTGCAAGTTGGGAATTCTAATAGGCCTAATTACT TACCAATGGAGGTATGCAAGATTGTTGAAGGCCAGCGGTATTCCAAAAGATTGAATGAGAGACAGATCACTGCTTTGCTG AAGGTTACCTGTCAGCGCCCGATAGATCGAGAAAAAGATATCTTACAGACGGTGCAACTCAATGATTATGCTAAAGATAA TTATGCTCAAGAGTTTGGCATCAAAATAAGTACTTCTCTGGCTTCTGTTGAGGCTCGTATACTGCCTCCTCCATGGCTTA AGTACCACGAGTCTGGAAGGGAAGGGACTTGTCTGCCACAAGTTGGTCAATGGAACATGATGAATAAGAAAATGATCAAT GGTGGAACGGTGAATAATTGGATCTGCATCAACTTTTCTAGGCAAGTGCAGGACAATCTAGCGCGTACATTTTGTCAGGA ACTTGCTCAAATGTGTTACGTATCTGGCATGGCATTTAATCCGGAACCAGTCCTCCCACCAGTCAGTGCTCGCCCTGAGC AAGTAGAGAAGGTCTTGAAGACTAGATATCATGATGCCACATCAAAACTCTCCCAAGGAAAAGAAATTGATCTGCTTATT GTCATTCTGCCCGATAATAATGGATCATTATACGGTGATTTGAAACGCATATGTGAGACTGAACTCGGCATAGTCTCTCA ATGTTGCCTGACAAAGCATGTCTTTAAGATGAGCAAACAATACATGGCTAATGTTGCGCTGAAGATTAATGTGAAGGTTG GAGGAAGAAACACAGTGCTTGTTGATGCTCTATCTAGGCGGATTCCTCTAGTCAGTGATCGACCCACCATTATATTTGGT GCTGATGTTACCCACCCTCACCCTGGAGAGGATTCAAGCCCATCTATTGCTGCTGTTGTGGCATCTCAGGATTGGCCTGA AATCACTAAATATGCTGGATTAGTTTGCGCTCAAGCGCATAGGCAGGAGCTCATTCAGGATCTGTTCAAAGAGTGGAAGG ATCCTCAGAAAGGTGTGGTGACTGGTGGCATGATAAAGGAGTTGCTCATAGCCTTCCGTAGATCAACTGGGCATAAACCA CTAAGGATCATCTTCTACAGGGATGGAGTCAGTGAGGGACAATTTTACCAAGTTTTGCTCTATGAACTTGATGCCATCCG CAAGGCCTGTGCTTCGCTGGAAGCAGGTTATCAACCACCAGTGACATTTGTGGTGGTGCAGAAGCGTCATCACACGAGGC TGTTTGCTCAGAACCACAATGATCGCCATTCGGTGGACAGAAGTGGGAATATTTTACCTGGCACTGTTGTGGACTCTAAA ATCTGCCACCCTACAGAGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCA CGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGTATGCAAGAT GCACACGCTCAGTTTCAATTGTTCCCCCTGCATATTATGCACATCTAGCAGCTTTTAGGGCTCGATTCTACATGGAGCCA GAGACATCAGACAGTGGCTCAATGGCTAGTGGGAGCATGGCACGTGGAGGTGGAATGGCTGGTAGAAGCACACGCGGGCC TAATGTCAATGCTGCAGTGAGGCCACTCCCAGCTCTGAAAGAGAATGTGAAGCGTGTCATGTTCTACTGCTGAGTTGATT CACCCTCTATCTATCTTTATGACCTAAATTAATGAAGAATATCATGTATGCTTTCTAAGACTTATCGTGTGTTTGGATAT TTCATCACTCTTTCTCTATGAGTATGAGATGCTTTATGACTCTTGTTTGACAACTACTAAACTTTATTATTCAAAACAGA

Where are the genes?How are they regulated?What do they do?How do they interact?How did they evolve?What about the rest of

the genome?

Page 8: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Experimental labs need informatics

In many labs bioinformatics is the bottleneck.

Example:• You want to study differences in miRNA expression in cancer

vs. normal tissue.• Short RNAs are extracted. It is mailed to a company for

sequencing.• You get a hard-disk in return full of short sequences.• The rest is bioinformatics.

Page 9: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Definition

The book:”bioinformatics involves the technology that uses computers for analysis, storage, retrieval, manipulation and distribution of information related to biological macromolecules such as DNA, RNA and proteins.”

Page 10: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Definition

Wikipedia: ”… The terms bioinformatics and computational biology are often used interchangeably.

However bioinformatics more properly refers to the creation and advancement of algorithms, computational and statistical techniques, and theory to solve formal and practical problems posed by or inspired from the management and analysis of biological data.

Computational biology, on the other hand, refers to hypothesis-driven investigation of a specific biological problem using computers, carried out with experimental and simulated data, with the primary goal of discovery and the advancement of biological knowledge. “

Page 11: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Bioinformatics?

Search for homologs to a protein sequence

Retrieve information about a genome segment

Predict the structure of an RNA molecule

Build a phylogentic tree connecting a set of proteins

Find differentially expressed genes using microarrays

Make a model of protein-protein interactions

Analyze experimental data in a spreadsheet

Make an equation describing a neuronal action potential

Construct cardiac blood-flow model

Differential equations to describe prey/predator dynamics

Page 12: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Some challenges in bioinformatics

How to fully decipher the digital content of the genome

How to analyze expression data

How to extract regulatory networks from the above

How to integrate multiple high-throughput data types

How to visualize and explore large scale multi-dimensional data

How to predict protein structure and function ab initio

How to identify signatures for cellular states (healthy vs diseased)

How to build hierachical models across multiple scales of time and space

How to reduce complex multi-dimensional models to underlying principles

Inspired byLeroy Hood

Page 13: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Example

In which you will learn a bit about:

accessing and searching for information in bio-databases

what microRNAs are

Prediction of RNA structure

Page 14: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Imagine

You are studying the oncogene c-Myc (a transcription factor)You have isolated a complex containing the mRNA for c-Myc

In this complex you find a small RNAYou get excited!You manage to clone and sequence it

• caaagugcuuacagugcagguagu

Now what?

Page 15: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Finding it in the genome

Is this a known molecule?Since the human genome has been fully sequenced:

• We must be able to find out where it is encoded• Go to a genome browser

Wow! It is a microRNASidestep: What are microRNAs?

Page 16: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

What are miRNAs?

Page 17: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

The RNA revolution

Biology’s Big Bang

• 10 years ago: RNA was considered uninteresting messengers for the proteins

• The non-coding part of the genome (98%) was considered junk

The Economist, June 2007

Page 18: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Beware of the RNA!

• It is your RNA that separates you from a worm – not your proteins!

• It is the RNA that regulates your genes – as much as proteins!

• New types of RNA are discovered every month• Most of a genome is transcribed• 98% of the genome is probably important (my

guess)

Page 19: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

The RNA operating system

Genome

Transcriptome

Proteome

Regulation by proteins

siRNA/miRNA

Massive regulation by RNA

Imprinting – methylationSplicing

Ribozymes

Page 20: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

MicroRNA

• Small (20-22nt) RNAs• Pre-miRNA forms hairpin structure• Involved in post-transcriptional regulation and

gene silencing (methylation)• Important in development, brain, cancer, etc. • Evolutionarily conserved (?)

Page 21: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

miRNA logic

ATCTGCCACCCTACAGAGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGTATGCAAGATmiRNA gene

Pri-miRNA

A microRNA

Inhibit mRNA translation

Drosha

Dicer

Export

Pre-miRNA

Page 22: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Animal & Plant miRNA

Page 23: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Some miRNAs occur in clusters

Page 24: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

miRNA targets

• Very few experimentally validated targetsmostly in fly and worm

• We have to rely on bioinformatic target predictions. Probably very noisy.– 10% of all genes regulated by miRNAs ( Enright et

al, 2003)– 30% of all genes regulated by miRNAs (Lewis et al.

2004)– ~all genes regulated to some degree (others)

Page 25: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Three main types of target sites(a) Canonical sites: good

or perfect complementarity Characteristic bulge in the middle.

(b) Dominant seed sites: perfect seed (bases 2-8) match, but poor 3’ end complementarity.

(c) Compensatory sites: mismatch or wobble in seed region. Compensate at the 3’ end.

From Mazière P,. and Enright, A , Drug discovery today, 2007

Page 26: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Principal criteria to predict miRNA targets

• Seed complementary: seed regions (bases 2-8) of miRNA sequences are complementary to the 3’ UTR.

• Target sites are conserved in other genome. (May miss targets of recently evolved miRNAs )

• Target multiplicity: multiple binding sites for miRNA• Thermodynamics of RNA-RNA duplex• Target structure: lack of strong secondary structure

at miRNA-target binding site may be an important feature

Page 27: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Overlap between methods

Hammell et al., Nature Methods 5:813-819

Page 28: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Finding it in the genome

Is this a known molecule?Since the human genome has been fully sequenced:

• We must be able to find out where it is encoded• Go to a genome browser

Wow! It is a microRNASidestep: What are microRNAs?

Let’s assume this was NOT known already

Page 29: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

RNA folding

Can it fold as a hairpin?• Get the sequence with flanks (from genome

browser)• Fold it at Vienna RNA

RNA

?

More details: RNA-lecture

Page 30: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Summary so far

ATCTGCCACCCTACAGAGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGTATGCAAGATmiRNA gene

RNA

proteinA microRNA

Prediction of precursorstructure

Identify transcript

Page 31: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

What controls the controller?

Find the transcription start site• Use and integrate existing data

• Genome browser: Known transcripts, genome annotation (from cDNA data)

• Auxilary information (not yet in genome browser)• Known 5’ ends (from RIKEN CAGE-tags)• Known RNA polymerase II binding sites (from ChIP)

• Use or construct predictive models• Machine learning / Inference (HMM, Neural Nets, SVM,

GLM)• You need a bioinformatician for this!!

ATCTGCCACCCTACAGAGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGTATGCAAGAT?-- miR-17 --?

?

Page 32: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Summary so far

ATCTGCCACCCTACAGAGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGTATGCAAGATmiRNA gene

RNA

proteinA microRNA

Prediction of binding sites

Prediction of precursor structure

Identify transcript

Page 33: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Prediction of transcription factor binding sites

CCCTACAGAGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGTATGCAAGAT

Does certain combinations of TFs occur together?

In certain groups of genes?Is this significant?

What biological meaning does it make?

In another lecture: Motif Search

UCSC transcription factor track

Page 34: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Prediction of microRNA targets

ATCTGCCACCCTACAGAGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGTATGCAAGATTranscriptional unit

RNA

?

Page 35: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Prediction of microRNA targets ?

RNAs interact by forming base pairs (A-U C-G G-U)Align microRNA and target (more details in Alignment-lecture)Build in biology:

• Some part of the miRNA is more important than others• Binding sites conserved in evolution tend to be more functional

MiRanda predictions

Page 36: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Regulatory systems

A microRNA

Regulates other RNA( prevents them from being translated to proteins )

ATCTGCCACCCTACAGAGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGTATGCAAGATTranscriptional unit

RNAMaybe feedback regulation?

Page 37: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

A feedback loop? miR-155 and Bach

ATCTGCCACCCTACAGAGTTTGACTTTTACCTCTGTAGTCATGCTGGTATTCAGGGCACTTCTCGACCTGCTCATTACCACGTTCTTTGGGATGAGAACAACTTTACTGCAGATGGACTTCAATCTCTGACCAATAACTTATGTTACACGTATGCAAGATBIC - mir-155

Bach2-binding sites(repressor)

Bach2-proteins miR-155

“Lack of BIC and microRNA miR-155 expression in primary cases of Burkitt lymphoma.” Genes Chromosomes Cancer. 2006 Feb;45(2):147-53

“These results indicate that BACH2 plays important roles in regulation of B cell development.” Oncogene. 2000 Aug 3;19(33):3739-49

Page 38: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Bioinformatics is like LEGO®

Build using different bricks to get Biological knowledge• Databases of experimental data ( sequence, genome

annotation, molecule interactions etc, etc)• Scan for transcription factor binding sites• RNA folding and classification• miRNA target prediction

Or design your own LEGO bricks!• Enter the master’s program

Page 39: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Masters of Bioinformatics

Page 40: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

What you have seen

DatabaseUCSC human genome browserUsing known information to find likely transcription start siteThe horror of ids/names

Alignment and sequence searchSequence search with BLAT against human genomemiRanda - Alignment to find miRNA targets

RNARNA folding of miRNA-precursor

Promoter analysisPredicted transcription factor binding sites

Page 41: Plan for day 1 1.The course 1.Registration 2.Layout 3.Expectations 4.Evaluation and exam 2.What is bioinformatics? 3.Setup and connect computers LUNCH

Plan for day 1

1. The course1. Registration2. Layout3. Expectations4. Evaluation and exam

2. What is bioinformatics?3. Setup and connect computers

LUNCH

1. 13:00. Setup and connect computers2. Software overview3. CLC Combined Workbench (presentation, installation, demo)4. Install and play with general computer tools