next generation sequencing & transcriptome analysis
DESCRIPTION
How to use next generation sequencing in transcriptomics and how to analyse those data.TRANSCRIPT
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
AND HOW TO USE THE DATA GENERATED
FOR TRANSCRIPTOMICS
METHODS
METHODS
454 SEQUENCING
SOLEXA / ILLUMINA
SOLID
454 SEQUENCING
SEQUENCING BY SYNTHESIS
PYROSEQUENCING
> 400 BASEPAIRS IN A SINGLE READ
454 SEQUENCING
454 SEQUENCING
454 SEQUENCING
454 SEQUENCING
REPEATS OF SINGLE NUCLEOTIDES ARE DETECTED BY SIGNAL STRENGTH
WORKS FOR UP TO 8 CONSECUTIVE BASES
SOLEXA / ILLUMINA
AGAIN: SEQUENCING BY SYNTHESIS
ANOTHER DETECTION-APPROACH
UP TO 100 BASEPAIRS IN A SINGLE READ
SOLEXA / ILLUMINA
T
A
C
C
G
G
...
...
SOLEXA / ILLUMINA
TG
C
AT
A
C
C
G
G
...
...
SOLEXA / ILLUMINA
TG
C
AT
A
C
C
G
G
...
...
SOLEXA / ILLUMINA
TG
C
AT
A
C
C
G
G
...
...
ADVANTAGES OF NGS
CAN RUN IN PARALLEL
PREPERATION CAN BE AUTOMATED
MUCH CHEAPER WHEN COMPARED TO TRADITIONAL SEQUENCING
TRANSCRIPTOME ANALYSIS
ALLOWS FOR EXPRESSION CHANGES IN:
DIFFERENT CELL TYPES
DIFFERENT CONDITIONS OF THE ENVIRONMENT
DISEASES
DIFFERENT DEVELOPMENTAL STAGES
TRANSCRIPTOME ANALYSIS
CAN BE USED TO IDENTIFY NEW GENES
CAN BE APPLIED TO NON-MODEL ORGANISMS
HOW TO ANALYSE TRANSCRIPTOMES
TRADITIONALLY: EXPRESSED SEQUENCE TAGS (ESTS)
USING NGS: RNA-SEQ
FIRST STEP: GET THE DATA
ESTS
DONE USING SHOTGUN-SEQUENCING
TAKES CLONES OF EXPRESSED MRNA
CHEAP TO PRODUCE
RNA-SEQ
SAME PRINCIPLE:
GET AVAILABLE MRNA
THEN SEQUENCING IN PARALLEL VIA NGS
RNA-SEQ
SAME PRINCIPLE:
GET AVAILABLE MRNA
THEN SEQUENCING IN PARALLEL VIA NGS
RNA-SEQ == EST + NGS
HOW TO ANALYSE TRANSCRIPTOMES
ASSEMBLY OF READS
DETECTION OF SNPS
GENE ANNOTATION
DETECTION OF OPEN READING FRAMES
DETECTION OF HOMOLOGOUS GENES
ASSEMBLY
CAP3
MIRA
...
AVAILABLE TOOLS:
CAP3
SMITH-WATERMAN TO CLIP BAD ENDINGS
GLOBAL ALIGNMENT TO FIND FALSE OVERLAPS
MIRA
COMBINES ASSEMBLY & SNP-DETECTION
USES:
TRACE FILES
TEMPLATE INSERT INFORMATION
REDUNDANCY
MIRA
FAST READ COMPARISON TO DETECT POTENTIAL OVERLAPS
CONFIRMS OVERLAPS USING SMITH-WATERMAN AND CREATES ALIGNMENTS
ASSEMBLES READ-PAIRS BY FINDING BEST PATH
CHECKS ASSEMBLIES FOR ERRORS AND BEGINS AGAIN
MIRATHE WORKFLOW
MIRA
RESULTS:
CONSENSUS CONTIGS MADE OF READS THAT OVERLAP
SNPS THAT ARE CALLED DURING ASSEMBLY PROCESS
SNP DETECTION
TOOLS:
MIRA
QUALITYSNP
AND SOME MORE
QUALITYSNP
USES CAP3-FILES
INPUT: CLUSTERS OF POTENTIAL HAPLOTYPES
CALCULATES SIMILARITY BETWEEN SEQUENCES TO CONSTRUCT HAPLOTYPES AND REMOVES PARALOGS
QUALITYSNP
REMOVES HAPLOTYPES THAT CONSIST OF ONLY ONE SEQUENCE
DETECTS SYNONYMOUS AND NON-SYNONYMOUS SNPS
PROVIDES A WEB-FRONTEND CALLED HAPLOSNPER
HOMOLOGY DETECTION
ALLOWS TO FIND GENES THAT SHARE AN ANCESTOR
USUALLY ONE SEARCHES AGAINST A DATABASE
HOMOLOGY DETECTION
DIFFERENT KIND OF SEARCHES:
PROTEIN AGAINST PROTEIN
NUCLEOTIDE AGAINST NUCLEOTIDE
PROTEIN AGAINST NUCLEOTIDE
NUCLEOTIDE AGAINST PROTEIN
HOMOLOGY DETECTION
TOOLS:
BLAST
FASTX / FASTY
HMMER
PATTERNHUNTER
BLAST
AVAILABLE FOR ALL TYPES OF COMPARISONS
ONE OF THE OLDEST ALGORITHMS
WIDELY USED
SPEED OVER SENSITIVITY
FASTX / FASTY
PARTS OF FASTA
COMPARE NUCLEOTIDES AGAINST PROTEINS
DETERMINES A HYPOTHESIZED CODING REGION (HCR)
FASTX IS FASTER, FASTY IS MORE ACCURATE
HMMER
PROTEIN-QUERIES AGAINST PROTEIN-DATABASE
USES HIDDEN MARKOV MODELS
MAPS SMITH-WATERMAN PARAMETERS ONTO A PROBABILISTIC MODEL
IMPROVES ACCURACY
PATTERNHUNTER
NUCLEOTIDE-QUERIES AGAINST OTHER NUCLEOTIDE-SEQUENCES
USES NON-CONSECUTIVE SEEDS FOR INCREASED SENSITIVITY
COMPARES HUMAN GENOME TO MOUSE GENOME IN 20 CPU-DAYS
ORF DETECTION
READING FRAMES CAN BE DETECTED IN EST-DATA
ALLOWS TO SCREEN FOR PREVIOUSLY UNKNOWN GENES
ALLOWS TO GIVE A POTENTIAL PROTEIN SEQUENCE
ORF DETECTION
TOOLS:
ESTSCAN
ORFPREDICTOR
...
ESTSCAN
USES HIDDEN MARKOV MODELS
ROBUST FOR FRAMESHIFT ERRORS
SENSITIVE ( 5 % FN, 18 % FP)
ORFPREDICTOR
WEB-BASED
USES BLASTX AS GUIDELINE IF POSSIBLE
USES A DEFINED RULESET FOR DEFINING ORFS
ORFPREDICTOR
GENE ANNOTATION
BLAST2GO VIA GENE ONTOLOGY
FINDS HOMOLOG GENES TO ANNOTATE FUNCTIONS OF GENE OF INTEREST
GENE ONTOLOGY
3 ONTOLOGIES:
MOLECULAR FUNCTION
CELLULAR COMPONENTS
BIOLOGICAL PROCESS
CONCLUSIONS
NGS PROVIDES A FAST AND CHEAP WAY TO GENERATE DATA
TONS OF TOOLS EXIST TO ANALYSE TRANSCRIPTOME DATA
ALL TOOLS HAVE THEIR OWN PROS & CONTRAS
MOST OF THOSE TOOLS ARE UNSUITABLE FOR A „NORMAL USER“