tips for effective use of blast and other ncbi tools

Post on 17-Feb-2017

1.281 Views

Category:

Science

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Matthew McNeill, PhD

Tips for effective use of BLAST and other NCBI tools

8/23/20161

Introduction: What is NCBI?National Center for Biotechnology Information

2

http://www.ncbi.nlm.nih.gov/

Introduction: NCBI’s available tools

3

http://www.ncbi.nlm.nih.gov/home/analyze.shtml

Introduction: NCBI’s available tools

4

http://www.ncbi.nlm.nih.gov/home/analyze.shtml

User story: Previously published paper

• A lncRNA regulates a network of genes involved in cancer processes

5

User story: Previously published paper

6Sanchez  et  al.,  Nature  Communications  5,Article  number:5812

User story: Previously published paper

7Sanchez  et  al.,  Nature  Communications  5,Article  number:5812

User story: We want to follow up on this work

Question: You have a collection of cancer cell lines. Does this lncRNA regulate the same network?

Selected tools:CRISPR – knockout lncRNA

qPCR – Analyze RNA expression of network

8

User story: We want to follow up on this work

Question: You have a collection of cancer cell lines. Does this lncRNA regulate the same network?

Selected tools:CRISPR – knockout lncRNA

qPCR – Analyze RNA expression of network

Common theme when using genetic/ genomic tools: Was my assay specific?

9

User story: Getting your gene sequences

• Identify your genes

• Downloading sequences

10

User story: Getting your gene sequences

• Identify your genes

• Downloading sequences

11

User story: Gene list

12Sanchez  et  al.,  Nature  Communications  5,Article  number:5812

lncRNA:  

PR-­‐lncRNA-­‐1

Downstream  genes:

TP53I3TGFB2SERPINB6POLA1PDK1LPPDPP4TNFRSF10DNCAPD3BCKDHBTRIO

User story: Identify your gene listTranslating IDs• Many options to consider

– Genome build– Gene Symbol/ Gene name– RefSeq Accession number.version

13

User story: Identify your gene listTranslating IDs• Many options to consider

– Genome build– Gene Symbol/ Gene name– RefSeq Accession number.version

• Note:– NCBI is phasing out GI numbers– Read more here: https://www.ncbi.nlm.nih.gov/news/03-02-2016-phase-

out-of-GI-numbers/

14

User story: Identify your gene listTranslating IDs—Genome build• Many options to consider

– Genome build• GRCh37/ hg19• GRCh38• GRCh38.p2

15

User story: Identify your gene listTranslating IDs—annotations

16

http://www.ncbi.nlm.nih.gov/

User story: Identify your gene listTranslating IDs—annotations—gene symbol

17

http://www.ncbi.nlm.nih.gov/gene/?term=TP53I3

User story: Identify your gene listTranslating IDs—annotations—gene name

18

http://www.ncbi.nlm.nih.gov/gene/?term=TP53I3

User story: Identify your gene listTranslating IDs—annotations—gene alias

19

http://www.ncbi.nlm.nih.gov/gene/?term=TP53I3

User story: Identify your gene listTranslating IDs—annotations—RefSeq mRNA accession

20http://www.ncbi.nlm.nih.gov/gene/9540

User story: Identify your gene listTranslating IDs—annotations—RefSeq mRNA accession

21http://www.ncbi.nlm.nih.gov/gene/9540

NM_001206802

User story: Identify your gene listTranslating IDs—annotations—RefSeq mRNA accession.version

22http://www.ncbi.nlm.nih.gov/gene/9540

NM_001206802.2

User story: Identify your gene listTranslating IDs—annotations

23

TP53I3TGFB2SERPINB6POLA1PDK1LPPDPP4TNFRSF10DNCAPD3BCKDHBTRIO

Gene  symbol RefSeq mRNA  accession

User story: Identify your gene listTranslating IDs—Annotations

24

TP53I3TGFB2SERPINB6POLA1PDK1LPPDPP4TNFRSF10DNCAPD3BCKDHBTRIO

https://biodbnet-­‐abcc.ncifcrf.gov/db/db2db.php

Gene  symbol RefSeq mRNA  accession

User story: Identify your gene listTranslating IDs—Annotations

25

TP53I3TGFB2SERPINB6POLA1PDK1LPPDPP4TNFRSF10DNCAPD3BCKDHBTRIO

https://biodbnet-­‐abcc.ncifcrf.gov/db/db2db.php

Gene  symbol RefSeq mRNA  accession

User story: Identify your gene listTranslating IDs—Annotations

26

TP53I3TGFB2SERPINB6POLA1PDK1LPPDPP4TNFRSF10DNCAPD3BCKDHBTRIO

https://biodbnet-­‐abcc.ncifcrf.gov/db/db2db.php

Gene  symbol RefSeq mRNA  accessionNMXM

NR

User story: Getting your gene sequencesImportant background

• Identify your genes

• Downloading sequences

27

User story: Identify your gene listDownloading FASTA sequences

28http://www.ncbi.nlm.nih.gov/gene/9540

User story: Identify your gene listBatch Entrez

29

http://www.ncbi.nlm.nih.gov/sites/batchentrez

User story: Identify your gene listBatch Entrez

30

http://www.ncbi.nlm.nih.gov/sites/batchentrez

User story: Identify your gene listLog file page

31

User story: Identify your gene listDownloading output

32

User story: Identify your gene list FASTA file format>gi|332205880|ref|NM_001206802.2| Homo sapiens tumor protein p53 inducible protein 3 (TP53I3), transcript variant 3, mRNA ACAATATGTTAGCCGTGCACTTTGACAAGCCGGGAGGACCGGAAAACCTCTACGTGAAGGAGGTGGCCAA GCCGAGCCCGGGGGAGGGTGAAGTCCTCCTGAAGGTGGCGGCCAGCGCCCTGAACCGGGCGGACTTAATG CAGAGACAAGGCCAGTATGACCCACCTCCAGGAGCCAGCAACATTTTGGGACTTGAGGCATCTGGACATG TGGCAGAGCTGGGGCCTGGCTGCCAGGGACACTGGAAGATCGGGGACACAGCCATGGCTCTGCTCCCCGG TGGGGGCCAGGCTCAGTACGTCACTGTCCCCGAAGGGCTCCTCATGCCTATCCCAGAGGGATTGACCCTG ACCCAGGCTGCAGCCATCCCAGAGGCCTGGCTCACCGCCTTCCAGCTGTTACATCTTGTGGGAAATGTTC AGGCTGGAGACTATGTGCTAATCCATGCAGGACTGAGTGGTGTGGGCACAGCTGCTATCCAACTCACCCG GATGGCTGGAGCTATTCCTCTGGTCACAGCTGGCTCCCAGAAGAAGCTTCAAATGGCAGAAAAGCTTGGA GCAGCTGCTGGATTCAATTACAAAAAAGAGGATTTCTCTGAAGCAACGCTGAAATTCACCAAAGTACAAG CAAATGCTGGTGAATGCTTTCACGGAGCAAATTCTGCCTCACTTCTCCACGGAGGGCCCCCAACGTCTGC TGCCGGTTCTGGACAGAATCTACCCAGTGACCGAAATCCAGGAGGCCCATAAGTACATGGAGGCCAACAA GAACATAGGCAAGATCGTCCTGGAACTGCCCCAGTGAAGGAGGATGGGGCAGGACAGGACGCGGCCACCC CAGGCCTTTCCAGAGCAAACCTGGAGAAGATTCACAATAGACAGGCCAAGAAACCCGGTGCTTCCTCCAG AGCCGTTTAAAGCTGATATGAGGAAATAAAGAGTGAACTGGAAAAAAAAAA

33

http://www.ncbi.nlm.nih.gov/nuccore/332205880?report=fasta

User story: Identify your gene list FASTA file format>gi|332205880|ref|NM_001206802.2| Homo sapiens tumor protein p53 inducible protein 3 (TP53I3), transcript variant 3, mRNA ACAATATGTTAGCCGTGCACTTTGACAAGCCGGGAGGACCGGAAAACCTCTACGTGAAGGAGGTGGCCAA GCCGAGCCCGGGGGAGGGTGAAGTCCTCCTGAAGGTGGCGGCCAGCGCCCTGAACCGGGCGGACTTAATG CAGAGACAAGGCCAGTATGACCCACCTCCAGGAGCCAGCAACATTTTGGGACTTGAGGCATCTGGACATG TGGCAGAGCTGGGGCCTGGCTGCCAGGGACACTGGAAGATCGGGGACACAGCCATGGCTCTGCTCCCCGG TGGGGGCCAGGCTCAGTACGTCACTGTCCCCGAAGGGCTCCTCATGCCTATCCCAGAGGGATTGACCCTG ACCCAGGCTGCAGCCATCCCAGAGGCCTGGCTCACCGCCTTCCAGCTGTTACATCTTGTGGGAAATGTTC AGGCTGGAGACTATGTGCTAATCCATGCAGGACTGAGTGGTGTGGGCACAGCTGCTATCCAACTCACCCG GATGGCTGGAGCTATTCCTCTGGTCACAGCTGGCTCCCAGAAGAAGCTTCAAATGGCAGAAAAGCTTGGA GCAGCTGCTGGATTCAATTACAAAAAAGAGGATTTCTCTGAAGCAACGCTGAAATTCACCAAAGTACAAG CAAATGCTGGTGAATGCTTTCACGGAGCAAATTCTGCCTCACTTCTCCACGGAGGGCCCCCAACGTCTGC TGCCGGTTCTGGACAGAATCTACCCAGTGACCGAAATCCAGGAGGCCCATAAGTACATGGAGGCCAACAA GAACATAGGCAAGATCGTCCTGGAACTGCCCCAGTGAAGGAGGATGGGGCAGGACAGGACGCGGCCACCC CAGGCCTTTCCAGAGCAAACCTGGAGAAGATTCACAATAGACAGGCCAAGAAACCCGGTGCTTCCTCCAG AGCCGTTTAAAGCTGATATGAGGAAATAAAGAGTGAACTGGAAAAAAAAAA

34

http://www.ncbi.nlm.nih.gov/nuccore/332205880?report=fasta

User story: Identify your gene list FASTA file format>gi|332205880|ref|NM_001206802.2| Homo sapiens tumor protein p53 inducible protein 3 (TP53I3), transcript variant 3, mRNA ACAATATGTTAGCCGTGCACTTTGACAAGCCGGGAGGACCGGAAAACCTCTACGTGAAGGAGGTGGCCAA GCCGAGCCCGGGGGAGGGTGAAGTCCTCCTGAAGGTGGCGGCCAGCGCCCTGAACCGGGCGGACTTAATG CAGAGACAAGGCCAGTATGACCCACCTCCAGGAGCCAGCAACATTTTGGGACTTGAGGCATCTGGACATG TGGCAGAGCTGGGGCCTGGCTGCCAGGGACACTGGAAGATCGGGGACACAGCCATGGCTCTGCTCCCCGG TGGGGGCCAGGCTCAGTACGTCACTGTCCCCGAAGGGCTCCTCATGCCTATCCCAGAGGGATTGACCCTG ACCCAGGCTGCAGCCATCCCAGAGGCCTGGCTCACCGCCTTCCAGCTGTTACATCTTGTGGGAAATGTTC AGGCTGGAGACTATGTGCTAATCCATGCAGGACTGAGTGGTGTGGGCACAGCTGCTATCCAACTCACCCG GATGGCTGGAGCTATTCCTCTGGTCACAGCTGGCTCCCAGAAGAAGCTTCAAATGGCAGAAAAGCTTGGA GCAGCTGCTGGATTCAATTACAAAAAAGAGGATTTCTCTGAAGCAACGCTGAAATTCACCAAAGTACAAG CAAATGCTGGTGAATGCTTTCACGGAGCAAATTCTGCCTCACTTCTCCACGGAGGGCCCCCAACGTCTGC TGCCGGTTCTGGACAGAATCTACCCAGTGACCGAAATCCAGGAGGCCCATAAGTACATGGAGGCCAACAA GAACATAGGCAAGATCGTCCTGGAACTGCCCCAGTGAAGGAGGATGGGGCAGGACAGGACGCGGCCACCC CAGGCCTTTCCAGAGCAAACCTGGAGAAGATTCACAATAGACAGGCCAAGAAACCCGGTGCTTCCTCCAG AGCCGTTTAAAGCTGATATGAGGAAATAAAGAGTGAACTGGAAAAAAAAAA

35

http://www.ncbi.nlm.nih.gov/nuccore/332205880?report=fasta

User story: Identify your gene list FASTA file format>gi|332205880|ref|NM_001206802.2| Homo sapiens tumor protein p53 inducible protein 3 (TP53I3), transcript variant 3, mRNA ACAATATGTTAGCCGTGCACTTTGACAAGCCGGGAGGACCGGAAAACCTCTACGTGAAGGAGGTGGCCAA GCCGAGCCCGGGGGAGGGTGAAGTCCTCCTGAAGGTGGCGGCCAGCGCCCTGAACCGGGCGGACTTAATG CAGAGACAAGGCCAGTATGACCCACCTCCAGGAGCCAGCAACATTTTGGGACTTGAGGCATCTGGACATG TGGCAGAGCTGGGGCCTGGCTGCCAGGGACACTGGAAGATCGGGGACACAGCCATGGCTCTGCTCCCCGG TGGGGGCCAGGCTCAGTACGTCACTGTCCCCGAAGGGCTCCTCATGCCTATCCCAGAGGGATTGACCCTG ACCCAGGCTGCAGCCATCCCAGAGGCCTGGCTCACCGCCTTCCAGCTGTTACATCTTGTGGGAAATGTTC AGGCTGGAGACTATGTGCTAATCCATGCAGGACTGAGTGGTGTGGGCACAGCTGCTATCCAACTCACCCG GATGGCTGGAGCTATTCCTCTGGTCACAGCTGGCTCCCAGAAGAAGCTTCAAATGGCAGAAAAGCTTGGA GCAGCTGCTGGATTCAATTACAAAAAAGAGGATTTCTCTGAAGCAACGCTGAAATTCACCAAAGTACAAG CAAATGCTGGTGAATGCTTTCACGGAGCAAATTCTGCCTCACTTCTCCACGGAGGGCCCCCAACGTCTGC TGCCGGTTCTGGACAGAATCTACCCAGTGACCGAAATCCAGGAGGCCCATAAGTACATGGAGGCCAACAA GAACATAGGCAAGATCGTCCTGGAACTGCCCCAGTGAAGGAGGATGGGGCAGGACAGGACGCGGCCACCC CAGGCCTTTCCAGAGCAAACCTGGAGAAGATTCACAATAGACAGGCCAAGAAACCCGGTGCTTCCTCCAG AGCCGTTTAAAGCTGATATGAGGAAATAAAGAGTGAACTGGAAAAAAAAAA

36

http://www.ncbi.nlm.nih.gov/nuccore/332205880?report=fasta

5ʹ′

3ʹ′

User story: Getting your sequencesLearned so far• There are many identifiers that can be used for a gene, and those

identifiers are often updated. NCBI tracks update information.

• NCBI provides the sequence of genetic/ genomic elements for easy download individually or as batches.

37

User story: Checking for off-target CRISPR eventsCRISPR—general overview

38

https://www.idtdna.com/pages/products/genome-­‐editing/crispr-­‐cas9

User story: Checking for off-target CRISPR eventsCRISPR—general overview

39

https://www.idtdna.com/pages/products/genome-­‐editing/crispr-­‐cas9

User story: Checking for off-target CRISPR eventsCRISPR—general overview

40

https://www.idtdna.com/pages/products/genome-­‐editing/crispr-­‐cas9

User story: Checking for off-target CRISPR eventsCRISPR—general overview

41

https://www.idtdna.com/pages/products/genome-­‐editing/crispr-­‐cas9

User story: Checking for off-target captureUsing BLAST• BLAST = Basic

Local Alignment Search Tool

42

https://BLAST.ncbi.nlm.nih.gov/Blast.cgi

User story: Checking for off-target captureUsing BLAST• BLAST = Basic

Local Alignment Search Tool

43

https://BLAST.ncbi.nlm.nih.gov/Blast.cgi

User story: Checking for off-target CRISPR eventsUsing BLASTN—optional parameters• Example guide RNA (crRNA) targeting PR-lncRNA-1: TTCCAAGTGGCTAAAACTAC(AGG)

44

User story: Checking for off-target CRISPR eventsUsing BLASTN—optional parameters

45

User story: Checking for off-target CRISPR eventsUsing BLASTN—optional parameters

46

User story: Checking for off-target CRISPR eventsUsing BLASTN—optional parameters

47

User story: Checking for off-target CRISPR eventsUsing BLASTN—optional parameters

48

User story: Checking for off-target CRISPR eventsUsing BLASTN—output

49

User story: Checking for off-target CRISPR eventsUsing BLASTN—output

50

User story: Checking for off-target CRISPR eventsUsing BLASTN—output

51

User story: Checking for off-target CRISPR eventsUsing BLASTN—output

52

User story: Checking for off-target CRISPR eventsUsing BLASTN—output

53

User story: Checking for off-target CRISPR eventsUsing BLASTN—output

54

User story: Checking for off-target CRISPR eventsUsing BLASTN—output

55

User story: Checking for off-target CRISPR eventsUsing BLASTN—output

56

Perfect  Match

User story: Checking for off-target CRISPR eventsUsing BLASTN—output

57

Off-­‐Target  Match

User story: Checking for off-target CRISPR eventsUsing BLASTN—output

58

Off-­‐Target  Match

User story: Checking for off-target CRISPR eventsLearned so far• Blast is a powerful tool to look for likely off-target CRISPR activity

• Correctly parsing your BLAST output improves off-target characterization

59

User story: Checking off-target qPCR primersPCR—general overview

60

Typicaldiagram

User story: Checking off-target qPCR primersPCR—general overview

61

Typicaldiagram

First  cycle

User story: Checking off-target qPCR primersPCR—general overview

62

Typicaldiagram

First  cycle

Second  cycle

User story: Checking for off-target qPCR primersPrimer BLAST—overview

63https://BLAST.ncbi.nlm.nih.gov/Blast.cgi

User story: Checking for off-target qPCR primersPrimer BLAST—overview

64https://BLAST.ncbi.nlm.nih.gov/Blast.cgi

User story: Checking for off-target qPCR primersPrimer BLAST—overview

65https://www.ncbi.nlm.nih.gov/tools/primer-­‐BLAST/index.cgi?LINK_LOC=BlastHome

User story: Checking for off-target qPCR primersPrimer BLAST—optional parameters

66

User story: Checking for off-target qPCR primersPrimer BLAST—output

67

User story: Analyze expression of your gene network

• Design qPCR primers

• Check primers for specificity, similar to lncRNA

• Order primers!

68

User story: Checking for off-target qPCR primersLearned so far• PCR primers are consumed when they amplify a target.

• Off-target amplification will decrease the efficiency of on-target characterization for both SYBR and probe-based assays.

• Primer BLAST is a powerful tool to identify off-target regions that may be amplified.

69

Summary: Covered tools

• Gene lookup—Gene database• Gene Symbol Translation—bioDB• Fasta Sequence Download—Gene database, Batch entrez• Single Sequence Uniqueness—BLASTN• Primer Uniqueness—Primer BLAST

70

Conclusions

• NCBI provides a powerful suite of tools

• Checking for off-target hybridization, annealing, and amplification is important for genetic and genomic studies

• Proper use of settings for each informatics tools improves results

• For questions about anything we discussed, email: custcare@idtdna.com

71

72

Todd AdamsonNicola Brookman-AmissahSean McCallHans PackerMaureen Young

Thanks

Nick DowneyElisabeth Wagner

Aurita Menezes

Yu Wang

Available products

73

Alt-R™ CRISPR-Cas9 System

• Cas9 protein, custom guide RNAs, and controls for genome editing• https://www.idtdna.com/pages/products/genome-editing/crispr-cas9

PrimeTime® qPCR Assays

• Predesigned primers, probes, multiple formats

• https://www.idtdna.com/pages/products/gene-expression/primetime-qpcr-assays-and-primers

top related