hands on exercise gbs workshop - … › 2013 › 12 › hands_on...2014/02/19  · tassel pipeline...

18
HANDS ON EXERCISE – GBS WORKSHOP ROB ELSHIRE, JEFF GLAUBITZ AND KATIE HYMA

Upload: others

Post on 06-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

HANDS ON EXERCISE – GBS WORKSHOP

ROB ELSHIRE, JEFF GLAUBITZ AND KATIE HYMA

Page 2: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

WARNING: SMART CHOICES V. DEFAULTS V. TUTORIAL SETTINGS

The default settings for the steps in the GBS pipeline as well as those in this tutorial are not appropriate for your real life data. This tutorial is intended to get you familiar with the mechanics of running the pipeline. It is not meant as a guide to running the pipeline on your unique set of data. To analyze your data, you will need to be familiar with the genetics of your system and samples as well as the settings for each of the steps. Tuning the pipeline is an iterative process and best done with a data set for which you can have testable expectations of the outcome (e.g. a mapping population). The pipeline will give you SNP markers at the end, but it is up to you to make sure that they are good markers.

Page 3: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

UNIX SHELL COMMANDS

ls – list directory contentscd – change dirctoryless (filename) -- display contents of a text file

Page 4: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

TASSEL PIPELINE AND XML FILES

XML files are structured text files. In this case they are used to store configuration information for running TASSEL plugins.

You can open up the xml file with less or a text editor.

You can regenerate the command line like this:

./run_pipeline.pl -translateXML config.xml

Page 5: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

EY Discovery

Tag Counts

SNP Caller

Genotypes

Tags by Taxa

Sequence

TOPM

Page 6: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

Discovery

Tag Counts

SNP Caller

Genotypes

Tags by Taxa

Sequence

TOPM

Page 7: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

COUNT GBS TAGScd /mnt/workshop/data/02_TagCounts/01_IndividualTagCounts/

run_pipeline.pl -Xms512m -Xmx1g -configFile ./FastqToTagCounts.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?><TasselPipeline>

<fork1><FastqToTagCountPlugin><i>../../01_RawSequence</i><o>./</o><k>../../50_KeyFiles/Pipeline_Testing_key.txt</k><e>ApeKI</e><s>3000000</s><c>1</c></FastqToTagCountPlugin></fork1><runfork1/>

</TasselPipeline>

Page 8: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

MERGE TAG COUNT FILES

cd /mnt/workshop/data/02_TagCounts/02_MergedTagCounts

run_pipeline.pl -Xms512m -Xmx1g -configFile ./MergeTagCounts.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?><TasselPipeline>

<fork1><MergeMultipleTagCountPlugin><i>../01_IndividualTagCounts</i><o>./GBS_Workshop_Maize.cnt</o><c>5</c></MergeMultipleTagCountPlugin></fork1><runfork1/>

</TasselPipeline>

Page 9: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

Discovery

Tag Counts

SNP Caller

Genotypes

Tags by Taxa

Sequence

TOPM

Page 10: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

CONVERT TO FASTQ

cd /mnt/workshop/data/02_TagCounts/03_TagCountToFastq

run_pipeline.pl -Xms512m -Xmx1g -configFile TagCountToFastq.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?><TasselPipeline>

<fork1><TagCountToFastqPlugin><i>../02_MergedTagCounts/GBS_Workshop_Maize.cnt</i><o>./GBS_Workshop_Maize.fq.gz</o><c>5</c></TagCountToFastqPlugin></fork1><runfork1/>

</TasselPipeline>

Page 11: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

ALIGN GBS TAGS TO REFERENCE GENOME

cd /mnt/workshop/data/03_SAM/

bowtie2 -M 4 -p 15 --very-sensitive-local -x ../53_AlignerIdices/GBS_Workshop_Maize -U ../02_TagCounts/03_TagCountToFastq/GBS_Workshop_Maize.fq-S GBS_Workshop_Maize.sam

Page 12: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

CONVERT SAM TO TAGS ON PHYSICAL MAP (TOPM)

cd /mnt/workshop/data/04_TOPM/

run_pipeline.pl -Xms512m -Xmx1g -configFile SAMConverter.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?><TasselPipeline>

<fork1><SAMConverterPlugin><i>../03_SAM/GBS_Workshop_Maize.sam</i><o>./GBS_Workshop_Maize.topm</o></SAMConverterPlugin></fork1><runfork1/>

</TasselPipeline>

Page 13: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

Discovery

Tag Counts

SNP Caller

Genotypes

Tags by Taxa

Sequence

TOPM

Page 14: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

MATCH TAGS TO SAMPLES (TAXA)cd /mnt/workshop/data/05_TBT/01_IndividualTBT

run_pipeline.pl -Xms512m -Xmx1g -configFile SeqToTBTHDF5Plugin.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?><TasselPipeline>

<fork1><SeqToTBTHDF5Plugin><i>../../01_RawSequence</i><k>../../50_KeyFiles/Pipeline_Testing_key.txt</k><e>ApeKI</e><o>./GBS_Workshop_Maize.h5</o><s>100000000</s><L>GBS_Workshop_MaizeTBTHDF5_Log.txt</L><t>../../02_TagCounts/02_MergedTagCounts/GBS_Workshop_Maize.cnt</t></SeqToTBTHDF5Plugin></fork1><runfork1/>

</TasselPipeline>

Page 15: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

PIVOT THE TBT TO PREPARE FOR SNP CALLING

cd /mnt/workshop/data/05_TBT/04_PivotMergedTaxaTBT

run_pipeline.pl -Xms512m -Xmx1g -configFile ./PivotTaxaTBTHDF5.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?><TasselPipeline>

<fork1><ModifyTBTHDF5Plugin><o>../01_IndividualTBT/GBS_Workshop_Maize.h5</o><p>./PivotTaxaTBTHDF5.h5</p><c></c></ModifyTBTHDF5Plugin></fork1><runfork1/>

</TasselPipeline>

Page 16: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

Discovery

Tag Counts

SNP Caller

Genotypes

Tags by Taxa

Sequence

TOPM

Page 17: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

CALL SNPS cd /mnt/workshop/data/06_HapMap/

run_pipeline.pl -Xms512m -Xmx1g -configFile ./SNP_Caller.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?><TasselPipeline>

<fork1><TagsToSNPByAlignmentPlugin><i>../05_TBT/04_PivotMergedTaxaTBT/PivotTaxaTBTHDF5.h5</i><o>./GBS_Workshop_Maize_chr+.hmp.txt</o><m>../04_TOPM/GBS_Workshop_Maize.topm</m><mnF>0.8</mnF><mnMAF>0.02</mnMAF><mnMAC>100000</mnMAC><sC>9</sC><eC>10</eC></TagsToSNPByAlignmentPlugin></fork1><runfork1/>

</TasselPipeline>

Page 18: HANDS ON EXERCISE GBS WORKSHOP - … › 2013 › 12 › hands_on...2014/02/19  · TASSEL PIPELINE AND XML FILES XML files are structured text files. In this case they are used to

Discovery

Tag Counts

SNP Caller

Genotypes

Tags by Taxa

Sequence

TOPM