sequence matrix: gene concatenation made easy
DESCRIPTION
Creating large datasets by concatenating genes can be challenging. This tool hopes to make that process much, much easier. For more information, see http://code.google.com/p/sequencematrix/ or http://www3.interscience.wiley.com/journal/123577052/abstractTRANSCRIPT
![Page 1: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/1.jpg)
Sequence MatrixGaurav Vaidya1, David Lohman2, Rudolf Meier2
Gene concatenation made easy
1: NeatCo Asia, Singapore.2: Department of Biological Sciences, National University of Singapore, Singapore.
![Page 2: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/2.jpg)
Our goals
✤ Many powerful tools exist for concatenating sequences.
✤ Adding new sequences to an existing dataset is tedious and time consuming.
✤ Our initial goal: simple, user-friendly program for concatenating sequences.
✤ We also added a few tools to help you look for lab contamination in your dataset.
![Page 3: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/3.jpg)
Sequence Matrix
✤ Written in Java.
✤ Graphical user interface libraries.
✤ Works on different operating systems.
✤ Easy to install: download and run the batch file.
![Page 4: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/4.jpg)
Importing sequences
✤ You can use the sequence names as entered in the input file.
✤ Or you can ask Sequence Matrix to try to identify the species names.
![Page 5: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/5.jpg)
Importing sequences
✤ Sequences mode:
✤ gi|237510679|gb|AY556753.2|Daubentonia madagascariensis voucher WE94001 5.8S ribosomal RNA gene, partial sequence; internal transcribed spacer 2, complete sequence; and 28S ribosomal RNA gene, partial sequence
✤ gi|237510678|gb|AY556735.2|Macaca sylvanus voucher OK96022 5.8S ribosomal RNA gene, partial sequence; internal transcribed spacer 2, complete sequence; and 28S ribosomal RNA gene, partial sequence
✤ Species name
✤ Daubentonia madagascariensis
✤ Macaca sylvanus
![Page 6: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/6.jpg)
Importing sequences
✤ A common source of error is forgetting to recode leading and trailing gaps as missing information.
✤ Sequence Matrix can automatically replace such gaps with question marks.
![Page 7: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/7.jpg)
Importing sequences: Naming
✤ Sequences from one dataset are matched up to another dataset by sequence name.
✤ Errors in sequence naming need to be fixed.
✤ We recommend naming your files by gene name: ‘coi’, ‘cytb’, ‘28S’ and so on.
![Page 8: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/8.jpg)
Export: Taxonsets
✤ By default, we generate taxonsets on the basis of:
✤ Combined length.
✤ Number of character sets
✤ Information for a particular gene.
![Page 9: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/9.jpg)
Gene trees
✤ Two ways to do them:
✤ Use the taxonset of taxa having information for a particular gene to exclude other taxa.
✤ Export the entire dataset with one file per column.
![Page 10: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/10.jpg)
Export features
✤ You can also export the Sequence Matrix table as an Excel-readable text file.
✤ Supervisory mode.
✤ Keep track of a project as it grows.
![Page 11: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/11.jpg)
Character sets
✤ We can read character sets defined in Nexus CHARSET and TNT xgroup commands.
✤ These can be “split” into individual columns, or imported as a single column representing the entire file.
![Page 12: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/12.jpg)
Excision
✤ Individual sequences can be excised from the dataset.
✤ Excised sequences will not be exported.
✤ Sequence Matrix will warn you about that.
![Page 13: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/13.jpg)
Contamination
✤ You thought you were sequencing Gorilla gorilla
✤ but you were really sequencing Homo sapiens.
✤ We have two tools you can use:
✤ If Homo sapiens is in your dataset.
✤ If Homo sapiens is not in your dataset (experimental!).
![Page 14: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/14.jpg)
H. sapiens in dataset
✤ Looks for pairs of sequences whose pairwise distance is very low.
✤ Expected difference depends on gene:
✤ 28S doesn’t change very much, but
✤ COI changes very quickly.
✤ Some interpretation is required.
![Page 15: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/15.jpg)
H. sapiens not present
✤ Use “Pairwise Distance Mode” to look for unusual pairwise distances.
✤ Ignore one charset, then sort taxa based on their pairwise distance to a “reference taxon”.
✤ Colour sequences by their individual pairwise distances to the reference taxon.
![Page 16: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/16.jpg)
H. sapiens not present
✤ Colour pairwise distances on the gene in question by their pairwise distance to the reference taxon.
✤ Look for colour variation which is unusual or out of place.
✤ We would expect sequences from different species to be correlated together.
![Page 17: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/17.jpg)
Pairwise distance mode
✤ You need to vary:
✤ The gene you are studying.
✤ The reference taxon being compared against.
✤ Possibly helpful as an alert mechanism.
![Page 18: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/18.jpg)
✤ Sequence Matrix allows you to assemble and examine multigene, multitaxon datasets.
✤ Taxonsets allow you to analyse subsets of your data in downstream programs.
✤ Excising sequences gives you greater control over which sequences to analyse.
✤ You can look for contamination in two ways:
✤ Looking for very low pairwise distances across your entire dataset.
✤ Looking for unusual pairwise distances in Pairwise Distance Mode.
Summary
![Page 19: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/19.jpg)
Acknowledgements
✤ Rudolf Meier
✤ Zhang Guanyang
✤ Farhan Ali
✤ David Lohman
✤ Everybody at the NUS DBS Evolutionary Biology lab.
![Page 20: Sequence Matrix: Gene concatenation made easy](https://reader033.vdocuments.us/reader033/viewer/2022060200/559822901a28ab372b8b4631/html5/thumbnails/20.jpg)
Question time!