geneious r7: a bioinformatics platform for biologistsgeneious r7 was created using the...

1
Geneious R7: A Bioinformatics Platform For Biologists Christian Olsen, Kashef Qaadri Biomatters, Inc. 60 Park Place, Suite 2100 Newark, NJ 07102 Contact: [email protected] One of the challenges with bioinformatics is the abundance of applications, websites, and algorithms available to the research community. This abundance of options tends to generate confusion for researchers as they seek to efficiently qualify best software tools for their needs. Bioinformatics software is often designed to solve a specific computational problem without much regard toward integrating with other software. Two different approaches have been used in the design of bioinformatics software. The first approach is known as the ‘black-box’ approach, where the software and underlying algorithms are proprietary and not available for inspection, adjustment, or extension. In some cases, the file formats generated by these applications fall into the ‘black-box’ category as they are proprietary and encrypted. The second approach is known as the ‘glass-box’ approach, where the algorithms implemented in the software are made available and the software has the flexibility to be extended by the public. The files generated by glass-box’ applications are also open and require minimal effort for parsing and converting. The ‘glass-box’ approach tends to be driven by the research community and is frequently found in academic and collaborative settings. Geneious R7 was created using the ‘glass-box’ approach in order to reduce the bioinformatics learning curve and allow more biologists to manage, analyze, and visualize their sequence data. Geneious R7 is a cross-platform Java application, which runs on any computer with a Java Virtual Machine installed on it (PC, MacOSX, Linux). Geneious R7 uses publically available, peer-reviewed algorithms, which afford researchers a measure of confidence, with the ability to inspect how the algorithms were used in the peer-reviewed literature. Four bioinformatic domains and multiple web services have been incorporated into one user interface for genomic sequence analysis: molecular cloning, NGS, phylogenetics, and sequence assembly/alignment. In addition to offering a robust feature-set, Geneious R7 empowers the researcher with intuitive data management features like drag-and-drop file interface, local file directory organization, and the ability to connect to relational databases [Fig. 1]. Biomatters’ Geneious R7 is a bioinformatics software platform that allows researchers the command of industry- leading algorithms and tools for their genomic and protein sequence analyses. Researchers at all levels can easily manage, analyze, and share their sequence data via a single intuitive software application. R7 provides tools for next-generation sequence analysis, sequence alignment, molecular cloning, chromatogram assembly, and phylogenetics. New features for this major version release include tools for Gibson assembly, TOPO cloning, RAxML, FastTree, Garli, LastZ, Bowtie2, Rich Copy/Paste, as well as a host of new plug-ins. R7 affords real-time dynamic interaction with sequence data and empowers biologists to produce stunning publication quality images to increase the impact of their research. By utilizing Geneious R7, biologists can easily improve their bioinformatic workflow efficiencies to free up more time for their research. In addition to multi-site Gateway cloning Geneious R7 has added several new cloning tools [Fig. 3] for Gibson assembly, TOPO cloning, TALEN assembly, Golden Gate cloning and one-step Gateway cloning. The interactive document viewer enables the user to quickly create and edit constructs, restriction enzyme lists, annotate restriction enzymes directly onto linear & circular DNA and then perform in silico digests into fragments. The ability to track the cloning process through parent-descendent relationships allows the user to fork and edit a cloning process for protocol development. Geneious R7 includes Primer3 [Ref.1,2], which allows the user to design degenerate primers, designate mismatches, and perform primer extensions among a number of useful features. Additionally, Oligos are easily imported from legacy primer databases and a built-in primer database and browser afford easy primer management and testing. 1. Untergrasser A, et. Al. 2012. Primer3 - new capabilities and interfaces. Nucleic Acids Research 40(15):e115. 2. Koressaar T, Remm M. 2007. Enhancements and modifications of primer design program Primer3 Bioinformatics 23(10):1289-91. 3. Zerbino, D. R., Birney, E. 2008. "Velvet: Algorithms for de novo short read assembly using de Bruijn graphs". Genome Research 18 (5): 821–829. 4. Zerbino, D. R., McEwen, G. K., Margulies, E. H., Birney, E. 2009. "Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler". PLoS ONE 4 (12):e8407. 5. Langmead B, Salzberg S. 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods. 9:357-359. 6. Huang, X. and Madan, A. 1999. CAP3: A DNA sequence assembly program. Genome Res. 9, 868-877. 7. Zwickl, D. J., 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. dissertation, The University of Texas at Austin. 8. Stamatakis, A. 2006.: “RAxML-VI-HPC: Maximum Likelihood-based Phylogenetic Analyses with Thousands of Taxa and Mixed Models”, Bioinformatics 22(21):2688–2690. 9. Price MN, Dehal PS, Arkin AP (2010) FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE 5(3): e9490. Introduction Background Molecular Cloning References Geneious R7 is a stable Java framework, which combines useful tools and functions into a single intuitive graphical user interface. By design the user is faced with a single learning curve to help navigate the diversity of leading applications and algorithms, which seamlessly communicate behind the scenes. The framework is flexible and able to adapt to the changing demands of research via a public Java API. The extensible plug-in architecture allows any developer to leverage what we have created to their own purpose [Fig. 2]. Much of the current functionality of Geneious stems from community driven feature requests and plugin creation (eg. Workflow builder [Fig. 2], PlasmoDB, GenBank submission, Light-LIMS interface, Mauve whole genome aligner). The selection of the feature set is fundamentally guided through the scientific community by way of collaborations and consultations with biologists and bioinformaticians. Figure 1: Geneious R7 interface. Top – ‘Menu Bar’ – configurable for frequently used analyses; Top Left - local directory; Bottom Left - ‘Live’ database services; Top Center – folder contents includes file metadata; Bottom Center - real-time interactive ‘Document Viewer’; Far Right – ‘Options Panel’ used to configure display in the document viewer. Everything displayed in the viewer may be exported in a variety of image file formats. Features Next generation sequence analysis Phylogenetics Geneious R7 is able to process sequence data from all sequencing platforms (eg. Illumina, Sanger, PacBio) and has been performance tuned for larger datasets. Pre-processing tools like trimming, pairing, and barcode sorting allow biologists the ability to groom and visualize [Fig. 5] their sequence data without having to learn command-line tools and scripts. R7 reference based & de novo assembly algorithms include assemblers like Velvet [Ref.3], Bowtie2 [Ref.4], CAP3 [Ref.5], and the Geneious assembler. Once an assembly has been performed rich-copy paste annotation and plug-ins like LastZ, a BLAST-like alignment tool for the pairwise alignment of chromosome-sized nucleotide sequences, facilitate rapid sequence characterization and publication. Geneious R7 has expanded its phylogenetic tree tool suite for maximum-likelihood tree methods. The Genetic Algorithm for Rapid Likelihood Inference (GARLI) [Ref. 7] is used to infer phylogenetic trees using the maximum- likelihood criterion and rapidly searches the space of evolutionary trees and model parameters to find the solution maximizing the likelihood score. Garli implements nucleotide, amino acid and codon-based models of sequence evolution. Randomized A(x)ccelerated Maximum Likelihood (RAxML) [Ref.8] and FastTree [Ref. 9] have been incorporated for larger multiple sequence alignments numbering in the millions. FastTree infers approximately- maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments of up to a million sequences in a reasonable amount of time and memory. These new tree building methods [Fig. 6] coupled with current methods like UPGMA & NJ, PhyML, MrBayes, and PAUP* allow Geneious to be useful for the evolutionary and microbiome research communities. Figure 3: Interactive visualization of a vector; ‘Primer Design’ dialogue window; Simple primer editing. Figure 5: Next generation sequencing quality trimming; read mapping; assembly report; coverage graph. Figure 2: Geneious R7 workflow builder. Figure 6: Visualized circular phylogenetic tree and associated multiple sequence alignment.

Upload: others

Post on 15-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Geneious R7: A Bioinformatics Platform For BiologistsGeneious R7 was created using the ‘glass-box’ approach in order to reduce the bioinformatics learning curve and allow more

Geneious R7: A Bioinformatics Platform For Biologists Christian Olsen, Kashef Qaadri

Biomatters, Inc. 60 Park Place, Suite 2100 Newark, NJ 07102Contact: [email protected]

One of the challenges with bioinformatics is the abundance of applications, websites, and algorithms available to the research community. This abundance of options tends to generate confusion for researchers as they seek to efficiently qualify best software tools for their needs. Bioinformatics software is often designed to solve a specific computational problem without much regard toward integrating with other software. Two different approaches have been used in the design of bioinformatics software. The first approach is known as the ‘black-box’ approach, where the software and underlying algorithms are proprietary and not available for inspection, adjustment, or extension. In some cases, the file formats generated by these applications fall into the ‘black-box’ category as they are proprietary and encrypted. The second approach is known as the ‘glass-box’ approach, where the algorithms implemented in the software are made available and the software has the flexibility to be extended by the public. The files generated by ‘glass-box’ applications are also open and require minimal effort for parsing and converting. The ‘glass-box’ approach tends to be driven by the research community and is frequently found in academic and collaborative settings.

Geneious R7 was created using the ‘glass-box’ approach in order to reduce the bioinformatics learning curve and allow more biologists to manage, analyze, and visualize their sequence data. Geneious R7 is a cross-platform Java application, which runs on any computer with a Java Virtual Machine installed on it (PC, MacOSX, Linux). Geneious R7 uses publically available, peer-reviewed algorithms, which afford researchers a measure of confidence, with the ability to inspect how the algorithms were used in the peer-reviewed literature. Four bioinformatic domains and multiple web services have been incorporated into one user interface for genomic sequence analysis: molecular cloning, NGS, phylogenetics, and sequence assembly/alignment. In addition to offering a robust feature-set, Geneious R7 empowers the researcher with intuitive data management features like drag-and-drop file interface, local file directory organization, and the ability to connect to relational databases [Fig. 1].

Biomatters’ Geneious R7 is a bioinformatics software platform that allows researchers the command of industry-leading algorithms and tools for their genomic and protein sequence analyses. Researchers at all levels can easily manage, analyze, and share their sequence data via a single intuitive software application. R7 provides tools for next-generation sequence analysis, sequence alignment, molecular cloning, chromatogram assembly, and phylogenetics. New features for this major version release include tools for Gibson assembly, TOPO cloning, RAxML, FastTree, Garli, LastZ, Bowtie2, Rich Copy/Paste, as well as a host of new plug-ins. R7 affords real-time dynamic interaction with sequence data and empowers biologists to produce stunning publication quality images to increase the impact of their research. By utilizing Geneious R7, biologists can easily improve their bioinformatic workflow efficiencies to free up more time for their research.

In addition to multi-site Gateway cloning Geneious R7 has added several new cloning tools [Fig. 3] for Gibson assembly, TOPO cloning, TALEN assembly, Golden Gate cloning and one-step Gateway cloning. The interactive document viewer enables the user to quickly create and edit constructs, restriction enzyme lists, annotate restriction enzymes directly onto linear & circular DNA and then perform in silico digests into fragments. The ability to track the cloning process through parent-descendent relationships allows the user to fork and edit a cloning process for protocol development. Geneious R7 includes Primer3 [Ref.1,2], which allows the user to design degenerate primers, designate mismatches, and perform primer extensions among a number of useful features. Additionally, Oligos are easily imported from legacy primer databases and a built-in primer database and browser afford easy primer management and testing.

1. Untergrasser A, et. Al. 2012. Primer3 - new capabilities and interfaces. Nucleic Acids Research 40(15):e115. 2. Koressaar T, Remm M. 2007. Enhancements and modifications of primer design program Primer3 Bioinformatics 23(10):1289-91.  3. Zerbino, D. R., Birney, E. 2008. "Velvet: Algorithms for de novo short read assembly using de Bruijn graphs". Genome Research 18 (5): 821–829.  4. Zerbino, D. R., McEwen, G. K., Margulies, E. H., Birney, E. 2009. "Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the

Velvet Short-Read de Novo Assembler". PLoS ONE 4 (12):e8407.  5. Langmead B, Salzberg S. 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods. 9:357-359.  6. Huang, X. and Madan, A. 1999. CAP3: A DNA sequence assembly program. Genome Res. 9, 868-877.  7. Zwickl, D. J., 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood

criterion. Ph.D. dissertation, The University of Texas at Austin.  8. Stamatakis, A. 2006.: “RAxML-VI-HPC: Maximum Likelihood-based Phylogenetic Analyses with Thousands of Taxa and Mixed Models”, Bioinformatics

22(21):2688–2690.  9. Price MN, Dehal PS, Arkin AP (2010) FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE 5(3): e9490.

!

Introduction

Background

Molecular Cloning

References

Geneious R7 is a stable Java framework, which combines useful tools and functions into a single intuitive graphical user interface. By design the user is faced with a single learning curve to help navigate the diversity of leading applications and algorithms, which seamlessly communicate behind the scenes. The framework is flexible and able to adapt to the changing demands of research via a public Java API. The extensible plug-in architecture allows any developer to leverage what we have created to their own purpose [Fig. 2]. Much of the current functionality of Geneious stems from community driven feature requests and plugin creation (eg. Workflow builder [Fig. 2], PlasmoDB, GenBank submission, Light-LIMS interface, Mauve whole genome aligner). The selection of the feature set is fundamentally guided through the scientific community by way of collaborations and consultations with biologists and bioinformaticians.

Figure 1: Geneious R7 interface. Top – ‘Menu Bar’ – configurable for frequently used analyses; Top Left - local directory; Bottom Left - ‘Live’ database services; Top Center – folder contents includes file metadata; Bottom Center - real-time interactive ‘Document Viewer’; Far Right – ‘Options Panel’ used to configure display in the document viewer. Everything displayed in the viewer may be exported in a variety of image file formats.

Features

Next generation sequence analysis

Phylogenetics

Geneious R7 is able to process sequence data from all sequencing platforms (eg. Illumina, Sanger, PacBio) and has been performance tuned for larger datasets. Pre-processing tools like trimming, pairing, and barcode sorting allow biologists the ability to groom and visualize [Fig. 5] their sequence data without having to learn command-line tools and scripts. R7 reference based & de novo assembly algorithms include assemblers like Velvet [Ref.3], Bowtie2 [Ref.4], CAP3 [Ref.5], and the Geneious assembler. Once an assembly has been performed rich-copy paste annotation and plug-ins like LastZ, a BLAST-like alignment tool for the pairwise alignment of chromosome-sized nucleotide sequences, facilitate rapid sequence characterization and publication.

Geneious R7 has expanded its phylogenetic tree tool suite for maximum-likelihood tree methods. The Genetic Algorithm for Rapid Likelihood Inference (GARLI) [Ref. 7] is used to infer phylogenetic trees using the maximum-likelihood criterion and rapidly searches the space of evolutionary trees and model parameters to find the solution maximizing the likelihood score. Garli implements nucleotide, amino acid and codon-based models of sequence evolution. Randomized A(x)ccelerated Maximum Likelihood (RAxML) [Ref.8] and FastTree [Ref. 9] have been incorporated for larger multiple sequence alignments numbering in the millions. FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments of up to a million sequences in a reasonable amount of time and memory. These new tree building methods [Fig. 6] coupled with current methods like UPGMA & NJ, PhyML, MrBayes, and PAUP* allow Geneious to be useful for the evolutionary and microbiome research communities.

Figure 3: Interactive visualization of a vector; ‘Primer Design’ dialogue window; Simple primer editing.

Figure 5: Next generation sequencing quality trimming; read mapping; assembly report; coverage graph.

Figure 2: Geneious R7 workflow builder.

F i g u r e 6 : V i s u a l i z e d c i r c u l a r phy logene t i c t r e e a n d a s s o c i a t e d m u l t i p l e s e q u e n c e alignment.