maik friedel, thomas wilhelm, jürgen sühnel fli-jena, germany introduction: during the last 10...

Download Maik Friedel, Thomas Wilhelm, Jürgen Sühnel FLI-Jena, Germany  Introduction: During the last 10 years, a large number of complete

If you can't read please download the document

Upload: michael-manning

Post on 18-Jan-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Maik Friedel, Thomas Wilhelm, Jrgen Shnel FLI-Jena, GermanyIntroduction: During the last 10 years, a large number of complete genomes has been sequenced. Having these data at hand, the basic aim is now to convert this information into biological knowledge. This requires the identification of biologically meaningful motifs in genomic data. Computational motif discovery has been used with some success in simple organisms such as yeast, for example. For higher organisms with more complex genomes more sensitive methods are required. There is also a growing awareness that not single motifs but motif combinations usually called modules may be relevant to biological function. We describe here a new type of GenomeBrowser that offers user-friendly genome analysis tools for the statistical analysis of single and multiple sequences as well as for the visual exploration of single sequences. A peculiarity is that not only the standard sequence representation in terms of the bases A, T, G and C can be adopted, but also a reduced sequence representation by purine/pyrimidine and AT/GC characteristics and finally a representation in terms of a large number of dinucleotide parameters that can encode geometrical information on DNA structure, for example. All of these coding schemes can be converted into a signal representation that allows for a very effective visual motif discovery. Analyses can be performed for the + and as well as for the double strand. Combining these sequence- and signal-based representations offers a new approach for the detection of new regulatory elements. The functionalities described make the GenomeBrowser a unique tool for the identification and analysis of functional motifs in genomes. Conclusion: The GenomeBrowser is a powerful new tool for motif discovery in genomes. In addition to the standard sequence representation the DNA is also analysed considering biochemical and biophysical dinucleotide properties. This allows to identify and visualize a broad range of both known and unknown genome patterns. The new way of seeing the genome can lead to a better understanding of its organisation and function. 1.Repeat finder Tool to search for any type of simple repeat in the sequence or signal representation 2. Motif finder Tool for searching DNA motifs in the sequence or signal 3. Average statistic Tool for calculating the average for any type of DNA feature of selected DNA fragments 4. Showing underlying DNA sequence Feature that allows to show the underlying DNA sequence of a selected part of the signal representation 5. Property editor Tool for searching, filtering and selecting all types of features indicated in the GenBank file To visualize biochemical and biophysical properties of a DNA strand we have included about 40 different dinucleotide properties. All parameters are available for the complete set of 16 dinucleotide combinations. AA-1,20 AC-1,50 AG-1,50 AT-0,90 CA-1,70 CC-2,10 CG-2,80 CT-1,50 GA-1,50 GC-2,30 GG-2,10 GT-1,50 TA-0,90 TC-1,50 TG-1,70 TT-1,20 The table shows, as an example, the free energy change (B-DNA) [Kcal/mol] for the set of all 16 dinucleotides (M. Aida J. Theor. Biol. 130, (1988)). The main window of the GenomeBrowser consists of three panel. The first (1) is the control panel which allows uploading and manipulation of sequence and coding parameter information. In the main field (2) the signal curve is shown and in the third panel (3) the position information of the actually depicted sequence range. The DNA sequence in FASTA or GenBank format is converted into a signal representation by applying dinucleotide parameters and smoothened using a shifting window technique. All sequence features included in the GenBank file can be selected and shown in different colors. 1.Visualization of evolutionary events The GenomeBrowser can be used to distinguish between 3 types of rRNA gene clusters in chloroplast genomes. The patterns can be best seen applying the free energy change measure for the DNA double strand. 2. Visualization of gene and exon/intron organization With help of the GenomeBrowser it can be shown that genes tend to be purine-rich. In both pictures below the positive strand is encoded by the pyrimidine content. On the left side all genes of the + strand and on the right side all genes of the strand are shown in red. 3. Repeats which cannot be found by standard repeat search methods We have shown this by hiding DNA sequence repeats in an artificial sequence with only 50% alignment identity. The new sequence contains the same repeats that are only visible in the signal representation. First applications Main window Implemented tools The exon (red) and intron (green) structure of a given gene can be seen adopting a GC content representation. Exons tend to have a higher GC content than introns. 1.) Inverted Repeats (25kB) 79 of 88 genomes 2.) Inverted Repeat Lacking Clade 7 of 88 genomes 3.) 3 Directed Repeats 2 of 88 genomes (subclass: Euglenozoa) Parameters 1.) original sequence repeats 2.) the same repeats hidden in an artificial sequence with only 50% sequence identity