uniview
TRANSCRIPT
![Page 1: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/1.jpg)
WebWeb--based application based application
to survey properties of to survey properties of homologous proteinshomologous proteins.
Candidato:
Diego Poggioli
Relatore:
Prof. Rita Casadio
Correlatore:
Dr. Brigitte Boeckmann
![Page 2: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/2.jpg)
• Bio-problem: Visualization and interaction with
biological data and performing a comparative protein analysis
• Info-solution: Web application – CGI
The portal gives access to four web pages: 1) Function-related annotation derived from UniProtKB/Swiss-Prot; 2) Feature of the protein group; 3) Conservation score; 4) Tree.
![Page 3: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/3.jpg)
Members of a protein family normally perform a general biochemical function in common, but one or more subgroups may evolve a slightly different function, such as different
substrate specificity.
![Page 4: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/4.jpg)
By comparing groups and subgroups of proteins it is possible to identify or estimate:
• similarity and differences between the proteins sequences
as well as the information available for the given protein
group;
• the ranges, within which functional information on proteins can be transferred from experimentally characterized proteins
to their homologs from poorly studied organism;
• errors in the annotations of proteins;
![Page 5: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/5.jpg)
Visualization and interact with biological dataVisualization and interact with biological data
![Page 6: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/6.jpg)
HTML JavaScript, PHP, Perl, Python, Ajax, ASP, Ruby…
C GIphp
System and browser
independent
Dinamic page
Available from
any PC
![Page 7: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/7.jpg)
P02701
P56732
P56734
O13153
P56733
P56735
P56736
AVID_CHICK
AVR2_CHICK
AVR4_CHICK
AVR1_CHICK
AVR3_CHICK
AVR6_CHICK
AVR7_CHICK
ID AVID_CHICK Reviewed; 152 AA.
AC P02701; Q91958; Q98SH4;
DT 21-JUL-1986, integrated into
DT 11-SEP-2007, sequence version 3.
DT 10-JUN-2008, entry version 87.
DE Avidin precursor.
GN Name=AVD;
OS Gallus gallus (Chicken).
OC Eukaryota; Metazoa; Chordata
OC Archosauria; Dinosauria
OC Neognathae; Galliformes
OX NCBI_TaxID=9031; RN [1] RP NUCLEOTIDE SEQUENCE [MRNA].
RX MEDLINE=87203384; PubMed
RA Gope M.L., Keinaenen R.A.,
RA Zarucki-Schulz T., O'Malley B.W.,
RT "Molecular cloning of the chicken
RL Nucleic Acids Res. 15:3595
RN [2] RP NUCLEOTIDE SEQUENCE [MRNA].
RX MEDLINE=90355928; PubMed
RA Chandra G., Gray J.G.;
RT "Cloning and expression of
RL Methods Enzymol. 184:70
…
Form filling and data type
![Page 8: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/8.jpg)
![Page 9: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/9.jpg)
BioViewBioView• overview on biological informations
• taxonomic descriptive statistics
a compact summary view on the biological information of
a protein group is important especially when having a large dataset. This way it will be possible to observe,
compare and count all common and dissimilar characteristics; it is also possible to analyze in every single detail of component with the same featuring.
- gene name, functional (catalytic activity, enzyme regulation, pathway…) and general
descriptive information;
- organism classification (OC) and organism species (OS);
- non-experimental qualifiers (by similarities, putative or probable).
![Page 10: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/10.jpg)
ID, AC, DE, CC:'FUNCTION', 'PATHWAY', 'CATALYTIC
ACTIVITY', 'ENZYME REGULATION', 'SUBUNIT',
'SIMILARITY', 'COFACTOR', 'DEVELOPMENTAL STAGE',
'INDUCTION', 'PTM', 'SUBCELLULAR LOCALIZATION',
'TISSUE SPECIFICITY'
OS, OC
Eukaryota -
Viridiplantae Eukaryota
Streptophyta Viridiplantae
Embryophyta Streptophyta
Tracheophyta Embryophyta
... ...
Pipeline BioView page
![Page 11: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/11.jpg)
Nuber of entries
Non-redundant annotation
Number of entries with non-experimental qualifier
Number of entries with annotated experimental qualifier
![Page 12: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/12.jpg)
Expande all the hierarchy
On mouse-click the relevant entry names are listed
![Page 13: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/13.jpg)
![Page 14: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/14.jpg)
FeatureViewFeatureView
• Interactive interface for visualizing function-related features on the protein sequence and 3D structure
• This page should allow the user to analyze combined sequences-structure on a broad set of data showing the greatest number of information available in a clear and intuitive way.
![Page 15: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/15.jpg)
Function-related features derived from the FT lines of UniProtKB:
active sites, binding sites, domain, transmembraneregion, DNA binding domain…
are mapped on the alignment and highlighted to allow a clear and compact presentation of the relevant information. The characteristics are mapped on the structure in the same way, allowing to identify regions and conserved sites.
Sequence � FT � Structure
![Page 16: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/16.jpg)
FeatureView
•• Choose the best structureChoose the best structure
• Alignment
• Mapping the feature on the alignment and on the structure
![Page 17: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/17.jpg)
F.P.A. David and Y.L. Yip. SSMap*: a new UniProt-PDB mapping resource for the curation of structural-related
information in the UniProt/Swiss-Prot Knowledgebase. Submitted
...
'91 ' => ‘91',
'25 ' => ‘25',
'92 ' => ‘92',
'81 ' => ‘82',
'71 ' => ‘71',
'21 ' => ‘23',
'-' => 'x',
'61 ' => ‘61',
'37 ' => ‘37',
'68 ' => ‘68',
'50 ' => ‘50',
'18 ' => ‘15',
...
Choose the best structureChoose the best structure
*
![Page 18: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/18.jpg)
Jmol: an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/
![Page 19: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/19.jpg)
![Page 20: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/20.jpg)
![Page 21: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/21.jpg)
![Page 22: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/22.jpg)
FeatureView
• Choose the best structure
•• AlignmentAlignment
• Mapping the feature on the alignment and on the structure
![Page 23: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/23.jpg)
Edgar, Robert C. (2004), MUSCLE: multiple sequence alignment with high accuracy and
high throughput, Nucleic Acids Research 32(5), 1792-97.
Input file
AlignmentAlignment
![Page 24: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/24.jpg)
FeatureView
• Choose the best structure
• Alignment
•• Mapping the feature on the alignment Mapping the feature on the alignment
and on the structureand on the structure
![Page 25: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/25.jpg)
I group: ('CA_BIND', 'NP_BIND', 'MOTIF', 'ACT_SITE', 'METAL',
'BINDING', 'SITE', 'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD',
'DISULFID', 'CROSSLINK');
II group: ('PEPTIDE', 'TOPO_DOM', 'TRANSMEM', 'DOMAIN',
'REPEAT', 'ZN_FING', 'DNA_BIND', 'REGION', 'COILED');
Input file
AlignmentAlignment
FT (Feature Table) lines
![Page 26: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/26.jpg)
different background colour and a toolbox with the content as described above.
I group: ('CA_BIND', 'NP_BIND', 'MOTIF',
'ACT_SITE', 'METAL', 'BINDING', 'SITE',
'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD',
'DISULFID', 'CROSSLINK');
II group: ('PEPTIDE', 'TOPO_DOM',
'TRANSMEM', 'DOMAIN', 'REPEAT', 'ZN_FING',
'DNA_BIND', 'REGION', 'COILED');
distinct font color and with a toolbox containing the description of the feature (entry name, feature key, sequence position, description)
-overlapping into the first group � represented in toolbox.-ovelapping into the second group � different background color.
FT (Feature Table) lines
![Page 27: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/27.jpg)
ATOM 1817 N MET B 3 -31.380 87.126 39.296 1.0 100.00
ATOM 1818 CA MET B 3 -30.684 88.400 39.176 1.0 100.00
ATOM 1819 C MET B 3 -30.858 88.967 37.771 1.0 100.00
ATOM 1820 O MET B 3 -30.195 88.514 36.832 1.0 100.00
ATOM 1821 CB MET B 3 -29.190 88.285 39.498 1.0 100.00
ATOM 1822 CG MET B 3 -28.465 89.628 39.501 1.0 100.00
ATOM 1823 SD MET B 3 -26.671 89.415 39.661 1.0 100.00
ATOM 1824 CE MET B 3 -26.312 90.705 40.863 1.0 100.00
ATOM 1825 N GLU B 4 -31.750 89.938 37.638 1.0 50.00
ATOM 1826 CA GLU B 4 -31.927 90.498 36.300 1.0 50.00
… … … … … … … … … … …
50.00
100.00
00.00Alignment position
![Page 28: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/28.jpg)
On mouse-click run blastp on UniProt web page
![Page 29: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/29.jpg)
On mouse-click start Jalview applet
![Page 30: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/30.jpg)
Conservation
• Interactive interface for visualizing the structural conservation of protein groups on the protein sequence and 3D structure
• Highlight positions and regions conserved in the group of proteins
• Conservation scores are mapped on the multiple sequence alignment (MSA) and into the 3D-structure
![Page 31: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/31.jpg)
Input file
Scoring residue conservationScoring residue conservation
![Page 32: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/32.jpg)
0.000 # ---S--------
0.000 # ---T--------
0.000 # ---S--------
0.000 # ---T--------
0.000 # ---S--------
0.024 # ---TM-M-----
0.320 # MMMSV-VVMM--
0.278 # VVVDHMHHGGG-
0.500 # LLLYLLWWLLL-
0.603 # SSSSTTTSSSS-
0.391 # PAAAPAAEDDD-
0.424 # AAAAEEEVGGQT
0.809 # DDDDEEEEEEEE
Scoring methodsScoring methods
Method name Type of score Description
basicmdm Sum-of-Pairs (SP), matrix score Simplest SP score possible
entropynorm7 EntropicNormalized Shanon entropy with 7
symbol types
entropynorm21 EntropicNormalized Shannon entropy with
21 symbol types.
tridentEntropic, matrix score, sequence
weightedMixed model score.
valdar01SP, matrix score, sequence
weighted
Score used in Valdar & Thornton
2001
![Page 33: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/33.jpg)
![Page 34: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/34.jpg)
![Page 35: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/35.jpg)
• develop a method to compare two or more protein subgroups
• profile
At the moment it is a framework integrated for the development of the visualization of info such as annotation and for the
visualization of sites that differ in conservation between protein
subgroups.
Input file
![Page 36: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/36.jpg)
TreeTree
The phylogenetic tree of the protein group will be shown in this page .
![Page 37: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/37.jpg)
Software for phylogenetic tree visualization and manipulations
http://bioinfo.unice.fr/biodiv/Tree_editors.html
- Treedyn: works in local machine but not in server side (graphical applet needed)
- Phylodendron: trouble with cgi script
-phyfi: private program it is not possible to install on own server, eventually URL
request
-nexplorer: NEXUS format needed and it is not possible to install on own server
- dnd2svg.pl: strict sequence number – output only in SVG format
-TreeFam: only private program
� ATV 1.92
![Page 38: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/38.jpg)
http://www.phylosoft.org/atv/
Zmasek C.M. and Eddy S.R. (2001) ATV: display
and manipulation of annotated phylogenetic trees.
Bioinformatics, 17, 383-384.
Gascuel O.1997. BIONJ: an improved version of the NJ algorithm based on a
simple model of sequence data. Molecular Biology and Evolution, 14:685-695.
Input file
Tree in Newick format
((((ACADM_HUMAN:0.000925,ACADM_PANTR:0.003941):0.014922,ACADM_MACFA:0.021579):0.041621,((ACADM
_MOUSE:0.015113,ACADM_RAT:0.029420):0.051559,(ACADM_DROME:0.187088,((ACAD8_MOUSE:0.049728,ACAD
8_HUMAN:0.052753):0.013706,ACAD8_BOVIN:0.104627):1.146493):0.149078):0.010918):0.015504,ACADM_
PIG:0.057735,ACADM_BOVIN:0.023577);
http://www.jalview.org/
Clamp, M., Cuff, J., Searle, S. M. and
Barton, G. J. (2004). The Jalview Java
Alignment Editor. Bioinformatics, 20, 426-7
![Page 39: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/39.jpg)
![Page 40: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/40.jpg)
Future plansFuture plans
• Normalize HTML pages according to the W3C standard
• Improve the use of CSS
• Test the application on different web browser
• Write the application in a server side language
• Integrate the application with other databases
• Ensuring multiple access to the application and analysis history
• Develop a view of phylogenetic tree to show and to interact with additional information
• Hierarchical phylogeny-based classification in UniProtKB
![Page 41: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/41.jpg)
Following the hierarchical
phylogeny-based classification in
UniProtKB
![Page 42: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/42.jpg)
![Page 43: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/43.jpg)
AcknowledgementsAcknowledgements
• Brigitte Boeckmann & Rita Casadio
• Swiss-Prot lab, Biocomputing group
• Fabrice David & Marco Vassura
• Tutti i miei amici e Fra
• Dolores e Davide
And now?And now?
![Page 44: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/44.jpg)
- identify similarity and differences between the proteins
sequences as well as the information available for the given protein group;
- estimating the ranges, within which functional informationon proteins can be transferred from experimentally
characterized proteins to their homologs from poorly studied organism;
- identify errors in the annotations of proteins;
practical examples practical examples
![Page 45: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/45.jpg)
Compact summary view on the biological information of a protein group is important
especially when having a large dataset. This way it will be possible to observe,
compare and count all common and dissimilar characteristics; it is also possible to
analyze in every single detail of component with the same featuring.
Acetylglutamate kinase family
![Page 46: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/46.jpg)
Acyl-CoA dehydrogenase family
![Page 47: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/47.jpg)
![Page 48: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/48.jpg)
![Page 49: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/49.jpg)
![Page 50: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/50.jpg)
![Page 51: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/51.jpg)
gatB/gatE family
![Page 52: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/52.jpg)
IPP transferase family
![Page 53: UniView](https://reader033.vdocuments.us/reader033/viewer/2022052823/554f5273b4c905524c8b4f56/html5/thumbnails/53.jpg)