a prlic - biojava update
DESCRIPTION
Presentation by Prlic at BOSC2012 "BioJava Update"TRANSCRIPT
![Page 1: A Prlic - BioJava update](https://reader036.vdocuments.us/reader036/viewer/2022081403/554e7ed1b4c90545698b51e1/html5/thumbnails/1.jpg)
How to use BioJavato calculate one billion protein structure alignments at
the RCSB PDB website
Andreas Prlić
![Page 2: A Prlic - BioJava update](https://reader036.vdocuments.us/reader036/viewer/2022081403/554e7ed1b4c90545698b51e1/html5/thumbnails/2.jpg)
My Two Hats
RCSB PDBBioJava
![Page 3: A Prlic - BioJava update](https://reader036.vdocuments.us/reader036/viewer/2022081403/554e7ed1b4c90545698b51e1/html5/thumbnails/3.jpg)
www.pdb.org
Overview N
umbe
r of r
elea
sed
entr
ies
Year
![Page 4: A Prlic - BioJava update](https://reader036.vdocuments.us/reader036/viewer/2022081403/554e7ed1b4c90545698b51e1/html5/thumbnails/4.jpg)
Some of the things you can do at the RCSB PDB site
• Advanced queries
• Custom reports
• Visualization
• Education section
• Comparisons across PDB, based on sequence and 3D structure similarities
Jmol
LigandExplorer
Custom report
![Page 5: A Prlic - BioJava update](https://reader036.vdocuments.us/reader036/viewer/2022081403/554e7ed1b4c90545698b51e1/html5/thumbnails/5.jpg)
www.pdb.org
Systematic Structural AlignmentObjective: Find novel relationships
Example: Green Fluorescent Protein§ Nidogen-1: similar 11-stranded § beta-barrel and internal helices§ 3 Å RMSD, only 9% sequence identity§ Nidogen-1: component of basement membrane, no chromophore§ GFP and NID-1 may share common ancestor
![Page 6: A Prlic - BioJava update](https://reader036.vdocuments.us/reader036/viewer/2022081403/554e7ed1b4c90545698b51e1/html5/thumbnails/6.jpg)
Open Science Grid
based on the FATCAT (rigid) algorithm Yuzhen Ye & Adam Godzik. Flexible structure alignment by chaining aligned fragment pairs allowing twists. 2003. Bioinformatics vol.19 suppl. 2. ii246-ii255.
Systematic comparisons of representative chains from 40% sequence identity clusters
22000 sequence clusters33000 representative domains
![Page 7: A Prlic - BioJava update](https://reader036.vdocuments.us/reader036/viewer/2022081403/554e7ed1b4c90545698b51e1/html5/thumbnails/7.jpg)
PDBCustom Job Management
Java Clients can run anywhere
Open Science
Grid
Sends out instructionsto clients
Writes resultsto disk
.
.
.
![Page 8: A Prlic - BioJava update](https://reader036.vdocuments.us/reader036/viewer/2022081403/554e7ed1b4c90545698b51e1/html5/thumbnails/8.jpg)
Initial calculation of frozen snapshot of PDB
~170k CPU hourson OSG
Incremental weekly updates(~1-2 million alignments)
<1000 CPU hours
Code www.biojava.org
1 billion alignmentsavailable freely at
www.rcsb.org
![Page 9: A Prlic - BioJava update](https://reader036.vdocuments.us/reader036/viewer/2022081403/554e7ed1b4c90545698b51e1/html5/thumbnails/9.jpg)
BioJava
• Major rewrite - BioJava 3
![Page 10: A Prlic - BioJava update](https://reader036.vdocuments.us/reader036/viewer/2022081403/554e7ed1b4c90545698b51e1/html5/thumbnails/10.jpg)
BioJava 1 BioJava 3
core data model
symbols/alphabets, counts, distributions
Genome/sequencing
Mult. seq. align
Structure alignment
Modfinder
AA Properties
Protein Disorder
Hmmer3 WS
NCBI WS
Parsers: Genbank/Embl/Blast
![Page 11: A Prlic - BioJava update](https://reader036.vdocuments.us/reader036/viewer/2022081403/554e7ed1b4c90545698b51e1/html5/thumbnails/11.jpg)
Acknowledgments
• Spencer Bliven
• Peter Rose
• Phil Bourne
• all contributors
• A. Yates, J. Jacobsen, P. Troshin, M. Chapman, J. Gao, C.H. Koh, S. Foisy, R. Holland, G. Rimsa, M. Heuer, H. Brandstaetter-Mueller, S. Willis
RCSB PDB BioJava
FundingRCSB PDBGoogle Summer of Code Open Science Grid