computational biology dr. jens allmer lecture slides week 5
TRANSCRIPT
![Page 1: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/1.jpg)
Computational Biology
Dr. Jens Allmer
Lecture Slides Week 5
![Page 2: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/2.jpg)
MakeDB
• Example– makeblastdb -in seq.fasta -dbtype prot -out seqBl –title
seqBlastDB
• More information?– Go to the doc folder of BLAST– Documentation is there– http://www.ncbi.nlm.nih.gov/books/NBK1763/
![Page 3: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/3.jpg)
BLAST
• Now that we have an indexed database try to run BLAST
• Read documentation and try to solve the simplest case– You will need the indexed database and you will need a FASTA
file as query– You could create queries from the database and slightly change
them
• Good luck
![Page 4: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/4.jpg)
OMSSA
• Unzip folder and check– Alternatively, download from NCBI
• MS/MS mgf file• Database file as FASTA• makeblastdb.exe• omssacl.exe• usermods.xml
![Page 5: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/5.jpg)
OMSSA
Before running OMSSA, database file must be converted to BLAST-like format.
So let’s run makeblastdb.exe to create a hash-indexed database
![Page 6: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/6.jpg)
OMSSA
Here 2 different settings are used.First one is with 0.05 product ion toleranceSecond one is with default product ion toleranceFor variable modifications (-mv) check usermods.xml
![Page 7: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/7.jpg)
X!Tandem
• Unzip folder and check
• Mgf formated spectra (file)• Database file (FASTA)• tandem-win32-10-12-01-1 folder• Used .xml configuration files (default_input.xml, input.xml
and taxonomy.xml)• To get the same output given in zip folder;
– Replace configuration files in «tandem-win\bin» folder with ones in «used» folder.
– Also copy database file to «fasta» folder and .mgf file to «bin» in «tandem-win»
![Page 8: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/8.jpg)
X!Tandem Console Application
![Page 9: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/9.jpg)
MBG404 Overview
Data
Generation
Processing
Storage
Mining
Pipelining
![Page 10: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/10.jpg)
X!Tandem Default Input
Parameters such as mass tolerances, enzyme type, number of charged for search can be reset in default_input.xml
![Page 11: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/11.jpg)
X!Tandem Input.xml
In input.xml file, you should specify path of:• taxonomy.xml • default_input.xml • Spectra filename • Output filenameNOTE: Here input.xml and all files above are in same folder(directory))
![Page 12: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/12.jpg)
X!Tandem Taxonomy
In taxonomy file, you should specify «database file path». In this example, database file is in «fasta» folder in «Xtandem\tandem-win32-10-12-01-1» folder.
![Page 13: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/13.jpg)
X!Tandem Output
![Page 14: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/14.jpg)
Console Applications
Why
![Page 15: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/15.jpg)
HTML
• What you need to know about hyper text markup language
• How to reach to it– Right click the document in your browser– Make sure you do not click on an image, link or some other non
HTML element– Choose View Source or View Page Source.
• What’s in the source
• Sometimes things are not visible/ accessible on the web page but can be retrieved from the source
![Page 16: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/16.jpg)
HTML Structure
<HTML>
<HEAD>
<TITLE>Page title seen in the title bar</TITLE>
<!-- Some other links and scripts can be here -->
</HEAD>
<BODY>
Text and other visible elements go here
</BODY>
</HTML>
![Page 17: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/17.jpg)
HTML Input
<FORM action=“destination” method=“POST/GET”>
<INPUT type=“TYPES” name=“” id=“” value=“” />
<TEXTAREA name=“” id=“”>value</TEXTAREA>
<SELECT name=“” id=“”>
<OPTION value=“”>display</OPTION>
</SELECT>
</FORM>
TYPES: { text, password, checkbox, radio, submit, reset, file, hidden, image, button}
![Page 18: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/18.jpg)
Why?
• Why do you need this information?
• Some information may be inaccessible on the website • In the HTML code it will be accessible
• Sometimes you may be interested in all settings for the programs that you used online
• Often these settings are in hidden input fields (you need to check the source then)
![Page 19: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/19.jpg)
NCBI Blast
• Contains many hidden variables here are some:
![Page 20: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/20.jpg)
Theory I
![Page 21: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/21.jpg)
MBG404 Overview
Data
Generation
Processing
Storage
Mining
Pipelining
![Page 22: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/22.jpg)
Database Management Systems
![Page 23: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/23.jpg)
Database Management Systems
Company/ Organization
DatabaseSize (GB)
DBMS SystemArch.
DBMSVendor
SystemVendor
StorageVendor
France Telecom 29,232 Oracle SMP Oracle HP HP
AT&T 26,269 Daytona SMP AT&T Sun Sun
SBC 24,805 Teradata MPP Teradata NCR LSI
Anonymous 16,191DB2 forUnix
MPP/ Cluster IBM IBM IBM
Amazon.com 13,001 Oracle SMP Oracle HP HP
Kmart 12,592 Teradata MPP Teradata NCR LSI
Claria Corporation 12,100 Oracle SMP Oracle Sun Hitachi
Health Insurance Review Agency 11,942 Sybase IQ Cluster Sybase HP Hitachi
FedEx Services 9,981 Teradata MPP Teradata NCR EMC
Vodafone D2 GmbH 9,108 Teradata MPP Teradata NCR LSI
![Page 24: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/24.jpg)
Database Management Systems
Physical Schema
Conceptual Schema
View 1 View 2 View 3
DB
Users
![Page 25: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/25.jpg)
Database Management Systems
![Page 26: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/26.jpg)
A Relation is a Table
Attributes(columnheaders)
Tuples(rows)
Contains data -> InstanceDomain
All possible values
name manf
WinterbrewBud Lite
Pete’sAnheuser-Busch
Beers
![Page 27: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/27.jpg)
Schemas
• Relation schema = relation name and attribute list.– Optionally: types of attributes.– Example: Beers(name, manf) or Beers(name: string, manf:
string)• Database = collection of relations.• Database schema = set of all relation schemas in the
database.• Instance of a relation = a table in a database with data
![Page 28: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/28.jpg)
Anomalies
• Goal of relational schema design is to avoid anomalies and redundancy.– Update anomaly : one occurrence of a fact is changed, but not
all occurrences.– Deletion anomaly : valid fact is lost when a tuple is deleted.
![Page 29: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/29.jpg)
Example of Bad Design
Drinkers(name, addr, beersLiked, manf, favBeer)
name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway ??? WickedAle Pete’s ???Spock Enterprise Bud ??? Bud
Data is redundant, because each of the ???’s can be easily figured out.
![Page 30: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/30.jpg)
This Bad Design AlsoExhibits Anomalies
name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAleJaneway Voyager WickedAle Pete’s WickedAleSpock Enterprise Bud A.B. Bud
• Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples?• Deletion anomaly: If nobody likes Bud, we lose track of the fact that Anheuser-Busch manufactures Bud.
![Page 31: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/31.jpg)
1st Normal Form
All attributes need to be atomic
![Page 32: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/32.jpg)
2nd Normal FormMust be in 1st NFa key must uniquely identify each tuple
![Page 33: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/33.jpg)
3rd Normal Form
Must be in 2nd NFattributes not part of a key must directly depend on one of the keys
![Page 34: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/34.jpg)
One-One Relationships
• In a one-one relationship, each entity of either entity set is related to at most one entity of the other set.
• Example: Relationship Best-seller between entity sets Manfs (manufacturer) and Beers.– A beer cannot be made by more than one manufacturer, and no
manufacturer can have more than one best-seller (assume no ties).
![Page 35: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/35.jpg)
Many-One Relationships
• Some binary relationships are many-one from one entity set to another.
• Each entity of the first set is connected to at most one entity of the second set.
• But an entity of the second set can be connected to zero, one, or many entities of the first set.
![Page 36: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/36.jpg)
Many-Many Relationships
• Focus: binary relationships, such as Sells between Bars and Beers.
• In a many-many relationship, an entity of either set can be connected to many entities of the other set.– E.g., a bar sells many beers; a beer is sold by many bars.
![Page 37: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/37.jpg)
End Theory I
• 5 min mindmapping• 10 min break
![Page 38: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/38.jpg)
Practice I
![Page 39: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/39.jpg)
MS Access
• Create new Tables:– Plant– Features– FeatureTypes
![Page 40: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/40.jpg)
Create a Table
![Page 41: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/41.jpg)
Create a Table
![Page 42: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/42.jpg)
Edit a Table
![Page 43: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/43.jpg)
Create the Three Tables
• Plant• Features• FeatureTypes
![Page 44: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/44.jpg)
Add Attributes
• Plant– ID– Gender– Species– Strain– Clone
![Page 45: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/45.jpg)
Add Attributes
• Features– ID– FeatureType– Value
![Page 46: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/46.jpg)
Add Attributes
• Features– ID– Type– Unit
![Page 47: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/47.jpg)
Table Space
![Page 48: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/48.jpg)
Notice
![Page 49: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/49.jpg)
More Editing
![Page 50: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/50.jpg)
More Editing
![Page 51: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/51.jpg)
Notice
![Page 52: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/52.jpg)
Fill with Data
• Import the data in the plants.csv file
![Page 53: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/53.jpg)
Select Appropriate table
![Page 54: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/54.jpg)
Some adjustments Are needed here
![Page 55: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/55.jpg)
Need to name theColumns
appropriately
![Page 56: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/56.jpg)
Insert Data
• Import Feature table• Import features txt file
![Page 57: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/57.jpg)
Real Data
• Download GO Terms:– http://
archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz
• Change file extention to .xml so that Access can import• Import file into Access
– May take a short while– Errors will occur (we ignore them for now)
• Have a look at the tables• Analyze the relationships (were they imported?)
![Page 58: Computational Biology Dr. Jens Allmer Lecture Slides Week 5](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649e7b5503460f94b7c8c6/html5/thumbnails/58.jpg)
End