chani & malki present:
DESCRIPTION
Chani & Malki present:. The OdzFinder. Project adviser: Dr. Ron Wides. WANTED. Name : Odz a.k.a : Ten-m Family : pair-rule gene Length: 10,000 bp. Getting to Know Odz …. Discovered in D. Melanogaster in 1994. Belongs to pair rule gene family. - PowerPoint PPT PresentationTRANSCRIPT
Chani & Malki present:
Project adviser: Dr. Ron Wides
The OdzFinder
WANTED
Name: Odz
a.k.a: Ten-m
Family: pair-rule gene
Length: 10,000 bp
Getting to Know Odz… Discovered in D. Melanogaster in 1994
Odz protein is expressed in neurons, developing brain and hindgut
Odz protein is expressed in segmentation.
Od Od z
Belongs to pair rule gene family
Plays a crucial role in the CNS during fetal development
The Odz Family
Ten-m1Ten-m2Ten-m3Ten-m4
Ten-a
Ten-m
Ten-m
Vertebrates
Arthropods
Odz gene orthologs have been found in 3 phylums:
Nematodes
The Odz Protein
2731 Amino Acids
III. hydrophobic sequences, probably transmembrane sequence
EGF-like domain Intracellular kinase substrate domain ODZ
The only pair rule gene that encodes a protein!
Contains 3 domains:
I. extracellular EGF-like repeats
II. tyrosine kinase phosphorylation sites
EGF-like Repeats
x(4)-C-x(0,48)-C-x(3,12)-C-x(1,70)-C-x(1,6)-C-x(2)-G-a-x(0,21)-G-x(2)-C-x
EGF-like domain: 30 - 40 amino acid residues Significant homology to epidermal growth factor
(EGF) Has been found in single or multiple copies in a
number of other proteins Generally found in the extracellular domain of
membrane proteins or secreted proteins Involved in receptor-ligand interactions Includes 6 conserved cysteine residues involved in
disulfide bonds
The lab’s goals:
Genomics:
To find a broad family of Odz gene
Phylogenetic trees to discover segmentation mechanism
Massive alignment to find conserved regions
Biological in-vivo experiments to change regions
Proteomics:
The protein’s role
How the protein functions
The protein’s interactions with other proteins ( i.e : notch)
Finding Odz Genes
BLASTing new EST libraries
DataBases
Se/uences discovered
in the lab
EST Libraries
Odz DataBase
Extracting DNA from various innocent creatures
BLASTing existing databases
Odz Database
The collected data was organized by Michal
Markovitz in a relational database.
The database consists of 10 different tables.
For example:
2 problems remained:
1. Blast results include many non Odz hits:
• prokaryotic hits• non-metazoan hits• EGF region hits• Low similarity
We need a program to automatically extract Odz hits from NCBI Blast results!!!
0
10
20
30
40
50
60
70
80
low scoreprokaryoticnon-metazoanOdzEgf-like
2. Every day…• New sequences are added to the existing databases• New EST libraries are released
A perl program that will automatically extract Odz hits from NCBI Blast results.
The OdzFinder
Blast Report Tax Report
UpdateDatabase
Combination
Look up table
Evalue>y?
Score>x? Score>x?
Evalue>y?
Odz
EGF?
Metazoan?
Prokaryote?
All EGFNo EGF
Mixed EGF
no
yes
yes
yes
yes
yes
input
S.O.F.T - screen Odz Flow Template
>gi|163076235|gb|AC765764.7 Apis mellifera BAC clone RP11-18D7 , complete sequence
Length = 184032
Score = 153 bits (328), Expect = 3e-36 Identities = 59/59 (100%), Positives = 59/59 (100%)
Frame = +3 / +3
Query: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179
IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH
Subjct: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179
The program extracts relevant information from each hit:
inputBlast Report
BLASTS are performed on the Odz orthologs
The results are sent to the OdzFinder program to be filtered.
>gi|163076235|gb|AC765764.7 Apis mellifera BAC clone RP11-18D7 , complete sequence
Length = 184032
Score = 153 bits (328), Expect = 3e-36 Identities = 59/59 (100%), Positives = 59/59 (100%)
Frame = +3 / +3
Query: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179
IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH
Subjct: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179
Taxonomy Report Eukaryota .................................. 2502 hits 41 orgs [root; cellular organisms] . Bilateria ................................ 2421 hits 33 orgs [Fungi/Metazoa group; Metazoa; Eumetazoa] . . Coelomata .............................. 2396 hits 31 orgs . . . Deuterostomia ........................ 2322 hits 23 orgs . . . . Chordata ........................... 2296 hits 22 orgs . . . . . Euteleostomi ..................... 2236 hits 21 orgs [Craniata; Vertebrata; Gnathostomata; Teleostomi] . . . . . . Tetrapoda ...................... 2022 hits 14 orgs [Sarcopterygii] . . . . . . . Amniota ...................... 1908 hits 12 orgs . . . . . . . . Eutheria ................... 1634 hits 10 orgs [Mammalia; Theria]
Search for eukaryotic and metazoan results.
Build prokaryotic database for possible future use.
Evolutional distance becomes relevant when dealing with EGF-like repeats.
The program will receive the BLAST hit’s Taxonomy Report and manipulate it into a manageable hash table.
A default Taxonomy Report will be available when BLASTing against ESTs.
inputBlast Report Tax Report
;
root ;cellular organisms ;Eukaryota ;Fungi/Metazoa group ;Metazoa ;Eumetazoa ;Bilateria ;Coelomata ;Protostomia ;Panarthropoda ;Arthropoda ;Mandibulata ;Pancrustacea ;Hexapoda ;Insecta ;Dicondylia ;Pterygota ;Neoptera ;Endopterygota ;
Hymenoptera ;Apocrita ;Aculeata ;Apoidea; Apidae; Apinae; Apini; Apis
Tenascin-m (odz) includes 8 EGF-like repeats
The conserved EGF region gave problematic results.
Many hits appear only due to their similarity to the EGF region.
Query :
Subject :
EGF?
High score!!!
There are three possible positions regarding the hit’s relation to the query’s EGF-like region-
I. The hit is completely inside the query’s EGF-region
525 2750804Query
Hit
II. The hit is completely outside the query’s EGF-region
525 804Query
Hit
III. The hit is partially in the query’s EGF-region
804525Query
Hit
Get a better picture..
score & e-value are examined
Set low threshholds to ensure that very small hits are not missed - some times
they are translocations
Position I:
The hit is completely outside the query’s EGF-like region
Evalue<y?
Score>x?
Odz
yes
yes
No EGF
Position II:
The hit is completely inside the query’s EGF-like region
Look up table example:
In order to prevent acceptance of non-odz hits with high scores due to their egf-region , a look up table was established
evolutionally close query & subject high id % demanded
evolutionally distant query & subject low id % demanded
Query HitOdz OrthologOdz Paralog
Mus MusculusHomo Sapiens95%70%
Mus MusculusDrosophila Melanogaster
75%55%
Look up table
Score>x?
Evalue>y?
Odz
yes
yes
?
All EGF
Position III :
The hit is partially inside the query’s EGF-like region
2 Possibilities:
A. False call ! An EGF hit with insignificant similarity outside of EGF-domains.
B. The Real Thing ! EGF with adjacent regions of significant similarity.
A B
Treat like II
Is it more like A or like B?
Treat like I
Mixed EGF
DBIUpdate Database
:Data flow through DBI
A database interface module for Perl
Enables Perl applications to access multiple database types
Provides a consistent database interface independent of the actual database being used
DBD::MSQLMySQLRDBMSDBIPerl Script
giscorespecies
49256537140Xenopus
48096180637Apis mellifera
45382362619Gallus gallus
42658224125Homo sapiens
34932761384Rattus norvegicus
38087011463Mus musculus
45446084419Drosophila melanogaster
325657151604Caenorhabditis elegans
41469033760Gasterosteus aculeatus
Results!
EGF
Odz
not Metazoa
ProkaryoticEGF
Odz
not Metazoa
Prokaryotic
Special thanks to our project adviser
Dr. Ron Wides
For his guidance, patience & Krispy Kreme donuts