arne elofsson ([email protected]) embrace: workshop on protein bioinformatics welcome embrace the new...
TRANSCRIPT
Arne Elofsson ([email protected])
EMBRACE: workshop on protein bioinformatics
Welcome EMBRACE
The new type of bioinformatics Web-services
Membrane protein bioinformatics Databases Structure prediction.
Topology 2.5D 3D
EMBRACE
EMBRACE is an EC-funded Network of Excellence with 18 partners, developing an integrated set of services for the major bioinformatics data resources and analysis tools.
The EMB name was selected after two previous names were rejected. It stands for "European Model for Bioinformatics Research And Community Education" .... and has no connection with EMBL.
EMBRACE
Network of Excellence - 18 partners with data resources, analysis tools, expertise in grid technology and experimental biologists.
Graham Cameron, Peter Rice, Alan Bleasby — EBI, Cambridge, GBToby Gibson — EMBL, Heidelberg, DEAndreas Gisel — Institute of Biomedical Technologies, Section Bari, CNR, ITTeresa Attwood — University of Manchester, GBMarco Pagni—Swiss Institute of Bioinformatics, CHErik Bongcam-Rudloff — LCB/BMC, Uppsala, SEVincent Breton — CNRS, Clermont Ferrand, FRSøren Brunak — CBS, Lyngby, DKJosé-María Carazo — CNB, Madrid, ESArne Elofsson — DBB, Stockholm, SEDaniel Kahn — INRA/CNRS, Toulouse, FRRalf Herwig — MPI für Molekulare Genetik, Berlin, DEEija Korpelainen — CSC, Espoo, FIChristine Orengo — University College London, GBYitzhak Pilpel — Weizmann Institute of Science, ILGert Vriend — CMBI, Nijmegen, NLAlfonso Valencia — INTA-CAB, Madrid, ESChristian Bryne — University of Bergen, NO
EMBRACE Overview
Nowadays biology often involves complex queries to many databases.
This kind of programming is hard to do.
EMBRACE aims to make it easier, and within the reach of experimental biologists.
To do this, we need an interoperable set of services and clients that can both find and make use of them.
EMBRACE aims to enable ...
•a scientist to evoke the latest and best version of a given program without any concern for its physical location
•the program to find the most up-to-date data without help from the user
•workflows to automatically take advantage of whatever compute power is available
•workflows to deliver results in a way which any user can understand
•the scientist to follow connections to other relevant data and tools using all the straightforward idioms of web browsing and hyperlinks.
App
licat
ion
Use
r in
terf
ace
App
licat
ion
inte
rfac
e
EMBRACE: Interconnectivity
EMBRACE: Approaches
•Defining an application interface•Design from the view of the user/application•Browser example
•User provides a query and a data type•Generate a list of results by data resource•Expand and browse the list, following links•Select some or all as input to analysis tools•Requires human-readable definitions
•Automation•A similar example, but with a program selecting and launching the analysis•Requires machine-readable definitions
EMBRACE Data Content
DNA sequence information Protein sequence information Genome annotation Macromolecular Structure Data Expression information Literature Orthologs Untranslated regions
Protein Families Alignments Protein/protein-associations Structural domainsGene3D ORFandDB SNPs in regulatory regions3D Electron Microscopy data
EMBRACE Analysis Tools
EMBOSSDNA sequence analysis Protein sequence analysis Pattern matching Genome annotationExpert systemsHidden Markov ModelsHomology searchesPhylogenetic analysisProtein structure analysisProtein structure comparison
Protein domain mappingMicroarrays and gene expressionBioinformatics workflowsBioinformatics tool environmentsProtein structure predictionElectron microscopyElectron microscope tomographySystems biology modellingText mining
EMBRACE: Technology Choice
•Promised deliverable is a survey of webservice and grid technologies•Will be made publicly available•To cover:
•European Grids and Bioinformatics (EGEE etc.)•Webservice standards•Grid service standards•Current standards•Emerging standards•Recommendations on technology adoption•Recommendations on further technology watch
•Technology test cases•Designed to demonstrate technology•Designed to show improvements in technology•Designed to highlight problems
GUI Interfaces: Taverna
Acknowledgements
(HGMP/RFCGR): Gary Williams, Tim Carver, Hugh Morgan, Claude Beesley, Damian Counsell, Val Curwen, Mark Faller, Sinead O’Leary, Thon deBoer, Martin Bishop
LION: (Thomas Laurent), (Bijay Jassal), Thure Etzold Sanger: (Ian Longden), (Richard Bruskiewich), Simon Kelley,
(Ewan Birney) EBI: Peter Rice, Alan Bleasby, Jon Ison, Lisa Mullan, (Martin
Senger), Tom Oinn, Rodrigo Lopez, Mahmut Uludag, Shaun McGlinchey
EMBnet: UK, Norway, Italy, Germany, Belgium, Argentina, China, Turkey, Israel, Canada, Manchester
Others: Don Gilbert, Will Gilbert, Rodger Staden, Bill Pearson, Catherine Letondal, Luke McCarthy, Susan Jean Johns, David Bauer, Andrew Lyall, Henrikki Almusa, Melody Clark, ....
Membrane protein Bioinformatics
Why membrane proteins Why structure
What type of structure. Brief history Current status
Arne Elofsson ([email protected])
Why membrane proteins
Membrane proteins Two types
Alpha helical membrane proteins 20% of the proteome 1% of PDB structures Important drug targets (50%)
Why Structure Next challenge for structural
bioinformatics ? Easier than 3D predictions of
globular proteins ?
Arne Elofsson ([email protected])
The traditional view of TM protein structure
Membrane proteins are “simple” Straight alpha-helices of length 21 Two-dimensional organization Secondary (i.e topology) prediction an
important step towards 3D-prediction. Current status of topology predictions. How many structures are and will be
available soon?
Arne Elofsson ([email protected])
Membrane protein structures increase with an exponential growth
White SH, Prot Sci 13, 2004
Arne Elofsson ([email protected])
History of membrane protein structure predictions
Secondary structure predictions, i.e Topology <1990 Hydrophobicity plots 1992 Positive inside rule (Toppred) 1994 Memsat 1998 HMMs (TMHMM and HMMTOP 2004 MSA+HMMs (prodiv-TMHMM) 2007 Combination of ANNs and models
Today 80% accurate topology
Arne Elofsson ([email protected])
3D predictions
Many studies on two systems Bacteriorhodopsin
Inside out protein Simple packing
Glycophorin A Helix-helix packing G-X-XG motif
Recently prediction efforts on other proteins Tasser FILM Rosetta
Arne Elofsson ([email protected])
First structure of a membrane
protein
TOPPRED
HMM based methods
Aquaporin structure
Glutamate transporter structure
ZPREDLasso
TOPMOD
1992
1975
1998 20062004
Recent membrane proteins have high structural complexity
1997
Interface helices (Granseth, JMB 2005)
78 interface helices ~50% of chains contain
an interface helix Average length is 9 aa Longest is 19 aa Most frequent in
photosynthetic reaction center
Reentrant regions (Viklund, JMB 2006)
36 reentrant helices 20 in new classification
24% contain reent. 72% on the outside Length 3-32 residues Loops 11-117
residues
Arne Elofsson ([email protected])
Results – predicting reentrant regions in complete genomes
0.280.720.24079Observed in dataset
0.520.480.167773E. coli
0.400.600.10757S. cerevisiae
0.540.460.154181H. sapiens
Reentrants in
Reentrants out
Reentrant fraction
ProteinsGenome
0.310.220.110.07Fraction
Channels
Active transporters
Electron transporters
Signal receptors
The not so simple TM proteins
Membrane protein structures are complex TM-helices ends at
different locations Different angles Neighbouring helices do
often interact Interface helices reentrant regions
No sheets close to the membrane
Arne Elofsson ([email protected])
More complex structures need new prediction methods
Nout
Cin
C
N
cytoplasm
periplasm
Membrane
Arne Elofsson ([email protected])
The Z-coordinate
Our Z-coordinate is the distance between a residue and the membrane center
Z
0
15
-15
Periplasm
Cytoplasm
Arne Elofsson ([email protected])
Final comments
We will all go together to the boat after the last lecture today.
Please fill out the form at the end of the program (or in front). EU wants it.
Arne Elofsson ([email protected])
Schedule
Wed Aug 22
10:00 Registration 11:00 Lunch at Restaurang Lantis (see map below) 12:15 Arne Elofsson, SU "Welcome and Embrace" 12:45 Per Larsson, SU "Using Taverna to Access SOAP based Web Services" 13:15 , "Genome scale-annotatio nof membrane proteins an dmore.." 14:30 David Jon ,es UK "Memsat-3" 14:45 Coffee 15:15 , "Z-coordinates of alpha-helical membrane protein :s predictio nand application "s 16:00 Håkan Viklund, SU "Predictio no f Reentrant regions and topology in TM protein "s 18:00-22:00 Dinner and Boat Cruise with / . Departure fro mNybrokajen 6 (see No 4 a t
)
Thu Aug 23 • 09:30 Andreas Bernsel, SU "An experimental TM-scale" • 10:15 Gunnar von Heijne, SU "Large Scale topology mapping and evolution" • 11:00 Coff ee • 11:30 Anna Johansson, SU "Simulations of membranes and membrane proteins" • 12:15 "Structural genomics of membrane proteins: fold space and target
selection" • 13:00 Lunch at Restaurang Lantis (See map below) • 14.30 , "The TM project" • 15:15 Patrick Barth, University of Washington "Toward high-resolution prediction and
design of membrane protein structures using ROSETTA" • 16:00 Coff ee • 16:30 Björn Wallner, University of Washington "Improved coarse grained modelling in
ROSSimulations of membranes and membrane proteins • 17:15 Arne Elofsson, "Evolution of membrane protein structure" • 17:45 Per Larsson, SU "Demo: Using webservices for membrane protein predictions" • 19:30 Dinner for speakers