open access chemistry
TRANSCRIPT
-
8/9/2019 Open Access Chemistry
1/2
Sarah Houlton, London, UK
The Wellcme Tust has ecentlyawaded a 5-yea, UK4.7 millin
gant t tansfe well-stuctuedchemgenmics data fmthe publicly listed cmpany
Galapags t the EupeanMlecula Bilgy LabatysEupean Biinfmatics Institute
(EMBL-EBI). The data will beincpated int the Institutescllectin f pen-access data
esuces f bimedical eseach,
and maintained by a team that
is nw being ecuited.EMBL-EBI, based in Hinxtn,
Cambidge, UK, had aleady
identified the stategic need fa chemgenmics data esucet help tanslate insights fm
the Human Genme Pject intmedical adances. Janet Thntn,Diect f EMBL-EMI, says:
Chemgenmics data ae anessential cmpnent in futue dugdiscey effts, but the alue f
this is nly pactically ealized when
such data ae effectiely integated
against genme databasesand functinal-genmics data.
Public databases f
chemgenmics data hae beenestablished in ecent yeas, thelagest f which is PubChem
(http://pubchem.ncbi.nlm.nih.g), hsted by the US NatinalInstitutes f Health. Hwee, lack
f cuatin f publicly depsiteddata is a significant limitatin tits utility (Nature Rev. Drug Discov.
7, 632633; 2008). Als, as yet,
Wcom boot forop-cc chmtrPhthropc cquto gv th cdmc chmogomc commut vubcc to w-curtd proprtr dt.
NATUrE rEvIEWS |drug discovery voLUME 7 | oCToBEr 2008 |789
neWs & analysis
http://pubchem.ncbi.nlm.nih.gov/http://pubchem.ncbi.nlm.nih.gov/http://pubchem.ncbi.nlm.nih.gov/http://pubchem.ncbi.nlm.nih.gov/ -
8/9/2019 Open Access Chemistry
2/2
the natue f the data paticulaly data
that culd be aluable f dug disceyeffts is nt yet cmpaable with thataailable in typical phamaceutical cmpany
databases (Nature Rev. Drug Discov. 5,707708; 2006).
The Wellcme Tusts acquisitin f
Galapagss data is set t change this.I think thee will be a numbe f immediatewins f the bilgical cmmunity, such
as gaining apid access t defined lists fcmpunds that ae likely t mdulatespecific genes, gene families pathways,
says Jhn oeingtn, Seni Diect fDiscey Infmatics at Galapags, whwas clsely inled in negtiating the
tansfe.The Galapags databases stated life
at Inphamatica, a infmatics spin-ut
cmpany f the Uniesity CllegeLndn, UK, acquied by Galapags in
2006. Inphamatica began in-huse dugdiscey pjects in 2000 t explit itstechnlgy platfm thugh identifyinggd tagets. We had a gd idea f what
made a duggable gene, and stated t linkthis t chemical databases, says oeingtn.Nthing was aleady aailable that allwed
us t make this leap between a pteinsequence and a small mlecule, peptide ptein theapeutic, which had sme defined
effect n a taget.S they deelped a lage-scale
stuctueactiity elatinship (SAr)
database, StArlite (SArs in the liteatue),which links tagets, functinal assay esults,and absptin, distibutin, metablism,
eliminatin and txicity (ADMET)ppeties t cmpund stuctue and tagetsequence stuctue. It cuently cntains
450,000 distinct cmpunds 35% f whichmeet all f the ule f fie citeia f aldug biaailability 2 millin biactiities
and nealy 4,000 mlecula tagets.A key featue f u database is that
we ecd the bilgical effects f makingchanges t a mlecules stuctue. If it ismade hydphbic bigge, des it affect,
say, cell penetatin actiity? oeingtnexplains. These diffeences ae ften key tcneting an in vitro tl cmpund int
a useful dug.Anthe maj infmatics challenge
is the difficulty in seaching acss public
and ppietay databases because fdiffeent data stuctues. An integatinlaye, SArfai, was theefe deelped
by Inphamatica t enable cmpaniest incpate thei in-huse data intthe database. Fist we deelped a
G-ptein-cupled ecept platfm and
then a kinase system, says oeingtn.
Futue plans include anthe SArfaif antibacteials, t suppt integatedchemisty-led and bilgy-led taget
selectin.In additin, a elated database called
CandiSte cntains the stuctues, tagets
and latest deelpment stage f clinicaldeelpment candidates, including bthsmall mlecules and the classes f
theapeutics. The aim is t tack dugfailues and undestand why they failedits a wk in pgess; sme aeas ae
well-ppulated and thes will be builtusing the gant, says oeingtn.
In the lng tem, oeingtn explains thatthey may intduce a depsitin mechanismf new data, but they d nt want t eplicate
PubChem, as ne f the stengths f StArliteis the cuated, cnsistent natue f the data.This is always ging t be a dawback with
lage epsity-like databases such asPubChem, explains its Diect Stee Byant.ou infmatics challenge is hw t cmpae
chemicals effectiely when diffeent chemistsdaw them diffeently, he says. Thee aeals bilgical challenges hw t descibe
what was dne. Wee asked depsits tpide a bttm-line summay what aethe tue psities, the mst actie chemicals
in thei pimay sceens, and s n.These ae ey infmatie t nn-expetswhen we hae them.
This cnsistency f data is the bigdiffeence between the new EMBL-EBI
database and the publicly accessibledatabases, says Andew Hpkins,pfess f medicinal infmatics at
the Uniesity f Dundee, UK. Its beennmalized f seaching and designed tbe mined, he says. Haing t dwnlad
eeything and efmat it befe it can beseached effectiely puts peple ff, andthees n guaantee that the seach wuld
succeed.Paul Clemns, Diect f Cmputatinal
Chemical Bilgy reseach at the Bad
Institute which hsts ChemBank(http://chembank.bad.haad.edu ),anthe lage cuated database f small
mlecules and bilgical sceens that is
feely aailable thinks that public access
t well-cuated chemgenmics databasesis cucial. They let ceatie academicshae access t the st f data that each
phama cmpany has sepaately had fmany yeas, he says. We ccasinally heaciticism fm industy clleagues that sme
f the data-mining actiities wee dinghae been dne befe. But wee nly nwgetting access t these data sets, s Im sue
academics hae smetimes edeelpedanalysis methds that wee deelpedsecetly in phama befe.
As well as the new pptunitiesf academic gups t deelpcheminfmatics appaches, thee is
gwing excitement at the pspect f feeaccess t the EMBL-EBI data f publicdug discey pjects in aeas such as
neglected diseases. I want t be ne f thefist adptes thees a wealth f ideas we
want t ty, says Hpkins. A gd exampleis that we can nw impe upn theiginal duggability analysis f neglecteddisease pathgens [see http://TDrtagets.
g] by linking pedicted duggability tsets f chemical tls f sceening. In thepeius iteatin, the chemisties culdnt
be disclsed.Anthe example, he cntinues, is that
we can use this lage public data set t
build lage-scale itual assay banks usingmachine leaning pcesses that lean fmthe undelying data t pedict new bilgical
actiities f cmpunds.Clemns als highlights the alue f
chemical bilgy in geneal. I think that the
biggest benefit in the end will be the ability tmake new and me specific tl cmpundsf cell bilgical eseach that leeage the
infmatin abut whats been made befe,and what it did when it was expsed tbilgical assays.
It is anticipated that scientists will beable t stat using the database in ealy 2009.
The funding will fist ensue that the datawill be aailable as a cmplete dwnladabledatabase f lcal installatin. Use-fiendly
web-based fnt-ends and pgammableweb seices ae expected t fllw.
Afte 5 yeas, EMBL-EBI will need t
find futhe funding f maintenance andcuatin f the database. The futuedepends n hw it deelps, says Thntn.
In the lnge tem, we ae hping it willbe pat f Elixi, the lage infastuctuepject f bilgical data, but that is a lng
way ff. EMBL-EBI was ey keen tacquie these data and thee is a lt f scpein the futue f deelping esuces
aund them.
There will be a number
of immediate wins for the
biological community.
N e w s & a N a l y s i s
790 | oCToBEr 2008 | voLUME 7 www.nat.m/w/
http://chembank.broad.harvard.edu/http://tdrtargets.org/http://tdrtargets.org/http://tdrtargets.org/http://tdrtargets.org/http://chembank.broad.harvard.edu/