advanced ncbi
TRANSCRIPT
Advanced NCBI.The Entrez API
https://github.com/lindenb/courses
Pierre Lindenbaum@yokofakun
[email protected]://plindenbaum.blogspot.com
Institut du Thorax. Nantes. France
September 27, 2016
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
NCBI ? What about EBI, ENSEMBL, ...
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
What will be covered today? :
File formats...
EInfo, GQuery, ESearch , Esummary, EFetch..
processing XML answer with XSLT: HTML, SVG, R...
generating a java parser for dbSNP.
NCBI EBot
using standalone BLAST
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
CURL
c u r l ” h t t p : / / en . w i k i p e d i a . o rg / w i k i / Main page ”wget −O − ” h t t p : / / en . w i k i p e d i a . o rg / w i k i / Main page ”
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
XML
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
XSLT
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
XSLT
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
XSLTPROC
x s l t p r o c s t y l e s h e e t . x s l f i l e . xml > r e s u l t . xml
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
JSON
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Formats
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
FormatsGenbank
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.
fcgi?db=nucleotide&id=25&rettype=gb
LOCUS X53813 422 bp DNA l i n e a r MAM 22−JUN−1992DEFINITION Blue Whale heavy s a t e l l i t e DNA.ACCESSION X53813 X17460VERSION X53813 . 1 GI : 25KEYWORDS s a t e l l i t e DNA.SOURCE Ba l a enop t e r a muscu lus ( Blue whale )
ORGANISM Ba la enop t e r a muscu lusEukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ;Mammalia ; Eu t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ;My s t i c e t i ; B a l a e nop t e r i d a e ; Ba l a enop t e r a .
REFERENCE 1 ( ba se s 1 to 422)AUTHORS Arnason ,U. and Widegren ,B .TITLE Compos i t ion and chromosomal l o c a l i z a t i o n o f c e t acean h i g h l y
r e p e t i t i v e DNA with s p e c i a l r e f e r e n c e to the b l u e whale ,Ba l a enop t e r a muscu lus
JOURNAL Chromosoma 98 (5 ) , 323−329 (1989)PUBMED 2612291
COMMENT See a l s o <X52700−2> f o r 1 ,760 bp common ce tacean component c l o n e sand <X52703−6>,<X53811−4> f o r the 422 bp heavy s a t e l l i t e c l o n e s .
FEATURES Loca t i on / Q u a l i f i e r ss ou r c e 1 . . 4 2 2
/ organ i sm=”Ba l a enop t e r a muscu lus ”/mo l type=”genomic DNA”/ db x r e f=”taxon :9771”/ c l o n e=”7”
m i s c f e a t u r e 1 . . 4 2 2/ note=”heavy s a t e l l i t e DNA”
ORIGIN1 t a g t t a t t c a a c c t a t c c c a c t c t c t a g a t a c c c c t t a g c acgtaaagga a t a t t a t t t g
61 ggggtccagc ca tggagaa t ag t t t a ga c a c tagga tgag ataaggaaca c a c c c a t t c t121 aaagaaatca c a t t a g g a t t c t c t t t t t a a g c t g t t c c t t aaaacac tag ag t c t t a gaa181 a t c t a t t g g a ggcagaagca gtcaagggta g c c t aggg t t agggt taggc t t a ggg t t a g241 gg t t aggg ta cggc t taggg t a c t g t t t c g gggaggggtt caggtacggc g taggg ta tg301 gg t t a ggg t t agggt taggg t t a g t g t t a g gg t t agggc t cgg t t t aggg t a cggg t t ag361 ga t t aggg ta cg tg t t aggg t t aggg tagg g c t t a g g g t t agggtacgtg t t a ggg t t a g421 gg
//
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
FormatsASN.1
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.
fcgi?db=nucleotide&id=25
Seq−e n t r y : := seq {i d {
embl {a c c e s s i o n ”X53813” ,v e r s i o n 1 } ,
g i 25 } ,d e s c r {
t i t l e ”Blue Whale heavy s a t e l l i t e DNA” ,s ou r c e {
org {taxname ” Ba l a enop t e r a muscu lus ” ,common ”Blue whale ” ,db {{
db ” taxon ” ,tag
i d 9771 } } ,orgname {
nameb i nom i a l {
genus ” Ba l a enop t e r a ” ,s p e c i e s ”muscu lus ” } ,
l i n e a g e ” Eukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ;Eu t e l e o s t om i ; Mammalia ; Eu t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ;My s t i c e t i ; B a l a e nop t e r i d a e ; Ba l a enop t e r a ” ,
gcode 1 ,mgcode 2 ,d i v ”MAM” } } ,
sub type {{
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
FormatsASN.1 (schema)
http:
//www.ncbi.nlm.nih.gov/data_specs/asn/insdseq.asn
INSDSeq : := SEQUENCE {l o c u s V i s i b l e S t r i n g ,l e n g t h INTEGER ,s t r a nd edn e s s V i s i b l e S t r i n g OPTIONAL ,moltype V i s i b l e S t r i n g ,t opo l ogy V i s i b l e S t r i n g OPTIONAL ,d i v i s i o n V i s i b l e S t r i n g ,update−date V i s i b l e S t r i n g ,c r e a t e−date V i s i b l e S t r i n g OPTIONAL ,update−r e l e a s e V i s i b l e S t r i n g OPTIONAL ,c r e a t e−r e l e a s e V i s i b l e S t r i n g OPTIONAL ,d e f i n i t i o n V i s i b l e S t r i n g ,pr imary−a c c e s s i o n V i s i b l e S t r i n g OPTIONAL ,ent ry−v e r s i o n V i s i b l e S t r i n g OPTIONAL ,a c c e s s i o n−v e r s i o n V i s i b l e S t r i n g OPTIONAL ,othe r−s e q i d s SEQUENCE OF INSDSeqid OPTIONAL ,secondary−a c c e s s i o n s SEQUENCE OF INSDSecondary−accn OPTIONAL,p r o j e c t V i s i b l e S t r i n g OPTIONAL ,keywords SEQUENCE OF INSDKeyword OPTIONAL ,segment V i s i b l e S t r i n g OPTIONAL ,s ou r c e V i s i b l e S t r i n g OPTIONAL ,organ i sm V i s i b l e S t r i n g OPTIONAL ,taxonomy V i s i b l e S t r i n g OPTIONAL ,r e f e r e n c e s SEQUENCE OF INSDReference OPTIONAL ,comment V i s i b l e S t r i n g OPTIONAL ,comment−s e t SEQUENCE OF INSDComment OPTIONAL ,s t r u c−comments SEQUENCE OF INSDStrucComment OPTIONAL ,p r imary V i s i b l e S t r i n g OPTIONAL ,source−db V i s i b l e S t r i n g OPTIONAL ,database−r e f e r e n c e V i s i b l e S t r i n g OPTIONAL ,f e a t u r e−t a b l e SEQUENCE OF INSDFeature OPTIONAL ,f e a t u r e−s e t SEQUENCE OF INSDFeatureSet OPTIONAL ,sequence V i s i b l e S t r i n g OPTIONAL , −− Opt i ona l f o r con t i g , wgs , e t c .c o n t i g V i s i b l e S t r i n g OPTIONAL ,a l t−seq SEQUENCE OF INSDAltSeqData OPTIONAL
}
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
FormatsASN.1 (tools)
DATATOOLGenerate C++ data storage classes based on ASN.1 serialization
streams.Convert data between ASN.1, XML and JSON formats.
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
FormatsXML
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.
fcgi?db=nucleotide&id=25&retmode=xml
<?xml v e r s i o n=” 1 .0 ”?><!DOCTYPE GBSet PUBLIC ”−//NCBI//NCBI GBSeq/EN” ” h t t p : //www. ncb i . nlm . n i h . gov/ dtd /NCBI GBSeq . dtd ”><GBSet>
<GBSeq><GBSeq locus>X53813</GBSeq locus><GBSeq length>422</GBSeq length><GBSeq st randedness>doub l e</GBSeq st randedness><GBSeq moltype>DNA</GBSeq moltype><GBSeq topology> l i n e a r</GBSeq topology><GBSeq d i v i s i o n>MAM</ GBSeq d i v i s i o n><GBSeq update−date>22−JUN−1992</GBSeq update−date><GBSeq create−date>13−JUL−1990</GBSeq create−date><GBSeq d e f i n i t i o n>Blue Whale heavy s a t e l l i t e DNA</ GBSeq d e f i n i t i o n><GBSeq primary−a c c e s s i o n>X53813</GBSeq primary−a c c e s s i o n><GBSeq access ion−v e r s i o n>X53813 . 1</GBSeq access ion−v e r s i o n><GBSeq other−s e q i d s>
<GBSeqid>emb |X53813 . 1 |</GBSeqid><GBSeqid>g i |25</GBSeqid>
</GBSeq other−s e q i d s><GBSeq secondary−a c c e s s i o n s>
<GBSecondary−accn>X17460</GBSecondary−accn></GBSeq secondary−a c c e s s i o n s><GBSeq keywords>
<GBKeyword> s a t e l l i t e DNA</GBKeyword></GBSeq keywords><GBSeq source>Ba laenop t e r a muscu lus ( Blue whale )</GBSeq source><GBSeq organism>Ba laenop t e r a muscu lus</GBSeq organism><GBSeq taxonomy>Eukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ; Mammalia ; Eu t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d
a c t y l a ; Cetacea ; My s t i c e t i ; B a l a e nop t e r i d a e ; Ba l a enop t e r a</GBSeq taxonomy><GBSeq r e f e r ence s>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
FormatsXML (DTD)
http://www.ncbi.nlm.nih.gov/dtd/NCBI_GBSeq.mod.dtd
<!ELEMENT GBSeq (GBSeq locus ,GBSeq length ,GBSeq s t randedness ? ,GBSeq moltype ,GBSeq topology ? ,GBSeq d i v i s i on ,GBSeq update−date ,GBSeq create−date ? ,GBSeq update−r e l e a s e ? ,GBSeq create−r e l e a s e ? ,GBSeq de f i n i t i o n ,GBSeq primary−a c c e s s i o n ? ,GBSeq entry−v e r s i o n ? ,GBSeq access ion−v e r s i o n ? ,GBSeq other−s e q i d s ? ,GBSeq secondary−a c c e s s i o n s ? ,GBSeq pro j ec t ? ,GBSeq keywords ? ,GBSeq segment ? ,GBSeq source ? ,GBSeq organism ? ,GBSeq taxonomy ? ,GBSeq r e f e r ence s ? ,GBSeq comment ? ,GBSeq comment−s e t ? ,GBSeq struc−comments ? ,( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
E-Utilities
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
GI
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
GI
http://www.ncbi.nlm.nih.gov/news/
03-02-2016-phase-out-of-GI-numbers/ : ”NCBI is phasingout sequence GIs - use Accession.Version instead!”
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
E-Utilities
Set of seven server-side programs that provide a stable interface tothe search, retrieval, and linking functions of the Entrez system,
using a fixed URL syntax.The output provided by the E-Utilities is in XML format,
sometimes JSON, (...)
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Entrez Direct
http://www.ncbi.nlm.nih.gov/books/NBK179288/ ”EntrezDirect (EDirect) is an advanced method for accessing the NCBI’sset of interconnected databases (publication, sequence, structure,gene, variation, expression, etc.) from a UNIX terminal window.
Functions take search terms from command-line arguments.Individual operations are combined to build multi-step queries.
Record retrieval and formatting normally complete the process.”
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EInfo
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EInfo
Provides a list of the names of all valid Entrez databases.Provides statistics for a single database, including lists of indexing
fields and available link names.
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EInfo
Base URL:https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EInfoXML Ouput
https:
//eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi
<e I n f o R e s u l t><D bLi s t>
<DbName>pubmed</DbName><DbName>p r o t e i n</DbName><DbName>n u c c o r e</DbName><DbName>n u c l e o t i d e</DbName><DbName>n u c g s s</DbName><DbName>n u c e s t</DbName><DbName>s t r u c t u r e</DbName><DbName>genome</DbName><DbName>a s s e m b l y</DbName><DbName>g c a s s e m b l y</DbName><DbName>genomepr j</DbName><DbName>b i o p r o j e c t</DbName><DbName>b i o s a m p l e</DbName><DbName>b i o s y s t e m s</DbName><DbName>b l a s t d b i n f o</DbName><DbName>books</DbName><DbName>cdd</DbName><DbName>c l i n v a r</DbName>
( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EInfoJSON Ouput
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.
fcgi?retmode=json
{” h e a d e r ” : {
” t y p e ” : ” e i n f o ” ,” v e r s i o n ” : ” 0 . 3 ”
} ,” e i n f o r e s u l t ” : {
” d b l i s t ” : [”pubmed ” ,” p r o t e i n ” ,” n u c c o r e ” ,
( . . . )” u n i g e n e ” ,” g e n c o l l ” ,” g t r ”
]}
}Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EInfo
Return statistics for a given Entrez database:https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?
db=DbName
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EInfoStatistics for Pubmed
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.
fcgi?db=pubmed
<?xml v e r s i o n=” 1 .0 ”?><e I n f o R e s u l t>
<DbInfo><DbName>pubmed</DbName><MenuName>PubMed</MenuName><De s c r i p t i o n>PubMed b i b l i o g r a p h i c r e c o r d</ De s c r i p t i o n><DbBui ld>Bui ld130805−2117m.4</DbBui ld><Count>22974581</Count><LastUpdate>2013/08/06 08 :33</ LastUpdate><F i e l d L i s t>
( . . . )<F i e l d>
<Name>UID</Name><FullName>UID</FullName><De s c r i p t i o n>Unique number a s s i g n e d to p u b l i c a t i o n</ De s c r i p t i o n><TermCount>0</TermCount><I sDa t e>N</ I sDa t e><I sNume r i c a l>Y</ I sNume r i c a l><S ing l eToken>Y</ S ing l eToken><H i e r a r c h y>N</ H i e r a r c h y><I sH idden>Y</ I sH idden>
</ F i e l d><F i e l d>
( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EInfoStatistics for Pubmed
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.
fcgi?db=pubmed&retmode=json
{” heade r ” : {
” type ” : ” e i n f o ” ,” v e r s i o n ” : ”0 .3”
} ,” e i n f o r e s u l t ” : {
” db i n f o ” : {”dbname ” : ”pubmed” ,”menuname ” : ”PubMed” ,” d e s c r i p t i o n ” : ”PubMed b i b l i o g r a p h i c r e c o r d ” ,” dbbu i l d ” : ”Bui ld160921−2207m.6” ,” count ” : ”26470199” ,” l a s t u p d a t e ” : ”2016/09/22 16 :32” ,” f i e l d l i s t ” : [
{”name ” : ”ALL” ,” f u l l n ame ” : ” A l l F i e l d s ” ,” d e s c r i p t i o n ” : ” A l l te rms from a l l s e a r c h a b l e f i e l d s ” ,” termcount ” : ”179424126” ,” i s d a t e ” : ”N” ,” i s n um e r i c a l ” : ”N” ,” s i n g l e t o k e n ” : ”N” ,” h i e r a r c h y ” : ”N” ,” i s h i d d e n ” : ”N”
} ,{
”name ” : ”UID” ,” f u l l n ame ” : ”UID” ,” d e s c r i p t i o n ” : ”Unique number a s s i g n e d to p u b l i c a t i o n ” ,
( . . . )Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EInfoWith entrez-direct
$ e i n f o −dbs$ e i n f o −db pubmed
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
GQuery
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
GQuery
Provides the number of records retrieved in all Entrez databases bya single text query.
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
GQueryExample
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ gquery ? term=ty r anno s au r u s%20r e x&retmode=xml”
<Re su l t><Term>t y r a nno s au r u s r e x</Term><eGQueryResu l t>
<Resu l t I t em><DbName>pubmed</DbName><MenuName/><Count>41</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>
<Resu l t I t em><DbName>pmc</DbName><MenuName/><Count>160</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>
<Resu l t I t em><DbName>mesh</DbName><MenuName/><Count>15</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>
<Resu l t I t em><DbName>books</DbName><MenuName/><Count>179</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>
<Resu l t I t em><DbName>pubmedhealth</DbName><MenuName/><Count>21</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>
<Resu l t I t em><DbName>omim</DbName><MenuName/><Count>10</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>
<Resu l t I t em><DbName>omia</DbName><MenuName/><Count>0</Count><Sta tu s>Termor Database i s not found</ S ta tu s></ Re su l t I t em>
<Resu l t I t em><DbName>n c b i s e a r c h</DbName><MenuName/><Count>1</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>
<Resu l t I t em><DbName>nucco re</DbName><MenuName/><Count>0</Count><Sta tu s>Term or Database i s not found</ S ta tu s></ Re su l t I t em>
( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
GQueryTransforming to HTML using XSLT
The XSLT stylesheet. https://raw.githubusercontent.com/
lindenb/courses/master/about.ncbi/gquery2html.xsl
1 <?xml v e r s i o n=’ 1 .0 ’ encod ing=”UTF−8” ?>2 <x s l : s t y l e s h e e t xm l n s : x s l= ’ h t t p : //www.w3 . org /1999/XSL/Transform ’ v e r s i o n=’ 1 .0 ’>3 <x s l : o u t p u t method=”html ”/>45 <x s l : t em p l a t e match=”/”><html><body>6 <x s l : a p p l y−t emp l a t e s s e l e c t=” Re su l t ”/>7 </body></html></ x s l : t em p l a t e>89 <x s l : t em p l a t e match=” Re su l t ”>
10 <t a b l e><c ap t i o n><x s l : v a l u e−o f s e l e c t=”Term”/></ c ap t i o n>11 <t r><th>Database</ th><th>Count</ th><th>Sta tu s</ th></ t r>12 <x s l : a p p l y−t emp l a t e s s e l e c t=” eGQueryResu l t / Re su l t I t em ”/>13 </ t a b l e>14 </ x s l : t em p l a t e>1516 <x s l : t em p l a t e match=” Re su l t I t em ”>17 <t r>18 <td><a>19 <x s l : a t t r i b u t e name=” h r e f ”>h t t p : //www. ncb i . nlm . n i h . gov/<x s l : v a l u e−o f s e l e c t=”
DbName”/>?cmd=sea r ch& ; term=<x s l : v a l u e−o f s e l e c t=” t r a n s l a t e (/ Re s u l t /Term, ’ ’ , ’+ ’ ) ”/></ x s l : a t t r i b u t e>
20 <x s l : v a l u e−o f s e l e c t=”DbName”/></a></ td>21 <td><x s l : v a l u e−o f s e l e c t=”Count”/></ td>22 <td><x s l : v a l u e−o f s e l e c t=” Sta tu s ”/></ td>23 </ t r>24 </ x s l : t em p l a t e>2526 </ x s l : s t y l e s h e e t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
GQueryTransforming to HTML
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ gquery ? term=ty r anno s au r u s%20r e x&retmode=xml” |\
x s l t p r o c gquery2html . x s l −
<html><body>
<t a b l e><capt i on>t y r a nno s au r u s r e x</ capt i on><t r>
<th>Database</ th><th>Count</ th><th>Sta tu s</ th>
</ t r><t r>
<td><a h r e f=” h t t p s : //www. ncb i . nlm . n i h . gov/pubmed?cmd=sea r ch& ; term=ty r anno s au r u s+r e x ”>pubmed</a>
</ td><td>41</ td><td>Ok</ td>
</ t r><t r>
<td><a h r e f=” h t t p s : //www. ncb i . nlm . n i h . gov/pmc?cmd=sea r c h& ; term=ty r anno s au r u s+r e x ”>pmc</a>
</ td><td>160</ td><td>Ok</ td>
</ t r><t r>
<td><a h r e f=” h t t p s : //www. ncb i . nlm . n i h . gov/mesh?cmd=sea r ch& ; term=ty r anno s au r u s+r e x ”>mesh</a>
</ td><td>15</ td>Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESearch
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESearch
Provides a list of UIDs matching a text query
Posts the results of a search on the History server
Downloads all UIDs from a dataset stored on the Historyserver
Combines or limits UID datasets stored on the History server
Sorts sets of UIDs
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESearchSyntax
Base URL https:
//eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESearchSearching for ’Mammuthus primigenius’
c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D” |\
xm l l i n t −−fo rmat −
<eSea r c hRe su l t><Count>684</Count><RetMax>20</RetMax><Re tS t a r t>0</ Re tS t a r t><I d L i s t>
<I d>507866428</ Id><I d>124056416</ Id><I d>383843869</ Id><I d>383843867</ Id><I d>383843865</ Id><I d>383843863</ Id><I d>383843861</ Id><I d>383843859</ Id><I d>383843857</ Id><I d>383843855</ Id><I d>383843853</ Id><I d>383843851</ Id><I d>383843849</ Id><I d>383843847</ Id><I d>383843845</ Id><I d>157367690</ Id><I d>157367676</ Id><I d>157367662</ Id><I d>157367648</ Id><I d>157367634</ Id>
</ I d L i s t><Tr a n s l a t i o n S e t>
<T r a n s l a t i o n><From>”Mammuthus p r im i g e n i u s ” [ORGN]</From><To>”Mammuthus p r im i g e n i u s ” [ Organism ]</To>
</ T r a n s l a t i o n></ T r a n s l a t i o n S e t><Tran s l a t i o nS t a c k>
<TermSet><Term>”Mammuthus p r im i g e n i u s ” [ Organism ]</Term><F i e l d>Organism</ F i e l d><Count>684</Count><Exp lode>Y</ Exp lode>
</TermSet><OP>GROUP</OP>
</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>”Mammuthus p r im i g e n i u s ” [ Organism ]</ Que r yT ran s l a t i o n>
</ eSea r chRe su l t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESearchSearching for ’Mammuthus primigenius’ (JSON)
c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&retmode=j s o n ”
{” heade r ” : {
” type ” : ” e s e a r c h ” ,” v e r s i o n ” : ”0 .3”
} ,” e s e a r c h r e s u l t ” : {
” count ” : ”811” ,” retmax ” : ”20” ,” r e t s t a r t ” : ”0” ,” i d l i s t ” : [
”1059791223” ,”198241525” ,”198241523” ,”198241521” ,”198241519” ,”198241517” ,”198241515” ,”198241513” ,”198241511” ,”198241509” ,”198241507” ,”198241505” ,”198241503” ,”198241501” ,”198241499” ,”198241497” ,”198241495” ,”198241493” ,”198241491” ,”198241489”
] ,” t r a n s l a t i o n s e t ” : [
{” from ” : ”\”Mammuthus p r im i g e n i u s \”[ORGN]” ,” to ” : ”\”Mammuthus p r im i g e n i u s \”[ Organism ]”
}] ,” t r a n s l a t i o n s t a c k ” : [
{” term ” : ”\”Mammuthus p r im i g e n i u s \”[ Organism ] ” ,” f i e l d ” : ”Organism ” ,” count ” : ”811” ,” exp l ode ” : ”Y”
} ,”GROUP”
] ,” q u e r y t r a n s l a t i o n ” : ”\”Mammuthus p r im i g e n i u s \”[ Organism ]”
}}
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESearchthe retmax parameter
c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&retmax=2” |\
xm l l i n t −−fo rmat −
<eSea r c hRe su l t><Count>684</Count><RetMax>2</RetMax><Re tS t a r t>0</ Re tS t a r t><I d L i s t>
<I d>507866428</ Id><I d>124056416</ Id>
</ I d L i s t><Tr a n s l a t i o n S e t>
<T r a n s l a t i o n><From>”Mammuthus p r im i g e n i u s ” [ORGN]</From><To>”Mammuthus p r im i g e n i u s ” [ Organism ]</To>
</ T r a n s l a t i o n></ T r a n s l a t i o n S e t><Tran s l a t i o nS t a c k>
<TermSet><Term>”Mammuthus p r im i g e n i u s ” [ Organism ]</Term><F i e l d>Organism</ F i e l d><Count>684</Count><Exp lode>Y</ Exp lode>
</TermSet><OP>GROUP</OP>
</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>”Mammuthus p r im i g e n i u s ” [ Organism ]</ Que r yT ran s l a t i o n>
</ eSea r chRe su l t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESearchthe retstart parameter
c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&retmax=3&r e t s t a r t =100” |\
xm l l i n t −−fo rmat −
<eSea r c hRe su l t><Count>684</Count><RetMax>3</RetMax><Re tS t a r t>100</ Re tS t a r t><I d L i s t>
<I d>300810656</ Id><I d>300810655</ Id><I d>300810654</ Id>
</ I d L i s t><Tr a n s l a t i o n S e t>
<T r a n s l a t i o n><From>”Mammuthus p r im i g e n i u s ” [ORGN]</From><To>”Mammuthus p r im i g e n i u s ” [ Organism ]</To>
</ T r a n s l a t i o n></ T r a n s l a t i o n S e t><Tran s l a t i o nS t a c k>
<TermSet><Term>”Mammuthus p r im i g e n i u s ” [ Organism ]</Term><F i e l d>Organism</ F i e l d><Count>684</Count><Exp lode>Y</ Exp lode>
</TermSet><OP>GROUP</OP>
</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>”Mammuthus p r im i g e n i u s ” [ Organism ]</ Que r yT ran s l a t i o n>
</ eSea r chRe su l t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESearchrettype=retcount
c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&r e t t y p e=count ” |\
xm l l i n t −−fo rmat −
<e S e a r c h R e s u l t><Count>684</ Count>
</ e S e a r c h R e s u l t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESearchsort=Date Released
c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&s o r t=Date+Re l ea s ed ”
xm l l i n t −−fo rmat −
<e S e a r c h R e s u l t><Count>811</ Count><RetMax>20</RetMax><R e t S t a r t>0</ R e t S t a r t>< I d L i s t><I d>1033204644</ I d><I d>1033204658</ I d><I d>1033204672</ I d><I d>1033204686</ I d><I d>1033204729</ I d><I d>1033204771</ I d><I d>1033204785</ I d><I d>1033204799</ I d><I d>1033204813</ I d><I d>1033204827</ I d><I d>1033204871</ I d><I d>1033205124</ I d><I d>1033205194</ I d><I d>1033205208</ I d><I d>1033205222</ I d><I d>1033205236</ I d><I d>1033205264</ I d><I d>1033205390</ I d>( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESummary
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESummarySyntax
Returns document summaries (DocSums) for a list of inputUIDs
Returns DocSums for a set of UIDs stored on the EntrezHistory server
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESummarySyntax
Base URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=(DB)&id=(TERM)
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESummaryRetrieve nucleotide gi=507866428
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=nu c l e o t i d e&i d =507866428”
<eSummaryResult><DocSum><I d>507866428</ Id><I tem Name=”Capt ion ” Type=” S t r i n g ”>KC524742</ Item><I tem Name=” T i t l e ” Type=” S t r i n g ”>Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene , p a r t i a l cds</ Item><I tem Name=”Ext ra ” Type=” S t r i n g ”>g i |507866428 | gb |KC524742 . 1 | [ 5 0 7866428 ]</ Item><I tem Name=”Gi ” Type=” I n t e g e r ”>507866428</ Item><I tem Name=”CreateDate ” Type=” S t r i n g ”>2013/06/15</ Item><I tem Name=”UpdateDate” Type=” S t r i n g ”>2013/06/21</ Item><I tem Name=” F l ag s ” Type=” I n t e g e r ”>0</ Item><I tem Name=”TaxId ” Type=” I n t e g e r ”>37349</ Item><I tem Name=”Length ” Type=” I n t e g e r ”>9042</ Item><I tem Name=” Sta tu s ” Type=” S t r i n g ”> l i v e</ Item><I tem Name=”ReplacedBy ” Type=” S t r i n g ”></ Item><I tem Name=”Comment” Type=” S t r i n g ”><! [CDATA[ ] ]></ Item></DocSum></ eSummaryResult>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESummaryRetrieve nucleotide gi=507866428 in JSON
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=nu c l e o t i d e&i d =507866428& retmode=j s o n ”
{” heade r ” : {
” type ” : ”esummary ” ,” v e r s i o n ” : ”0 .3”
} ,” r e s u l t ” : {
” u i d s ” : [”507866428”
] ,”507866428”: {
” u id ” : ”507866428” ,” c ap t i o n ” : ”KC524742 ” ,” t i t l e ” : ”Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene , p a r t i a l cds ” ,” e x t r a ” : ” g i |507866428 | gb |KC524742 . 1 | ” ,” g i ” : 507866428 ,” c r e a t e d a t e ” : ”2013/06/15” ,” updatedate ” : ”2013/06/21” ,” f l a g s ” : ”” ,” t a x i d ” : 37349 ,” s l e n ” : 9042 ,” b iomol ” : ” genomic ” ,”moltype ” : ”dna ” ,” t opo l ogy ” : ” l i n e a r ” ,” sou rcedb ” : ” i n s d ” ,” s e g s e t s i z e ” : ”” ,” p r o j e c t i d ” : ”0” ,
( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESummaryRetrieve snp rs25
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=snp&i d=25”
<eSummaryResult><DocSum><I d>25</ Id><I tem Name=”SNP ID” Type=” I n t e g e r ”>25</ Item><I tem Name=”Organism” Type=” S t r i n g ”></ Item><I tem Name=”ALLELE ORIGIN” Type=” S t r i n g ”></ Item><I tem Name=”GLOBAL MAF” Type=” S t r i n g ”>0 .4913</ Item><I tem Name=”GLOBAL POPULATION” Type=” S t r i n g ”></ Item><I tem Name=”GLOBAL SAMPLESIZE” Type=” I n t e g e r ”>0</ Item><I tem Name=”SUSPECTED” Type=” S t r i n g ”></ Item><I tem Name=”CLINICAL SIGNIFICANCE” Type=” S t r i n g ”></ Item><I tem Name=”GENE” Type=” S t r i n g ”>THSD7A</ Item><I tem Name=”LOCUS ID” Type=” I n t e g e r ”>221981</ Item><I tem Name=”ACC” Type=” S t r i n g ”>NM 015204 . 2 , NT 007819 .17</ Item><I tem Name=”CHR” Type=” S t r i n g ”>7</ Item><I tem Name=”WEIGHT” Type=” I n t e g e r ”>1</ Item><I tem Name=”HANDLE” Type=” S t r i n g ”>1000GENOMES, BGI , BL ,BUSHMAN,COMPLETE GENOMICS,CSHL−HAPMAP,GMI , ILLUMINA−UK,KWOK,PERLEGEN,SSMP,TISHKOFF</ Item><I tem Name=”FXN CLASS” Type=” S t r i n g ”>i n t r on−v a r i a n t</ Item><I tem Name=”VALIDATED” Type=” S t r i n g ”>by−1000G, by−c l u s t e r , by−f r equency , by−hapmap</ Item><I tem Name=”GTYPE” Type=” S t r i n g ”>t r u e</ Item><I tem Name=”NONREF” Type=” S t r i n g ”>f a l s e</ Item><I tem Name=”DOCSUM” Type=” S t r i n g ”>HGVS=NC 000007 .13 : g .11584142T> ;C , NG 027670 . 1 : g .292683A> ;G, NM 015204 . 2 : c .1454−1398A> ;G, NT 007819 .17 : g .11574142T> ;C|SEQ=TCTGTGAGCTTCTGCATGCAATCCT[A/G]TGCAATTGGAATTTGATAGTCCTTT|GENE=THSD7A:221981</ Item><I tem Name=”HET” Type=” I n t e g e r ”>50</ Item><I tem Name=”SRATE” Type=” I n t e g e r ”>0</ Item><I tem Name=”TAX ID” Type=” I n t e g e r ”>9606</ Item><I tem Name=”CHRRPT” Type=” S t r i n g ”>2 5 | 2 | 0 | 1 | 1 | 1 | 7 | NT 007819 .17 |11574141 |11584142 |THSD7A|0 . 499848 |0 . 00872267 | | 51 |1 | 1 |36 | 13 8 | 0 | | | T:2178 :0 .4913</ Item><I tem Name=”ORIG BUILD” Type=” I n t e g e r ”>36</ Item><I tem Name=”UPD BUILD” Type=” I n t e g e r ”>138</ Item><I tem Name=”CREATEDATE” Type=” S t r i n g ”>2000−09−19 17 :02</ Item><I tem Name=”UPDATEDATE” Type=” S t r i n g ”>2013−06−21 14 :17</ Item><I tem Name=”POP CLASS” Type=” S t r i n g ”></ Item><I tem Name=”METHOD CLASS” Type=” S t r i n g ”>computed , h y b r i d i z e , sequence , unknown</ Item><I tem Name=”SNP3D” Type=” S t r i n g ”></ Item><I tem Name=”LINKOUT” Type=” S t r i n g ”>ILLUMINA−UK| h t t p : //www. i l l um i n a . com/HumanGenomeNA18507 000019106 NCBI36 . 1 ch r7 11550667</ Item><I tem Name=”SS” Type=” I n t e g e r ”>654151077</ Item><I tem Name=”LOCSNPID” Type=” S t r i n g ”>7 11584142</ Item><I tem Name=”ALLELE” Type=” S t r i n g ”>R</ Item><I tem Name=”SNP CLASS” Type=” S t r i n g ”>snp</ Item><I tem Name=”CHRPOS” Type=” S t r i n g ”>7 :11584142</ Item><I tem Name=”CONTIGPOS” Type=” S t r i n g ”>NT 007819 .17 :11574142</ Item><I tem Name=”TEXT” Type=” S t r i n g ”></ Item><I tem Name=”LOOKUP” Type=” S t r i n g ”>325952</ Item></DocSum></ eSummaryResult>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ESummaryRetrieve pubmed pmid=7939126
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=pubmed&i d =7939126”
<eSummaryResult><DocSum><I d>7939126</ Id><I tem Name=”PubDate” Type=”Date”>1994 Apr</ Item><I tem Name=”EPubDate” Type=”Date”></ Item><I tem Name=”Source ” Type=” S t r i n g ”>S l e ep</ Item><I tem Name=” Au tho rL i s t ” Type=” L i s t ”><I tem Name=”Author ” Type=” S t r i n g ”>Broughton R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>B i l l i n g s R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Ca r tw r i gh t R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Doucette D</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Edmeads J</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Edwardh M</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Er v i n F</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Orchard B</ Item><I tem Name=”Author ” Type=” S t r i n g ”>H i l l R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Tu r r e l l G</ Item></ Item><I tem Name=” LastAuthor ” Type=” S t r i n g ”>Tu r r e l l G</ Item><I tem Name=” T i t l e ” Type=” S t r i n g ”>Homic ida l somnambul ism: a ca se r e p o r t .</ Item><I tem Name=”Volume” Type=” S t r i n g ”>17</ Item><I tem Name=” I s s u e ” Type=” S t r i n g ”>3</ Item><I tem Name=”Pages ” Type=” S t r i n g ”>253−64</ Item><I tem Name=” LangL i s t ” Type=” L i s t ”><I tem Name=”Lang” Type=” S t r i n g ”>Eng l i s h</ Item></ Item><I tem Name=”NlmUniqueID” Type=” S t r i n g ”>7809084</ Item><I tem Name=”ISSN” Type=” S t r i n g ”>0161−8105</ Item><I tem Name=”ESSN” Type=” S t r i n g ”>1550−9109</ Item><I tem Name=”PubTypeList ” Type=” L i s t ”><I tem Name=”PubType” Type=” S t r i n g ”>Jou r na l A r t i c l e</ Item></ Item><I tem Name=”Reco rdSta tus ” Type=” S t r i n g ”>PubMed − i ndexed f o r MEDLINE</ Item><I tem Name=”PubStatus ” Type=” S t r i n g ”>ppub l i s h</ Item><I tem Name=” A r t i c l e I d s ” Type=” L i s t ”><I tem Name=”pubmed” Type=” S t r i n g ”>7939126</ Item><I tem Name=” e i d ” Type=” S t r i n g ”>7939126</ Item><I tem Name=” r i d ” Type=” S t r i n g ”>7939126</ Item></ Item><I tem Name=” H i s t o r y ” Type=” L i s t ”><I tem Name=”pubmed” Type=”Date”>1994/04/01 00 :00</ Item><I tem Name=”med l i ne ” Type=”Date”>1994/04/01 00 :01</ Item><I tem Name=” en t r e z ” Type=”Date”>1994/04/01 00 :00</ Item></ Item><I tem Name=” Re f e r e n c e s ” Type=” L i s t ”></ Item><I tem Name=”HasAbst rac t ” Type=” I n t e g e r ”>1</ Item><I tem Name=”PmcRefCount” Type=” I n t e g e r ”>4</ Item><I tem Name=”Fu l l Journa lName ” Type=” S t r i n g ”>S l e ep</ Item><I tem Name=”ELocat ion ID ” Type=” S t r i n g ”></ Item><I tem Name=”SO” Type=” S t r i n g ”>1994 Apr ; 1 7 ( 3 ) :253−64</ Item></DocSum></ eSummaryResult>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EFetch
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EFetchSyntax
Base URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=(db)&id=(ID)
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EFetchRetrieve nucleotide gi=507866428 as ASN.1
Default https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=507866428
Seq−e n t r y : := s e t {c l a s s nuc−p ro t ,d e s c r {
source {genome genomic ,org {
taxname ”Mammuthus p r im i g e n i u s ” ,common ” woo l l y mammoth” ,db {{
db ” taxon ” ,tag
i d 37349 } } ,orgname {
nameb i nom i a l {
genus ”Mammuthus” ,s p e c i e s ” p r im i g e n i u s ” } ,
mod {{
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EFetchRetrieve nucleotide gi=507866428 as Fasta
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.
fcgi?db=nucleotide&id=507866428&rettype=fasta
>g i |507866428 | gb |KC524742 . 1 | Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in(Mb) gene , p a r t i a l cds
GCACTTGCTTTTTTTGTCTTCTTCAGACCACGACATGGGACTCAGCGACGGGGAATGGGAGTTGGTGTTGAAAACCTGGGGGAAAGTGGAGGCTGACATCCCGGGCCATGGGCTGGAAGTCTTCGTCAGGTAAAGGAAGAAATCCTGTGGCCCCCATCACCCACCCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EFetchRetrieve nucleotide gi=507866428 as TinySeq
https:
//eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=nucleotide&id=507866428&rettype=fasta&retmode=xml
<?xml v e r s i o n=” 1 .0 ”?><!DOCTYPE TSeqSet PUBLIC ”−//NCBI//NCBI TSeq/EN”<TSeqSet>
<TSeq><TSeq seqtype v a l u e=” n u c l e o t i d e ”/><TSeq g i>507866428</TSeq g i><TSeq accver>KC524742 . 1</TSeq accver><TSeq tax id>37349</TSeq tax id><TSeq orgname>Mammuthus p r im i g e n i u s</TSeq orgnam<TSeq d e f l i n e>Mammuthus p r im i g e n i u s i s o l a t e CME2<TSeq length>9042</TSeq length><TSeq sequence>GCACTTGCTTTTTTTGTCTTCTTCAGACCACGA
</TSeq></TSeqSet>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EFetchRetrieve nucleotide gi=507866428 as Genbank-xml
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.
fcgi?db=nucleotide&id=507866428&retmode=xml
<GBSeq><GBSeq locus>KC524742</GBSeq locus><GBSeq length>9042</GBSeq length><GBSeq st randedness>doub l e</GBSeq st randedness><GBSeq moltype>DNA</GBSeq moltype><GBSeq topology> l i n e a r</GBSeq topology><GBSeq d i v i s i o n>MAM</ GBSeq d i v i s i o n><GBSeq update−date>21−JUN−2013</GBSeq update−date><GBSeq create−date>15−JUN−2013</GBSeq create−date><GBSeq d e f i n i t i o n>Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene , p a r t i a l cds</ GBSeq d e f i n i t i o n><GBSeq primary−a c c e s s i o n>KC524742</GBSeq primary−a c c e s s i o n><GBSeq access ion−v e r s i o n>KC524742 . 1</GBSeq access ion−v e r s i o n><GBSeq other−s e q i d s>
<GBSeqid>gb |KC524742 . 1 |</GBSeqid><GBSeqid>g i |507866428</GBSeqid>
</GBSeq other−s e q i d s><GBSeq source>Mammuthus p r im i g e n i u s ( woo l l y mammoth)</GBSeq source><GBSeq organism>Mammuthus p r im i g e n i u s</GBSeq organism>
( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EFetchRetrieve nucleotide gi=507866428 as Genbank
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.
fcgi?db=nucleotide&id=507866428&rettype=gb
LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013DEFINITION Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene ,
p a r t i a l cds .ACCESSION KC524742VERSION KC524742 . 1 GI :507866428KEYWORDS .SOURCE Mammuthus p r im i g e n i u s ( woo l l y mammoth)
ORGANISM Mammuthus p r im i g e n i u sEukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ;Mammalia ; Eu t h e r i a ; A f r o t h e r i a ; P robo s c i d ea ; E l e phan t i d a e ;Mammuthus .
REFERENCE 1 ( ba se s 1 to 9042)AUTHORS Mirceta , S . , S ignore ,A .V . , Burns , J .M. , Cos s i n s ,A .R . , Campbel l ,K. L .
and Berenbr ink ,M.TITLE Evo l u t i o n o f mammalian d i v i n g c a p a c i t y t r a c e d by myog lob in net
s u r f a c e cha rgeJOURNAL Sc i e n c e 340 (6138) , 1234192 (2013)PUBMED 23766330
REFERENCE 2 ( ba se s 1 to 9042)AUTHORS Signore ,A .V . , Campbel l ,K. L . and Poinar ,H.N.TITLE D i r e c t Submis s i onJOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sc i ence s , U n i v e r s i t y o f
Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , CanadaCOMMENT ##Assembly−Data−START##
Sequenc ing Technology : : Sanger d i d eoxy s equenc i ng##Assembly−Data−END##
FEATURES Loca t i on / Q u a l i f i e r ssource 1 . . 9 042
/ organ i sm=”Mammuthus p r im i g e n i u s ”/mo l type=”genomic DNA”/ i s o l a t e=”CME2005/915”/ d b x r e f=” taxon :37349 ”/ t i s s u e t y p e=”bone”
gene <35..>9042/gene=”Mb”
mRNA j o i n ( <35 . .129 ,5627 . .5849 ,8979 . . >9042)/ gene=”Mb”/ product=”myog lob in ”
CDS j o i n (35 . . 129 , 5627 . . 5849 , 8979 . . >9042 )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EFetchEfetch works with the ACCESSION NUMBERS
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.
fcgi?db=nucleotide&id=KC524742&rettype=gb
LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013DEFINITION Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene ,
p a r t i a l cds .ACCESSION KC524742VERSION KC524742 . 1 GI :507866428KEYWORDS .SOURCE Mammuthus p r im i g e n i u s ( woo l l y mammoth)
ORGANISM Mammuthus p r im i g e n i u sEukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ;Mammalia ; Eu t h e r i a ; A f r o t h e r i a ; P robo s c i d ea ; E l e phan t i d a e ;Mammuthus .
REFERENCE 1 ( ba se s 1 to 9042)AUTHORS Mirceta , S . , S ignore ,A .V . , Burns , J .M. , Cos s i n s ,A .R . , Campbel l ,K. L .
and Berenbr ink ,M.TITLE Evo l u t i o n o f mammalian d i v i n g c a p a c i t y t r a c e d by myog lob in net
s u r f a c e cha rgeJOURNAL Sc i e n c e 340 (6138) , 1234192 (2013)PUBMED 23766330
REFERENCE 2 ( ba se s 1 to 9042)AUTHORS Signore ,A .V . , Campbel l ,K. L . and Poinar ,H.N.TITLE D i r e c t Submis s i onJOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sc i ence s , U n i v e r s i t y o f
Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , CanadaCOMMENT ##Assembly−Data−START##
Sequenc ing Technology : : Sanger d i d eoxy s equenc i ng##Assembly−Data−END##
FEATURES Loca t i on / Q u a l i f i e r ssource 1 . . 9 042
/ organ i sm=”Mammuthus p r im i g e n i u s ”/mo l type=”genomic DNA”/ i s o l a t e=”CME2005/915”/ d b x r e f=” taxon :37349 ”/ t i s s u e t y p e=”bone”
gene <35..>9042/gene=”Mb”
mRNA j o i n ( <35 . .129 ,5627 . .5849 ,8979 . . >9042)/ gene=”Mb”/ product=”myog lob in ”
CDS j o i n (35 . . 129 , 5627 . . 5849 , 8979 . . >9042 )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EFetchUsing the WebEnv parameter.
Web environment string returned from a previous ESearch, EPostor ELink call. When provided, ESearch will post the results of thesearch operation to this pre-existing WebEnv.
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EFetchUsing the WebEnv parameter.
Searching extinct species in the NCBI taxonomy (’extinct[PROP]’)c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?usehistory=y&db=
taxonomy&term=e x t i n c t%5BPROP%5D”
<eSea r c hRe su l t><Count>145</Count><RetMax>20</RetMax><Re tS t a r t>0</ Re tS t a r t><QueryKey>1</QueryKey><WebEnv>NCID 1 75550312 130.14.18.34 9001 1375948145 325582538</WebEnv><I d L i s t>
<I d>1225531</ Id><I d>1225530</ Id><I d>1211276</ Id><I d>1211275</ Id><I d>1027716</ Id><I d>948961</ Id><I d>943952</ Id><I d>867394</ Id><I d>867393</ Id><I d>748142</ Id><I d>748141</ Id><I d>741158</ Id><I d>703576</ Id><I d>703571</ Id><I d>703559</ Id><I d>693865</ Id><I d>686441</ Id><I d>665113</ Id><I d>659069</ Id><I d>656807</ Id>
</ I d L i s t><Tr a n s l a t i o n S e t /><Tran s l a t i o nS t a c k>
<TermSet><Term>e x t i n c t [PROP]</Term><F i e l d>PROP</ F i e l d><Count>145</Count><Exp lode>N</ Exp lode>
</TermSet><OP>GROUP</OP>
</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>e x t i n c t [PROP]</ Que r yT ran s l a t i o n>
</ eSea r chRe su l t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EFetchUsing the WebEnv parameter.
Fetch the extinct species in the NCBI taxonomy (’extinct[PROP]’)using the WebEnv parameter.
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=taxonomy&query key=1&WebEnv=NCID 1 75550312 130.14.18.34 9001 1375948145 325582538&retmode=xml”
<TaxaSet><Taxon><TaxId>1225531</TaxId><Sc i e n t i f i cName>Equus ovodov i</ S c i e n t i f i cName><OtherNames>
<Synonym>Equus ( Sussemionus ) ovodov i</Synonym><Name>
<ClassCDE>a u t h o r i t y</ClassCDE><DispName>Equus ovodov i Eisenmann & ; Se rge j , 2011</DispName>
</Name></OtherNames><ParentTax Id>1225530</ParentTax Id><Rank>s p e c i e s</Rank><D i v i s i o n>Mammals</ D i v i s i o n><Genet icCode>
<GCId>1</GCId><GCName>Standard</GCName>
</Genet icCode><MitoGenet icCode>( . . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EPOST
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EPost
Uploads a list of UIDs to the Entrez History server
Appends a list of UIDs to an existing set of UID lists attachedto a Web Environment
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EPostPost gi to epost
Get a list of gis of extincts animals:
wget −O − ’ h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy&term=e x t i n c t [PROP]& retmax=1000 ’ |\
xm l l i n t −fo rmat − |\grep ’< Id>’ |\cut −d ’< ’ −f 2 |\cut −d ’> ’ −f 2|\t r ”\n” ” , ”
output:
1860150 ,1860149 ,1849957 ,1825730 ,1825729 ,1636722 ,1607772 ,1607771 ,1607767 ,1607757 ,1607756 ,1597978 ,1582057 ,1566623 ,1563127 ,1563126 ,1563125 ,1563124 ,1563123 ,1563122 ,1563121 ,1563120 ,1560315 ,1560314 ,1543223 ,1542494 ,1542469 ,1530197 ,1524889 ,1523245 ,1513476 ,1513474 ,1503129 ,1453604 ,1425170 ,1415635 ,1295174 ,1225531 ,1225530 ,1211276 ,1211275 ,1027716 ,948961 ,943952 ,867394 ,867393 ,748142 ,748141 ,741158 ,703576 ,703571 ,703559 ,693865 ,686441 ,665113 ,659069 ,656807 ,647691 ,647690 ,643746 ,643745 ,643744 ,643742 ,577682 ,572106 ,572105 ,572104 ,572099 ,572098 ,570943 ,570942 ,570941 ,551196 ,544298 ,523825 ,523824 ,523822 ,523821 ,523820 ,518692 ,518691 ,518689 ,475185 ,436495 ,436494 ,436493 ,436488 ,402889 ,399386 ,399178 ,386524 ,379504 ,363580 ,363579 ,363578 ,363571 ,339614 ,339612 ,339609 ,330944 ,330640 ,330639 ,330638 ,330637 ,330636 ,328612 ,314500 ,307641 ,304335 ,272462 ,268291 ,251263 ,251094 ,251093 ,239970 ,239969 ,237965 ,230980 ,230979 ,227166 ,227165 ,223567 ,222863 ,222862 ,216182 ,216181 ,201717 ,201716 ,192211 ,188536 ,187135 ,187134 ,187133 ,187132 ,187131 ,187118 ,184920 ,180214 ,180178 ,180177 ,180176 ,180175 ,180174 ,173935 ,166505 ,148923 ,147494 ,147466 ,147464 ,136416 ,136415 ,126594 ,126429 ,115942 ,107030 ,103864 ,94623 ,92649 ,92648 ,89252 ,89250 ,63631 ,63221 ,54568 ,54500 ,54497 ,54366 ,54365 ,48784 ,46906 ,39097 ,39053 ,39051 ,37349 ,37348 ,37185 ,27445 ,27444 ,20678 ,13266 ,13140 ,9619 ,9275 ,9274 ,9273 ,8818 ,8817 ,8815 ,8813 ,8812 ,8811 ,8810 ,8367 ,3409
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EPostPost gi to epost
wget −O − ’ h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / epo s t . f c g i ?db=taxonomy&WebEnd=NCID 1 15435144 130 . 1 4 . 2 2 . 2 1 59001 1474637318 669113391 0MetA0 S MegaStore F 1&i d=1860150 ,1860149 ,1849957 ,1825730 ,1825729 ,1636722 ,1607772. . . ”
Output:
<?xml v e r s i o n=” 1 .0 ”?><!DOCTYPE ePo s tRe su l t PUBLIC ”−//NLM//DTD ePos tResu l t , 11 May 2002//EN” ” h t t p : //
www. ncb i . nlm . n i h . gov/ e n t r e z / query /DTD/ ePost 020511 . dtd ”><ePo s tRe su l t><QueryKey>1</QueryKey><WebEnv>NCID 1 15467192 130 . 1 4 . 2 2 . 2 1 5
9001 1474637456 570452194 0MetA0 S MegaStore F 1</WebEnv></ ePo s tRe su l t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EPostSearching in the WebEnv
Search Homo Sapiens in WebEnv ?
c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy&term=Homo%20Sapiens&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 49001 1375948145 325582538&que r y k ey=1”
<eSea r c hRe su l t><Count>0</Count><RetMax>0</RetMax><Re tS t a r t>0</ Re tS t a r t><QueryKey>8</QueryKey><WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv><I d L i s t /><Tr a n s l a t i o n S e t /><Tran s l a t i o nS t a c k>
<OP>GROUP</OP><TermSet>
<Term>homo s a p i e n s [ A l l Names ]</Term><F i e l d>A l l Names</ F i e l d><Count>0</Count><Exp lode>N</ Exp lode>
</TermSet><OP>AND</OP>
</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>(#2) AND homo s a p i e n s [ A l l Names ]</ Que r yT ran s l a t i o n>
</ eSea r chRe su l t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EPostSearching in the WebEnv
Search Tyranosaurus in WebEnv ?
$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy&term=Tyrannosaurus&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 49001 1375948145 325582538&que r y k ey=1”
<eSea r c hRe su l t><Count>1</Count><RetMax>1</RetMax><Re tS t a r t>0</ Re tS t a r t><QueryKey>9</QueryKey><WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv><I d L i s t>
<I d>436494</ Id></ I d L i s t><Tr a n s l a t i o n S e t /><Tran s l a t i o nS t a c k>
<OP>GROUP</OP><TermSet>
<Term>Tyrannosaurus [ A l l Names ]</Term><F i e l d>A l l Names</ F i e l d><Count>1</Count><Exp lode>N</ Exp lode>
</TermSet><OP>AND</OP>
</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>(#2) AND Tyrannosaurus [ A l l Names ]</ Que r yT ran s l a t i o n>
</ eSea r chRe su l t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EDirect: combining tools
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Piping Edirect
e s e a r c h −db taxonomy −q u e r y ” T y r a n n o s a u r u s ” | \e f e t c h −fo rmat xml
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Piping Edirect
e s e a r c h −db pubmed −q u e r y ” T y r a n n o s a u r u s ” | \e f i l t e r −mindate 2005 | \e f e t c h −fo rmat docsum | \x t r a c t −p a t t e r n DocumentSummary \−e l em en t M e d l i n e C i t a t i o n /PMID \−e l em en t I d S o r t F i r s t A u t h o r
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Elink
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Elink
Returns UIDs linked to an input set of UIDs in either thesame or a different Entrez database
Returns UIDs linked to other UIDs in the same Entrezdatabase that match an Entrez query
Checks for the existence of Entrez links for a set of UIDswithin the same database
Lists the available links for a UID
Lists LinkOut URLs and attributes for a set of UIDs
Lists hyperlinks to primary LinkOut providers for a set of UIDs
Creates hyperlinks to the primary LinkOut provider for a singleUID
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Elink
Base URL:https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ELinkSearching the pubmed records associated to sequence gi:507866428
h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e l i n k . f c g i ? dbfrom=nu c l e o t i d e&db=pubmed&i d =507866428&cmd=n e i g h b o r s c o r e
<eL i n kRe s u l t><L inkSe t>
<DbFrom>nucco re</DbFrom><I d L i s t>
<I d>507866428</ Id></ I d L i s t><LinkSetDb>
<DbTo>pubmed</DbTo><LinkName>nuccore pubmed</LinkName><L ink>
<I d>23766330</ Id><Score>0</ Score>
</ L ink></LinkSetDb>
</ L inkSe t></ eL i n kRe s u l t>
$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&i d =23766330& r e t t y p e=med l i ne&retmode=t e x t ”
PMID− 23766330TI − Evo l u t i o n o f mammalian d i v i n g c a p a c i t y t r a c e d by myog lob in net s u r f a c e
cha rge .PG − 1234192LID − 10.1126/ s c i e n c e .1234192 [ do i ]
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Transformations
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EfetchTransforming to SVG
Using the stylesheethttps://github.com/lindenb/xslt-sandbox/blob/master/
stylesheets/bio/ncbi/gb2svg.xsl
x s l t p r o c <( c u r l ” h t t p s : // raw . g i t hub . com/ l i n d e n b / x s l t−sandbox /master / s t y l e s h e e t s/ b i o / ncb i / gb2svg . x s l ” ) \
” h t t p s : //www. ncb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=nu c l e o t i d e&i d=14971102&retmode=xml&r e t t y p e=gbc”
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EfetchTransforming to SVG
1 <?xml v e r s i o n=” 1 .0 ” encod ing=”UTF−8”?>2 <s v g : s v g xm ln s : s v g=” h t t p : //www.w3 . org /2000/ svg ” h e i g h t=”121” width=”920” s t y l e=”
s t r oke−wid th : 1px ; ”>3 <s v g : t i t l e>Human r o t a v i r u s segment 7 NSP3 gene , complete cds</ s v g : t i t l e>4 <s v g : d e f s>5 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=” grad ”>6 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=” b l a ck ”/>7 <s v g : s t o p o f f s e t=”50%” stop−c o l o r=”whitesmoke ”/>8 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” b l a ck ”/>9 </ s v g : l i n e a r G r a d i e n t>
10 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=”v e r t i c a l b o d y g r a d i e n t ”>
11 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=”wh i t e ”/>12 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” l i g h t g r a y ”/>13 </ s v g : l i n e a r G r a d i e n t>14 </ s v g : d e f s>15 <s v g : s t y l e type=” t e x t / c s s ”/>16 <s v g : g>17 <s v g : g t r an s f o rm=” t r a n s l a t e (0 , 0 ) ”>18 <s v g : r e c t x=”0” y=”0” width=”920” h e i g h t=”120” f i l l =” u r l (#
v e r t i c a l b o d y g r a d i e n t ) ” s t r o k e=” b l a c k ”/>19 <s v g : t e x t s t y l e=” c o l o r : r e d ; font−s i z e : 3 5 p x ; ” x=”10” y=”35”>Human r o t a v i r u s
segment 7 NSP3 gene , complete cds</ s v g : t e x t>20 <s v g : g>21 <s v g : r e c t x=”10” y=”40” width=”900” h e i g h t=”18” s t y l e=” f i l l : u r l (#grad ) ;
s t r o k e : b l a c k ; ” t i t l e=” 1 . . 1 074 ”/>22 <s v g : t e x t y=”54” x=”460” tex t−anchor=”midd le ”><s v g : t s p a n s t y l e=” font−
we i g h t : b o l d ; ”>s ou r c e</ s v g : t s p a n><s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3. org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=” h t t p : //www.w3 . org /1999/x l i n k ” font−we ight=” bo ld ”>organ i sm</ s v g : t s p a n>:Human r o t a v i r u s A <s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ”xm l n s : x l i n k=” h t t p : //www.w3 . org /1999/ x l i n k ” font−we ight=” bo ld ”>mol type</ s v g : t s p a n>: genomic RNA <s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=” h t t p : //www.w3 . org/1999/ x l i n k ” font−we ight=” bo ld ”>s t r a i n</ s v g : t s p a n>:M <s v g : t s p a nxm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org /1999/ x l i n k ” font−we ight=” bo ld ”>segment</ s v g : t s p a n>: 7 <s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e” xm l n s : x l i n k=” h t t p : //www.w3 . org /1999/ x l i n k ” font−we ight=” bo ld ”>c l o n e</ s v g : t s p a n>:M0</ s v g : t e x t>
23 </ s v g : g>24 <s v g : g>25 <s v g : r e c t x=”10” y=”60” width=”27.6794035414725 ” h e i g h t=”18” s t y l e=”
f i l l : u r l (#grad ) ; s t r o k e : b l a c k ; ” t i t l e=” 1 . . 3 4 ”/>26 <s v g : t e x t y=”74” x=”39.6794035414725 ” tex t−anchor=” s t a r t ”>27 <s v g : t s p a n s t y l e=” font−we i g h t : b o l d ; ”>5 ’UTR</s vg : t s pan>28 </s v g : t e x t>29 </svg :g>30 <svg :g>31 <s v g : r e c t x=”38.5181733457595” y=”80” width =”781.733457595526” h e i g h t
=”18” s t y l e=” f i l l : u r l (#grad ) ; s t r o k e : b l a c k ; ” t i t l e =”35..967”/>32 <s v g : t e x t y=”94” x=”429.384902143523” tex t−anchor=”midd le”><s v g : t s p a n
s t y l e=”font−we i g h t : b o l d ;”>CDS</s vg : t s pan><s v g : t s p a n xm l n s : x s i=”h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org /1999/ x l i n k ”font−we ight=”bo ld”>codon s t a r t</s vg : t s pan>: 1 <s v g : t s p a n xm l n s : x s i=”h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org /1999/x l i n k ” font−we ight=”bo ld”>product</s vg : t s pan>:NSP3 <s v g : t s p a n xm l n s : x s i=”h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org/1999/ x l i n k ” font−we ight=”bo ld”>p r o t e i n i d </s vg : t s pan>:AAK74116.1</ s v g : t e x t>
33 </svg :g>34 <svg :g>35 <s v g : r e c t x=”821.090400745573” y=”100” width =”88.909599254427” h e i g h t
=”18” s t y l e=” f i l l : u r l (#grad ) ; s t r o k e : b l a c k ; ” t i t l e =”968..1074”/>36 <s v g : t e x t y=”114” x=”819.090400745573” tex t−anchor=”end”>37 <s v g : t s p a n s t y l e=”font−we i g h t : b o l d ;”>3 ’UTR</ s v g : t s p a n>38 </ s v g : t e x t>39 </ s v g : g>40 <s v g : r e c t x=”0” y=”0” width=”920” h e i g h t=”120” f i l l =”none” s t r o k e=” b l a ck ”/
>41 </ s v g : g>42 </ s v g : g>43 </ s v g : s v g>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EfetchTransforming to SVG
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EfetchTransforming to R
$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=pubmed&term=Tyrannosaurus&u s e h i s t o r y=t r u e ” | xm l l i n t −−fo rmat −
$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 59001 1375957034 1619786167&que r y k ey=1&retmode=xml”
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EfetchTransforming to R
1 <?xml v e r s i o n=’ 1 .0 ’ encod ing=”UTF−8” ?>2 <x s l : s t y l e s h e e t xm l n s : x s l= ’ h t t p : //www.w3 . org /1999/XSL/Transform ’ v e r s i o n=’ 1 .0 ’>3 <x s l : o u t p u t method=” t e x t ”/>456 <x s l : t em p l a t e match=”/”>7 date2count &l t ;− l i s t ( )8 <x s l : a p p l y−t emp l a t e s s e l e c t=”/PubmedArt i c l eSet / PubmedArt i c l e [ Med l i n eC i t a t i o n /
DateCreated /Year ] ”/>9 d f &l t ;− data . f rame (
10 Year=as . i n t e g e r ( names ( date2count ) ) ,11 Count=u n l i s t ( date2count )12 )13 png ( ’ j e te rpubmed . png ’ )14 p l o t ( d f )15 t i t l e ( ’ pubmed: count ( a r t i c l e s )=f ( y ea r ) ’ )16 dev . o f f ( )17 </ x s l : t em p l a t e>1819 <x s l : t em p l a t e match=”PubmedArt i c l e ”>20 <x s l : v a r i a b l e name=” yea r ” s e l e c t=”Med l i n eC i t a t i o n /DateCreated /Year ”/>21 date2count [ [ ”<x s l : v a l u e−o f s e l e c t=”$ yea r ”/>” ] ] & l t ;− i f e l s e ( i s . n u l l ( da te2count [ [
”<x s l : v a l u e−o f s e l e c t=”$ yea r ”/>” ] ] ) ,1 ,1+ date2count [ [ ”<x s l : v a l u e−o f s e l e c t=”$ yea r ”/>” ] ] )
22 </ x s l : t em p l a t e>2324 </ x s l : s t y l e s h e e t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EfetchTransforming to R
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 59001 1375957034 1619786167&que r y k ey=1&retmode=xml” |\
x s l t p r o c pubmed2rs ta t s . x s l −
date2count <− l i s t ( )
da te2count [ [ ”2013” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2013” ] ] ) ,1 ,1+ date2count [ [ ”2013” ] ] )
da te2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ”2012” ] ] )
da te2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ”2012” ] ] )
da te2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ”2011” ] ] )
da te2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ”2011” ] ] )
( . . )df <− data . frame (Year=as . i n t e g e r (names ( date2count ) ) ,Count=u n l i s t ( date2count ))png ( ’ j e te rpubmed . png ’ )p l o t ( df )t i t l e ( ’ pubmed : count ( a r t i c l e s )=f ( y ea r ) ’ )dev . o f f ( )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
EfetchTransforming to R
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 59001 1375957034 1619786167&que r y k ey=1&retmode=xml” |\
x s l t p r o c pubmed2rs ta t s . x s l − |\R −−no−save
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Generating a JAVA parser
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Using the XML schemaXML Schema for dbSNP
ftp://ftp.ncbi.nlm.nih.gov/snp/specs/docsum_3.4.xsd
<?xml v e r s i o n=” 1 .0 ” encod ing=”UTF−8”?><xsd : schema xm ln s : x s d=” h t t p : //www.w3 . org /2001/XMLSchema” xmlns=” h t t p : //www. ncb i . nlm . n i h . gov/SNP/docsum” targetNamespace=” h t t p : //www. ncb i . nlm . n i h . gov/SNP/docsum” e lementFormDefault=” q u a l i f i e d ” a t t r i b u t eFo rmDe f a u l t=” u n q u a l i f i e d ”><x s d : e l emen t name=”ExchangeSet ”>
<x s d : a n n o t a t i o n><x sd :documenta t i on>Set o f dbSNP refSNP docsums , v e r s i o n 3 .4</ x sd :documenta t i on>
</ x s d : a n n o t a t i o n><xsd :complexType>
<x s d : s e qu en c e><x s d : e l emen t name=”SourceDatabase ” minOccurs=”0”>
<xsd :complexType><x s d : a t t r i b u t e name=” t a x I d ” type=” x s d : i n t ” use=” r e q u i r e d ”>
<x s d : a n n o t a t i o n><x sd :documenta t i on>NCBI taxonomy ID f o r v a r i a t i o n</ x sd :documenta t i on>
</ x s d : a n n o t a t i o n></ x s d : a t t r i b u t e><x s d : a t t r i b u t e name=” organ i sm ” type=” x s d : s t r i n g ” use=” r e q u i r e d ”>
<x s d : a n n o t a t i o n><x sd :documenta t i on>common name f o r s p e c i e s used as pa r t o f da tabase name .</ x sd :documenta t i on>
</ x s d : a n n o t a t i o n></ x s d : a t t r i b u t e><x s d : a t t r i b u t e name=”dbSnpOrgAbbr” type=” x s d : s t r i n g ”>
<x s d : a n n o t a t i o n><x sd :documenta t i on>organ i sm a b b r e v i a t i o n used i n dbSNP . </ x sd :documenta t i on>
</ x s d : a n n o t a t i o n></ x s d : a t t r i b u t e><x s d : a t t r i b u t e name=”gpipeOrgAbbr ” type=” x s d : s t r i n g ”>
<x s d : a n n o t a t i o n><x sd :documenta t i on>organ i sm a b b r e v i a t i o n used w i t h i n NCBI genome p i p e l i n e data dumps .</ x sd :documenta t i on>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Using the XML schemaCompiling the XML Schema for dbSNP with XJC
$ x j c −d . ” f t p : // f t p . n cb i . nlm . n i h . gov/ snp/ spe c s /docsum 3 . 4 . xsd ”p a r s i n g a schema . . .c omp i l i n g a schema . . .h t t p s /www ncb i n lm n ih gov / snp/docsum/Assay . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Assembly . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/BaseURL . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Component . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/ExchangeSet . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/FxnSet . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/MapLoc . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Ob j e c tFac to r y . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Pr imarySequence . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Rs . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/RsL inkout . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/ RsSt ruc t . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Ss . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/package−i n f o . j a v a
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Using the XML schemaCompiling the XML Schema for dbSNP with XJC
Search the non-genomic rs# in dbSNP.1 import h t t p s . www ncb i n lm n ih gov . snp . docsum .∗ ;2 import j a v a x . xml . b ind .∗ ;3 import j a v a x . xml . s t ream .∗ ;4 import j a v a x . xml . s t ream . e v en t s .∗ ;5 c l a s s ParseDbSnp6 {7 pub l i c s t a t i c vo id main ( S t r i n g [ ] a r g s ) throws Excep t i on8 {9 JAXBContext j a xbC t x t=JAXBContext . new In s tance ( ” h t t p s . www ncb i n lm n ih gov
. snp . docsum” ) ;10 Unmar sha l l e r u nma r s h a l l e r=j a xbC t x t . c r e a t eUnma r s h a l l e r ( ) ;11 XMLInputFactory i f a c t o r y = XMLInputFactory . new Ins tance ( ) ;12 XMLEventReader r= i f a c t o r y . createXMLEventReader ( System . i n ) ;13 wh i l e ( r . hasNext ( ) )14 {15 XMLEvent ev t=r . peek ( ) ;16 i f ( ! ( e v t . i s S t a r t E l emen t ( ) && ev t . a sS t a r tE l emen t ( ) . getName ( ) .
g e t Lo c a lPa r t ( ) . e q u a l s ( ”Rs” ) ) )17 {18 ev t=r . nex tEvent ( ) ;19 cont inue ;20 }2122 Rs r s=unma r s h a l l e r . unmarsha l ( r , Rs . c l a s s ) . ge tVa lue ( ) ;23 i f ( ” genomic ” . e qua l s ( r s . getMolType ( ) ) ) cont inue ;24 System . out . p r i n t l n ( ” r s ”+r s . g e tRs I d ( )+” ”+r s . getMolType ( ) ) ;25 }26 r . c l o s e ( ) ;27 }28 }
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Using the XML schemaCompiling the XML Schema for dbSNP with XJC
compile...$ j a v a c ParseDbSnp . j a v a h t t p s /www ncb i n lm n ih gov / snp/docsum/∗ . j a v a
and run...$ c u r l −s ” f t p : // f t p . n cb i . n i h . gov/ snp/ o rgan i sms /human 9606/XML/ ds ch1 . xml . gz” |\gunz ip −c |\j a v a ParseDbSnp
r s701 cDNArs860 cDNArs861 cDNArs862 cDNArs863 cDNArs864 cDNArs865 cDNArs866 cDNArs877 cDNArs878 cDNArs879 cDNArs880 cDNArs882 cDNArs883 cDNArs884 cDNArs885 cDNArs886 cDNArs913 cDNArs945 cDNArs946 cDNA( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
NCBI EBot
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
NCBI EBotURL
https://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/
ebot/ebot.cgi
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
NCBI EBotSample output
#!/ u s r / b i n / p e r l( . . . )# PUBLIC DOMAIN NOTICE# Nat i o na l Cente r f o r B i o t e chno l ogy I n f o rma t i o nuse LWP: : S imple ;use LWP: : UserAgent ;use Net : : FTP ;
my $de l a y = 0 ;my $maxdelay = 3 ;my $base = ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /” ;
$params{ ema i l} = ”nobody@nowhere . com” ;$params{db} = ” nuccore ” ;$params{ t o o l} = ” ebot ” ;$params{term} = ”Mammuthus+p r im i g e n i u s [ORGN] ” ;%params = e s e a r c h (%params ) ;
$params{retmode} = ”xml” ;$params{ o u t f i l e } = ” r e s u l t . xml” ;$params{ r e t t y p e} = ” na t i v e ” ;e f e t c h b a t c h (%params ) ;
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
BLAST
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Standalone BlastDownloading
Standalone tools are available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
#add BLAST to your pathexport PATH=${PATH} : / path / to / ncb i−b l a s t −2.2.28+/ b i n
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Standalone BlastDownload a sample
apis mellifera proteins
c u r l −o p r o t e i n . f a . gz \” f t p : // f t p . n cb i . n i h . gov/genomes/ A p i s m e l l i f e r a / p r o t e i n / p r o t e i n . f a . gz”
gunz ip p r o t e i n . f a . gz
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Standalone BlastCreate a Blast database with makeblastdb
Getting help...
$ makeb lastdb −h e l p( . . . )−dbtype <S t r i n g , ‘ nuc l ’ , ‘ p rot ’>
M o l e c u l e type o f t a r g e t db− i n <F i l e I n >
I n p u t f i l e / d a t a b a s e nameD e f a u l t = ‘− ’
− i n p u t t y p e <S t r i n g , ‘ a s n 1 b i n ’ , ‘ a s n 1 t x t ’ , ‘ b l a s t d b ’ , ‘ f a s t a ’>Type o f the data s p e c i f i e d i n i n p u t f i l eD e f a u l t = ‘ f a s t a ’
( . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Standalone BlastCreate a Blast database with makeblastdb
Create the BLAST database:
$ makeb lastdb − i n p r o t e i n . f a −dbtype p r o t
B u i l d i n g a new DB, c u r r e n t t ime : 09/02/2013 1 8 : 2 9 : 3 8New DB name : p r o t e i n . f aNew DB t i t l e : p r o t e i n . f aSequence type : P r o t e i nKeep L i n k o u t s : TKeep MBits : TMaximum f i l e s i z e : 1000000000BAdding s e q u e n c e s from FASTA ; added 10570 s e q u e n c e s i n 1 .84458 s e c o n d s .
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Standalone BlastQuery a Blast database with blastp
Get help:
$ b l a s t p −h e l p( . . . )−q u e r y <F i l e I n >
I n p u t f i l e nameD e f a u l t = ‘− ’
−db <S t r i n g>BLAST d a t a b a s e name
( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Standalone BlastBlast human EIF4G1 gi:187956781
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n&r e t t y p e=f a s t a&i d =187956781” |\
b l a s t p −db p r o t e i n . f a
Query= g i |187956781 | gb |AAI40897 . 1 | EIF4G1 p r o t e i n [Homo s a p i e n s ]( . . . )
Score ESequences p roduc i ng s i g n i f i c a n t a l i g nmen t s : ( B i t s ) Value
g i |328782175 | r e f |XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o n . . . 189 4e−49g i |328779480 | r e f | XP 003249661 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38 .1 0 .017g i |110762568 | r e f | XP 001121713 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38 .1 0 .018
( . . . )> g i |328782175 | r e f |XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o ni n i t i a t i o n f a c t o r 4 gamma 2− l i k e [ Ap i s m e l l i f e r a ]Length=899
Score = 189 b i t s (479) , Expect = 4e−49, Method : Compos i t i ona l mat r i x a d j u s t .I d e n t i t i e s = 115/319 (36%) , P o s i t i v e s = 175/319 (55%) , Gaps = 39/319 (12%)
Query 717 KEPRKIIATVLMTEDIKLNKAEKAWKPSS−−KRTAADKDRGEEDADGSKTQDLFRRVRSI 774++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR I
Sb j c t 22 RKPSETTVGLVIKDDIRSLSTEQRWIPPSTLRRDALTPE−−−−−−−−SRNNFIFRKVRGI 73
Query 775 LNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCL−−−−− 829LNKLTP+ F +L + + ++++ LKGVI LIFEKA+ EP +S YA +C+ L
Sb j c t 74 LNKLTPEKFAKLSNDLLNVELNSDVILKGVIFLIFEKALDEPKYSSMYAQLCKRLSDEAA 133
Query 830 −MALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLK 888K E F LLL++C+ EFE E FE + DE EE
Sb j c t 134 NFEPKKALIESQKGQSTFTFLLLSKCRDEFENRSKASEAFENQ−−−−DELGPEEE−−−−− 184Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Standalone BlastBlast human EIF4G1 gi:187956781 , ouput XML
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n&r e t t y p e=f a s t a&i d =187956781” |\
b l a s t p −db p r o t e i n . f a −outfmt 5
( . . . )<H i t h s p s>
<Hsp><Hsp num>1</Hsp num><Hsp b i t−s c o r e>189.119</Hsp b i t−s c o r e><Hsp sco r e>479</ Hsp sco r e><Hsp eva l ue>3.78314 e−49</ Hsp eva l ue><Hsp query−from>717</Hsp query−from><Hsp query−to>1017</Hsp query−to><Hsp h i t−from>22</Hsp h i t−from><Hsp h i t−to>319</Hsp h i t−to><Hsp query−f rame>0</Hsp query−f rame><Hsp h i t−f rame>0</Hsp h i t−f rame><Hs p i d e n t i t y>115</ H s p i d e n t i t y><Hs p p o s i t i v e>175</ H s p p o s i t i v e><Hsp gaps>39</Hsp gaps><Hsp a l i gn−l e n>319</ Hsp a l i gn−l e n><Hsp qseq>KEPRKIIATVLMTEDIKLNKAEKAWKPSS−−KRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCL−−−−−−MALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARD
IARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLL−−−−−−−−KNHDEESLECLCRLLTTIGKDLDFEKAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLRGSNWVPRRG−−DQGPKTIDQIHKEAE</Hsp qseq><Hsp hseq>RKPSETTVGLVIKDDIRSLSTEQRWIPPSTLRRDALTPE−−−−−−−−SRNNFIFRKVRGILNKLTPEKFAKLSNDLLNVELNSDVILKGVIFLIFEKALDEPKYSSMYAQLCKRLSDEAANFEPKKALIESQKGQSTFTFLLLSKCRDEFENRSKASEAFENQ−−−−DELGPEEE−−−−−−−−−ERRQ
VAKRKMLGNIKFIGELGKLGIVSETILHRCILQLLEKKRRRRSRGDTAEDIECLCQIMRTCGRILDSDKGRGLMDQYFKRMNSLAESRDLPLRIKFMLRDVIELRRDGWVPRKATSTEGPMPINQIRNDNE</Hsp hseq><Hsp m id l i n e>++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR ILNKLTP+ F +L
+ + ++++ LKGVI LIFEKA+ EP +S YA +C+ L K E F LLL++C+ EFEE FE + DE EE ER +A+R+ LGNIKFIGEL KL +++E I+H C+++LL + E +ECLC+++ T G+ LD +K + MDQYF +M
+ + + RI+FML+DV++LR WVPR+ +GP I+QI + E</ Hsp m id l i n e></Hsp>
( . . . )Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
NCBI URL-API Blast
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
NCBI URL-API Blast
https://www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html
$ c u r l ” h t t p s : //www. ncb i . nlm . n i h . gov/ b l a s t / B l a s t . c g i ?CMD=Put&QUERY=PAERLMERKADIE&DATABASE=nr&PROGRAM=b l a s t p&FILTER=L&HITLIST SZE=500”
( . . . )
<!−−QBla s t I n f oBeg i nRID = 1NRYGX9K014RTOE = 29
QBlas t In foEnd−−>
( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
The End
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses