python xml web - rostlaba glimpse of namespaces allows to prevent tag name collisions between...

25
BioinfRes SoSe 17 Bioinforma)cs Resources XML / Web Access Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12

Upload: others

Post on 27-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

Bioinforma)csResourcesXML/WebAccess

Lecture&ExercisesProf.B.Rost,Dr.L.Richter,J.Reeb

Ins)tutfürInforma)kI12

Page 2: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

XMLInfusion(in10sec)●  compila)onfromhKp://www.w3schools.com/xml/default.asp

●  XMLisasoPware-andhardware-independenttooltostoreandtotransportdata

●  XMLstandsforeXtensibleMarkupLanguage

●  designedtostoreandtransportdata●  designedtobeself-descrip)ve

●  W3Crecommenda)on

●  itdoesNOTDOanything

Page 3: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

AboutTags

●  XMLtagsarenotpredefinedlikeHTMLtags●  everybodycan/hastoinventhisowntags

●  newtagscanbeaddedany)me

●  theauthorhastodefinecontentandstructureofthedocument

●  everythingisplaintext

Page 4: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

DocumentStructure<?xml version="1.0" encoding="UTF-8"?>!<bookstore>!!  <book category="cooking">!    <title lang="en">Everyday Italian</title>!    <author>Giada De Laurentiis</author>!    <year>2005</year>!    <price>30.00</price>!  </book>!! <book category="children">!    <title lang="en">Harry Potter</title>!    <author>J K. Rowling</author>!    <year>2005</year>!    <price>29.99</price>!  </book>!!....!</bookstore>!!takenfromhKp://www.w3schools.com/xml/xml_usedfor.asp

Page 5: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

SyntaxRules●  elementsaredefinedusingtags:<tagName> ... </tagName>or<tagName/>!

●  elementscanbenested(containotherelements-parentandchildnodes,siblingnodes)

●  elementscanhavetextcontent

●  eachdocumentmustcontainONErootelementthatistheparentofallotherelements

Page 6: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

SyntaxRefined

●  prologline<?xml ...>isop)onal●  tagsmustbe(self-)closed

●  tagarecasesensi)ve

●  tagsmustbeproperlynested:<a><b>....</a></b> Wrong!<a><b>....</b></a>! Right!

Page 7: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

SyntaxRefined●  tagsmayhaveaKributes●  aKributevaluesmustalwaysbequoted

●  somespecialcharacterscannotbeuseddirectly

●  ->codedbyen)tyreferences:&lt; < lessthan&gt; > greaterthan&amp; & ampersand&apos; ‘ apostrophe&quot; “ quota)onmark

●  comments:<!-- .... -->!

Page 8: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

TagNames●  casesensi)ve●  muststartwithaleKerorunderscore

●  mustnotstartwiththeleKersxmlinanycase

●  cancontain:leKers,digits,hyphens,underscoresandperiods

●  cannotcontainspaces

●  applycommonsenseandaconsistentstyle●  avoid:minus(-),period(.),colon(:),non-englishcharactersforcompa)bilityreasons

Page 9: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

XMLElement

●  everythingbetweenthestartandtheendtag●  tagsareincluded

●  cancontain:-  text-  aKributes-  otherelements-  amixofall

●  areextensible

Page 10: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

XMLAKributes

●  valuesmustbequoted:singleordoublequotes●  theunusedquo)ngcharactercanbeusedinsidethevalue

●  decisionforaKributeorelementundecided,but:-  aKributescannotcontainmul)plevalues-  aKributescannotcontaintreestructures-  aKributesarenoteasilyexpandable

●  usefultostoremetadata,likeelementid,etc.

Page 11: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

AGlimpseofNamespaces

●  allowstopreventtagnamecollisionsbetweendifferentauthors/applica)ons/domains

●  implementedbytheintroduc)onofprefixes●  definedasanaKribute:xmlns:prefix=“URI”!

●  usage:<prefix:tagName>!●  theURIisonlyneededtobeunique

●  usedtointegrateotherspecifica)ons,e.g.XSLT

Page 12: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

LevelsofCorrectness●  wellformed:adocumentobeythesyntaxrules:-  rootelement-  closingtag-  casesensi)ve-  properlynested-  aKributevaluesquoted

●  validdocuments:inaddi)ontobeingvalidthealsoconformtoadocumenttypedefini)on(formatspecifica)on)

Page 13: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

DocumentTypeDefini)ons

●  twowaystospecifyadocumentstructure:●  DTD:DocumentTypDefini)on

●  XMLSchema:XMLbasedalterna)vetoDTD

Page 14: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

Example

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE note SYSTEM "Note.dtd”> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend! &copyright; </body> </note>!

Page 15: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

Example

<!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> <!ENTITY copyright “Copyright by ..”> ]>!

Page 16: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

XMLDTD

●  referencedfromadocumentwith:<!DOCTYPE note SYSTEM "Note.dtd">!

●  !DOCTYPEdefinestherootelement●  !ELEMENTdefinesthestructureoftheelements

●  #PCDTAmeansparse-abletextdata●  !ENTITYdefinesspecialcharactersorstrings

Page 17: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

XMLSchema●  alterna)vetoDTD<xs:element name="note”> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>!

●  supportofdatatypesandnamespaces

●  wriKeninXMLandextensible!

Page 18: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

XMLwithPython●  basic:xml.etree.ElementTreeimport xml.etree.ElementTree as ET!

●  ifyouparseXMLfromuntrustedsourcesusesafeguardedlibrarieslikedefusedxml

●  readxmlinputfromfileintoatreeelement:parse(“fileName”)!

●  readxmlinputfromstringintoatreeelement:fromString(data_as_string)!

●  retrieverootelementfromtreeelement:getroot()

Page 19: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

XMLwithPython●  elementtype:tag(string)●  elementaKributes:attrib(dic)onary).Preferaccessviae.get(key, default=None), e.items(), e.keys(), e.set(key, value)!

●  element(text)content:text(string)

●  childelements:simplyiterate

●  ifyouknowthestructureyoucanaccesschildrenviaindexnota)on:root[0][1]!

●  selectchildnodesofacertaintype:root.iter(‘typeName’)!

!

Page 20: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

XMLwithPython

●  inser)onofnewelements:-  createnewelementwith:Element(‘typeName’)!-  addcontentviatextandaKributesviaattrib!-  addexis)ngsub-elementswithappend()!-  appendnewelementtotheparentelement

●  simpleoutput:treeElement.write(‘fileName’)!

●  forvalida)onandfullXPathsupportuse:lxml!

Page 21: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

WebAccess

●  URI(UniformResourceIden)fier):isastring●  URLisaspecifictypeofURIincludingtheloca)onofaresource

●  scheme://lo.ca.ti.on/pa/th?qu_er#fragment!

●  scheme:protocolinuse

●  loca)on:thehos)ngserver

●  path:pathtotheresource●  query+fragment:op)onalspecifica)ons

Page 22: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

WebAccess

●  URLmanipula)on:–  urljoin(base_url_string,relative_url_string)

from urllib import parse as urlparse urlparse.urljoin('http://somehost.com/some/path/here','../other/path') # Result is: 'http://somehost.com/some/other/path’!

-  urlsplit(url_string,default_scheme='', allow_fragments=True) urlparse.urlsplit('http://www.python.org:80/faq.cgi?src=fie') # Result is: ('http','www.python.org:80','/faq.cgi','src=fie','') !

Page 23: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

WebAccess●  easytousethirdpartypackagerequests hKp://docs.python-requests.org/en/master/api/#requests.request

●  supportPythonv2andv3

●  threemainclasses:Request,Response,Session

●  Request:modelsaHTTPrequestsendtotheserver

●  Response:modelstheHTTPresponse●  Session:ifyouneedcon)nuity(ignorebynow)

Page 24: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

Requests●  requests.request(method,url,**kwargs)-  mandatory:method(delete,get,head,op)ons,patch,post,put)

-  mandatory:URL-  op)onalparameters:toomanytolisthere

●  conveniencefun)ons:-  requests.get/post/...(URL,[data=None],**kwargs)

●  import requests req = requests.request('GET', 'http://www.example.com/')!

Page 25: python xml web - RostlabA Glimpse of Namespaces allows to prevent tag name collisions between different authors/applicaons/domains implemented by the introducon of prefixes defined

BioinfRes SoSe 17

Response

●  usefulaKributes:status_codetellsusaboutsuccessorfailure

●  r.content:holdstheresponse’scontent●  r.iter_contentandr.iter_linesallowchunkwisereadingoftheresponsedata,especiallyinastreamcontextorwhenhandlinghugeresponses