method of communication and open datatixeuil/m2r/uploads/main/... · a web service is a method of...

12
Web Service and Open Data By Lélia Blins - ProgRes [email protected] Thanks to Quentin Bramas What is a Web Service ? A Web Service is a method of communication between two electronic devices over the Web. HTTP is the typical protocol used by WebService to communicate. What is a Web Service ? Request Response Device Server HTTP Application Programming Interface

Upload: others

Post on 20-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Web Service and Open Data

By Lélia Blins - ProgRes [email protected]

Thanks to Quentin Bramas

What is a Web Service ?A Web Service is a method of communication between two electronic devices over the Web.

HTTP is the typical protocol used by WebService to communicate.

What is a Web Service ?

Request

Response

Device ServerHTTP

ApplicationProgrammingInterface

An Interface used by Programs to interact with an Application

APIs exposes a service which consumes the service

Developers write a

program

Examples Example twitter API

Example geocoding APIs Geocoding APIs

• Open Street Map API

• Google Map API

• Adress Data Gouv

• ….

Request

Response

Device ServerHTTP

What is the format of the Request?

Request

Response

Device ServerHTTP

What is the format of the Response?

Request

Response

Device ServerHTTP

REpresentational State Transfert

REST Web Api

Is a web service using simpler REpresentational State Transfer (REST) based communication.

Request is just a HTTP Method over an URL. Response is typically JSON or XML.

REpresentational State Transfer (REST)

REST standart was create by par Roy Fielding in 2000 during is phd intituled « Architectural Styles and the Design of Network-based Software Architectures ».

REST: Architectural properties

• Simplicity of a uniform interface;

• modifiability of components to meet changing needs (even while the application is running);

• visibility of communication between components by service agents;

• portability of components by moving program code with the data;

• reliability in the resistance to failure at the system level in the presence of failures within components, connectors, or data.[9]

REST: Architectural constraints

• Client–server architecture

• Statelessness

• Cacheability

• Layered system

• Code on demand (optional)

• Uniform interface

REST: Uniform interface

• Resource identification in requests

• Resource manipulation through representations

• Self-descriptive messages

• Hypermedia as the engine of application state

Request HTTP

GET

POST

PUT

DELETE

Web Api REST - example

• https://www.lip6.fr/recherche/team_membres.php?acronyme=NPA

• https://fr.wikipedia.org/wiki/Representational_state_transfer

• https://api-adresse.data.gouv.fr/search/?q=8+bd+du+port

Request Pythonimport requests

• requests.get(« http://linuxfr.org/")

• requests.put(« http://linuxfr.org/")

• requests.delete(« http://linuxfr.org/")

• requests.patch(« http://linuxfr.org/")

• requests.post(« http://linuxfr.org/")

• requests.head(« http://linuxfr.org/« )

• requests.options("http://linuxfr.org/")

Request Python: Example>>> import requests >>> >>> r = requests.get("https://api-adresse.data.gouv.fr/search/?q=8+bd+du+port") >>> print(r.text)

{"licence": "ODbL 1.0", "limit": 5, "features": [{"geometry": {"coordinates": [2.290084, 49.897443], "type": "Point"}, "type": "Feature", "properties": {"context": "80, Somme, Hauts-de-France (Picardie)", "y": 6977867.2, "citycode": "80021", "postcode": "80000", "score": 0.4626765550239234, "street": "Boulevard du Port", "city": "Amiens", "type": "housenumber", "x": 648952.6, "housenumber": "8", "importance": 0.3526, "name": "8 Boulevard du Port", "id": "ADRNIVX_0000000260875032", "label": "8 Boulevard du Port 80000 Amiens"}}, {"geometry": {"coordinates": [2.062794, 49.0317], "type": "Point"}, "type": "Feature", "properties": {"context": "95, Val-d'Oise, \u00cele-de-France", "y": 6881718.8, "citycode": "95127", "postcode": "95000", "score": 0.4494947368421052, "street": "Boulevard du Port", "city": "Cergy", "type": "housenumber", "x": 631466.4, "housenumber": "8", "importance": 0.2076, "name": "8 Boulevard du Port", "id": "ADRNIVX_0000002010754592", "label": "8 Boulevard du Port 95000 Cergy"}}, {"geometry": {"coordinates": [3.605884, 43.425225], "type": "Point"}, "type": "Feature", "properties": {"context": "34, H\u00e9rault, Occitanie (Languedoc-Roussillon)", "y": 6258645.4, "citycode": "34157", "postcode": "34140", "score": 0.44880382775119615, "street": "Boulevard du Port", "city": "M\u00e8ze", "type": "housenumber", "x": 749085.3, "housenumber": "8", "importance": 0.2, "name": "8 Boulevard du Port", "id": "ADRNIVX_0000000284423783", "label": "8 Boulevard du Port 34140 M\u00e8ze"}}, {"geometry": {"coordinates": [-2.34098, 47.258819], "type": "Point"}, "type": "Feature", "properties": {"context": "44, Loire-Atlantique, Pays-de-la-Loire", "y": 6697933.5, "citycode": "44132", "postcode": "44380", "score": 0.43526746411483247, "street": "Boulevard du Port", "city": "Pornichet", "type": "housenumber", "x": 296410.1, "housenumber": "8", "importance": 0.0511, "name": "8 Boulevard du Port", "id": "ADRNIVX_0000000280022748", "label": "8 Boulevard du Port 44380 Pornichet"}}, {"geometry": {"coordinates": [3.036731, 42.79091], "type": "Point"}, "type": "Feature", "properties": {"context": "66, Pyr\u00e9n\u00e9es-Orientales, Occitanie (Languedoc-Roussillon)", "y": 6187933.1, "citycode": "66017", "postcode": "66420", "score": 0.43476746411483247, "street": "Boulevard du Port", "city": "Le Barcar\u00e8s", "type": "housenumber", "x": 703008.6, "housenumber": "8", "importance": 0.0456, "name": "8 Boulevard du Port", "id": "ADRNIVX_0000000263992135", "label": "8 Boulevard du Port 66420 Le Barcar\u00e8s"}}], "version": "draft", "attribution": "BAN", "type": "FeatureCollection", "query": "8 bd du port"}

Request Python: Example

>>> import request >>> r = requests.get("https://www.lip6.fr/recherche/team_membres.php?acronyme=NPA") >>> print(r.text)

• text/plain

• text/html

• text/xml or application/xml

• application/json

• image/png

• ...

Response

Content-Type:

Parsing

Regular Expressions

Regular Expressions

Regular expressions are a powerful language for matching text patterns.The Python "re" module provides regular expression support.

>>> import re >>> re.findall("([0-9]+)", "Bonjour 111 Aurevoir 222") ['111', '222']

Regular Expressions• a, X, 9, < -- ordinary characters just match themselves exactly. The meta-characters

which do not match themselves because they have special meanings are: . ^ $ * + ? { [ ] \ | ( ) (details below)

• . (a period) -- matches any single character except newline '\n'• \w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_].

Note that although "word" is the mnemonic for this, it only matches a single word char, not a whole word. \W (upper case W) matches any non-word character.

• \b -- boundary between word and non-word• \s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab,

form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace character.• \t, \n, \r -- tab, newline, return• \d -- decimal digit [0-9] (some older regex utilities do not support but \d, but they all

support \w and \s)• ^ = start, $ = end -- match the start or end of the string• \ -- inhibit the "specialness" of a character. So, for example, use \. to match a period or \\

to match a slash. If you are unsure if a character has special meaning, such as '@', you can put a slash in front of it, \@, to make sure it is treated just as a character.

[email protected]

Regular ExpressionsExtract Email Information:

([^@]+)@([^@]+)[ ] ^

a characterthat is not

@ the at symbol + at least one of this character

>>> m= re.match('([^@]+)@([^@]+)','[email protected]') >>> m.group(1) 'lelia.blin' >>> m.group(2) 'lip6.fr' >>>

Regular Expressions

>>> import request >>> r = requests.get("https://www.lip6.fr/recherche/team_membres.php?acronyme=NPA") >>>import re >>>re.findall('(01([ ][0-9]{2}){4})’,r.text)

[('01 44 27 89 90', ' 90'), ('01 44 27 87 89', ' 89'), ('01 44 27 87 98', ' 98'), ('01 44 27 87 72', ' 72'), ('01 44 27 30 58', ' 58'), ('01 44 27 88 57', ' 57'), ('01 44 27 87 88', ' 88'), ('01 44 27 71 06', ' 06'), ('01 44 27 88 59', ' 59'), ('01 44 27 88 80', ' 80'), ('01 44 27 88 81', ' 81'), ('01 44 27 88 42', ' 42'), ('01 44 27 87 64', ' 64'), ('01 44 27 73 83', ' 83'), ('01 44 27 71 27', ' 27'), ('01 44 27 87 62', ' 62'), ('01 44 27 71 26', ' 26'), ('01 44 27 88 38', ' 38'), ('01 44 27 88 39', ' 39'), ('01 44 27 88 39', ' 39'), ('01 44 27 87 75', ' 75'), ('01 44 27 71 28', ' 28'), ('01 44 27 72 77', ' 77'), ('01 44 27 88 39', ' 39'), ('01 44 27 71 28', ' 28'), ('01 44 27 88 39', ' 39'), ('01 44 27 71 28', ' 28'), ('01 44 27 71 28', ' 28'), ('01 44 27 71 28', ' 28'), ('01 44 27 91 21', ' 21'), ('01 44 27 43 75', ' 75'), ('01 44 27 71 03', ' 03'), ('01 44 27 71 16', ' 16'), ('01 44 27 71 34', ' 34'), ('01 44 27 91 21', ' 21'), ('01 44 27 88 39', ' 39'), ('01 44 27 72 77', ' 77'), ('01 44 27 71 28', ' 28'), ('01 44 27 88 42', ' 42'), ('01 44 27 72 77', ' 77'), ('01 44 27 72 77', ' 77'), ('01 44 27 72 77', ' 77'), ('01 44 27 72 77', ' 77'), ('01 44 27 88 42', ' 42'), ('01 44 27 72 77', ' 77'), ('01 44 27 72 77', ' 77'), ('01 44 27 88 42', ' 42'), ('01 44 27 60 06', ' 06'), ('01 44 27 72 77', ' 77’)]

Beautifulsoup (HTML parser)

import requestsfrom bs4 import BeautifulSoup

r = requests.get("https://dblp.uni-trier.de/pers/hy/b/Blin:L=eacute=lia")soup = BeautifulSoup(r.content, "html.parser")

print (soup.title)

<title>dblp: Lélia Blin</title>

Find the information• Look the source of the html page

• Examples: Title of the article

>r=requests.get(« https://dblp.org/pers/hd/b/Bramas:Quentin ») >soup = BeautifulSoup(r.content, « html.parser") >>> print(soup.title) <title>dblp: Quentin Bramas</title> >spans=soup.find_all(‘span',attrs={'class':'title'}) >>> for span in spans: >>> print(span.string) >>> The complexity of data aggregation in static and dynamic wireless sensor networks. The Random Bit Complexity of Mobile Robots Scattering. Killing Nodes as a Countermeasure to Virus Expansion. Energy-Centric Wireless Sensor Networks. (Réseaux de capteurs sans fil efficaces en énergie). Distributed Online Data Aggregation in Dynamic Graphs. Benchmarking Energy-Centric Broadcast Protocols in Wireless Sensor Networks. Brief Announcement: Probabilistic Asynchronous Arbitrary Pattern Formation. Packet Efficient Implementation of the Omega Failure Detector. Probabilistic Asynchronous Arbitrary Pattern Formation (Short Paper). Distributed Online Data Aggregation in Dynamic Graphs. The Random Bit Complexity of Mobile Robots Scattering. WiSeBat: accurate energy benchmarking of wireless sensor networks. Wait-Free Gathering Without Chirality. The Complexity of Data Aggregation in Static and Dynamic Wireless Sensor Networks. Packet Efficient Implementation of the Omega Failure Detector. Asynchronous Pattern Formation without Chirality. The Random Bit Complexity of Mobile Robots Scattering.

Find information

• Find the co-authors >>> divs=soup.find_all(‘div',attrs={'class':'person'}) >>> for div in divs: ... print(div.string) ... None Xavier Défago Wilfried Dron Mariem Ben Fadhl Dianne Foreback Patrick Garda Khalil Hachicha Toshimitsu Masuzawa Mikhail Nesterenko Thanh Dang Nguyen Sébastien Tixeuil

JSON and XML

JSON Example{ "fruits": [ { "kiwis": 3, "mangues": 4, "pommes": null }, { "panier": true } ], "legumes": { "patates": "amandine", "poireaux": false }, "viandes": ["poisson","poulet","boeuf"] }

Convert JSON to Python Object (Dict)

use the json package:

import json

json_data = '{"name": "Marie", "city": "Paris"}' python_obj = json.loads(json_data) print python_obj[« name"] print python_obj[« city"]

Result>python3 01_Json.py Maria Paris

Convert JSON to Python Object (List)

import json

json_data = '{"persons": [{"name": "Marie", "city": "Paris"}, {"name": "Pierre", "city": "Lyon"} ] }' python_obj = json.loads(json_data) print(json.dumps(python_obj, sort_keys=True, indent=4))

Result>python3 02_Json.py { "persons": [ { "city": "Paris", "name": "Marie" }, { "city": "Lyon", "name": "Pierre" } ] }

Convert JSON to Python Object

import json

json_input = '{"persons": [{"name": "Marie", "city": "Paris"}, {"name": "Pierre", "city": "Lyon"} ] }' try: decoded = json.loads(json_input) # Access data for x in decoded['persons']: print x['name'] except (ValueError, KeyError, TypeError): print "JSON format error"

Result>python3 03_Json.py

Marie Pierre

Use JSON file

import json

data = json.load(open('lang.json'))

try: # Access data for x in data['results']: print x['formatted_address'] except (ValueError, KeyError, TypeError): print "JSON format error"

Result>python3 04_Json.py 25 Rue Cité Lang, 68560 Hirsingue, France 25 Rue Raphaël Lang, 54500 Vandœuvre-lès-Nancy, France

XML Example<?xml version="1.0" encoding="UTF-8"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

XML ParsingWith xml.etree.ElementTree

import xml.etree.ElementTree as ETtree = ET.parse(‘countryXML.xml')

xml.etree.ElementTree load the whole file, you can then naviguate in the tree structure.

File XML<?xml version="1.0"?> <data> <country name="Liechtenstein"> <rank>1</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/> <neighbor name="Switzerland" direction="W"/> </country> <country name="Singapore"> <rank>4</rank> <year>2011</year> <gdppc>59900</gdppc> <neighbor name="Malaysia" direction="N"/> </country> <country name="Panama"> <rank>68</rank> <year>2011</year> <gdppc>13600</gdppc> <neighbor name="Costa Rica" direction="W"/> <neighbor name="Colombia" direction="E"/> </country> </data>

$ python3 Python 2.7.10 (default, Feb 7 2017, 00:08:15) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import xml.etree.ElementTree as ET >>> tree=ET.parse('contryXML.xml') >>> root=tree.getroot() >>> root.tag 'data' >>> root.attrib {} >>> for child in root: ... print(child.tag, child.attrib) ... country {'name': 'Liechtenstein'} country {'name': 'Singapore'} country {'name': 'Panama'} >>> root[0][1].text '2008' >>> for n in root.iter('neighbor'): ... print(n.attrib) ... {'direction': 'E', 'name': 'Austria'} {'direction': 'W', 'name': 'Switzerland'} {'direction': 'N', 'name': 'Malaysia'} {'direction': 'W', 'name': 'Costa Rica'} {'direction': 'E', 'name': 'Colombia'}

Xml exampleFile 05_XML.py

import xml.etree.ElementTree as ET tree = ET.parse('ContryXML.xml') root = tree.getroot()

# Or Short: root = ET.fromstring(country_data_as_string)

print("---------------contry") for child in root: print(child.tag, child.attrib)

print("---------------Rank:") for rank in root.iter('rank'): print(rank.text) print("---------------neighbors") for neighbor in root.iter('neighbor'): print(neighbor.attrib) print("---------------neighbors name") for neighbor in root.iter('neighbor'): print(neighbor.get('name')) print("---------------contry and neighbors") for child in root: print("the neighbors of",child.get('name'),":") for neighbor in root.iter('neighbor'): print(neighbor.get('name'))