web service and open datatixeuil/m2r/uploads/main/pr...web service and open data by lélia blins -...

46
Web Service and Open Data By Lélia Blins - ProgRes 2018 [email protected] Thanks to Quentin Bramas

Upload: others

Post on 20-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Web Service and Open Data

By Lélia Blins - ProgRes 2018 [email protected]

Thanks to Quentin Bramas

What is a Web Service ?A Web Service is a method of communication between two electronic devices over the Web.

HTTP is the typical protocol used by WebService to communicate.

What is a Web Service ?

Request

Response

Device ServerHTTP

ApplicationProgrammingInterface

An Interface used by Programs to interact with an Application

APIs exposes a service which consumes the service

Developers write a

program

Examples

Example twitter API

Example geocoding APIs

Geocoding APIs

• Open Street Map API

• Google Map API

• Adress Data Gouv

• ….

What is the format of the response?

Request

Response

Device ServerHTTP

https://maps.googleapis.com/maps/api/geocode/xml?address=25%20rue%20lang%20france

https://maps.googleapis.com/maps/api/geocode/json?address=25%20rue%20lang%20france

Web Api - example

REpresentational State Transfert

REST Web Api

Is a web service using simpler REpresentational State Transfer (REST) based communication.

Request is just a HTTP Method over an URI. Response is typically JSON or XML.

Request HTTP

GET

POST

PUT

DELETE

Request Python>>> import requests >>> r = requests.get("http://linuxfr.org/") >>> print(r.text) <!DOCTYPE html> <html lang="fr"> <head> <meta charset="utf-8"> <title>Accueil - LinuxFr.org</title> <style type="text/css">header#branding h1 { background-image: url('/images/logos/linuxfr2_mountain.png') }</style>

r = requests.put("http://linuxfr.org/") r = requests.delete("http://linuxfr.org/") r = requests.patch("http://linuxfr.org/") r = requests.post("http://linuxfr.org/") r = requests.head("http://linuxfr.org/") r = requests.options("http://linuxfr.org/")

Request PythonSend data

data = {"first_name":"Richard", "second_name":"Stallman"} r = requests.post("http://linuxfr.org", data = data)

Picture

file = {'file': open("photo.png", "rb")} r = requests.post("http://linuxfr.org", files = file)

r.text #Return the content (unicode) r.content #Return the content (bytes) r.json #Return the content (json) r.headers #Return the content (Dict)

Resources

• ex: Facebook Graph Api: GET: /{photo-id} to retrieve the info of a photo GET: /{photo-id}/likes to retrieve the people who like it POST: /{photo-id} to update the photo DELETE : /{photo-id} to delete the photo

URI/Resource based:

• ex: Google Calendar Api: GET: /calendars/{calendarId} to retrieve the info of a calendar PUT: /calendars/{calendarId} to update a calendar DELETE : /calendars/{calendarId} to delete a calendarPOST: /calendars to create a calendar GET: /calendars/{calendarId}/events/{eventId}

ResponseHTTP Response: • 200: OK • 3 _ _: Redirection • 404: not found (4 _ _ : something went wrong with what you try to access) • 5 _ _ : Server Error

API Response: • Flickr:

{ "stat": "fail", "code": 1, "message": "User not found" } { "galleries": { ... }, "stat": "ok" }

• Google Calendar:{ "error": {"code": 403, "message": "User Rate Limit Exceeded" } } { "kind": "calendar#events","summary": ..., "description": ...

• text/plain

• text/html

• text/xml or application/xml

• application/json

• image/png

• ...

Response

Content-Type:

Python JSON and XML

Parsing

use the json package:

>> obj = json.loads('{"attr1": "v1", "attr2": 42}') >> obj['attr1'] 'v1' >> obj['attr2'] 42 >> obj = {'id':1, 'data':[1,2,3,4]} >> json.dumps(obj) # returns a string '{'id':1, 'data':[1,2,3,4]}'

JSON Parsing

Convert JSON to Python Object (Dict)

use the json package:

import json

json_data = '{"name": "Marie", "city": "Paris"}' python_obj = json.loads(json_data) print python_obj[« name"] print python_obj[« city"]

Result>python3 01_Json.py Maria Paris

Convert JSON to Python Object (List)

import json

json_data = '{"persons": [{"name": "Marie", "city": "Paris"}, {"name": "Pierre", "city": "Lyon"} ] }' python_obj = json.loads(json_data) print json.dumps(python_obj, sort_keys=True, indent=4)

Result>python3 02_Json.py { "persons": [ { "city": "Paris", "name": "Marie" }, { "city": "Lyon", "name": "Pierre" } ] }

Convert JSON to Python Object

import json

json_input = '{"persons": [{"name": "Marie", "city": "Paris"}, {"name": "Pierre", "city": "Lyon"} ] }' try: decoded = json.loads(json_input) # Access data for x in decoded['persons']: print x['name'] except (ValueError, KeyError, TypeError): print "JSON format error"

Result>python3 03_Json.py

Marie Pierre

Use JSON file

import json

data = json.load(open('lang.json'))

try: # Access data for x in data['results']: print x['formatted_address'] except (ValueError, KeyError, TypeError): print "JSON format error"

Result>python3 04_Json.py 25 Rue Cité Lang, 68560 Hirsingue, France 25 Rue Raphaël Lang, 54500 Vandœuvre-lès-Nancy, France

XML ParsingWith xml.etree.ElementTree, xml.sax, or html.parser

import xml.etree.ElementTree as ETtree = ET.parse(‘countryXML.xml')

xml.etree.ElementTree load the whole file, you can then naviguate in the tree structure.

$ python Python 2.7.10 (default, Feb 7 2017, 00:08:15) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import xml.etree.ElementTree as ET >>> tree=ET.parse('contryXML.xml') >>> root=tree.getroot() >>> root.tag 'data' >>> root.attrib {} >>> for child in root: ... print child.tag, child.attrib ... country {'name': 'Liechtenstein'} country {'name': 'Singapore'} country {'name': 'Panama'} >>> root[0][1].text '2008' >>> for n in root.iter('neighbor'): ... print n.attrib ... {'direction': 'E', 'name': 'Austria'} {'direction': 'W', 'name': 'Switzerland'} {'direction': 'N', 'name': 'Malaysia'} {'direction': 'W', 'name': 'Costa Rica'} {'direction': 'E', 'name': 'Colombia'}

Simple API to XML: SAXimport xml.sax

class MyHandler ( xml.sax.ContentHandler):

def __init__( self): xml.sax.ContentHandler.__init__( self) self.element_name2count = {}

def startElement( self, name, attrs): self.element_name2count[ name] = self.element_name2count.get( name, 0) + 1

filename = "lang.xml"handler = MyHandler()xml.sax.parse( filename, handler)# sort elements according to their countto_sort = [(count,name) for name,count in handler.element_name2count.iteritems()]to_sort.sort( reverse=True)for count,name in to_sort: print "%s: %d" % (name,count)

Simple API to XML: SAXResult type: 24 short_name: 14 long_name: 14 address_component: 14 lng: 6 lat: 6 viewport: 2 southwest: 2 result: 2 place_id: 2 partial_match: 2 northeast: 2 location_type: 2 location: 2 geometry: 2 formatted_address: 2 status: 1 GeocodeResponse: 1

Beautifulsoup (HTML parser)import requestsfrom bs4 import BeautifulSoup

r = requests.get("https://fr.wikipedia.org/wiki/Beautiful_Soup")soup = BeautifulSoup(r.content, "html.parser")

#print(soup)print (soup.title)

> python3 06_BS.py <title>Python — Wikipédia</title>

Regular Expressions

Regular Expressions

Regular expressions are a powerful language for matching text patterns.The Python "re" module provides regular expression support.

>>> import re >>> re.findall("([0-9]+)", "Bonjour 111 Aurevoir 222") ['111', '222']

Regular Expressions• a, X, 9, < -- ordinary characters just match themselves exactly. The meta-characters

which do not match themselves because they have special meanings are: . ^ $ * + ? { [ ] \ | ( ) (details below)

• . (a period) -- matches any single character except newline '\n'• \w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_].

Note that although "word" is the mnemonic for this, it only matches a single word char, not a whole word. \W (upper case W) matches any non-word character.

• \b -- boundary between word and non-word• \s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab,

form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace character.• \t, \n, \r -- tab, newline, return• \d -- decimal digit [0-9] (some older regex utilities do not support but \d, but they all

support \w and \s)• ^ = start, $ = end -- match the start or end of the string• \ -- inhibit the "specialness" of a character. So, for example, use \. to match a period or \\

to match a slash. If you are unsure if a character has special meaning, such as '@', you can put a slash in front of it, \@, to make sure it is treated just as a character.

[email protected]

Regular ExpressionsExtract Email Information:

([^@]+)@([^@]+)

[ ]

^ a characterthat is not

@ the at symbol + at least one of this character

>>> m= re.match('([^@]+)@([^@]+)','[email protected]') >>> m.group(1) 'lelia.blin' >>> m.group(2) 'lip6.fr' >>>

Create an API with Python

Create an API

• Django: Powerful web framework with a lot of modules. Great to build a complete website.

• Flask: Small Framework to build simple website.

• Bottle: Similar to Flask, but even simpler. Perfect to build an API

Available library/framework in python:

Create an APIThe Bottle Framework (single file module, no dependencies)

• Routing: Requests to function-call mapping with support for clean and dynamic URLs.

• Templates: Fast and pythonic built-in template engine • Utilities: Convenient access to form data, file uploads,

cookies, headers and other HTTP-related metadata. • Server: Built-in HTTP development server and

support for other WSGI capable HTTP server. (WSGI is the Web Server Gateway Interface, which is a specification for web server in python)

Create an API

from bottle import route, run

@route('/hello')def hello():

return 'Hello world' run(host='localhost', port=8080)

Hello world example:

>python3 07_Hello.py Bottle v0.12.13 server starting up (using WSGIRefServer())... Listening on http://localhost:8080/ Hit Ctrl-C to quit.

http://localhost:8080/hello

Id in URL

from bottle import route, run, template

@route('/hello/<name>')def hello(name): return 'Hello ' + name

run(host='localhost', port=8080)

File: 08_HelloName.pyURL: http://localhost:8080/hello/Marie

Id in URLfrom bottle import route, run, template

@route('/hello/<name>')def hello(name): return 'Hello ' + name #http://localhost:8080/hello/Marie

@route('/bonjour/<name>')def bonjour(name):

return 'Bonjour ' + name#http://localhost:8080/bonjour/Marie

@route('/buenas/<name>')def buena(name):

return 'Buenas dias ' + name#http://localhost:8080/buenas/Marie

run(host='localhost', port=8080)

File: 09_HelloPL.py

Id in URL

from bottle import Bottle, run, view, request app = Bottle() @app.route('/jemesure')def jemesure(): return "Je mesure " + request.params.taille + " cm" run(app, host='localhost', port=8080)#, reloader=True)

File: 10_Taille.pyURL: http://localhost:8080/jemesure?taille=133

Static content#!/usr/bin/env python # -*- coding: utf-8 -*- from bottle import Bottle, run, static_file app = Bottle() @app.route('/static/<filename:path>') def server_static(filename): return static_file(filename, root='.') run(app, host='localhost', port=8080, reloader=True)

File: 11_Img.pyURL: http://localhost:8080/static/cube.png

Open Data

Open DataPublicly available API / Dataset about:

• Education • Public Transport • Economie • Sport Results