automatic data collection on the internet (web …... wir bewegen informationen automatic price...
TRANSCRIPT
www.statistik.at Wir bewegen Informationen
Automatic price collection on the internet (web scraping)
Ingolf Boettcher
Tokio20. May 2015 Ottawa Group 2015 –
Topic 1 Alternate data sources and Index number formulasSession 2 Online Prices and Web Scraping
www.statistik.at Folie 2 | 26.05.2015
Web scraping
There is a hugeamount of data on the internet <HTML>
<HEAD><TITLE> DATA </Title></HEAD> </HTML>
How can we bestcollect/scrape/harvestdata from there forstatistical purposes?
www.statistik.at Folie 3 | 26.05.2015
Web scraping
Internet data collection –Minimum goal for (Price) Statistics:
Turn website content into a spreadsheet
www.statistik.at Folie 4 | 26.05.2015
Web scraping
Internet data collection
Options:
1. Manual price collection2. Develop an API /Web scraper2.1 by writing custom computer code2.2 by using point and click web tools
www.statistik.at Folie 5 | 26.05.2015
Web scraping
Reasons for not writing an ownweb scraper
IT‐developer needed, therefore:• Expensive • Inflexible • Even maintenance cannot be
handled by CPI staff
www.statistik.at Folie 6 | 26.05.2015
Web scraping
Reasons to use click and point webtools forweb scraping:
No IT‐developer needed, therefore:
• Cheap• Flexible• No programming skill required
www.statistik.at Folie 7 | 26.05.2015
Web scraping
How web scraping with click and point usingimport.io looks like:
• web-platform that allows to structure andextract data from websites
www.statistik.at Folie 8 | 26.05.2015
Webscraping
Web scraping with click and point on web‐based platform offers solutions to:
• extract data by point-and-click• record actions on a website • crawl all the data of a webpage
More issues to be considered:• Legality to crawl on websites• Internal IT Security • Training of staff
www.statistik.at Folie 9 | 26.05.2015
Contact:Ingolf Boettcher
Guglgasse 13, 1110 WienTel: +43 (1) 71128-7917
Fax: +43 (1) [email protected]
Automatic price collection on the internet (web scraping)
www.statistik.at Folie 10 | 26.05.2015
Webscraping with import.io