seo scraping with excel (google suggest and more)
TRANSCRIPT
![Page 1: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/1.jpg)
#SMConnect @Zen2SEO
Search Marketing Connect - 20 e 21 Novembre 2015
SEO Scraping with Excel: From an “infinite” Google Suggest to SERPs estractions for several
goals, without any cost and with no programming skills needed
![Page 2: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/2.jpg)
#SMConnect @Zen2SEO
salsa dancing + travel + crime novels + lot of fun
=
Giuseppe Pastore(unconventional SEO manager)
Say hello!
@Zen2SEO
![Page 3: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/3.jpg)
#SMConnect @Zen2SEO
Web Scraping - WhatWeb scraping = extracting information from websites, simulating human exploration with a
software
![Page 4: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/4.jpg)
#SMConnect @Zen2SEO
Web Scraping - Whyprice comparison, contact scraping, weather data monitoring, website change detection, research,web mashup and web data integration.
![Page 5: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/5.jpg)
#SMConnect @Zen2SEO
Web Scraping - HowLots of techniques... That need coding.
I can’t code, but I like Excel.
![Page 6: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/6.jpg)
#SMConnect @Zen2SEO
ExcelSEO tools for Excel
RegExXpath
![Page 7: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/7.jpg)
#SMConnect @Zen2SEO
http://seotoolsforexcel.com
![Page 8: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/8.jpg)
#SMConnect @Zen2SEO
Regular Expression (regex or regexp) = a
sequence of characters that define a search pattern, mainly for use in pattern
matching with strings
http://goo.gl/pqtNE0
![Page 9: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/9.jpg)
#SMConnect @Zen2SEO
Xpath = a query language for selecting nodes from
an XML document
//*[@id="rso"]/div/div/h3/a
![Page 10: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/10.jpg)
#SMConnect @Zen2SEO
SCRAPING (EVERY!!!) SUGGEST
![Page 11: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/11.jpg)
#SMConnect @Zen2SEO
Google Suggest API to be discontinued
http://googlewebmastercentral.blogspot.it/2015/07/update-on-autocomplete-api.html
![Page 12: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/12.jpg)
#SMConnect @Zen2SEO
UberSuggest (takes data from Bing)
http://ubersuggest.org
![Page 13: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/13.jpg)
#SMConnect @Zen2SEO
Keyword Tool
http://keywordtool.io
![Page 14: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/14.jpg)
#SMConnect @Zen2SEO
Target #1 – Google Suggest
![Page 15: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/15.jpg)
#SMConnect @Zen2SEO
http://suggestqueries.google.com/complete/search?output=toolbar&hl=it&q=milan
Step 1
![Page 16: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/16.jpg)
#SMConnect @Zen2SEO
Step 2
=DownloadString("http://suggestqueries.google.com/complete/search?output=toolbar&hl=it&q="&A2)
<?xml version="1.0"?><toplevel><CompleteSuggestion><suggestion data="milan"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan news"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano finanza"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano meteo"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano marittima"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milanotoday"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano expo"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano malpensa"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milanuncios"/></CompleteSuggestion></toplevel>
Downloading the entire page code
![Page 17: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/17.jpg)
#SMConnect @Zen2SEO
Step 3
=RegexpReplace(DownloadString("http://suggestqueries.google.com/complete/search?output=toolbar&hl="&B2&"&q="&A2);"((.*)toplevel>)?<CompleteSuggestion><suggestion(\s)data=";"")
"milan"/></CompleteSuggestion>....</toplevel>
<?xml version="1.0"?><toplevel><CompleteSuggestion> <suggestion data="milan"/></CompleteSuggestion>...</toplevel>
Deleting nodes opening
![Page 18: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/18.jpg)
#SMConnect @Zen2SEO
Step 4
=RegexpReplace(A11;"/></CompleteSuggestion>(</toplevel>)?";",")
"milan", "milan news","milano finanza", "milan","milano","milano meteo","milano marittima","milano expo","milano malpensa","milanotoday","milan store",
"milan"/></CompleteSuggestion> "milano news"/></CompleteSuggestion>...</toplevel>
Deleting nodes closing
![Page 19: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/19.jpg)
#SMConnect @Zen2SEO
Step 5
=SINISTRA(A14;TROVA(",";A14;1))
"milan","milan", "milan news","milano finanza", "milan","milano","milano meteo","milano marittima","milano expo","milano malpensa","milanotoday","milan store",
Finding comma and isolating everything at its left
![Page 20: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/20.jpg)
#SMConnect @Zen2SEO
Step 6
=RegexpReplace(SINISTRA(A17;TROVA(",";A17;1));""",?";"")
milan"milan",
Removing quotes: I’ve isolated the first result
![Page 21: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/21.jpg)
#SMConnect @Zen2SEO
Step 7
=DESTRA(A14;LUNGHEZZA(A14)-TROVA(",";A14;1))
"milan news","milano finanza","milan","milano","milano meteo","milano marittima","milano expo","milano malpensa","milanotoday","milan store",
From the 10 results string I’m isolating the part that’s at the right of the first term
"milan","milan news","milano finanza", "milan","milano","milano meteo","milano marittima","milano expo","milano malpensa","milanotoday","milan store",
143 caratteri
8 caratteri135 caratteri
![Page 22: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/22.jpg)
#SMConnect @Zen2SEO
"milan news","milano finanza","milan","milano","milano meteo","milano marittima","milano expo","milano malpensa","milanotoday","milan store",
Iterating 5-6-7
milanmilan news
milanomilano finanzamilano meteo
milano marittimamilano expo
milano malpensamilanotodaymilan store
=SINISTRA(A14;TROVA(",";A14;1))=RegexpReplace(SINISTRA(A17;TROVA(",";A17;1));""",?";"")=DESTRA(A14;LUNGHEZZA(A14)-TROVA(",";A14;1))
![Page 23: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/23.jpg)
#SMConnect @Zen2SEO
Iterating 5-6-7
=RegexpReplace(RegexpReplace(DownloadString("http://suggestqueries.google.com/complete/search?output=toolbar&tbm=&hl="&B2&"&lang_"&B2&"&q="&A2);"((.*)toplevel>)?<(/?Complete)?suggestion((\s)data=)?>?(</toplevel>)?";"");"/>";",")
![Page 24: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/24.jpg)
#SMConnect @Zen2SEO
Target #2 – Bing Suggest
![Page 25: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/25.jpg)
#SMConnect @Zen2SEO
http://api.bing.com/osjson.aspx?query=milan
Step 1
12 resultsBased on IP
["milan",["milan news","milano finanza","milan","milano today","milano","milan live","milanotoday","milannews.it","milannews","milanofinanza.it","milano meteo","milan calciomercato"]]
https://hide.me/en/proxy
![Page 26: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/26.jpg)
#SMConnect @Zen2SEO
Target #3 – Amazon Suggest
![Page 27: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/27.jpg)
#SMConnect @Zen2SEO
http://completion.amazon.com/search/complete?method=completion&q=%q&search-alias=aps&mkt=1
http://completion.amazon.co.uk/search/complete?method=completion&q=%q&search-alias=aps&mkt=4
http://completion.amazon.co.jp/search/complete?method=completion&q=%q&search-alias=aps&mkt=6
Aps = All Product Selection (?)
Step 1
["milano",["milani","milano cookies","milano bride","milano knife","kiko milano","milano moda","milano lego","giorgio milano","milano poker chips","milanos"],[{"sc":"1","nodes":[{"name":"Beauty","alias":"beauty"},{"name":"Health & Personal Care","alias":"hpc"}]},{},{},{},{},{},{},{},{},{}],[]]
![Page 28: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/28.jpg)
#SMConnect @Zen2SEO
Target #4 – Google Image Suggest
http://suggestqueries.google.com/complete/search?json&client=toolbar&ds=i&q=%q
<?xml version="1.0"?><toplevel><CompleteSuggestion><suggestion data="milano"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan napoli"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano expo"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano metro"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano skyline"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano marittima"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano metropolitana"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano navigli"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan news"/></CompleteSuggestion></toplevel>
![Page 29: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/29.jpg)
#SMConnect @Zen2SEO
Target #5 – Youtube Suggest
http://suggestqueries.google.com/complete/search?json&client=toolbar&ds=yt&q =%q
<?xml version="1.0"?><toplevel><CompleteSuggestion><suggestion data="milano bangkok"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan napoli 0 4"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan napoli"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan palermo"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan palermo 3 2"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan napoli 0 4 auriemma"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan napoli 0 4 crudeli"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan udinese 3 2"/></CompleteSuggestion></toplevel>
![Page 30: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/30.jpg)
#SMConnect @Zen2SEO
Target #6 – Wikipedia Suggest
http://it.wikipedia.org/w/api.php?action=opensearch&search=%q
["milano",["Milano","Milano-Sanremo","Milano 2","Milano-Torino","Milano-Sanremo 2012","Milano-Sanremo 2014","Milano-Sanremo 2013","Milano-Sanremo 2015","Milano-Sanremo 2011","Milano-Sanremo 2010"],["Milano ( pronuncia /mi\u02c8lano/, in lombardo Milan, pronunciato /mi\u02c8l\u00e3\u02d0/ nel dialetto locale) \u00e8 una citt\u00e0 italiana di 1 342 806 abitanti, capoluogo dell'omonima citt\u00e0 metropolitana e della regione Lombardia, secondo comune italiano per numero di abitanti, tredicesimo comune dell'Unione europea e diciannovesimo del continente e, con l'agglomerato urbano, terza area metropolitana pi\u00f9 popolata d'Europa dietro Londra e Parigi.","La Milano-Sanremo \u00e8 una corsa in linea maschile di ciclismo su strada professionistico, una delle pi\u00f9 importanti corse ciclistiche del relativo circuito internazionale e prima grande classica nel calendario ciclistico stagionale.","Milano 2 (o anche Milano Due, abbreviato MI2 e M2) \u00e8 un quartiere residenziale sito nel territorio del comune italiano di Segrate, nella citt\u00e0 metropolitana di Milano.","La Milano-Torino \u00e8 una corsa in linea maschile di ciclismo su strada, che si svolge tra Milano e Torino, in Italia, ogni anno nel mese di ottobre, ed \u00e8 una delle classiche d'autunno.","La Milano-Sanremo 2012, centotreesima edizione della corsa, si \u00e8 disputata il 17 marzo 2012, per un percorso totale di 298 km.","La Milano-Sanremo 2014, centocinquesima edizione della corsa, valida come quarta prova del circuito UCI World Tour 2014, si svolse il 23 marzo 2014 su un percorso di 294km, con partenza da Milano ed arrivo a Sanremo.","La Milano-Sanremo 2013, centoquattresima edizione della corsa, si \u00e8 disputata il 17 marzo 2013 su un percorso accorciato per motivi meteorologici da 298 km a 255 km.","La Milano-Sanremo 2015, centoseiesima edizione della corsa, valida come quarta prova del circuito UCI World Tour 2015, si \u00e8 svolta il 22 marzo 2015 su un percorso di 293 km, con partenza da Milano ed arrivo a Sanremo.","La Milano-Sanremo 2011, centoduesima edizione della corsa, si \u00e8 disputata il 19 marzo 2011, per un percorso totale di 298 km.","La Milano-Sanremo 2010, centunesima edizione della corsa, si \u00e8 disputata il 20 marzo 2010 e ha affrontato un percorso totale di 298 km."],["https://it.wikipedia.org/wiki/Milano","https://it.wikipedia.org/wiki/Milano-Sanremo","https://it.wikipedia.org/wiki/Milano_2","https://it.wikipedia.org/wiki/Milano-Torino","https://it.wikipedia.org/wiki/Milano-Sanremo_2012","https://it.wikipedia.org/wiki/Milano-Sanremo_2014","https://it.wikipedia.org/wiki/Milano-Sanremo_2013","https://it.wikipedia.org/wiki/Milano-Sanremo_2015","https://it.wikipedia.org/wiki/Milano-Sanremo_2011","https://it.wikipedia.org/wiki/Milano-Sanremo_2010"]]
![Page 31: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/31.jpg)
#SMConnect @Zen2SEO
SCRAPING (GOOGLE) SERPs
![Page 32: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/32.jpg)
#SMConnect @Zen2SEO
Target #2 – Google SERP
![Page 33: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/33.jpg)
#SMConnect @Zen2SEO
Xpath Identification
Step 1
//h3[@class='r']/a
![Page 34: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/34.jpg)
#SMConnect @Zen2SEO
Href element estraction
Step 2
=XPathOnUrl("https://www.google.it/search?q=%q&hl=it&&tbs=lr:lang_1it,qdr:a&prmd=ivns&num=10&source=lnt";"(//h3[@class='r']/a)["1"]";"href")
![Page 35: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/35.jpg)
#SMConnect @Zen2SEO
Target #3 – Google Cache
![Page 36: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/36.jpg)
#SMConnect @Zen2SEO
http://webcache.googleusercontent.com/search?hl=it&q=cache:http://www.miosito.it
Step 1
![Page 37: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/37.jpg)
#SMConnect @Zen2SEO
=RegexpFindOnUrl("http://webcache.googleusercontent.com/search?hl=it&q=cache%3Ahttp://www.giuseppepastore.com");"cache di Google di(.*)</a>\.(\s)")
Step 2
cache di Google di <a href="http://www.giuseppepastore.com" dir="ltr">http://www.giuseppepastore.com</a>.
=RegexpFindOnUrl("http://webcache.googleusercontent.com/search?hl=it&q=cache%3Ahttp://www.giuseppepastore.com");" visualizzata il(.*)GMT ")
visualizzata il 16 nov 2015 14:29:53 GMT
![Page 38: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/38.jpg)
#SMConnect @Zen2SEO
Conclusions
Google SuggestBing Suggest
Google Image SuggestYoutube SuggestAmazon Suggest
Wikipedia Suggest
(What-Ever-You-Want Suggest – as long you can query an URL)
![Page 39: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/39.jpg)
#SMConnect @Zen2SEO
Conclusions
Google SERPsGoogle Cache
(What-Ever-You-Want from any web page)
![Page 40: SEO scraping with Excel (Google suggest and more)](https://reader034.vdocuments.us/reader034/viewer/2022042722/58aaf1281a28abc73a8b63cb/html5/thumbnails/40.jpg)
#SMConnect @Zen2SEO
Thank you!Giuseppe Pastore
@Zen2SEO