growth hackers toronto web scraping & lead generation
TRANSCRIPT
Growth Hacking Toronto
@GrowthCollectiv@Dean_h_wang@FundThrough@TopHat@EventMobi@GuruLink
Connect with us:
N: EventMobi-GuestP: emguests#GrowthHackingTO
Web Scraping and Lead Generation
By:Dean Wang
Web Scraping and Lead GenerationNovember 17, 2015
About MeDean WangData Intelligence Manager at Influitive
@dean_h_wanglinkedin.com/in/deanwang1
• Data Nightmares• What is web scraping?• Learning web scraping• Demo set up• Legal stuff and prevention• More to explore• Demo results
Outline
Data NightmaresIs your database a strength or weakness?
Dude, where are my prospects?
Distinguishing Apples and Oranges
Hi [firstname],
Lead Generation
Data Refresh/Appending
Incomplete entries• Contact information• Segmentation
Outdated entries• Person has left the company• Company has grown or shrunk• Company was acquired
Innovative Marketing Initiatives
Personalize communications with targeted prospects
Target prospects with strong signalsMore possibilities for ongoing campaigns
Traditional Solutions
Manual data entry• Time and resource intensive
List buying• Quality problems
New Solutions
Web Scraping(the subject of this talk!)
Accessing a website’s APINot always available
What is Web Scraping?
The World Wide Web
• Full of information, directories• Some webpages are semi-structured• Web scraping can take advantage of this
structure
“Web Scraping” as an Intern
What an intern or scraper would do:• Find a list of useful pages• Click through each page• Note down the useful information
Learning Web Scraping
The Structure of Webpages
The Structure of Webpages
HTML tags “mark up” pages
<tag>
<div>
<span>
<a href=“www.influitive.com”>
The Structure of WebpagesWith a few exceptions, each opening tag has a closing tag
<div></div>
<span></span>
<h3></h3>
<a href=“www.influitive.com”></a>
The Structure of WebpagesTags can contain text or even other tags within them
<div>This text is in the div element</div>
<nested><tag></tag>
</nested>
<a href=“www.influitive.com”>Influitive</a>
The Structure of Webpages
CSS Selectors
CSS Selectors enable you to communicate exactly what tags you want to selectAn advanced technique that gives you more control over scrapinge.g. div, div a, h3.class
Structure and Web ScrapingWeb scraping takes advantages of HTML tags
The tags must be somewhat regular on every page in order for the web scraping to work
DemoFingers crossed!
Legal Stuff and Prevention
Legal Stuff
Is it legal? • A site’s terms of use often forbid it• Information may be otherwise publicly available• Case of QVC v Resultly• A gray area in general
Preventing Web Scraping
• robots.txt• Restrictions on number of page views allowed• Blacklisting IP addresses• CAPTCHAs• Slight variations in website design
More to ExploreLinks and Tools
Some Useful Linkshttp://www.w3schools.com/html/
http://scraping.pro/
https://blog.hartleybrody.com/web-scraping/
Different Web Scraping ToolsChrome Extensions
Data Miner/ScraperWebscraper
ScraperWikiMozendaKnowledge of Python required
ScrapySelenium + Beautiful Soup
Demo ResultsFingers crossed!
Questions?
Thank You