how to clean up travel website traffic from bots and spammers?

33
Cleaning up website traffic from bots & spammers TLearn webinar 24 February 2015

Upload: tnooz

Post on 14-Jul-2015

413 views

Category:

Technology


1 download

TRANSCRIPT

Cleaning up website traffic from

bots & spammers

TLearn webinar

24 February 2015

Your hosts

Kevin May

Editor & Moderator

Tnooz

Nick Vivion

Reporter & Global Events Lead

Tnooz

Speakers

Rami Essaid

Distil Networks

CEO & Co-founder

Rob Gennaro

Red Label Vacations

Digital Marketing Officer

Poll no. 1

Has your website been a victim to web scraping

and/or bad bot attacks in the last 12 months?

Poll no. 2

Do you currently have a bot defense

solution in place?

Agenda

The growing bot problem

Web scraping bots and your data

Is web scraping legal?

Red Label Vacations journey to clean traffic

Selection criteria for a bot detection solution

Distil Networks overview

How Big is the Problem?

Up to 38% of traffic on travel websites are Bad Bots

4.2 million IP addresses impacted by “Pushdo” botnet alone

15% bot traffic can equate to hitting each of your pricing pages

30 times per month

Why the Massive Increase in Bot Traffic?

Online data has increased in value

Pricing, incentive packages, flight routes, reviews,

images, star ratings, availability, hotel

names/descriptions, and editorial are changing

daily

Anyone can get in the game

Cheap or free virtual servers, bandwidth, easy-to-

use tools, and scrapers for hire

Bots no longer tied to IP addresses

Bots cycle through random IP addresses

Bots hide behind anonymous proxies

Consumer IPs now infected with bot traffic too

What Is Web Scraping?

Web Scraping

Also known as screen scraping, web scraping is the act of

copying large amounts of data from a website – either

manually or with an automated program (Bot)

Legitimate Scraping

Scraping can sometimes be benevolent and totally

acceptable. For example, the search engine bots that index

your website

Malicious Scraping

A systematic theft of intellectual property accessible on a

website, including pricing, content, images, and proprietary

data

What Are Scrapers Doing with Your Travel Site?

Posting your content on competitor

sites

Scrapers steal your traffic and advertising

dollars. Duplicative content and high bounce

rates diminishes your SEO

Undermining your prices

Bots monitor your prices, ensuring competitors

can undercut with lower price listings

Executing searches on your siteThe resulting API calls to third parties can cost

you

Bots Impact Your Website and Bottom Line

○ Cause brownouts

○ Undermine pricing strategies

○ Damage the human visitor’s experience

○ Negatively impact revenues

○ Waste advertising spend

○ Lower site quality score and hurt SEO

○ Hijack accounts

Bots Amplify Website Security Breaches

Is Web Scraping Legal?

Is the Legal Route Effective?

Hard to prosecute scrapersNo easy way to detect or identify stolen data in derivatives

Legal route is too expensiveTravel website’s legal bill for one “scraper” > $10M

Copyrights and terms of use don’t have teethEasy for thieves to assert plausible deniability

Big Brands Already at Risk

About Red Label Vacations

○ Largest independent travel company

in Canada

○ 19 brands

○ Deals on vacations, flights, cruises,

hotels and car rentals

Canada’s premier online travel agency service

offering cheap flights, airline tickets, last

minute vacation packages and discounted

cruises

Red Label Vacations Complex IT Infrastructure

Complex IT Infrastructure

○ Total of 19 different web properties

○ 5 servers for RedTag.ca

○ 5 servers for ITA (for flight technology)

○ API calls into ITA, Sabre, Softvoyage, Hotels.com, Cartrawler

○ Mixed web infrastructure environments (outsourced hosting, owned data centers)

○ Mix of web application stacks (e.g., .NET, PHP)

○ Akamai CDN

Red Label Vacations Bot Challenges

Bot Challenges

○ Homegrown, IP blocking system wasn’t working

○ Bots came in through proxies; IP addresses were spoofed

○ Bots caused brownouts

○ Brownouts caused immediate loss of revenue ($1000s)

○ Bots can hurt Google quality score and SEO

○ Akamai CDN was difficult to manage

Red Label Vacations Selection Criteria

Bot Detection and Mitigation Solution Requirements

○ Block web scrapers without impacting human visitors

○ Accurately identify good bots vs. bad bots

○ Increase website availability and speed

○ Detect automated browsing tools

○ Simple setup

○ Little or no maintenance, “self-optimizing” solution

○ Reduce costs and complexity of Akamai

Red Label Vacations Results with Distil

Improved Website Performance with Distil

○ Uptime went from 99.6 to to 99.9%

○ Faster load times; no errors

○ User time on site increased; bounce rate

decreased

○ Detailed reporting distinguishes human visitors

from malicious bots

Red Label Vacations Results with Distil

Monthly Cost Savings with Distil

○ 65% less expensive than Akamai

○ Reduced costs for third party API calls

○ Cost savings due to improved uptime

○ Eliminated tax on internal teams

Red Label Vacations Traffic Overview

Turing Tested, No False Positives

Visibility into Bot-Laden Advertising Networks

Selection Criteria: Purpose-Built Solution

Bot Detection is a New Category, NOT a Feature

○ NOT a Content Delivery Service (CDN)

○ NOT a Distributed Denial of Service (DDoS) protection solution

○ NOT a Web Application Firewall (WAF)

○ NOT a simple IP list or set of scripts

A purpose-built bot detection solution is

always updating and evolving

Selection Criteria: Complete Protection

Internal Teams Catch 20%

IP BLOCK

USER AGENT

TESTING

IP ANALYSIS

USER AGENT

TESTING

JAVASCRIPT

TESTCOOKIE

SELENIUM TEST

BROWSER RATE

LIMITING

AUTOMATED

BROWSER

PHANTOM JS

MACHINE

LEARNING

IP CYCLING

A Purpose-Built Solution Should Catch 99.9%

Selection Criteria: No Impact on Human Visitors

IP Based/WAF Purpose Built

Selection Criteria: Accuracy

Inline Fingerprinting

Fingerprints stick to the bot even if it attempts to

reconnect from random IP addresses or hide behind

an anonymous proxy

Known Violators Database

Real-time updates from a Known Violators Database,

which is based on the collective intelligence of all

protected sites

Behavioral Modeling and Machine

Learning

Machine-learning algorithms pinpoint behavioral

anomalies specific to your site’s unique traffic

patterns

Selection Criteria: Accuracy

Browser Automation Tool Detection

JavaScript Validation on the connection stream

identifies browser automation tools

Advanced Rate Limiting

Set rate limits such as pages per minute, pages per

session, and session length

“Good Bot” Authentication

Validate that good bot requests (Google, Bing, etc.)

map to the correct user agent and IP range

How Travel Companies Benefit from Distil

Increase insight & control

over human, good bot &

bad bot traffic

Block 99.9% of malicious

bots without impacting

legitimate users

Slash the high tax bots

place on internal teams

& web infrastructure

Protect data from web

scrapers, unauthorized

aggregators & hackers

www.distilnetworks.com/trial/

Promo Code: TLearn15

Offer Ends March 10th

One Month of Free Service + Traffic Analysis

Thank you!

Send your questions and comments to

[email protected]

Replay and presentation of webinar will be available on

www.tnooz.com