turning the tide: curbing deceptive yelp behaviors...malicious behaviors: fraudulent reviews qsocial...
TRANSCRIPT
TurningtheTide:CurbingDeceptiveYelpBehaviors
June2018@UCAS
Dongwon Lee
PennState/IST
SIAMSDM2014
Reviews
ReviewCentricSocialNetworks
2
User
Socialnetworks
MaliciousBehaviors:FraudulentReviews
q Socialnetworks:idealtargetsformaliciousbehaviorsq Upto25%ofYelpreviewsarefraudulent[1]q YELP:
q Extrahalf-starratingcausesarestauranttosellout19%moreoften[2]
q One-starincreaseleadstoa5–9%increaseinrevenue[3]
[1]Yelpadmitsaquarterofsubmittedreviewscouldbefake.BBC,www.bbc.co.uk/news/technology-24299742[2]MichaelAndersonandJeremyMagruder.Learningfromthecrowd:Regressiondiscontinuityestimatesoftheeffectsofanonlinereviewdatabase.EconomicJournal,122(563):957–989,2012.[3]MichaelLuca.Reviews,Reputation,andRevenue:TheCaseofYelp.com.Availableathbswk.hbs.edu/item/6833.html.
3
MaliciousBehaviors:ReviewCampaigns
q ReviewCampaign:postmultiplefraudulentreviews
q Example:SearchEngineOptimization(SEO)companies[1]q UseIPspoofingtechniquesq Setupfakeonlineprofilesq TargetYelp,GoogleLocal,CitySearchq Investigatedbylawenforcement
q Deceptivevenue:usesreviewcampaignstoalterrating
[1] A.G.Schneiderman AnnouncesAgreementWith19CompaniesToStopWritingFakeOnlineReviewsAndPayMoreThan$350,000InFines.Availableat:http://www.ag.ny.gov/press-release/ag-schneiderman-announces-agreement-19-companies-stop-writing-fake-online-reviews-and 4
FeasibilityStudy
q Createdfakevenuesq PostedreviewjobsonAmazonMechanicalTurk
q Receivedmorethan90(fake)reviews
…forafistfulofdollars
5
ProblemStatement
q Detectmaliciousbehaviorsinreviewcentricsocialnetworksq Fraudulentreviewsq Deceptivevenuesq Impactfulreviewcampaigns
6
AdversaryModel
q Needstoadjusttheratingoftargetvenueq Hasfinitebudgetq Controlsfinitesetof(IPaddress,YelpSybilaccount)pairs
q Hasaccesstoamarketofreviewwriters
q Socialnetworkproviderdoesnotcolludewithattackers
7
MarcoSystemOverview
8
1. Friend & review count2. Venue “expertise”3. Venue activities4. …
FRI ModuleRSD Module
Venue timeline
ARD Module
Review ratings
Venue ClassifierDeceptive & legitimate venues Features
7,435 venues195,417 users270,121 reviews
Train
Train Label
Fraudulent & genuine reviews
Review Classifier
Users Venues
Friendrelations Spatial
vicinity
Reviews/Time
SuccessfulReviewCampaign
q Increases(decreases)theratingofthetargetvenuebyatleasthalfastar
q Claim:Theminimumnumberofreviewstheadversaryneedstopostinordertofraudulentlyincreasetheratingofavenuebyhalfastarisn/7q n:thenumberofgenuinereviewsavenuehasatthe
completionofthecampaign
9
10
ReviewSpikes
Theorem:Ifn>49,asuccessfulreviewcampaignwillexceed,duringtheattackinterval,themaximumnumberofreviewsofauniformreviewdistribution
ReviewSpikeDetection(RSD)
q Identifyvenuesthatreceivehighernumberofpositive(negative)reviewsthannormal
q UsethemeasuresofdispersionofBox-and-Whiskerplots todetectoutliers
q Twofeaturesq Numberofspikesdetectedforavenueq Normalizedamplitudeofthehighestspike
11
MarcoSystemOverview
12
1. Friend & review count2. Venue “expertise”3. Venue activities4. …
FRI ModuleRSD Module
Venue timeline
ARD Module
Review ratings
Venue ClassifierDeceptive & legitimate venues Features
7,435 venues195,417 users270,121 reviews
Train
Train Label
Fraudulent & genuine reviews
Review Classifier
Users Venues
Friendrelations Spatial
vicinity
Reviews/Time
AggregateRatingDisparity(ARD)Module
q ARDModule: Measurethereviewdivergenceq N:totalnumberofreviewsofvenueV
ARD(V)=∑ |#$%&$')*+&,-.*%-)*+&,-/01|23
4
13
MarcoSystemOverview
14
1. Friend & review count2. Venue “expertise”3. Venue activities4. …
FRI ModuleRSD Module
Venue timeline Review ratings
Venue ClassifierDeceptive & legitimate venues Features 7,435 venues
195,417 users270,121 reviews
Train
Train Label
Fraudulent & genuine reviews
Review Classifier
Users Venues
Friendrelations Spatial
vicinity
Reviews/Time
ARD Module
FraudulentReviewImpact(FRI)Module
q Venueswithfewgenuinereviewsq Vulnerabletoreviewcampaignsq Longtermcampaignscanre-definethe“normal”
reviewpostingbehavior
q FRIModule: detectfraudulentreviewsthatsignificantlyimpacttheaggregateratingofvenues
15
FRIModule(Cont’d)
Featurestoclassifyreview(fraudulentvs.genuine):q Reviewwriter
q Numberoffriendsq Numberofreviewswrittenq Expertiseofuseraroundvenueq Numberofcheck-insatvenueq Numberofphotosatvenueq Ageofuser’saccountwhenreviewwaspostedq Feedbackcountofreview
q FRIFeature:q Percentageofreviewsclassifiedasfraudulent
16
ReviewData
q Goldstandardfraudulentreviewsq Spelp (spamYelp)sitesq Suspicioususeraccountsq Genericreviewtext
q Goldstandardgenuinereviewsq Writtenbyactive,popularusersq Noshort,genericreviews
q 200fraudulentand202genuinereviews
17
ReviewClassification
18
Overallaccuracy:RF[94%],Bagging[93.5%],DT[93%]
MarcoSystemOverview
19
1. Friend & review count2. Venue “expertise”3. Venue activities4. …
FRI ModuleRSD Module
Venue timeline Review ratings
Venue Classifier
Deceptive & legitimate venues Features
7,435 venues195,417 users270,121 reviews
Train
Train Label
Fraudulent & genuine reviews
Review Classifier
Users Venues
Friendrelations Spatial
vicinity
Reviews/Time
ARD Module
VenueClassificationFeatures
Featurestoclassifyvenues:q Numberofreviewspikesforvenueq Amplitudeofthehighestspike
q Aggregateratingdisparity
q Fraudulentreviewimpactofvenueq Countofreviewsclassifiedfraudulent
q Ratingofthevenueq Numberofreviews(withcheck-ins&photos)q Ageofvenue
20
VenueData
21
q Deceptivevenue:fraudulentreviewsimpactitsrating
q Groundtruth:Yelp’s“ConsumerAlert”feature
VenueData(cont’d)
22
q Goldstandardlegitimatevenuesq Wellknownconsistentqualityq Atmost10%ofreviewsarefilteredbyYelp
q 90deceptiveand100legitimatevenues
VenueClassification
RFandDTaretiedforbestaccuracy,95.8%.23
Comparisonwithstate-of-the-art
CompareMarcowiththethreedeceptivevenuedetectionstrategiesofFengetal.[1],avg∆,distΦandpeak↑
Strategy Accuracy(%)Marco/RF 95.8avg∆ 66.3distΦ 72.1peak↑ 58.9
24
Marco’sOverhead
Per-moduleoverhead Zoom-inofFRImoduleoverhead
25
MarcointheWild:YelpData
q YCrawl: developedcrawlertofetchrawHTMLpagesofYelpvenueanduseraccounts
q Collected:q 7,435venuesfromSanFrancisco,NewYorkCityandMiamiq Carshops,Spas,Movingcompanies
q 270,121reviewsq 195,417reviewers
26
ExperimentalResultsonLiveData
City CarShop Mover Spa
Miami,FL 1000(6) 348(8) 1000(21)
SanFrancisco,CA 612(59) 475(45) 1000(42)
NYC,NY 1000(8) 1000(27) 1000(28)
DetecteddeceptivevenuesbyMarcooutofcollectedvenuesinYelp
27
SanFrancisco:Marcoflagsalmost10%ofcarrepairandmovingcompaniesassuspicious
Conclusions
28
q Lowerboundonthenumberofreviewsrequiredtolaunchsuccessful reviewcampaign
q Marco:automaticdetectionoffraudulentreviews,deceptivevenuesandimpactfulreviewcampaigns
q Noveldatasetofreviewsandvenues
q Marcoiseffectiveandfast