auditing algorithms : towards transparency in the age of

42
Auditing Algorithms : Towards Transparency in the Age of Big Data Christo Wilson Assistant Professor @ Northeastern University [email protected]

Upload: others

Post on 25-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Auditing Algorithms : Towards Transparency in the Age of

AuditingAlgorithms:TowardsTransparencyintheAgeofBigData

ChristoWilsonAssistantProfessor@[email protected]

Page 2: Auditing Algorithms : Towards Transparency in the Age of

PersonalizationontheWebSantaBarbara,California Amherst,Massachusetts

Page 3: Auditing Algorithms : Towards Transparency in the Age of

PersonalizationisUbiquitousSearchResults

GoodsandServices

Music,Movies,Media

SocialMedia

Page 4: Auditing Algorithms : Towards Transparency in the Age of

DangersofPersonalization?

Page 5: Auditing Algorithms : Towards Transparency in the Age of

RacialDiscriminationChrisWilson

LookingforChrisWilson?Ad

FindPeopleNearYou!www.yellowpages.com

TrevonJones

TrevonJones,Arrested?Ad

SearchCriminalRecords,SexOffenderRegistry,andMore.

www.instantcheckmate.com

RacialbiasinGoogle’sAdSensesystemuncoveredbyLatanya Sweeneyin2013

Exampleofunintendedconsequences ofbigdataPeopleexhibitracialbiasintheirsearchandclickspatternsThead-placementalgorithmobservedandlearnedthesebehaviors

Page 6: Auditing Algorithms : Towards Transparency in the Age of

PriceDiscriminationShowingusersdifferentpricesInecon:differentialpricing

Example:Amazonin2001DVDsweresoldfor$3-4moretosomeusers

Surprisingly,notillegalintheUSAnti-DiscriminationActdoesnotprotectconsumers

Article20(2)oftheServicesDirectiveprotectsEUresidentsButcompaniesseemtobeflauntingtheregulation:(

WebsitesVaryPrices,DealsBasedonUsers’Information

Page 7: Auditing Algorithms : Towards Transparency in the Age of

PriceSteeringAlteringtheorderorcompositionofproductsE.g.highpriceditemsrankhigherforsomepeople

Example:Orbitz in2012UsersreceivedhotelsinadifferentorderwhensearchingNormalusers:cheaphotelsfirst;Macusers:expensivehotelsfirst

OnOrbitz,MacUsersSteeredtoPricierHotels

Page 8: Auditing Algorithms : Towards Transparency in the Age of

AuditingAlgorithmsGovernmentsandregulatorsareconcernedaboutbigdataandalgorithmsWhiteHousereports:BigData:SeizingOpportunities,PreservingValuesBigDataandDifferentialPricing

FTC’snewOfficeofTechnologyResearchandInvestigationTaskedwithmonitoringtheapplicationsofbigdataandalgorithms

Howdowemeasureandunderstandalgorithms?Algorithmsmaybetradesecrets,constantlychangingAccesstosourcecodeisnotenough,dataisequallyimportant

Emergingscientificarea:AuditingAlgorithms

Page 9: Auditing Algorithms : Towards Transparency in the Age of

GoalsofOurWork

1. UnderstandinghowcompaniescollectandsharedataaboutusersOnlineandofflineretailersAdvertisersandmarketersDatabrokerslikeAcxiom,Datalogix,Equifax,Experian,etc…

2. Reverse-engineeringonlinealgorithmstoassesstheirimpactSearchenginesOnlineadvertisementsE-commerceSocialnetworksetc…

Page 10: Auditing Algorithms : Towards Transparency in the Age of

MeasuringPersonalizationCaseStudy:GoogleSearchCaseStudy:E-commerce

Page 11: Auditing Algorithms : Towards Transparency in the Age of

MeasuringPersonalizationCaseStudy:E-commerce

Page 12: Auditing Algorithms : Towards Transparency in the Age of

AreAllDifferencesPersonalization?

Product1Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Product2Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Product4Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis

Product3Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Product2Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Product1Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Product3Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis

Product4Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Compare

Notnecessarily! Itcouldbe:• Updatestoinventory/prices• Tax/Shippingdifferences• Distributedinfrastructure• Load-balancing

Howcanwereliablyidentifyandquantifypersonalization?

Personalization?

Page 13: Auditing Algorithms : Towards Transparency in the Age of

ControllingforNoise

129.10.115.14

129.10.115.15 74.125.225.67

Product 1Lorem ipsum dolor sit amet, consectetur adipiscing elit. In mollis

Product 2Lorem ipsum dolor sit amet, consectetur adipiscing elit. In mollis

Queriesrunatthesametime

SameAmazonIPaddress

129.10.115.16

Product 2Lorem ipsum dolor sit amet, consectetur adipiscing elit. In mollis

Noise

Difference – Noise = Personalization

IPaddressesinthesame/24

Page 14: Auditing Algorithms : Towards Transparency in the Age of

DualMethodology

REALUSERACCOUNTS

Leveragerealuseraccountswithlotsofhistory

Measurepersonalizationinreallife

SYNTHETICUSERACCOUNTS

Createaccountsthateachvarybyonefeature

Measuretheimpactofspecificfeatures

Questionswewanttoanswer:1. Towhatextentiscontentpersonalized?2. Whatuserfeaturesdrivepersonalization?

Page 15: Auditing Algorithms : Towards Transparency in the Age of

RealUserExperiment

TaskonAmazonMechanicalTurk(AMT)Over1000sofparticipantsEachexecutedhundredsofsearchqueriesEveryquerypairedwithtwocontrolqueriesRunfromemptyaccounts,i.e.nohistoryBaselineresultsforcomparison

HTTPProxy

UserQuery

UserQueryControlQuery

ControlQuery

Page 16: Auditing Algorithms : Towards Transparency in the Age of

MeasuringPersonalizationCaseStudy:GoogleSearchCaseStudy:E-commerce

Page 17: Auditing Algorithms : Towards Transparency in the Age of

ResultsfromRealUsers

0

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5 6 7 8 9 10

ResultsChanged(%

)

SearchResultRank

Control/Control

RealUser/Control Differencebetweenresultsispersonalization

Topranksarelesspersonalized

Lowerranksaremorepersonalized

• Onaverage,realusershavea12%higherchanceofdifferingthanthecontrols• Mostchangesareduetolocation

Page 18: Auditing Algorithms : Towards Transparency in the Age of

WhatCausesofPersonalization?

HistoricalFeatures• LoggedIn/Out• HistoryofSearches• HistoryofSearchResultClicks• BrowsingHistory

AMTresultsrevealextensivepersonalizationNextquestion:whatuserfeaturesdrivethis?

StaticFeatures• Gender• Age• Browser• OperatingSystem• Location(IPAddress)• LoggedIn/Out

Methodology:usesynthetic(fake)accounts

Page 19: Auditing Algorithms : Towards Transparency in the Age of

LoggedIn/OuttoGoogle

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7

Ave

rage

Jac

card

Inde

x

Day

No Cookies / No Cookies

Logged In / No Cookies

Logged Out / No Cookies

0

1

2

3

4

5

1 2 3 4 5 6 7A

vera

ge E

dit D

ista

nce

Day

Sameresults…Butina

differentorder

Page 20: Auditing Algorithms : Towards Transparency in the Age of

IPAddressGeolocation

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7

Jacc

ard

Inde

x

Days

MA / MACA / MAUT / MAIL / MANC / MA

0

1

2

3

4

5

1 2 3 4 5 6 7

Ave

rage

Edi

t Dis

tanc

eDay

Onaverage,1differentresult

…Plus1pairofreorderedresults

Page 21: Auditing Algorithms : Towards Transparency in the Age of

WhatAboutSearchHistory?Searchfor‘healthcare’ Searchfor‘obama,’ then‘healthcare’

Subsequentqueriesmay“carry-over”

Page 22: Auditing Algorithms : Towards Transparency in the Age of

ImpactofSearchHistory

00.10.20.30.40.50.60.70.80.91

0 2.5 5 7.5 10 12.5 15 17.5 20

AverageJaccardIndex

TimeBetweenQueries(Minutes)

OverlapinResults,Searchingfor‘healthcare’and‘obama’+‘healthcare’

10minutecutoff

Page 23: Auditing Algorithms : Towards Transparency in the Age of

MeasuringPersonalizationCaseStudy:GoogleSearchCaseStudy:E-commerce

Page 24: Auditing Algorithms : Towards Transparency in the Age of

MeasuringPersonalizationCaseStudy:E-commerce

Page 25: Auditing Algorithms : Towards Transparency in the Age of

TargetedRetailers10Generalretailers

BestBuyCDWHomeDepot JCPenney Macy’sNewEgg OfficeDepot SearsStaplesWalmart

Focusonproductsreturnedbysearches,20searchterms/site

6travelsites(hotels&carrental)CheapTickets Expedia Hotels.comPricelineOrbitz Travelocity

Page 26: Auditing Algorithms : Towards Transparency in the Age of

DoUsersSeetheSamePricesfortheSameProducts?

Manysitesshowinconsistencies forrealusersUpto3.6%ofallproducts

Retailers Hotels RentalCars

%ofP

roducts

InconsistentPrices

Page 27: Auditing Algorithms : Towards Transparency in the Age of

0

200

400

600

800

1000Differencein$

95th

75th

mean

25th

5th

HowMuchMoneyAreWeTalkingAbout?

Inconsistenciescanbe$100s!(perday/nightforhotels/cars)

Retailers Hotels RentalCars

Page 28: Auditing Algorithms : Towards Transparency in the Age of

WhatFeaturesTriggerPersonalization?Methodology:usesynthetic(fake)accountsGivethemdifferentfeatures,lookforpersonalizationEachdayfor1month,runstandardsetofsearches

Category Feature TestedFeatures

Account Cookie NoAccount,LoggedIn,NoCookies

User-AgentOS WinXP,Win7,OSX,Linux

BrowserChrome33,AndroidChrome34,IE8,Firefox25,Safari7,iOSSafari6

HistoryClick BigSpender,LowSpender

Purchase BigSpender,LowSpender

Page 29: Auditing Algorithms : Towards Transparency in the Age of

HomeDepotSmartphoneusersseetotallydifferent

productsthandesktopusers

7%ofproductshavedifferentpricesonAndroid

…butthepricesonlygoupby$0.50onaverage

Page 30: Auditing Algorithms : Towards Transparency in the Age of

TravelSitesCheaptickets andOrbitz offerlowerpricesonhotelsforuserswholog-intothesites1hotelperpage,$12offpernightonaverage

Travelocityoffersdiscountsonhotelsforusersonmobiledevices1hotelperpage,$15offpernightonaverage

Pricelinechangestheorderofsearchresultsbasedonclickandpurchasehistory

Exampleofpricesteering• 2accountsclick/reservehighpricehotels• 2accountsclick/reservelowpricehotels• 2accountsdonothing

Page 31: Auditing Algorithms : Towards Transparency in the Age of

Cheaptickets/Orbitz

Page 32: Auditing Algorithms : Towards Transparency in the Age of

Cheaptickets/OrbitzCheaptickets andOrbitz offerlowerpricesonhotelsforuserswholog-intothesites

About1hotelperpagehasalowerprice

Pricesdropbyaround$12pernight

Avg.PriceDifference($)

Page 33: Auditing Algorithms : Towards Transparency in the Age of

Travelocity

iOSusersseedifferenthotels

About1hotelperpagehasalowerprice

Pricedropsbyaround$15/night

Travelocityoffersdiscountsonhotelsforusersonmobiledevices

Page 34: Auditing Algorithms : Towards Transparency in the Age of

PricelinePricelinechangestheorderofsearchresultsbasedonclickandpurchasehistory

• 2accountsclick/reservehighpricehotels• 2accountsclick/reservelowpricehotels• 2accountsdonothing

Page 35: Auditing Algorithms : Towards Transparency in the Age of

Hotels.com/ExpediaHotelsandExpediaareconductinglarge-scaleA/BtestsontheirusersWhenyouvisitthesite,youarerandomly placedina“bucket”2outof3bucketsseehigh-pricehotelsatthetopofsearchresultsTheremainingbucketseeslow-pricehotelsatthetopofthepage

ExemplifiespricesteeringTheonlywaytoseethehiddenhotelresultsistoclearyourcookiesandreloadthesite

Page 36: Auditing Algorithms : Towards Transparency in the Age of

ConclusionsandFutureWork

Page 37: Auditing Algorithms : Towards Transparency in the Age of

TheEraofBigDataAlgorithmsdrivenbybigdatashapeyourworldSearchresultsyouaregivenPricesandproductsyouareshownMovie,music,andbookrecommendationsThedirectionsyouusetodrive

Inmanycases,thesesystemsarewonderful

Inothercases,theymaybedetrimentalUnintendedconsequencesIntentionalmanipulation

EligibilityforsocialservicesAccesstocreditandbankingAllocationofpoliceforces

Page 38: Auditing Algorithms : Towards Transparency in the Age of

OurGoal:TransparencyPersonalizationisproblematicwhenitisnottransparentHowisdatabeingcollectedandshared?Howisdatabeingusedtoaltercontent?

Usealgorithmauditstoinvestigatedeployedsystems,assesstheirimpact

OurgoalistoincreasetransparencyBuilding toolstohelpusersandregulatorsReverse-engineeringsystemstounderstandhowtheyworkRaisingpublicawarenessoftheseissues

Page 39: Auditing Algorithms : Towards Transparency in the Age of

PeekingBeneaththeHoodofUber

Page 40: Auditing Algorithms : Towards Transparency in the Age of

BordersonGoogleMaps

Page 41: Auditing Algorithms : Towards Transparency in the Age of

DiscriminationintheGig-economy

Page 42: Auditing Algorithms : Towards Transparency in the Age of

Allofourcode,data,andpapersareavailableat:

http://personalization.ccs.neu.edu