epl682 advanced security topics - ucy · 2019. 3. 28. · antreas dionysiou 3 id: 945278 in 3rd...

11
EPL682 Advanced Security Topics Antreas Dionysiou ID: 945278 Summary for papers: 1) Motoyama, M., Levchenko, K., Kanich, C., McCoy, D., Voelker, G. M., & Savage, S. (2010, August). Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context. In USENIX Security Symposium (Vol. 10, p. 3). 2) Sivakorn, S., Polakis, I., & Keromytis, A. D. (2016, March). I am robot:(deep) learning to break semantic image captchas. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on (pp. 388-403). IEEE.

Upload: others

Post on 09-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EPL682 Advanced Security Topics - UCY · 2019. 3. 28. · Antreas Dionysiou 3 ID: 945278 In 3rd section, authors discuss automated CAPTCHA solvers examining 2 back-then state-of-

EPL682AdvancedSecurityTopics

AntreasDionysiouID:945278

Summaryforpapers:

1) Motoyama,M.,Levchenko,K.,Kanich,C.,McCoy,D.,Voelker,G.M.,&Savage,S.(2010,August).Re:CAPTCHAs-UnderstandingCAPTCHA-SolvingServicesinanEconomicContext.InUSENIXSecuritySymposium(Vol.10,p.3).

2) Sivakorn,S.,Polakis,I.,&Keromytis,A.D.(2016,March).Iamrobot:(deep)learningtobreaksemanticimagecaptchas.InSecurityandPrivacy(EuroS&P),2016IEEEEuropeanSymposiumon(pp.388-403).IEEE.

Page 2: EPL682 Advanced Security Topics - UCY · 2019. 3. 28. · Antreas Dionysiou 3 ID: 945278 In 3rd section, authors discuss automated CAPTCHA solvers examining 2 back-then state-of-

AntreasDionysiou ID:9452782

Understanding what Completely Automated Public Turing test to tell Computers andHumansApart(CAPTCHA)is:ReverseTuringTests,or “CompletelyAutomatedPublicTuring test to tellComputersandHumansApart”(CAPTCHA),wereproposedin2003byVonetal.,inordertotellifauserisahumanornot. For aCAPTCHA tobeeffective, itmust resist automatedCAPTCHA solvingsoftware techniques, yet it must be painless to be solved by humans also with highprobability.Text-basedCAPTCHAsisthemostwidelyusedCAPTCHAschemethatmakesuseofcombinationsofdistortedcharactersandobfuscationtechniquesthathumanscaneasilyrecognizebutthatmaybedifficultforautomatedscripts.Nowadays,CAPTCHAshavebecomeawidelydeployeddefensemechanismtoprotectwebservicesfrombeingexploitedatscalefromautomatedbotsystems(ex.Accountregistration,commentposting,etc.).Paper:“Re:CAPTCHAs-UnderstandingCAPTCHA-SolvingServicesinanEconomicContext.”The authors start by giving a brief explanation on CAPTCHAs. Then, they explain that aCAPTCHAsolvingecosystemhasemerged,offeringtwotypesofsolvingmethodologies:(a)automated CAPTCHA solvers (software), and (b) real-time human labor, to bypass theseprotections. For these reasons, CAPTCHAs can be evaluated in economic terms (i.e. themarket price of a CAPTCHA solving solution vs the monetary value of the asset beingprotectedbyCAPTCHAs).Questionsoninternetsecurityoftencreateeconomicopportunitiesforexploitation (ex. Internetadvertisingrevenuewhereuserseffectively“paying” for freeservicesindirectlythroughtheirexposuretoadcontent).Unlike similar securityproblems thatusually favor attacker, theCAPTCHAunderlying coststructurebenefitsdefender (due toCAPTCHAsolving cost–asset revenue trade-off). Thecheapinternetaccessaswellasthecommodityoftoday’sCAPTCHAschemeshasglobalizedthemarket,dramaticallydecreasingthecost(e.g.recruitingworkersfromlowest-costlabormarkets).Nowadays,thereareplentyofCAPTCHAsolvingserviceswithverylowprices(ex.$1 per 1000 CAPTCHAs). CAPTCHAs’ security, traditionally viewed as a technologicalimpediment to an attacker, shouldmore precisely be regarded as an economic one. TheauthorsmentionthatalthoughmanyCAPTCHAsolvingservicesexist,whichtellsusthevalueofassociatedprotectedassets,theoverallshapeofmarketispoorlyunderstood,andthus,wecannotreasonaboutCAPTCHAs’securityvalue.Inthispaper,theunderlyingmarketshapeisexaminedarguingaboutCAPTCHAs’securityvalue.Theauthorsdocumenttheevolutionofautomatedsolvingtoolsandhowtheyhavebeeneclipsedbytheemergenceofhuman-basedsolvingmarket.Toshowthelater,theyengagetheretailCAPTCHA-solvingmarketasabotha client and aworker. The authors also interviewed a CAPTCHA-solving service owner toprovidevalidationandinsights(Mr.E).Theirresearchtriedtoanswerkeyquestionslike:(a)whichCAPTCHAsaremostlytargeted,(b)theroughsolvingcapacityofthemarketleaders,(c)qualityofservices,(d)pricingofservices,(e)demographicsofworkforce,and(f)servicesadaptability to changes in CAPTCHA schemes. Overall, this research provides a reasoningaboutnetvalueofCAPTCHAsunderexistingthreats.In 2nd section, authors give a brief background about CAPTCHAs, mentioning that mostcommonly found CAPTCHAs are text-based. The CAPTCHA designing, reflects a trade-offbetween protection and usability. The authors explain why, automated solving has beenrelegatedtoanichestatus,duetoeconomicreasons.Furthermore,theyclearlystatethatahuman-basedlaborhasbeencommoditizedbyabroadrangeofCAPTCHA-solvingproviders,andthattheyalsoarethe1sttoidentifythegrowthofthisactivity.

Page 3: EPL682 Advanced Security Topics - UCY · 2019. 3. 28. · Antreas Dionysiou 3 ID: 945278 In 3rd section, authors discuss automated CAPTCHA solvers examining 2 back-then state-of-

AntreasDionysiou ID:9452783

In3rdsection,authorsdiscussautomatedCAPTCHAsolversexamining2back-thenstate-of-the-arttools,namelyXrumerandreCaptchaOCR.TheyunderlinetheexistenceofaCAPTCHA-solving vs CAPTCHA-design-improving war between CATPCHA solving services and webservicesthatdeployCAPTCHAsasasecuritypolicy.Astheymentionintheirpaper,automatedsolvershaveclearadvantages,includingnear-zeromarginalcostandnear-infinitecapacity,and they usually combine segmentation and recognition algorithms for extracting thecharacters from a distorted CAPTCHA image. They argue though, that developing suchautomated solvers is extremely complex and often such systems fail to replicate humanaccuracy.Movingforward,theauthorsexaminethetoolXrumerbyevaluatingallitsaspectsregardingcost,performance,CAPTCHAschemesthatthetoolwasabletobreak,aswellastheevolutionofthetoolafterCAPTCHAsdesigningimprovements.AfterimprovingCAPTCHAdesignschemes,Xrumercouldn’tsolveanyCAPTCHAscheme(exceptSMF),andasaresultAugust2009’sversionaddedintegrationforhuman-basedCAPTCHAsolvingservices.Xrumerworkedasahybridmodeldeployingautomatedsolverswherepossibleandhuman-basedsolvers otherwise. Xrumer’s policy was to only include highly efficient and accurateautomatedsolverstosolve“weaker”CAPTCHAs,whereasreCaptchaOCRfocusedsingularlyon the popular CAPTCHA schemes (e.g.Microsoft, Yahoo, Google, etc.). After testing thereCaptchaOCRtoolon100randomlyselectedCAPTCHAsfor2008and2009,theyobserved30%and18%accuracyrespectively,whicharefarlowerthantheaveragehumanaccuracy(75-90% in their experiments). They continue by saying that the technical perspective ofCAPTCHAsdoesn’tcapturethebusinessrealitiesofCAPTCHA-solvingecosystem.Astheysay,theeconomicsofautomatedsolversdependonthesefactors:(a)thecostofdevelopingnewsolvers,(B)theaccuracyofthesolvers,(c)theresponsivenessofsiteswhoseCAPTCHAsarebeingattacked.Moreover,automatedsolversachievinglowsuccessrateslimitboththeutilityandusefulnessofasolver.Foranautomatesolvertobeprofitableitmustcostlessthanthevaluethatoffersinitslifetime,alsohavinglesscostthanotheralternativeslikehuman-basedsolvingservices.Theauthorsclosetheirevaluationonautomatedsolversbymentioningthatforallthesereasonshuman-basedsolvingdominatesthecommercialmarketforservice.In4thsection,authorsreviewtheevolutionofhuman-basedlabormarket,itsbasiceconomicsand the underlying ethical issues. They examined two outsourcing models namelyopportunisticsolvingandpaidsolving.ThefirstmodelreliesonconvincinganindividualtosolveaCAPTCHAaspartofsomeotherunrelatedtask,alsomentioningthatthismodeldoesnotplaymajorroleinthemarket.Theirfocusisonpaidsolvingmodel,whichtheybelievethatrepresentsthecoreofCAPTCHA-solvingecosystem.ThismodelisheavilybasedonthepremisethatthereareworkerswhoarewillingtosolveCAPTCHAsforlessmoneythanthesolutions worth to the client. They mention that many retail CAPTCHA-solving servicesbasicallyaggregate:(a)thedemandforCAPTCHA-solvingservicesviawebsitesandopenAPIs,and(b)thedemandforCAPTCHA-solvinghumanlaborbyrecruitingindividuals(throughweb)thatsolveCAPTCHAs inexchangeformoney.Next,theauthorsexaminetheeconomicsofCAPTCHA-solving market. They say that the market for CAPTCHA-solving services hasexpanded, although thatwagesofworkers solvingCAPTCHAshavebeendecliningdue tothesereasons:(a)CAPTCHAsolvingisanunskilledjob,(b)itcaneasilybesourcedviainternettothelowestcostlabor,andthereisan(c)increasedcompetitionontheretailside.Manyofthe retailers have tied their services to third-party products to protect CAPTCHA solvingprices.Astheauthorsexplain,althoughitschallengingtomeasureprofitabilitydirectly,askingMr.Eabouthisbusinessprofit(middlepricespectrum),hesaidthat50%ofrevenueisprofit,roughly10%isforserversandbandwidth,andtheremainderissplitbetweensolvinglabor

Page 4: EPL682 Advanced Security Topics - UCY · 2019. 3. 28. · Antreas Dionysiou 3 ID: 945278 In 3rd section, authors discuss automated CAPTCHA solvers examining 2 back-then state-of-

AntreasDionysiou ID:9452784

andincentivesforpartners.Theremainderofthepaperisfocusedonactivemeasurementofsuch services, both by paying for solutions and by participating as a labor worker. Theymention that in theU.S. there are several bodies of law thatmay impinge on CAPTCHA-solving. Firstly, evading security CAPTCHAs (or other security mechanisms) exceeds theauthorizationgrantedbythesiteowner, inpotentialviolationoftheComputerFraudandAbuseAct.Furthermore,whileCAPTCHAsolversproviderealuseoutsidecircumventionofcopyrightcontrolsit’snotclearthatsuchadefenseissufficienttoprotectinfringers.Ethicalconcernsalsoexist,whereonecandoharmwithoutsuchactionsbeingillegal.Theauthorscomparetheconsequencesoftheirinterventiontoanalternativeworldinwhichtheytookno action, and finally they evaluate the outcome for its cost-benefit trade-off. On thepurchasingside,thereisonlyaminorindirectimpactthatwasoutweighedbythebenefitsthatcomefrombetterunderstandingthenatureofthethreat.Onthesolvingside,theethicalquestionsarebiggersincethesolutionstoCAPTCHAswillbeusedtocircumventthesitestheyareassociatedwith.TheychosetoletworkerssolvetheCAPTCHAsastheywouldhavesolvedit anyway, to prevent their activities from impacting the gross outcome. Finally, afterconsultingwiththeirhumansubject’sliaisononthisworktheyhavebeentoldthattheirstudydidn’trequireanyapproval.

Figure 1: CAPTCHA-solving market workflow. 1) GYC automator tries to register a Gmail account and ischallengedwithaCAPTCHA,2)GYCusesDeCaptcherplug-intosolveCAPTCHA,3)DeCaptcherqueuesCAPTCHAforaworkerontheaffiliatedPixProfitback-end,4)PixProfitselectsahumanworkertosolvetheCAPTCHA,5)TheworkerentersasolutiontoPixProfit,which6)returnsittotheplug-in,andfinally7)GYCentersthesolutionfortheCAPTCHAtoGmailtoregistertheaccount.In5thsection,authorspresent theiranalysisofspecificCAPTCHA-solvingservices (serviceswhichwerewell-advertisedat the time), evaluating several aspects suchas: (a) customerinterface,(b)solutionaccuracy,(c)responsetime,(d)availability,and(e)capacity.Foreachservice,acustomeraccountrequiredinordertousethem,aswellasprepaymentinunitsdefined by their price schedule (usually 1000 CAPTCHAs is the smallest package). MostservicesprovideAPIpackagesforinteractingwiththem(uploadingCAPTCHAsandreceivingresults),usingtwomethods.Forthe1stmethod,theclientperformsHTTPPOSTrequestthatuploadstheimagetotheservice,waitsfortheCAPTCHAtobesolved,andreceivestheanswerinHTTPresponse.Forthe2ndmethod,theclientperformsHTTPPOSTrequesttouploadtheimage, receivesan imageID in response,and subsequentlypolls the site for theCAPTCHAsolutionusingtheimageID.Regardingservicespricing,theauthorsmentionthatmanyoftheservicesofferbiddingsystemswherecustomeroffersapaymentforgaininghigherpriorityaccesstosolverswhenloadishigh,andtheysaythattheybelievethatthisoverageispureprofit totheserviceproviderastheyhavenotseenpricefluctuationsontheworkerside.Authors, evaluated 8 CAPTCHA-solving services as customers for 5 months collectingCAPTCHAs(7,500persiteexcept12,000forYahoo)bypopularwebsites(theyalsomentionthe steps taken to select25mostpopular sites). Inorder toassess theaccuracy foreachservice,theyneededtodeterminethecorrectsolutionforeachCAPTCHAintheircorpus.To

Page 5: EPL682 Advanced Security Topics - UCY · 2019. 3. 28. · Antreas Dionysiou 3 ID: 945278 In 3rd section, authors discuss automated CAPTCHA solvers examining 2 back-then state-of-

AntreasDionysiou ID:9452785

do that, they used the services themselves, taking as correct solution themost frequentsolutionreturned.Toassessthequalityofeachservice(accuracy,responsetime,andserviceavailability),theysubmittedasingleCAPTCHAevery5minutestoallservicessimultaneously,recordingsubmittingandresponsereceivingtimes.ThetophalfofFigure2showsserviceaccuracy(intermsoferrorrate)foreachCAPTCHA.Theareaofeachcircleisproportionaltoaservice’smeanerrorrateonaparticularCAPTCHAtype.ACAPTCHAsolutionisusefulonlyif it’s correct. They state that accuracy clearly depends on the type of CAPTCHA, and isgenerallyconsistentacrosstheservices(ex.allservicesgivepooraccuracyonYoukuandgoodaccuracyonPayPal).TheynotethatworkerfamiliaritywithaCAPTCHAtypeaffectssolutionaccuracy(aswellasresponsetime).

Figure2:ErrorrateandmedianresponsetimeforeachcombinationofserviceandCAPTCHAtype.Theareaofeachcircleuppertableisproportionaltotheerrorrate(amongsolvedCAPTCHAs).Inthelowertable,circleareaisproportionaltotheresponsetimeminustenseconds(forincreasedcontrast);negativevaluesaredenotedbyunshadedcircles.Numericvaluescorrespondingtothevaluesintheleftmostandrightmostcolumnsareshownontheside.Thus,theerrorrateofBypassCaptchaonYoukuCAPTCHAsis66%,andforBeatCaptchasonPayPal4%.ThemedianresponsetimeofCaptchaGatewayonYoukuis21seconds,and8secondsforAntigateonPayPal.

Figure3:Medianerrorrateandresponsetime(inseconds)forallCAPTCHAs.CAPTCHAsarerankedtop-to-bottominorderofincreasingerrorrate.

Figure4:Medianerrorrateandresponsetime(inseconds)forallservices.Servicesarerankedtop-to-

bottominorderofincreasingerrorrate.

Page 6: EPL682 Advanced Security Topics - UCY · 2019. 3. 28. · Antreas Dionysiou 3 ID: 945278 In 3rd section, authors discuss automated CAPTCHA solvers examining 2 back-then state-of-

AntreasDionysiou ID:9452786

Inadditiontoaccuracy,customerswantservicesthatsolveCAPTCHAsquickly.Figure3showsthemedianerrorrateandresponsetimeforallCAPTCHAschemes.Figure4showsmedianresponse times for each service,with amedian response across of 14 seconds across allservices.Theauthorsclaimthatservicesdifferconsiderably in therelativeresponsetimestheyprovidetotheircustomers.AntigateandImageToTextprovidedthefastestservicewithmedianresponsetimesof9.6secondsand9.4seconds,respectively,with90%ofCAPTCHAssolvedunder25seconds.TheyfoundthataccuracyvariedwiththetypeofCAPTCHA,andalsosaw some variation in response time among different CAPTCHA types. However, as theyclearlystate,thevariationinresponsetimesamongtheservicesdominatesthevariationduetoCAPTCHAtype.Moreover,theymentionthatthevalueofaparticularsolvertoacustomerdependsuponthecombinationofthesefactors:(a)accuracy,(b)responsetime,and(c)price.Anotherpointofdifferentiationissolvercapacity,namelyhowmanyCAPTCHAsaservicecansolveinagivenunitoftime.Theymeasuredthenumberandrateofsolutionsreturnedinresponse toagivenoffered load, substantially increasing the load in incrementsuntil theserviceappearedoverloaded(5servicestested).BothDeCaptcherandCaptchaBotwereableto sustain a rate of about 14–15 CAPTCHAs per second, with BeatCaptchas andBypassCaptchassustainingasolverateofeightandfourCAPTCHAspersecond,respectively.Customerscanpollthetransientloadontheservicesandofferpaymentoverthemarketrateinexchangeforhigherpriorityaccesswhenloadishigh.Authors,suggestthathigherbidsmaybenecessarytoachieveadesiredlevelofserviceattimesofhighload.In6thsection,authorssaythathumanCAPTCHAsolvingservicesareeffectivelyaggregatorsthatononehand,theyaggregatedemandbyprovidingasingularpointforpurchasingsolvingservices,andontheotherhand,theyaggregatethelaborsupplybyprovidingasingularpointthroughwhichworkerscandependonbeingofferedconsistentCAPTCHAsolvingworkforhire.Theymentionthataworkeraccount is requiredtobecreatedandtheyexplainhowworker interface looks like. Next, they try to summarize worker wages focused on twoservicesnamelyKolotibabloandPixProfit.Kolotibablopaysworkersatavariablerate(from$0.50/1,000 up to over $0.75/1,000 CAPTCHAs) depending on howmany CAPTCHAs theyhavesolved,whereasPixProfitoffersasomewhathigherrateof$1/1,000.Theysaythataminimum amount of money should be collected before payout, and that most servicesprovide payment via an online e-currency system. In general, these earnings are roughlyconsistent with wages paid to low-income textile workers in Asia. The authors craftedCAPTCHAswhosesolutionswouldrevealinformationaboutthegeographicdemographicsofCAPTCHAsolvers,expectinghighaccuracyforservicesemployingworkersfamiliarwiththoselanguages.Table2 liststhelanguagestheyusedinthisexperimentalongwithanexamplethree-digitCAPTCHAinthelanguagecorrespondingtothesolution“123”.Table2alsoshowsthe accuracy of the serviceswhen presentedwith these CAPTCHAs. As authorsmention,although Roman alphanumerics in typical CAPTCHAs are globally comprehensible, Englishwords for numerals represent a noticeable semantic gap for presumably non-Englishspeakers. Very high accuracies on normal CAPTCHAs drop to 38–62% for the challengepresentedinEnglish.Second,workersatanumberoftheservicesexhibitstrongaffinitiestoparticular languages. Five of the services have accuracies for Chinese (Traditional andSimplified) either substantially higher or nearly as high as English. The services evidentlyincludeasizeableworkforcefluentinChinese,likelymainlandChinawithavailablelow-costlabor. In addition, Antigate has appreciable accuracies for Russian andHindi, presumablydrawing on workforces in Russia and India. Similarly, for CaptchaBypass and Russian;BeatCaptchaandTamil,Portuguese,andSpanish;andDeCaptcherandTamil.AsMr.Etold

Page 7: EPL682 Advanced Security Topics - UCY · 2019. 3. 28. · Antreas Dionysiou 3 ID: 945278 In 3rd section, authors discuss automated CAPTCHA solvers examining 2 back-then state-of-

AntreasDionysiou ID:9452787

theminhisinterview,hiscompanyhirestafffromlabormarketsinChina,India,Bangladesh,andVietnam.Movingforward,theauthorsstatethatImageToTextresultsareimpressive,asithasappreciableaccuracyacrossaremarkablerangeoflanguages.ImageToTextisthemostexpensiveservicebyawidemargin,butclearlyhasadynamicandadaptivelaborpool.Theauthors alsoused their time zone to revealdemographic informationaboutworkers. Theresultsshowed,77.9%ofthemcamefromUTC+8,furtherreinforcingtheestimationofalargelaborpoolfromChina;thetwoothertoptimezonesweretheIndianUTC+5.5with5.7%andEasternEuropeUTC+2with3.0%.Asafinalassessment,theauthorswantedtoexaminehowbothCAPTCHAservices(e.g.KolotibabloandPixProfit)andsolversadapttochangesinstate-of-the-art CAPTCHA generation. They focused on the Asirra CAPTCHA,which is based onidentifying pictures of cats anddogs among a set of 12 images. ImageToText displayed aremarkable adaptability to Asirra CAPTCHA, successfully solving the CAPTCHA on average39.9%ofthetime.

Figure5:ImageToTexterrorrateforthe

customAsirraCAPTCHAovertime.

Figure5 shows thedecliningerror rate for ImageToText; as timeprogresses, theworkersbecome increasingly adapt at solving Asirra CAPTCHA. The next closest service wasBeatCaptchas, which succeeded 20.4% of the time. The remaining services, excludingDeCaptcher, had success rates below 7%. After Microsoft deployed Asirra CAPTCHA theDeCpatcher service incorporate it into their service in no time at $4 per 1,000 AsirraCAPTCHAs.DeCaptchersuccessfullysolved696(46.5%)requestswithamedianresponsetimeof39seconds,about2.3timesitsmedianof17secondsforregularCAPTCHAs.Fromwhattheyfound,DeCaptcherdidnotpayPixProfitworkersdoubletheamountforsolvingthem,consequently increasing its profit margin on these new CAPTCHAs. Finally, the authorswantedtoknowwhichsites’CAPTCHAsaretargetedthemost.TheyfoundthatonPixProfit,thetoptwoCAPTCHAtypesrepresent81%ofthevolume,withthetopfiveaccountingfor91%.Kolotibablowasnotquiteasconcentrated,butthetopfivestillaccountfor76%ofitsvolume.TheymentionthatalthoughMicrosoftisbyfarthemostcommontargetforboth,PixProfit tailors to CAPTCHAs from large global services,whereas Russian sites otherwisedominateKolotibablo.In their paper’s conclusion, authors say that CAPTCHAs low-impact quality makes themattractivetositeoperators,butatthesametime,easytooutsourcetoglobalunskilledlabormarket. They say that CAPTCHA-solving business is well-developed, highly-competitive

Page 8: EPL682 Advanced Security Topics - UCY · 2019. 3. 28. · Antreas Dionysiou 3 ID: 945278 In 3rd section, authors discuss automated CAPTCHA solvers examining 2 back-then state-of-

AntreasDionysiou ID:9452788

industrywiththecapacitytosolveontheorderofamillionCAPTCHAsperday,notingthatthewholesaleandretailpriceswillcontinuetodecline.AnsweringifCAPTCHASactuallywork,theysaythatCAPTCHASindeed“tellcomputersandhumansapart”,theydon’tpreventlarge-scaleautomatedsiteaccess,buttheyeffectively limitautomatedsiteaccessas“CAPTCHAreducesanattacker’sexpectedprofitbythecostofsolvingtheCAPTCHA”.Theymentionthathigher-value sites’ CAPTCHAsplace a utilization constraint onotherwise “free” resources,belowwhichitmakesnosensetotargetthem.Theauthorsarguethatthetheprofitabilityofanyparticularscamisa functionof threefactors: (a) thecostofCAPTCHA-solving, (b) theeffectiveness of any secondary defenses, and (c) the efficiency of the attacker’s businessmodel.AsthecostofCAPTCHAsolvingdecreases,asiteoperatormustemploysecondarydefenses (although they aremore expensive both in infrastructure and customer impact)moreaggressivelytomaintainagivenleveloffraud.Finally,theyunderlinedthatCAPTCHAsshouldberegardedasaneconomicimpediment(notonlytechnological)toattackers,aswellastheyarealow-impactmechanismthataddsfrictiontotheattacker’sbusinessmodelandthus,minimizesthecostandlegitimateuserimpactofheavier-weightsecondarydefenses.Paper:“IAmRobot:(Deep)LearningtoBreakSemanticImageCAPTCHAs.”In thispaper, theauthors: (a)conductastudyof latestversionofGoogle’s reCaptcha, (b)explorehowtheriskanalysisprocessisinfluencedbyeachaspectoftherequest,(c)identifyflaws that allowadversaries to influence the risk analysis, bypass restrictions, anddeploylarge-scaleattacks,(d)proposeaneffectiveandlow-costdeep-learning-basedattackforthesemanticannotationofimages,and(e)proposeaseriesofsafeguardsandmodificationsforimpactingthescalabilityandaccuracyoftheirattacks.ThegoalofGoogle’slatestversionofreCaptcha,isto:(a)minimizetheeffortforlegitimateusers,while(b)requiringtasksthataremorechallengingtocomputersthantextrecognition.Theauthorsunderline,thatreCaptchaisdrivenbyan“advancedriskanalysis system”thatevaluates requestsandbasedon thisanalysisselectsthedifficultyofthecaptchathatwillbereturned.Inintroduction,authorsunderlinethatCAPTCHAsareavaluabledefensemechanismtofightfraudsters,buttheyareconsideredtobeanuisancetohumans,deterringthemfromvisitingaspecificwebsite.Asaresult,theystatethatCAPTCHAsshouldbeaseffortlessaspossibletohumans,whileremainingrobustagainstautomatedsolvers.Furthermore,theymentionthatautomatedsolvershavebeenconsideredlesslucrativethanhumansolvers,duetothequick response times exhibited by captcha services in tweaking their design. Moreover,authors say that the future of text-based captchas seems uncertain, due to evolution ofautomatedsolvers.Google’s“noCAPTCHAreCaptcha”maingoalwastoverifyusers,withoutrequiringthemtosolveanykindofchallenge.Thetoolmakesuseofanadvancedriskanalysissystemthatanalyzesusers’requestsforaCAPTCHAtoconcludewhetherornottherequestoriginatesfromanhonestuser.TheyfoundthatreCaptchawidgetalsoperformsaseriesofbrowser checks (for detecting automation frameworks orweird browser behavior), apartfromtheriskanalysissystem.Theauthorsmanagedtodeployanautomationtoolwithoutbeing detected by reCaptchawidget, also identifying design flaws that allow attackers to“influence”theriskanalysisprocessandreceivetheeasycheckboxchallenge,specifyingthata Google tracking cookie that is 9 days old is sufficient. Next, they propose a machine-learning-basedattackforsolvingimage-basedCAPTCHAs,thatextractssemanticinformationfromimages.Astheysay,theirsystemishighlyeffectiveandefficient,asitachieves70.78%accuracyagainsttheimage-basedreCaptchasolvingchallengesin19seconds.TheyalsotesttheirtoolonFacebook’simageCAPTCHAtoshowthegeneralapplicabilityoftheirattack.The

Page 9: EPL682 Advanced Security Topics - UCY · 2019. 3. 28. · Antreas Dionysiou 3 ID: 945278 In 3rd section, authors discuss automated CAPTCHA solvers examining 2 back-then state-of-

AntreasDionysiou ID:9452789

authorsalsoevaluatetheirtoolintermsofcost-effectiveness,andmorespecificallyinoffline-modesolving41.57%oftheCAPTCHAswhilerequiringonly20.9secondsperchallenge,withpracticallynocost.Overall,theyfoundthatreCaptcha’sriskanalysisandwidget(despitetheirflaws)canimprovethesecurityCAPTCHAimplementations.Furthermore,theysuggestthatevolvingtomoreadvancedtasks,suchasextractingsemanticinformationfromimages,isapromisingdirectionforthedesignofrobustandusablecaptchas,underlyingthatthefutureof CAPTCHAs depends on the exploration of fundamentally different approaches to theirdesign.Aftersumminguptheircontributions,theymoveforwardonanalyzingreCaptcha.In2ndsection,authorssaythatreCaptchaisthemostwidelyusedcaptchaservice,designedtobeuser-friendlyandsecure,leveraginginformationaboutusers’activitiesthroughcookies,employingaplethoraofchecks.Next,authorsexplainbrieflyhowreCaptchawidgetworksaswell as describing itsworkflow. Themain point is thatwhen users click on a checkbox arequest is sent containing all related to user collected information. Then, the request isanalyzedbytheadvancedriskanalysissystem,whichdecidesthetypeofCAPTCHAchallengetobepresentedtotheuser.Iftheuserrequestsmultiplechallengesorprovidesseveralwronganswers, the system will return increasingly harder challenges. Next, the versions ofreCaptchathattheycameacrossduringtheirexperimentsare:(a)NoCAPTCHAreCaptcha-Figure1,(b)ImagereCaptcha–Figure2,(c)Distortedone-word–Table1a,(d)Scannedwords–Table1c,(e)Distortedtwo-words–Table1d,and(f)FallbackCAPTCHA–Table1e.

As statedbyauthors, the imageCAPTCHA isnowthedefault type returned,although thedifficult textCAPTCHAs (Table1d,e) are still inuse targeting suspicioushumanusers (e.g.workers forCAPTCHA-solvingservices)andnotbots.Fraudsters,mayattackCAPTCHAsbyeitheremployingautomatedsolversorhuman-based labor.Theauthors’goal is tobypasssafeguardsandinfluencetheadvancedriskanalysisintoreturningcheckboxCAPTCHAs(thatare easy to be solved), aswell as develop an automated attack to solve semantic imageCAPTCHAs.Theyunderlinethattheirfindingscouldalsobeexploitedbysolvingservices,astheydemonstratethefeasibilityofaccurate,large-scale,low-costattacks.In 3rd section, they present an overview of their system designed to solve reCaptchachallenges. Their system is build on Selenium, and Mozilla Firefox (v.36) for leveragingWebDriver. Their system is based on two components, where the first is responsible forcreatingtrackingcookiesthatinfluencetheriskanalysisprocess,andthesecondprocessesthe challenges following different techniques based on the type of challenge. The cookiecreatingcomponent,isresponsibleforcreatingcookieswhicharesubsequently“trained”to

Page 10: EPL682 Advanced Security Topics - UCY · 2019. 3. 28. · Antreas Dionysiou 3 ID: 945278 In 3rd section, authors discuss automated CAPTCHA solvers examining 2 back-then state-of-

AntreasDionysiou ID:94527810

appearasoriginatingfromlegitimateusers.Next, theydescribetheircomponentusedforcollectingreCaptchachallenges.Movingforward,theauthorsexploreifdifferentcomputervisionalgorithmsand imageannotationservicescanbeusedforsolving imageCAPTCHAs.Theyspecificallydeploy: (a)Google’sreverse imagesearch(GRIS) forconductingasearch,basedonanimage,(b)DifferentImageannotationservicesforassigningtags(keywords)orfree-form description of images, (c)Machine-Learning-based classifier that can guess thecontent of an imagebasedon a subset of the tags, and a (d)Historymodule,which is amanually created labeled-dataset with images and their tag from challenges they havecollected,alsoannotatedwiththehintgivenforeachchallenge.Next,theyexplainthateachmoduleassignsthecandidateimagestooneof3sets:(a)select,(b)discardor(c)undecided,alsodescribingtheprocess.In4thsection,theauthorsevaluatehowtheirsysteminfluencestheadvancedriskanalysissystembasedontheirCAPTCHArequests,andtheyalsoevaluatetheaccuracyoftheirattackagainst the image-based version of reCaptcha. They found that Google’s advanced riskanalysiscanbeneutralizedbyappendinga9-dayoldcookie(withorwithoutwebsurfing)totherequest.Next,theyinvestigatedifbeingloggedinaGoogleaccountinfluencestheriskanalysis with, and without, conducting a phone verification. In both cases they werepresentedwithacheckboxcaptchaafter60dayshadpassed.Theyhaveconcludedthatitisactuallybetterforanadversarytonotuseanyaccountatall.Next,theyexploredtheimpactoftheuser’sgeo-locationandtheyfoundthatthereisnorestrictionbasedonthecountryinwhichacookieiscreated,afact,thatfacilitatesfraudsters.Next,theyexplorehowaspectsofautomatedbrowserenvironmentaffectstheoutcomeoftheriskanalysis.Theyfoundthat:(a)thevariablewebdriverwhichisrequiredtobeTruesowebsitescandetectautomationdoesnothaveaneffect,(b)iftheuser-agentcontainsanoutdatedversionofthebrowserorif the browser and engine versions are up-to-date but don’t correspond to the actualenvironment of the experiment, the widget automatically considers the environmentsuspiciousandpresentstheuserwithafallbackCAPTCHAbeforethecheckboxisclicked,(c)fallback captchas are also returned when the user-agent does not contain the completeinformation,orismiss-formatted,(d)thewidgetdoesnotdetecttheunderlyingoperatingsystem,(e)eveniftheuser-agentusedduringacookie’screationisdifferenttotheoneusedwhenrequestingacaptchawiththatcookie,theoutcomeisnotaffected,(f)screenresolutionandmousebehaviordonotaffecttheoutcomeofriskanalysis,(g)cookiesarenotassignedareputationscore,and(h)thereisnomechanismforprohibitthecreationofalargenumberofcookiesfromasingleIPaddress.Next,theypresentaworkaroundtoovercomethefactthatreCaptchawasredesignedsothatthetokenistiedtothewebsitewherethechallengewas presented. After that, they deployed their system to identify how many checkboxCAPTCHAstheycansolveinasingleday,andfoundthatduringweekdays,theycouldsolvebetween 52,000 and 55,000, and during weekends they could solve 59,000 checkboxCAPTCHAsperday.Asoverallevaluation,theyunderlinethatcurrentversionofreCaptchasuffersfromsignificantflawsandomissions,mentioningthattheplethoraofchecksthatareperformed,combinedwiththosethatarefeasiblecanbeusedtointroducemoresafeguardsand improve the robustness of any CAPTCHA system. In addition, they explore whetherreCaptchahasanyflexibilitywhendecidingifthegivensolutioniscorrectfindingthatinmostcases(74%)thenumberofcorrectcandidateimagestobe2;therestcontain3andtheyalsofoundtwochallengeswith4.BasedontheseresultstheysettheirCAPTCHAbreakingsystemtoselect3imagesforthesolution.Next,theytriedtoquantifyimagesbeingrepeatedacrosschallenges,findingthatoutofthe700CAPTCHAs,6pairsofcompletelyidenticalchallenges,

Page 11: EPL682 Advanced Security Topics - UCY · 2019. 3. 28. · Antreas Dionysiou 3 ID: 945278 In 3rd section, authors discuss automated CAPTCHA solvers examining 2 back-then state-of-

AntreasDionysiou ID:94527811

andthusthey’veconcludedthatchallengesarenotcreated“on-the-fly”butselectedfromarelativelysmallpoolofchallenges.Theyalsonoticedthattheimageswerebeingrepeatedacross challenges, finding 1,368 redundant images that belonged to 358 sets of identicalimages. Next, theymoved forward on evaluating the effectiveness of eachmodule usedmentioning the success rates achieved per module. Next, they evaluate their attack’saccuracy against 2,235 and obtained a 70.78% accuracy, mentioning that the most timeconsumingphaseisGRIS.Theystatethattheirattackisveryefficientwithanaveragedurationof19.2secondsperchallenge.Averylimitedvarietyofimagecategorieshasbeendetected,soanadversarycantailoranattackbytrainingtheimageannotationsystemforthesespecifictypesofimages.Afterevaluatingtheirattackinanofflinemode,theauthorshaveconcludedthat adversaries can deploy accurate and efficient attacks against the image reCaptchawithout relyingonexternal services.Next, theauthorsevaluated theirCAPTCHAbreakingsystem’s economic viability and concluded that it is comparable to a professional solvingserviceinbothaccuracyandattackduration,withtheaddedbenefitofnotincurringanycoston theattacker.Theyhavealsoestimated that their tokenharvestingattackcouldaccrue$104 - $110daily, perhost (withpotential amount incensementusingproxy services andrunning multiple attacks in parallel). Next, they discuss countermeasures for defendingagainst their attacks, and their potential impact on the usability of the service. Theseproposalsarerelatedto:(a)tokenauctioning,(b)influencingadvancedriskanalysissystem(account, cookie reputation, browser checks), and (c) image CAPTCHA attacks proposingmodificationsandsafeguardsfocusedonreducingtheaccuracyoftheirautomatedattacks(largerrangeofimages,removepresentedchallengesfrompool,removehint,selectspecificcontent,contenthomogeneity,advancedsemanticrelations,introducenoise,useadversarialimages).Authors,saythattheyinformedGoogleandFacebookabouttheirsystemflawsandgiventhemareportcontainingsuggestionsforimprovement.In8thsection,authorsmentionthattheirsystemcanbeexplored,forfurtherimprovingtheaccuracy of the attack (i.e. account for certain characteristics of the image, introduceconfidencescores).Next,theysaythatduetotheevolutionofcomputervisionandmachinelearningalgorithms,areassessmentonreverseTuringtests(CAPTCHAs)andtheirdesignisconsideredcritical. In10thsectiontheauthorsbrieflyexplaintherelatedwork,andfinally,theyquote their conclusions,demonstrating the feasibilityof large-scaleCAPTCHA-solvingattacks,aswellasthenecessityforexploringnewdirectionsforthedesignofCAPTCHAs,asexisting schemes rely on tasks that are within the capabilities of automated cognizance.Finally,authorsunderlinethattheadvancedriskanalysisandwidgetintroducedbyreCaptchapossessvaluable functionality, thatcanbe incorporated into futureCAPTCHAschemesformitigatingattacks.