![Page 1: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/1.jpg)
PDFMirage:ContentMaskingAttackAgainstInformation-BasedOnlineServices
IanMarkwood*,Dakun Shen*,YaoLiu,andZhuo LuUniversityofSouthFlorida
*Co-firstauthors
PresentedbyIanMarkwood
![Page 2: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/2.jpg)
Outline
• Motivation• BackgroundInformation• ContentMaskingAttack– AgainstConferenceReviewerAssignmentSystems– AgainstPlagiarismDetection– AgainstDocumentIndexing
• ContentMaskingDefense• Conclusion
![Page 3: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/3.jpg)
Motivation
• TheAdobePortableDocumentFormat(PDF)isthestandardforconsistentcross-computerdocumentrendering
• PDFdocumentscannotbeeditedwithcommonlyaccessibletools(MSWord,AdobeReader,etc.)
• Thisconfersasenseofintegritytothedocumentfortheenduser
![Page 4: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/4.jpg)
Motivation
• ThereisadisconnectbetweenthecontentofaPDFandwhatisactuallydisplayed
• Acomputerandahumanseetwodifferentthings
![Page 5: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/5.jpg)
Motivation
• WithinthisdisconnectwecanperformacontentmaskingattackwhichcompromisesthecontentintegrityofPDFfiles
• Threeinformation-basedonlinesystemsrelyontheintegrityofPDFdocuments:– Automaticreviewerassignmentsystemsforacademicpapers
– Plagiarismdetectionsystems– Searchengines
![Page 6: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/6.jpg)
Outline
• Motivation• BackgroundInformation• ContentMaskingAttack– AgainstConferenceReviewerAssignmentSystems– AgainstPlagiarismDetection– AgainstDocumentIndexing
• ContentMaskingDefense• Conclusion
![Page 7: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/7.jpg)
BackgroundInformation
• Whatdotheseserviceshaveincommon?– TheysupportPDFsubmission– TheyscrapethetextoutofsubmittedPDFfilestoperformtheirfunction,ratherthanusingOpticalCharacterRecognition(OCR)
– TextscrapingcopiestheplaintextoutofallstringswithinthePDFfile
– Ignoresfontassociatedwithtext
![Page 8: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/8.jpg)
BackgroundInformation
• Automaticconferencereviewerassignmentsystems– Usetopicmatchingtoassignreviewerstosubmittedpapers
– Comparefrequentwordsappearinginreviewers’publishedpaperstofrequentwordsappearinginsubmittedpapers
– INFOCOMusesLatentSemanticIndexing(LSI)
![Page 9: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/9.jpg)
BackgroundInformation
• Plagiarismdetectionsystems–Measuresimilaritybetweenstringswithinsubjectdocumentandallotherdocumentssubmittedthusfar
• Documentindexing– Searchenginesreturndocumentsbasedonthesimilarityoftheircontenttothesearchstring
![Page 10: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/10.jpg)
Outline
• Motivation• BackgroundInformation• ContentMaskingAttack– AgainstConferenceReviewerAssignmentSystems– AgainstPlagiarismDetection– AgainstDocumentIndexing
• ContentMaskingDefense• Conclusion
![Page 11: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/11.jpg)
ContentMaskingAttack
plaintext cipher
ciphertext
![Page 12: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/12.jpg)
ContentMaskingAttack
• “Maskingfont”– acustomfontwithsomerearrangementofthecharacter/glyphrelationship
• OpensourcetoolssuchasFontForgeallowcopy/pasteofcharacterglyphswithinfonts
• CustomfontsmaybeimportedintoLATEX
![Page 13: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/13.jpg)
Outline
• Motivation• BackgroundInformation• ContentMaskingAttack– AgainstConferenceReviewerAssignmentSystems– AgainstPlagiarismDetection– AgainstDocumentIndexing
• ContentMaskingDefense• Conclusion
![Page 14: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/14.jpg)
ContentMaskingAttackAgainstAutomaticConferenceReviewerAssignmentSystems
• Anauthorcantargetaspecificreviewerbyreplacingenoughkeywordsinthepaperwithkeywordsfromthereviewer’spapers
• Keywords– uncommonwordsthatappearmostfrequently
![Page 15: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/15.jpg)
ContentMaskingAttackAgainstAutomaticConferenceReviewerAssignmentSystems
• Algorithm:– Orderkeywordsinsubjectpaperandtargetreviewer’scorpusbydescendingfrequency
– Constructa“wordmapping”betweenthesetwolists
– Createa“charactermapping”betweenthelettersofeachpairofwords
![Page 16: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/16.jpg)
ContentMaskingAttackAgainstAutomaticConferenceReviewerAssignmentSystems
• Challenges:– One-to-ManyCharacterMapping–WordLengthDisparity
![Page 17: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/17.jpg)
ContentMaskingAttackAgainstAutomaticConferenceReviewerAssignmentSystems
• Experiment:–WehavereproducedtheINFOCOMautomaticreviewerassignmentsystem
– Thisincludes114TPCmembersfromawell-knownsecurityconferenceand2094oftheirrecentlypublishedpapersfortraining
– 100additionalpapersusedastestingdata
![Page 18: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/18.jpg)
ContentMaskingAttackAgainstAutomaticConferenceReviewerAssignmentSystems
• Experiment:–Matchingapapertoonereviewer
Similarityscoresrelativetoamountofwordsmasked.Bluestarsshowthedesiredmatching.
![Page 19: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/19.jpg)
ContentMaskingAttackAgainstAutomaticConferenceReviewerAssignmentSystems
• Experiment:–Matchingapapertoonereviewer
Wordmaskingrequirementsforall100testingpapers
![Page 20: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/20.jpg)
ContentMaskingAttackAgainstAutomaticConferenceReviewerAssignmentSystems
• Experiment:–Matchingapapertoonereviewer
Maskingfontrequirementsforall100testingpapers
![Page 21: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/21.jpg)
ContentMaskingAttackAgainstAutomaticConferenceReviewerAssignmentSystems
• Experiment:–Matchingapapertomultiplereviewers
Similarityscoresrelativetoamountofwordsmasked,betweenapaperandthreereviewers.Bluestars,blackcircles,
andgreentrianglesshowthedesiredmatchings
![Page 22: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/22.jpg)
Outline
• Motivation• BackgroundInformation• ContentMaskingAttack– AgainstConferenceReviewerAssignmentSystems– AgainstPlagiarismDetection– AgainstDocumentIndexing
• ContentMaskingDefense• Conclusion
![Page 23: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/23.jpg)
ContentMaskingAttackAgainstPlagiarismDetection
• Acheatingstudentcanevadeaplagiarismdetectorbyreplacingtheunderlyingtextwithgibberish
• Usea“scramblingfont”torenderthegibberishaslegible(plagiarized)text
• Resultsinzerosimilaritywithexistingwork
![Page 24: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/24.jpg)
ContentMaskingAttackAgainstPlagiarismDetection
• Zerosimilarityisunrealisticduetocommonphrasesinlanguage
• Weevaluatethreemethodstotargetaspecificsimilarityscore
• Eachmethodchooseswhattexttoscrambleandwhattexttoleaveunaltered
![Page 25: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/25.jpg)
ContentMaskingAttackAgainstPlagiarismDetection
• Byletter– Usescramblingfontwhichscramblesallcharacters
– Removecharactersfrombeingscrambledbyorderoftheirfrequencyofappearanceinthelanguage
– Continueremovingcharactersuntilatargetsimilarityscoreisreached
![Page 26: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/26.jpg)
ContentMaskingAttackAgainstPlagiarismDetection
• Byword,infrequencyofappearance– Usescramblingfontwhichscramblesallcharacters
– Orderdistinctwordsbyfrequencyofappearance– Applyscramblingfonttoallwords– Removescramblingfontfromdistinctwordsuntilatargetsimilarityscoreisreached
![Page 27: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/27.jpg)
ContentMaskingAttackAgainstPlagiarismDetection
• Byword,atrandom– Usescramblingfontwhichscramblesallcharacters
– Iterateoverdocument,applyingscramblingfontatrandomaccordingtochosenprobability
–Modifyprobabilityuntilatargetsimilarityscoreisreached
![Page 28: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/28.jpg)
ContentMaskingAttackAgainstPlagiarismDetection
• Experiment:– Applyscramblingfontsto10publishedpapersandtarget5-15%similarityscoremeasuredbyTurnitin
![Page 29: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/29.jpg)
Outline
• Motivation• BackgroundInformation• ContentMaskingAttack– AgainstConferenceReviewerAssignmentSystems– AgainstPlagiarismDetection– AgainstDocumentIndexing
• ContentMaskingDefense• Conclusion
![Page 30: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/30.jpg)
ContentMaskingAttackAgainstDocumentIndexing
• AnattackercanplacespamorillicitcontentinPDFdocumentsindexedbysearchengines
• ThesePDFscanshowadsinsteadoflegitimatecontentthatuserssearchfor
![Page 31: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/31.jpg)
ContentMaskingAttackAgainstDocumentIndexing
• Thiscanbeconsideredaspecialcaseofthereviewerassignmentsystemsubversionmethod
• Insteadofmaskingparticularwords,wearemaskingtheentiredocument
• Notconstrainedbyspaceshowever
![Page 32: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/32.jpg)
ContentMaskingAttackAgainstDocumentIndexing
• Thelargernumberofmaskedcharactersrequiresmoremaskingfonts
• Insteadofgeneratingfontsadhoc,wemakeonefontforeachglyph
• ~84fonts• Allowsforeasyautomatedgenerationofmaskeddocuments
![Page 33: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/33.jpg)
ContentMaskingAttackAgainstDocumentIndexing
• Experiment– Used5well-knownpublishedpapers–Maskedeachasgibberish
![Page 34: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/34.jpg)
ContentMaskingAttackAgainstDocumentIndexing
• Experiment– Submittedthemtoleadingsearchenginesforindexing(Google,Bing,Yahoo!,DuckDuckGo)
– Resultswerethesameforalltestdocuments
![Page 35: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/35.jpg)
ContentMaskingAttackAgainstDocumentIndexing
• Experiment
SearchEngine
IndexedPapers
AttackSuccessful
EvadesSpamDetection
NotLaterRemoved
Google ✔ ✘ ✘ ✘
Bing ✔ ✔ ✔ ✔
Yahoo! ✔ ✔ ✘à✔ ✔
DuckDuckGo ✔ ✔ ✔ ✔
![Page 36: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/36.jpg)
ContentMaskingAttackAgainstDocumentIndexing
• Experiment
![Page 37: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/37.jpg)
Outline
• Motivation• BackgroundInformation• ContentMaskingAttack– AgainstConferenceReviewerAssignmentSystems– AgainstPlagiarismDetection– AgainstDocumentIndexing
• ContentMaskingDefense• Conclusion
![Page 38: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/38.jpg)
ContentMaskingDefense
• Onefeasible defense:performOpticalCharacterRecognition(OCR)onthedocumenttochecktheintegrityofeachcharacter.
• Problem:– Highcomputationaloverhead– Highfalsepositiverate
50,000- 75,000characters
![Page 39: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/39.jpg)
ContentMaskingDefense– Ourproposal
• RendereachcharacterinthefontsembeddedinthesubjectPDFfileandperformOCRonthosecharactercodesratherthantherenderedPDFfileitself.
• Saveprocessingtime
100-2000characters
50,000- 75,000characters
![Page 40: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/40.jpg)
ChallengesandTechnicalDetails
• Challenge1:Wholefontfileisembedded– Contain2"# = 65,536 charactersmaximum– Causehighcomputationaloverhead
• Solution:Scanthedocumenttoextractthecharactersused,andperformOCRontheseriesofcharacterusedineachfont.
![Page 41: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/41.jpg)
ChallengesandTechnicalDetails
• Challenge2:Specialcharacters
pUnicode:0xfe
þUnicode:0x70
OCR
Unicodemismatch
Falsealarm
![Page 42: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/42.jpg)
ChallengesandTechnicalDetails
• Solution:FontTraining1. PerformOCRonthefontandlistallsimilar
characters.2. Ifthedetectedglyphisinthesimilarcharacter
list,replacethecharacter’sUnicodeasthenormalletteritlookslike.
![Page 43: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/43.jpg)
FontTraining
Unicode:0xfe
þ
Inthelist
ChangeUnicode
Unicode:0x70
Whitelist
ã0xe3
a0x61
ɧ0x267
h0x68
Ѡ0x460
W0x57
…… ……
Þ0xfe
p0x70
…… ……
![Page 44: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/44.jpg)
FontVerificationPerformance
• Experiment1– ToanalyzetheaccuracyofourFontVerificationmethodandtheWholeDocumentOCRmethod
– Generated10PDFfileswithmaskedcharactersvaryingfrom5-20%infrequencyofappearance
![Page 45: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/45.jpg)
Performance– Experiment1
![Page 46: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/46.jpg)
FontVerificationPerformance
• Experiment2– Toanalyzetheeffectsofdocumentlengthonthedetectionrateforeachmethod.
– Generated10PDFfilesrangingfrom1-10pagesinlengthandhavinganeven30%distributionofmaskedcharacters
![Page 47: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/47.jpg)
Performance– Experiment2
![Page 48: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/48.jpg)
FontVerificationPerformance
• Experiment3– Toanalyzetheeffectofdocumentlengthonthedetectiontimeforeachmethod
– Generated20PDFfilesrangingfrom1-20pagesinlengthandhavinga30%distributionofmaskedcharacters
![Page 49: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/49.jpg)
Performance– Experiment3
![Page 50: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/50.jpg)
Outline
• Motivation• BackgroundInformation• ContentMaskingAttack– AgainstConferenceReviewerAssignmentSystems– AgainstPlagiarismDetection– AgainstDocumentIndexing
• ContentMaskingDefense• Conclusion
![Page 51: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/51.jpg)
Conclusion
• WedescribeanewcontentmaskingattackagainsttheAdobePDFstandard
• Wecreateandevaluatealgorithmsforeffectivelyperformingattacksagainst:– Automaticreviewerassignmentsystems– Plagiarismdetection– Documentindexing
• WecreateandevaluateafontverificationalgorithmthatismoreaccurateandlightweightthanOCR
![Page 52: PDF Mirage: Content Masking Attack Against Information ... · PDF Mirage: Content Masking Attack Against Information-Based Online Services Ian Markwood*, DakunShen*, Yao Liu, and](https://reader033.vdocuments.us/reader033/viewer/2022060322/5f0d77027e708231d43a7d5b/html5/thumbnails/52.jpg)
Thankyou!
• Questions?
PDFfileimagefromhttp://iconbug.com/detail/icon/5940/file-format-pdf/TrueTypefontfileimagefromhttps://typography.guru/journal/opentype-myths-explained-r24/