copy-move forgery detection based on automatic threshold ... and... ·...

23
DOI: 10.4018/IJSKD.2020010101 International Journal of Sociotechnology and Knowledge Development Volume 12 • Issue 1 • January-March 2020 Copyright©2020,IGIGlobal.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIGIGlobalisprohibited. 1 Copy-Move Forgery Detection Based on Automatic Threshold Estimation Aya Hegazi, Faculty of Computers and Informatics, Benha University, Benha, Egypt Ahmed Taha, Faculty of Computers and Informatics, Benha University, Benha, Egypt Mazen Mohamed Selim, Faculty of Computers and Informatics, Benha University, Benha, Egypt ABSTRACT Recently,usersandnewsfollowersacrosswebsitesfacemanyfabricatedimages.Moreover,itgoes farbeyondthattothepointofdefamingorimprisoningaperson.Hence,imageauthenticationhas becomeasignificantissue.Oneofthemostcommontamperingtechniquesiscopy-move.Keypoint- basedmethodsareconsideredasaneffectivemethodfordetectingcopy-moveforgeries.Insuch methods,thefeatureextractionprocessisfollowedbyapplyingaclusteringtechniquetogroupspatially closekeypoints.Mostclusteringtechniqueshighlydependontheexistenceofaspecificthreshold toterminatetheclustering.Determinationofthemostsuitablethresholdrequiresahugeamountof experiments.Inthisarticle,acopy-moveforgerydetectionmethodisproposed.Theproposedmethod isbasedonautomaticestimationoftheclusteringthreshold.Thecutoffthresholdofhierarchical clusteringisestimatedautomaticallybasedonclusteringevaluationmeasures.Experimentalresults testedonvariousdatasetsshowthattheproposedmethodoutperformsotherrelevantstate-of-the-art methods. KEywoRDS Clustering Evaluation Measures, Copy-Move Detection, Image Forensics, Keypoint-Based Methods, Multiple- Copied Matching 1. INTRoDUCTIoN Digitalimagesareeverywhere,andtheyhavethepowertodoinfinitelymorethanadocument.Inthe latterhalfofthelasttwodecades,theinternet,mobiletechnology,andobsessionofsocialmediahave highlyaffectedandchangedpeople’slives(Katta&Patro,2017;Mahajanetal.,2018;Muliawatet al.,2019).Recently,thereisarapidincreaseinimagesshowinginthemediaasinsocialmediaand televisionthatdon’tseemtobeallastheyappear.Authenticityofdigitalimagesisacriticalissue. Daybyday,itbecomeseasyforanyonetomanipulateimagesevenwithoutleavinganyvisibleclues. WideavailabilityofpowerfulimageprocessingsoftwarelikePhotoshopandGimpmakesitmore challengingfordigitalimageauthentication. Digitalimageforensicsisthescienceofdetectingtamperedregionsinimages.Identifyingthe authenticityofdigitalimagesisveryimportantindigitalforensics.Thepurposeofdigitalimage manipulationistoconcealorhideinformationforseveralintentionsthereforechangetheirmeaning. Manyareashavebeenaffectedbydigitalforensics.Theimpactofimagemanipulationinmedia, journalism,digitalcinema,newsandinpoliticstomisleadthepublicopinion.Itcouldalsobeusedin lawformiscarryingjustice.Manipulatedimagesalsohavebeenfoundinacademicpapers.Inasurvey byTijdink(Tijdinketal.,2014),inthepastthreeyears,15%ofoffendersareinvolvedinscientific

Upload: others

Post on 11-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

DOI: 10.4018/IJSKD.2020010101

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

Copyright©2020,IGIGlobal.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIGIGlobalisprohibited.

1

Copy-Move Forgery Detection Based on Automatic Threshold EstimationAya Hegazi, Faculty of Computers and Informatics, Benha University, Benha, Egypt

Ahmed Taha, Faculty of Computers and Informatics, Benha University, Benha, Egypt

Mazen Mohamed Selim, Faculty of Computers and Informatics, Benha University, Benha, Egypt

ABSTRACT

Recently,usersandnewsfollowersacrosswebsitesfacemanyfabricatedimages.Moreover,itgoesfarbeyondthattothepointofdefamingorimprisoningaperson.Hence,imageauthenticationhasbecomeasignificantissue.Oneofthemostcommontamperingtechniquesiscopy-move.Keypoint-basedmethodsareconsideredasaneffectivemethodfordetectingcopy-moveforgeries. Insuchmethods,thefeatureextractionprocessisfollowedbyapplyingaclusteringtechniquetogroupspatiallyclosekeypoints.Mostclusteringtechniqueshighlydependontheexistenceofaspecificthresholdtoterminatetheclustering.Determinationofthemostsuitablethresholdrequiresahugeamountofexperiments.Inthisarticle,acopy-moveforgerydetectionmethodisproposed.Theproposedmethodisbasedonautomaticestimationoftheclusteringthreshold.Thecutoffthresholdofhierarchicalclusteringisestimatedautomaticallybasedonclusteringevaluationmeasures.Experimentalresultstestedonvariousdatasetsshowthattheproposedmethodoutperformsotherrelevantstate-of-the-artmethods.

KEywoRDSClustering Evaluation Measures, Copy-Move Detection, Image Forensics, Keypoint-Based Methods, Multiple-Copied Matching

1. INTRoDUCTIoN

Digitalimagesareeverywhere,andtheyhavethepowertodoinfinitelymorethanadocument.Inthelatterhalfofthelasttwodecades,theinternet,mobiletechnology,andobsessionofsocialmediahavehighlyaffectedandchangedpeople’slives(Katta&Patro,2017;Mahajanetal.,2018;Muliawatetal.,2019).Recently,thereisarapidincreaseinimagesshowinginthemediaasinsocialmediaandtelevisionthatdon’tseemtobeallastheyappear.Authenticityofdigitalimagesisacriticalissue.Daybyday,itbecomeseasyforanyonetomanipulateimagesevenwithoutleavinganyvisibleclues.WideavailabilityofpowerfulimageprocessingsoftwarelikePhotoshopandGimpmakesitmorechallengingfordigitalimageauthentication.

Digitalimageforensicsisthescienceofdetectingtamperedregionsinimages.Identifyingtheauthenticityofdigitalimagesisveryimportantindigitalforensics.Thepurposeofdigitalimagemanipulationistoconcealorhideinformationforseveralintentionsthereforechangetheirmeaning.Manyareashavebeenaffectedbydigitalforensics.Theimpactofimagemanipulationinmedia,journalism,digitalcinema,newsandinpoliticstomisleadthepublicopinion.Itcouldalsobeusedinlawformiscarryingjustice.Manipulatedimagesalsohavebeenfoundinacademicpapers.InasurveybyTijdink(Tijdinketal.,2014),inthepastthreeyears,15%ofoffendersareinvolvedinscientific

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

2

misconductsuchasfabricating,refutationormanipulatingdata.Astudyby(Farid,2006)reportedthatintheJournalofCellBiologyabout20%ofadmittedmanuscriptshaveatleastonefigurethatmustberestoredduetounsuitableimagemanipulation,andabout1%aredeceitfulfigures.Theseconsequencesmakeimageauthenticitieslesstrustful.

Rapidgrowthofforgedimagesandtheirinfluenceinmanyareashaveledtothedevelopmentoftamperingdetectiontechniques.Tamperingdetectiontechniquesfallundertwocategories:activeauthenticationandpassiveauthenticationmethods(Al-Qershi&Khoo,2013)asshowninFigure1.Activeauthenticationmethodsrequiresomepreprocessingondigitalimageslikewatermarking,orsignatures.Digitalwatermarkingconcealsawatermarkintotheimageatthecapturingendandextractsitattheauthenticationendtoexaminewhethertheimagehasbeentamperedwith(Al-Qershi&Khoo,2013).Insertingthewatermarkeitheratthecapturingtimeoftheimageusingaspeciallyequippedcameraorlaterbyanauthorizedpersonisthemaindrawbackofwatermarking(Qureshi&Deriche,2015).Moreover,mostofthecamerastodayarenotequippedwithawatermarkembeddingtechnique.Inaddition,thesubsequentprocessingoftheoriginalimagecoulddegradetheimagevisualquality.Moreover,digitalsignatureissimilartodigitalwatermarking.Attheimage-capturingend,uniquefeaturesareextractedfromtheimageasasignature.Attheauthenticationanddetectionend,thesignatureisregeneratedusingthesamemethodandtheauthenticityoftheimagecanbeidentifiedandverifiedthroughcomparison.Digitalsignatureshavethesamedrawbacksofdigitalwatermarking.

Ontheotherhand,Passive(blind)authenticationmethodsauthenticateimageswhilenotrequiringany previous information of it. It relies on the traces left on the image during manipulation byvariousprocessingoperations.Therefore,passiveauthenticationmethodsareconsideredasthemostcommon(Linetal.,2018).Passivedetectiontechniquescanbeclassifiedtoforgery-typedependentorforgery-typeindependent.Forgery-typeindependenttechniquesdetectforgeriesregardlessofthetypeoftheforgery.Todetectgeneraltampering,theindependenttechniquesexploitthreediversetypesofartifacts:tracesofre-sampling,compressionandinconsistencies(Redietal.,2011).Theforgery-typedependenttechniquesareusedforcertaintypesofforgeries.Copy-moveandsplicingareexamplesofforgery-typedependent(seeFigure1).Suchtechniquesdependoncopyingandpastingimageregionseitherfromthesameimage(copy-move),orfromdifferentimages(splicing).Imagesplicingiscreatedfromatleasttwodifferentimages(Sharma&Ghanekar,2019;Walia&Kumar,2018).AnExampleofimagesplicingisshowninFigure2(d).

Copy-moveorcloningisatechniqueofcopyingaregionandpastingitinthesameimage.Itcontainsatleasttworegionsalike(seeFigure2(b)).Sincetheduplicatedregionsarefromthesameimage,theyinheritthesamebasicimagepropertiessuchascolorpalette,illuminationconditionsandnoise.Copy-moveforgeryisthemostcommontypeusedforimagemanipulationduetoitssimplicityandeffectiveness(Al-Qershi&Khoo,2013;Bakiahetal.,2016).Althoughthistechniqueiseasytoimplement,itishardtodetect.Ofteninpractice,forgeryisnotjustlimitedtocopyingandpastingtheregions,someprocessingoperationsareappliedtotheseregions.Theseoperationscanbeclassifiedtointermediateoperations(geometrictransformations)andpost-processingoperations.Intermediateoperationsareusedtoprovideaspatialsynchronizationandhomogeneitybetweenthecopiedregionanditsneighbors(Al-Qershi&Khoo,2013;Bakiahetal.,2016).Examplesofintermediateoperationsarerotationandscaling.Post-processingoperationsareusedtoremovetracesleftfromforgeryandtomakeitunnoticeable.Additivenoise,JPEGcompressionandblurringareexamplesofpost-processingoperation (Liu et al., 2010). Since all those operations make detecting copy-move forgery morechallenging,numerousmethodshavebeenproposedforCopy-MoveForgeryDetection(CMFD).Mostofthemcanbeclassifiedeitherintoblock-basedmethodsorkeypoint-basedmethods(Bakiahetal.,2016).Inblock-basedmethods,theimageisdividedintooverlappingornon-overlappingblocksoffixedsize.Ontheotherhand,keypoint-basedmethodscalculatelocalinterestpoints(keypoints)fromthewholeimagewithoutanysubdivisions.

Generally,CMFDtechniques followacommonpipelineasshown inFigure3 (Christleinetal.,2012).Bothblock-basedandkeypoint-basedmethodsfollowthesamepipelinestepsexceptfor

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

3

featureextraction.ThefirststageofCMFDpipelineispreprocessing.Itisanoptionalstageanditdependsonthetechniqueused.Itaimstoimprovetheimagedataeitherbyenhancingtheimagefeaturesorbyremovingsomeunwanteddistortion.ConvertingRGBimagetograyscaleisoneofthemostusedmethodsforpre-processing(Bakiahetal.,2016).Afterthisconversion,featuresareextractedeitherfromthedividedblocksinblock-basedmethodsorforthekeypointsthentheyarestoredinafeaturevector.Inthematchingstage,eachfeaturevectoriscomparedwitheachothertofindsimilaritieswithinthesameimage.Inthisstageoncethematchedblocksaredetected,copy-moveforgerymanipulationsaredetermined.Thematchingtechniqueuseddependsontheextractedfeaturesofblock-basedorkeypoint-basedmethods.Inthefilteringprocess,outliersandfalsematchesareremoved.Finally,thecopy-moveforgerydetectionresultcanbevisualizedtolocalizethetamperedregionsintheforgedimage.Visualizationcanbefurtherrefinedbymorphologicaloperationsuchasfillingtheholes.Thereisalwaysamotivationforpresentingacopy-movedetectiontechniquewithefficientcomplexityandrobustnessagainstimageprocessingoperations.

The rest of the paper is structured as follows. Section 2 introduces related work. Section 3explainsourproposedmethod.Theexperimentalresultsaregiveninsection4.Finally,section5concludesthepaper.

2. RELATED woRK

Intheliterature,severaltechniquesinbothblock-basedandkeypoint-basedhavebeenintroducedforcopy-moveforgerydetection.Amongtheblock-basedmethods,DiscreteCosineTransform(DCT)isconsideredasthemostwidelyused.(Fridrichetal.,2003)suggestthefirstmethodfordetectingcopy-moveforgery;theyused256coefficientsofDCTasfeatures.FurtherimprovementsbasedonDCThavebeenintroducedin(Caoetal.,2011b,2011a;Huetal.,2011;Junhong,2010).Although

Figure 1. Existing image forgery detection techniques

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

4

block-basedmethodsare robust tonoiseandJPEGcompression, theyare sensitive togeometrictransformationslikerotationandscalingandoftenresultinsignificantfalsepositives.Moreover,theyhavelargefeaturevectorsize.Itresultsinahighcomputationalcomplexity(Christleinetal.,2012).Incontrast,keypoint-basedmethodsoutperformblock-basedmethods.Theymatchfeaturesofthe

Figure 2. Examples of blind forgery: (a) and (c) Original images; (b) Copy-move forged image; (d) Spliced forged image

Figure 3. Common processing pipeline for copy-move forgery detection

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

5

imageinsteadofblocks.Hence,itresultsinlesscomputationalcomplexityandminimummemoryconsumption(Dadaetal.,2016).

Ontheotherhand,keypoint-basedmethodsexhibitthemostaccurateandstableresultsinthepresenceofgeometricaltransformations(e.g.scaling,rotation,andaffinetransformation).Scale-InvariantFeatureTransform(SIFT)andSpeededUpRobustFeatures(SURF)aremostwidelyusedkeypoint-basedmethods. (Amerinietal.,2011) introduceaSIFT-basedmethodinwhichfeaturevectorsareextractedbySIFT.Then,theyarematchedusinggeneralized2-nearestneighbor(g2NN)algorithmfollowedbyagglomerativehierarchicalclustering.Althoughthismethodcandealwithmultiplecopy-moveforgery,itfailstolocalizecopymoveregionsaccurately.Inaddition,itisunabletoseparateduplicatedregionsthatarenearoneanother.Moreover,sometimesclusteringcouldbeunacceptablebecauseitneedstosetanempiricalthresholdtostoptheprocess.

ShivakumarandBaboo(B.L.Shivakumar&Baboo,2011)usedHarrisdetectoraskeypointsdetectorwhileSIFTisusedasadescriptor.Thek-dimensionaltree(kd-tree)algorithmisutilizedformatchingkeypointsandfordetectingduplicatedregions.PanandLyu(Pan&Lyu,2010)presentanotherSIFT-baseddetectionalgorithm.ThedetectedSIFTkeypointsarematchedusingthebest-bin-firstalgorithmfollowedbyRandomSampleConsensus(RANSAC)algorithmforgeometrictamperingestimation.However,quantitativeresultsonarealisticdatasetarenotofferedandcannotaccuratelydetecttamperedregionsbecauseofthelackofcorrectmatchedpoints.Thismethodhasadeficientperformancewhendetectingsmallduplicateregions.In(Lietal.,2015),theypresentamethodbasedonSIFTandsegmentation.TheysuggesttheuseofEMalgorithmfortransformestimationrefinementtoreducefalsematches.However,theirmethodsuffersfromhighcomputationalcomplexity.Theirworkistestedusingtwodatasetsandcannotdetectplaincopy-moveforgeryaccurately.Moreover,theyidentifiedsomeunforgedimagesasforged.Yangetal.(Yangetal.,2017)presentamethodbasedonhybridfeatures.TheyusedinterestpointdetectorcalledKAZEcombinedwithSIFTforfeatureextraction.KAZEisaJapanesewordthatmeanswind.

Althoughkeypoint-basedmethods recordagoodperformance, they still suffer fromseveralissuesthatcriticallyaffectthetamperingdetection.Mostpriormethodsempiricallyselectthresholdsanddonotconsideritsrelationshipwiththeimageandforgedregionsize.Furthermore,theycannotdetectenoughkeypointsinflatareas,whichresultinmanyfalsenegativesandinaccuratelocalization.Generally,aneffectiveandefficientcopy-moveforgerydetectiontechniqueshouldberobusttoimageprocessingoperation(i.e.intermediateandpost-processingoperations).Moreover,itshouldaccuratelylocalizetamperedregionsandattainlesscomputationalcomplexity.

3. PRoPoSED METHoD

Theproposedmethoddetectscopy-moveforgerybasedonSIFTfeaturesandagglomerativehierarchicalclustering.Theestimationofthresholdthatcaneffectivelyandefficientlyhandlethemostcommonissuesrelatedtokeypoint-basedmethodsisdoneautomatically.Inourexperiment,CLAHEbasedonRayleighdistributionfunctionisapplied.Cliplimitandtilesizearesetto0.01and4×4respectively.AsshowninFigure4,thewholedetectionprocessoftheproposedmethod.Inourdetectionscheme,theimageisfirstpreprocessed.Then,featuresareextractedusingSIFT.DistinctiveSIFTkeypointsarethenmatchedbetweeneachotherusingfastapproximatenearestneighbormethod.Afterthat,hierarchicalclusteringisappliedonthematchedpoints.Finally,theimageisfiltered,andtamperedregionsarelocalized.Eachstepofthedetectionmethodwillbedescribedindetailinthefollowingsubsections: section3.1describes the imagepreprocessing, section3.2 introducesSIFT featuresextraction,section3.2illustrateskeypointmatching,section3.3presentstheclusteringandforgerydetection,section3.4showstheprocessofestimatingaffinetransformationandfinallysection3.5illustrateshowtamperedregionsarelocalized.

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

6

3.1. Image Preprocessing by CLAHEAsmentionedearlier,SIFTfeatureextractionalgorithmcannotdetectenoughkeypointsinflatareas.Tolimitthisissue,inourwork,Contrast-LimitedAdaptiveHistogramEqualization(CLAHE)(Maetal.,2017)isusedtoenhancethecontrastoftheimage.CLAHEislocallyadaptivecontrastenhancementmethod.IncontrasttoglobalHE,CLAHEworkslocallyinsmallareascalledtilesasopposedtotheentireimage.Eachtile’scontrastisimproved,sothattheprocessedhistogramregionapproximatelymatchesthehistogramspecifiedbyadistributionfunction(e.g.uniform,Gaussian,orRayleigh).BeforecomputingtheCumulativeDistributionFunction(CDF)inCLAHE,thehistogramisclippedataspecificvaluewhichlimitsthenoiseamplification.CLAHEdependsontwomainparameters:Block(tile)SizeandClipLimit.Thequalityoftheimprovedimageiscontrolledmainlybytheseparameters.Rayleighdistributionisoneofthecommonlyusedhistogramclips,whichproduceabell-shapedhistogram(Maetal.,2017).Rayleighdistributionfunctionisgivenby:

y i yp imin( ) = + ( ) −− ( )

2 1

1

12α ln (1)

whereymin

isthelowerboundofthepixelvalue,α isascalingparameterofRayleigh,and p i( ) iscumulativeprobabilitywhichisprovidedtocreatetransferfunction.Ahighervalueofα willresultinmoresignificantcontrastenhancementintheimage,increasingsaturationvalueandamplificationofnoiselevels.

Figure 4. Framework of the proposed copy-move forgery detection method

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

7

UsingCLAHEoffersmanyadvantages;easytoimplement;simplecalculation,anditprovidesgoodresultsinimage’slocalareas.CLAHEcanlimitbrightnesssaturationthatusuallyresultsfromHEandresultsinlessnoise(Kumar&Sharma,2008).

ItworthnoticingthatapplyingCLAHEgenerallyonallimageswillnotalwaysresultinsatisfyingresults.Some imageregionsmaybecomeoversaturatedordarkerwhichwillcriticallyaffect thetampering detection results. Thus, CLAHE will only be applied on low contrast images. In ourexperiment,CLAHEbasedonRayleighdistributionfunctionisapplied.Cliplimitandtilesizearesetto0.01and4×4respectively.

3.2. SIFT Features Extraction and DescriptionTheproposedmethodisbasedonaneffectivekeypointdetectoranddescriptorcalledScaleInvariantFeatureTransform(SIFT)(Lowe,2004).SIFTalgorithmextractsdistinctivefeatures(orlocalfeatures)indigitalimagesthatareinvarianttoimagescalingandrotationandproviderobustnesstochangesinillumination,distortion,noiseaddition,and3Dviewpoint(Lowe,2004).ItisappliedtotheinputimagetoextractSIFTdistinctivekeypointswhicharerepresentedwith128-dimensionalfeaturevector.Asdescribedin(Lowe,2004),thefollowingarethemajorstagesofSIFTalgorithmexplainedbriefly:

• Scale-space peak selection:Thegoalofthisstageistoidentifylocationsandscalesthatcanbefrequentlyassignedunderdivergentviewsofthesameobject.ThatisdonebydetectingextremausingadifferenceofGaussianfunctionatdifferentscalesoftheimage;

• Keypoint localization:Someofkeypointcandidatesresultfromthepreviousstageareunstable(i.e.theyliealonganedge,ortheyhavealowcontrast).Forthatadetailedmodelisfittothenearbydataforaccuratelocation,scale,andcontrast.So,unstablekeypointsarerejectedandhenceincreasetheefficiencyandtherobustnessofthealgorithm.Attheendofthisstageobtainedkeypointsarestableandscaleinvariance.Formoredetails,see(Lowe,2004);

• Orientation assignment:Basedonlocalimageproperties,oneormoreorientationsareassignedtoeachkeypoint.Aroundeachkeypointgradientdirectionandmagnitudearecalculated.Thenthemostprominentgradientorientation(s)areidentifiedandassignedtothatregion.Now,keypointsthatarerotationinvarianceareobtained;

• Keypoint descriptor: After assigning location, scale and orientation for each keypoint, adescriptor is computed for the local image region based on a window around the detectedkeypoint.Therefore,theoutputofthisstepisauniqueSIFTkeypointsthatarerepresentedwith128-dimensionaldescriptorvectors.Keypointsdescriptorishighlydistinctiveanditisinvarianttoscaling,rotation,illuminationchangeand3Dviewpoint.

3.3. Keypoint MatchingInacopy-moveforgery,keypointsextractedfromtheoriginalandduplicatedregionshavethesamedescriptorvectors.Therefore,matchingbetweenthemisappliedtoauthenticatecopy-moveforgeriesintheimage.Usuallymatchingbetweendetectedkeypointsisdoneusingg2NNasin(Amerinietal.,2013;Dadaetal.,2016;Lietal.,2015;Mohamdian&Pouyan,2013).Anyway,itisknowntosufferfromhighcomplexitywhenidentifyingthesimilarityfrommanyhighdimensionalvectors.Inaddition,itgiveslessaccurateresultsespeciallyinhighdimensionalspace.Moreover,lexicographicsortingyieldshigherfalsenegativerate(Christleinetal.,2010).Toaddresstheseissues,FastApproximateNearestNeighbor(FANN)methodintroducedbyMujaetal.(Muja&Lowe,2009)isused.ItisbasedonBest-Bin-First(BBF)searchwhichisavariantofaKD-treethatisusedforfindingapproximatenearestneighborswithhighestprobabilityandlesstime.Intheworkintroducedin(Christleinetal.,2010),ithasbeenshownthattheuseofKD-treematchinggenerallygivesabetterresultscomparedtolexicographicalsortingespeciallyinvery-high-dimensionalspace.TheideaofBBFistosearchinbinsofkd-treeinorderofdistancefromthequeryusingapriorityqueue.Thedistancetoabin

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

8

(node)istheminimumdistancebetweenthequeryandanyotherpointonthebinboundary.Ateachinternalnodevisitedstorethepositionandthedistanceinthequeueinsteadofbacktracking,poptheclosestdistancefromthequeueandcontinuefromit(Lowe,2004).

Given a test image, a set of keypoints F f fn

= … 1, , and its corresponding descriptor

D D Dn

= … 1, , areextracted.Matchingoperation isperformedbetweenfeaturedescriptors to

identifythesimilaritybetweenthem.FeaturesdescriptorisusedtobuildtheKD-tree.AfterthatBBFsearchisappliedonthekd-treetofindtheNnearestneighborsofeachkeypoint f

ifromallother

(n-1)keypointsoftheimage.NearestneighboriscomputedusingEuclideandistance.LetsortedEuclideandistancewhichisknownassimilarityvectordenotedbyd d d d

n= … − , , ,

1 2 1.Assuggested

by(Lowe,2004),theratiobetweenthedistanceofclosestneighbortothedistanceofthesecondclosestneighboriscalculatedandthentheresultiscomparedtoapredefinedthresholdT(usuallyrangefrom0.3to0.5)toreducefalsematches.Therefore,thekeypointismatchedonlyifthefollowingconstraintissatisfied:

dd

T where T12

0 1< ( ) , , (2)

Sincecopy-moveforgeriesmayhavesameimageareathatisclonedmultipletimes,itisnecessarytohandlethiscase.FANNalgorithmcanmanagemultiplecopy-moveforgery.Attheendofthisstage,allthematchedkeypointsarekeptandisolated,andonesarediscarded.

3.4. Clustering and Forgery DetectionInorder to identifypossible forgedregionsandgroupspatiallyclosedkeypoints,AgglomerativeHierarchicalClustering(AHC)(Hastieetal.,2003)isappliedonspatiallocationsofmatchedfeaturepairs.AHCisabottom-upclusteringmethod.Hierarchicalclusteringcreatesahierarchyofclusterswhereclustershavesub-clusters,whichinturnhavesub-clusters,etc.Itstartswitheachkeypointinitsownsingletoncluster.Pair-wisedistancesbetweenclustersarethenevaluated.Itagglomerates(merges)theclosestpairofclusterswithshortestdistance.Thisprocessisrepeateduntilallclustershavebeenmergedintoasingleclusterthatcontainsallkeypoints.AHCcanbevisualizedusingatree-likediagramcalledadendrogram.Itshowstheprogressivegroupingofthekeypoints.Itisthenpossibletodeterminewhatsuitablenumberofclustersis.Generally,mergingofthekeypointsisdeterminedbytwocriteria:thelinkagemethodandthecutoffthresholdusedtostopclustering.Cutoffthresholdplaysakeyroleinforgerydetectionanditcriticallyaffectstheresults.Often,theproblemofcuttingoffadendrogramhasbeenadifficultissueinhierarchicalclusteringresearches(Abeetal.,2017).Usually,settingcutoffthresholdrequiresmanyexperimentsandoptimizationasin(Amerinietal.,2011).Moreover,determiningcutoffthresholdissotrickybecauseofdifferentsizeofimagesanddifferentsizeofforgedregions.Tohandlethisissue,theproposedmethodusesdynamiccutoff,whichautomaticallyterminatestheclusteringprocessonceoptimalnumberofclustersareobtained.Itestimatesoptimalnumberofclustersbasedoninternalclustervaliditymeasures.Generally,clustervalidationisusedtoverifythatanyfoundclusterisreallyindataratherthanbeingproducedfromalgorithmsartifacts(Everittetal.,2011).Clustervalidationcanbedividedintotwomainmethods:Internal and external (Halkidi et al., 2002). Internal methods measure cluster quality based oninter-clusterseparationandintra-clustercompactness(cohesion).Inourwork,twocommonlyusedinternalmethodsforAHCareintroduced:gapstatistics(Tibshirani,2001)andsilhouettewidth(Gan,2007).Theproposedmethodiscomparedusinggapstatisticandsilhouettewidth(seesections4.3and4.4foradetaileddescriptionofsuchexperiment).Mathematicaldetailsforbothgapstatisticandsilhouettecoefficientaregiveninsection3.4.1and3.4.2respectively.Forthelinkagemethod,the“Ward”linkageisusedintheproposedmethodasitgivesthebestresults(Amerinietal.,2011).

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

9

Ward’smethodsupposesthataclusterisrepresentedbyitscentroid.TheincreaseofSumofSquaresError(SSE)thatresultsfrommergingtwoclustersisusedbyWardmethodtomeasuretheproximitybetweentwoclusters.SumofsquaresinAHCstartsatzero(everypointisinitsownsingletoncluster)andgrowsasmergingtheclusters.Ward’smethodattemptstominimizeSSEdistancesofpointsfromtheirclustercentroidstokeepthisgrowthassmallaspossible.

Giventwoclusters,AandB,thedistancebetweenthemgivenby(Amerinietal.,2011):

∆ ( ) = ( )− ( )+ ( )

dist

A B SSE AB SSE A SSE B, (3)

where:

SSE A x xi

n

Ai A

A

( ) = −=∑

1

2 (4)

where∆ iscalledthemergingcostofcombiningtheclustersAandB,nA

isnumberofpointsinclusterA,x

AiindicatestheithpointinclusterA,andx

Aisthecentroid.

3.4.1. Gap StatisticThegapstatistic(Tibshiranietal.,2001)isusedtoestimatetheoptimalnumberofclusters.Theideaisthatfordifferentvaluesofkclusters,thechangesofthetotalwithinintra-clusterarecomparedwithitsexpectationunderanullhypothesis(i.e.adistributionwithnoclustering).Valueofkthatmaximizesthegapstatisticrepresentstheestimationoftheoptimalnumberofclusters.Thismeansthatthestructureofclusteringismuchfarfromtherandomuniformdistributionofpoints.Therearetwochoicesofreferencedistributioneitherbasedonuniformdistributionorbasedonprincipalcomponentanalysis(moredetailsin(Tibshiranietal.,2001).Forsimplicity,intheproposedmethod,uniformreferencedistributionisused.Thefollowingarehowgapstatisticsworkbriefly:

1. ClusterthedataintokclusterswithC C C Cr k= … 1 2

, , ., denotingtheindicesofdatainclusterr;

Computethepairwisedistanceforallpointsinclusterrsuchthat:

D dr

i i Cii

r

=′∈∑,

’ (5)

Computethetotalwithinintra-clustersuchthat:

wnD

kr

k

rr

==∑

1

1

2 (6)

4. Breferencedatasetsaregeneratedbasedonareferencedistributionmethod.Theneveryofthosereferencedatasetsisclusteredintokclusters;

5. Thecorresponding totalwithin intra-clustervariationW b B k Kkb* , , , ., , , , ,= … = …1 2 1 2 is

computed;Theestimatedgapstatisticiscomputed:

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

10

Gap kB

W wb

B

kb k( ) = ( )− ( )=∑1

1

*log log (7)

Computethestandarddeviationsdk

anddefinethestandarderrorsk

:

sdB

W lk

bkb

=

( )−

∑1 2

1

2

*log (8)

s sdBk k

= +

11 (9)

where:

lB

Wb

kb=

( )∑1

log * (10)

8. Finally,thesmallestvalueofkisselectedastheoptimaltheoptimalnumberofclusters k inwhichthegapstatisticiswithinonestandarddeviationofthegapatk+1:

k smallest k such that Gap k Gap k sk

= ( ) ≥ +( )− + , 11

3.4.2. Silhouette WidthTheSilhouettewidth(Gan,2007)canbeusedtorepresentthecompactnessandseparationofclusters.Itmeasurestheclosenessbetweeneverypointinoneclustertothepointswithintheneighboringclusters.Fordifferentvaluesofk,theaverageSilhouetteofobservationsiscomputed.Suchthat,theonethatmaximizestheaverageSilhouetteoverallpotentialkvaluesrepresenttheoptimalnumberofclustersk.Supposepointx

ibelongstoclusterk.First,lettheaveragedistancebetweenthispoint

andallothersinthesameclusterrepresentedbyai.Second,lettheaveragedistancebetweenthis

pointandthoseincluster l k≠ representedby dli

.Lettheaveragedistancebetween xiandthe

nearestclusterofwhichitisnotamemberisdenotedby,b min di l l= .Therefore,theSilhouettewidth

isgivenby:

s xb ak

a bii i

i i

( ) = −

max , (11)

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

11

Largervalueofs xi( ) meansthatx

iliesbetterinitscluster.Entireclusteringstructureismeasured

by the average Silhouette of all data points (also called Silhouette Width Criterion SWC). BestclusteringstructurecorrespondstomaximizedSWC.Itisgivenby:

SWCN

s xi

N

i= ( )

=∑

1

1

(12)

3.5. Affine Transform EstimationPreliminarily,weknowthatimageistamperedandwherethesourceregionandtarget(copy-moved)region. Since copy-move regions undergo geometric distortions (such as scaling, rotation), therelationship between tampered regions is estimated using affine transformation specified by atransformationmatrixH.Giventhecoordinateoftwocorrespondingmatchedpointsfromaregion

anditsduplicateasX x yT

= ( ), and X x y T= ( , ) ,respectively.Thegeometricrelationshipbetweenthesetworegionsisexpressedas:

X HX= (13)

Itcanbeexpressedinmatrixformas:

x

y

a a t

a a tx

y

1 0 0

11 12

21 22

=

1 1 1

=

x

y H

x

y

(14)

wheretheparametersa11

,a12

,a21

,a22

,areassociatedwithscalingandrotation,while tx

, ty

areassociated with translation. Since affine transformation has six degrees of freedom (six matrixparameters),thereisaneedforatleastthreematchedpairsnottobecollineartoobtaintheaffinetransformation.Givenasetofcorrespondences X X

n1, ,…( ) and X X

n1, .,…( ) ,thetransformation

matrixHcanbecomputedbymeansofminimizingthefollowingtotalerrorfunction:

i

n

i iX HX

=∑ −

1

2 (15)

Mismatchedpointsoroutlierscansolelyaffect theestimatedhomographyH.Inthatcase,awidelyusedmethodforrobustestimationcalledRandomSampleConsensus(RANSAC)(Fischler&Bolles,1981)isemployedtoperformthepreviousestimationmoreaccurately.ThismethodisalsoadoptedbysomeotherCMFDtechniques(Amerinietal.,2011;Dadaetal.,2016).RANSACrandomlyselectsthreepairsofpointsormorefromthematchedpointsandestimatesthehomographymatrixHbyminimizingthetotalerrorfunctiongiveninequation(15).Then,accordingtoH,alltheremainingpointsaretransformed.Afterthat,allpairsofmatchedpointsareclassifiedintoinliersoroutliersdependingontheestimatedmatrixH.Apairofmatchedpointsconsideredasinlierif X HX− ≤ β ,otherwise, it is outlier. After N

i times of iteration, RANSAC returns the estimated transform

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

12

parametersthatresultinthebiggestvarietyofinliers.Inourexperiment,β issetto3andNiisset

to1000.Finally,asetofaffinetransformations H Hn1

, ,… areacquired.EstimationofaffinetransformationisanimportantstageofCMFDscheme.Falselydetected

regions thatdonothaveasetofpointswithuniformtransformrelationship,willbe removed inthisstage.Moreover,itenhancesthedetectionofcopy-moveforgerybyprovidingthemoredetailsabouttamperedimage.Inliterature,mostofrecentCMFDmethodschoosetoestimategeometrictransformations(Amerinietal.,2011;Shivakumar&Baboo,2011;Dadaetal.,2016;Hashmietal.,2014;Lietal.,2015).

3.6. Tampered Regions LocalizationOnceanimageisconsideredastampered,duplicatedregionscanbeaccuratelylocalized.WiththeestimatedaffinetransformationH,identicalregionsarefoundbycomparingeachpixelintheimagewithitstransformationalcounterpart.Intheproposedmethod,alocalizationmethodisusedasin(Yangetal.,2017).Localizingtamperedregionsisillustratedinthefollowingsteps:

1. AllpointsintheimagearetransformedforwardwiththematrixH andbackwardwiththeinversematrixH −1 letoriginalregionRo anditsrelatedtamperedregionRF thentherelationbetweenthesetworegionsintermsoftransformationmatrixis:

R HR R H RF o o F= = −, 1

Tolocalizethetamperedregions,thesimilaritybetweentheoriginalimageIandtransformed(warped)imageTismeasuredusingZeroMeanNormalizedCross-Correlation(ZNCC).Letpixelintensitiesatlocationxdenotedas I x( ) andT x( ) ,then:

ZNCC xI v I T v T

I v I T v T

Zv x

v x

( ) =( )−( ) ( )−( )

( )−( ) ( )−( )∈ ( )

∈ ( )

Ω

Ω

2 2, NNCC ∈ ( )0 1, (16)

whereΩ x( ) a7×7pixelsneighboringareacenteredatlocation x . I v( ) andT v( ) arethepixelsintensitiesatlocation v . I andT aretheaveragepixelintensitiesofIandTcomputedatΩ x( ) .LargervalueofZNCCindicateshighsimilarity.

2. Correlationmapisnowobtained,toreducenoise,Gaussianfilterofsize7pixelsisapplied.Then,itisconvertedtobinaryimagewithathresholdvalue(th=0.55inourexperiments);

3. Finally,morphologicaloperationisappliedtofilltheholesinthebinaryimage.

4. EXPERIMENTAL RESULTS

Inthissection,theperformanceoftheproposeddetectionmethodisevaluated.Section4.1describesthetestimagedatasetusedinourexperiments.Errormeasuresareintroducedinsection4.2.ResultsonMICC-F220andbenchmarkdatasetsareillustratedinsection4.3and4.4,respectively.Acomparisonoftheproposedmethodandotherrelatedmethodsisgiveninsection4.5.Insection4.6,therobustness

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

13

oftheproposedmethodtoprocessingoperationsisshown.Finally,section4.7showstheabilityofproposedmethod todetectmultiple copy-move forgeries.Allmeasurements areperformedonadesktopcomputerwithIntelCorei51.7GHzCPUand4GBRAMmemory,runningMatlabR2016b.

4.1. Test Image DatasetInourexperiments,twopubliclyavailabledatasetsareused.BotharethemostwidelyuseddatasetsbyCMFDmethods(Bakiahetal.,2016).ThefirstdatasetisMICC-F220introducedbyAmerini(Amerinietal.,2011).Ittotallyconsistsof220images:110areoriginalimagesand110aretampered.Theimageresolutionrangesfrom722×480to800×600pixels.About1.2%oftheentireimageiscoveredbyaforgedregion.Theprocessingoperationinthisdatasetonlylimitedtotranslation,scalingandrotation.Thegroundtruthofthisdatasetisnotgiven.Theseconddatasetusedbenchmarkdatasetcreatedby(Christleinetal.,2012).Thisdatasetisconsistingof48basicimagesandtheirtransformedimages,suchasrotation,scaling,JPEGcompressionandadditivenoise.ParametersappliedoneachattackisgiveninTable1.Imagesofthisdatasetarequitelarge,itsaveragesizeisabout3000×2300pixels.About10%oftheentireimageiscoveredbyaforgedregion.Inthisdataset,theduplicatedregionsaremeaningful,whicharecategorizedaseither:rough(e.g.,rocks),smooth(e.g.,sky),orstructured(Christleinetal.,2012).Thisdatasetisprovidedwithagroundtruth.Table2illustratesthedetailsabouteachdataset.

Inourexperiments,totally1756imageshavebeentested.Thereare220imagesfromMICC-F220dataset.FromBenchmarkingdataset: (1) 48original images, (2)Plain copy-move: 48 tamperedimageswithoutanyprocessingoperationsapplied.(3)Rotation:theduplicatedregionsarerotatedbyanglevaryingfrom2°to10°withsteplength2°.Thismeansthattherearetotally48×5=240images.(4)Scaling: theduplicatedregionsarerescaledwithratiobetween91%and109%of its

Table 1. Setting of attacks parameters

Attacks Parameters

Rotation Angle(2°:2°:10°)

Scaling Ratio(0.91:0.02:1.09)

JPEGCompression QualityFactor(20:10:100)

Noiseaddition Deviation(0.02:0.02:0.1)

Table 2. Details about image datasets used

Dataset Image Size Total Images Processing Operations Ground Truth

MICC-F220 722×480to800×600

Total:220 -Translation-Rotation-Scaling(symmetric,asymmetric)-Combinedtransformation

No

Original110

Tampered110

BenchmarkCMFD 420×300to3888×2592

Original48 -Rotation-Scaling-JPEGCompression-Noise-Combinedtransformation

Yes

Tampered-48(plainCMF)-240(Rotated)-480(Scaled)-423(JPEG)-240(Noise)

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

14

originalsize,withsteplength2%.Thatresultsin48×10=480images.(5)JPEGCompression:Theforgedimagesarecompressedwithqualityfactorsvaryingfrom20to100withsteplength20.Inthiscase,thereare84×9=432images.(6)Noiseaddition:noiseisaddedtoduplicatedregionswithstandarddeviationvaryingfrom0.02to0.1withsteplength0.02.Therefore,totallythereare240images.(7)MultipleCopies:foreachofthe48images,ablocksizeof64×64pixelsareselectedandrandomlycopiedfivetimes.

4.2. Error MeasuresTotesttheperformanceofourCMFDmethod,wefollowtheapproachpresentedby(Christleinetal.,2012).PerformanceofCMFDschemeisevaluatedattwolevels:(1)Imagelevel:theabilitytodetermineiftheimagehasbeentamperedornot.(2)Pixellevel:theabilitytolocalizetamperedregionscorrectly.CommonlyusedevaluationmetricsinCMFDtocalculatetheaccuracyareTruePositiveRatio(TPR)andFalsePositiveRatio(FPR)seeTable3.ACMFDtechniqueisefficientifitmaintainsahighTPRwhiletheFPRattheminimumlevel.ThecalculationforTPR(recall),FPRandprecisiongivenas:

Precision =+TP

TP FP (17)

Recall =+TP

TP FN (18)

FPR =+FP

FP TN (19)

Intheproposedmethod,ifatleastoneaffinetransformationisestimatedbetweenforgedimageregions,animageistakenintoaccountasforged.Inaddition,F1scoreisusedasanevaluationmetric(Christleinetal.,2012),whichcombinesprecisionandrecallintoasinglevalue:

F12= ⋅

⋅+

precision recall

precision recall

(20)

Table 3. Evaluation measures description

Measures Description

TP(Truepositive) Correctlydetectedforgedimages

TN(TrueNegative) Correctlydetectedoriginalimages

FP(Falsepositive) Originalimagesfalselydetectedasforged

FN(FalseNegative) Falselymissedforgedimages

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

15

Atpixellevel,thesameevaluationmetricsareusedwhereTPrepresentsnumberofpixelsthatarecorrectlydetectedasforged,FPnumberofpixelsthatarefalselydetectedasforgedandFNshowsfalselymissedforgedpixels.

4.3. Results on MICC-F220 DatasetItworthtonoticethat imagesizeandforgedregionsizeplayanimportantrolewhenevaluatingforgerydetectionmethod.Inthisdataset,imagesizesarenotsolarge.Asmentionedearlier,imagessizeisvaryingfrom722×480to800×600.Theforgedregionsaresmallcomparedtothewholeimage.Itwasfoundthatthesmallerthesizeoftheimageandtheforgedregionis,thelesserthenumberoffeaturesandmatchedpointsarefound.Inourmatchingmethod,thethresholdvalueissetto0.5thatyieldsthebestresults.Lowerthresholdvaluewillresultinsmallmatchedpointswhilelargervalueswillresultinhighfalsematches.Fortheclustervaliditymeasures,bothgapstatisticsandSilhouettewidthareapplied,andtheresultsarecompared.Thekvalue(clusterslist)isvaryingfrom1to7.LargerangeofkwillresultinlowTPRandhighFPRbecauseclustersizesdependhighlyonnumberofmatchedpoints.SilhouettewidthproducesthebestresultsintermsofTPRandFPRasillustratedinTable4.

4.4. Results on Benchmark DatasetAs previously mentioned, the image size of this dataset is relatively large. Its average size is3000×2300.Largeimagesaremorechallenging,sinceanoverallhighernumberoffeaturevectorsexists,andthusthereisahigherprobabilityoffalsepositives.Inthiscase,thematchingthresholdissetto0.4.Toavoidhighfalsepositives,thevalueofkissetfrom1to20.Theabilityoftheproposedmethodisexaminedindifferentcases:plaincopy-moveforgery,robustnesstoprocessingoperations:rotation, scaling, JPEGcompressionandnoise additionand inmultiplepasted regions.WealsocompareresultsusinggapstatisticsandSilhouettewidth.Inthisdataset,gapstatisticsrepresentthebestresults.Figure5showssomeexamplesoftamperedimagesdetectedbytheproposedmethod.ThelocalizationdetectionresultsonthetestimageswithCMFregionsfusedinthebackgroundareillustratedinFigure6.TheproposedmethoddetectsmostCMFregions.

4.5. Comparisons with other Relevant MethodsToverify theperformanceof theproposedmethod,our results is comparedwithvarious recentkeypoint-basedmethods.AllcomparisonsaremadeusingMICC-F220datasetandthebenchmarkdataset.Themethodsusedforcomparisoninclude:panandLyu(Pan&Lyu,2010),Amerirniet.al(Amerinietal.,2011),ShivakumarandBaboo(B.L.Shivakumar&Baboo,2011),Liet.al(Lietal.,2015),Hashmiet.al(Hashmietal.,2014),andYangetal.(Yangetal.,2017).UsingMICC-F220dataset,theproposedmethodiscomparedwithmethodsin(Amerinietal.,2011)and(Hashmietal.,2014).Theproposedmethodandthemethodpresentedin(Amerinietal.,2011)achievebestTPR,buttheproposedmethodachieveslessFPRasshowninTable5.Theprocessingtimeoftheproposedmethod(perimage)onaverageabout0.43seconds,whereastheothertwomethodstakeabout4.94and2secondsrespectively.Usingbenchmarkingdataset,theperformanceincaseofplaincopy-moveforgeryisevaluated.Insuchcase,48originalimagesand48forgedimagesareusedwithoutapplyinganyprocessingoperations.Inthiscase,theCMFDmethodsmustdifferentiatewhethertheimagehas

Table 4. TPR and FPR values on MICC-F220 with respect to cluster evaluation method

Method TPR FPR

Gapstatistics 95.45% 8.18%

Silhouettewidth 100% 6.36%

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

16

beentamperedornot.Detectionresultsofplaincopy-moveforgeryareshowninTable6.Notethattheproposedmethodandthemethodpresentedin(Lietal.,2015)obtainthebestresultsintermsofrecallrate,100%and97.72%,respectively.Forprecisionrate,theproposedmethodobtainsthebestresultreachingto90.9%,followedbythemethodpresentedin(Hashmietal.,2014)of88.89%.Inaddition to theprecisionand recall, theproposedmethodachieves thebest F

1 scoreof95.23%

comparedtoothermethods.Inconclusion,theproposedmethodobtainsthebestresultsintermsofrecall,precisionandF

1corecomparedtoothermethods.

4.6. Robustness to Processing operationsTheproposedmethodhasadditionallybeentestedintermsofdetectionperformancefromarobustnesspointofview;theimpactofrotation,scaling,JPEGcompressionandnoiseadditionusingbenchmarkdataset(Christleinetal.,2012):

1. Robustness to gaussian noise:Imageintensitiesbetween0and1arenormalizedandzero-meanGaussiannoisewithstandarddeviationsof0.02,0.04,0.06,0.08and0.10isadded.Itcanbenoticedthatasstandarddeviationincreasedthefalsenegativesalsoincreasedandtruepositivesdecreased.Inaddition,astandarddeviationof0.10leadstoobviouslyvisibleartifacts(Christleinetal.,2012).Generally,theproposedmethodmaintainshighTPRasillustratedinTable7;

2. Robustness to JPEG compression artifacts:Thequalityfactorsvariedbetween100and20withsteplength10.ThesameJPEGcompressionappliedto48forgeriesperqualitylevel.Thevisualqualityoftheimageishighlyaffectedwhenthequalityfactorisverylow.Forreal-worldforgeries,qualitylevelsdownatleast70areconsideredasacceptableassumptions(Christleinet al., 2012).TPR tends to slightly reducewhen imagequalitydecreases.ProposedmethodmaintainshighTPRasillustratedinTable7;

Figure 5. Examples of tampered images detected by proposed method

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

17

Figure 6. Detection results on the images with copy-move forgery regions combined in the background. The first column represents the test images from benchmark dataset. The second column represents the ground truth of the copy-move forgery regions in these images. The third column shows the localization detection results of the proposed method.

Table 5. Detection results on MICC-F220 dataset and processing time (average, per image)

Method TPR (%) FPR (%) Time (s)

Amerirniet.al(2011) 100 8 4.94

Hashmiet.al(2014) 80 10 2

Proposed 100 6.36 0.43

Table 6. Detection results of the plain copy-move forgery on benchmark dataset

Methods Precision (%) Recall (%) F1

PanandLyu(2010) 80.49 68.75 74.15

Amerirniet.al(2011) 88.4 79.2 79.2

ShivakumarandBaboo(2011)

77.27 70.83 73.90

Liet.al(2014) 70.17 83.33 76.19

Hashmiet.al(2014) 88.89 80 84.21

Yanget.al(2017) 78.33 97.92 87.04

Proposed 90.90 100 95.23

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

18

3. Robustness to rotation and scaling:FlexibilityofCMFDalgorithmstoaffinetransformations,likescalingandrotationisveryimportant.Theproposedmethodshowsstronginvariancetoscalingandrotation.Performanceoftheproposedremainsrelativelystableacrossthewholescalingparametersanddifferentrotationangles.Proposedmethodresultsin100%TPRforbothscalingandrotationseeTable7.

4.7. Detection of Multiple CopiesIntheproposedmethod,weaddressthedetectionofmultiplecopiesofthesameregion.Asmorecombinationsofmatchedregions,thechanceoffalsematchincreases.Table8illustratesthedetectionresultsofourproposedmethodofmultiplecopiesintermsofTPRusingdifferentthresholdvalues.Generally,theperformancewilldecrease,mainlysincetherandomchoiceofsmallblockstypicallyyieldsregionswithonlyafewmatchedkeypoints(Christleinetal.,2012).

5. CoNCLUSIoN

Detectingcopy-moveforgeryisachallengingtask.DependenceofempiricalthresholdsinexistingCMFD techniques is a critical issue. This paper proposes a CMFD method that automaticallydeterminestheoptimalnumberofclustersusinginternalclustervaliditymeasures.ThemethodusesGapStatisticandSilhouettewidthforautomaticcutoffthresholdinAHC.TheproposedmethodhasbeenevaluatedusingtwopubliclyavailabledatasetsMICC-F220andBenchmarkdataset.Experimentalresultsexhibitthattheproposedmethodyieldshigherprecisionandrecall,andlowerfalsenegativevaluescomparedtoalternativesimilarworkswithintheliterature.Italsoshowsrobustnessagainstdifferentattackssuchasscaling,rotation,JPEGcompression,andadditivenoise.Futureworkcanbeextendedtoworkonvideosandhandleothertypesofimageforgerylikeimagesplicing.

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

19

Table 7. Robustness to processing operations

Attacks Total Number of Images Parameters TP Recall (%)

Rotation(angle)

2402°

48 100

4° 48

6° 48

8° 48

10° 48

Scaling(ratio)

480 0.91 48 100

0.93 48

0.95 48

0.97 48

0.99 48

1.01 48

1.03 48

1.05 48

1.07 48

1.09 48

JPEG Compression(qualityfactor)

432 20 45 98.61

30 47

40 46

50 48

60 48

70 48

80 48

90 48

100 48

Noise(standarddeviation)

240 0.02 47 96.67

0.04 47

0.06 46

0.08 46

0.10 46

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

20

Table 8. Results for multiple copy-move forgery

Threshold TPR (%)

0.4 93.75

0.5 95.83

0.6 97.92

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

21

REFERENCES

Abe,R.,Miyamoto,S.,&Hamasuna,Y.(2017).HierarchicalClusteringAlgorithmswithAutomaticEstimationofTheNumberofClusters.Proceedings of the17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems (IFSA-SCIS)(pp.1–5).AcademicPress.doi:10.1109/IFSA-SCIS.2017.8023241

Al-Qershi,O.M.,&Khoo,B.E.(2013).PassiveDetectionofCopy-MoveForgeryinDigitalImages:State-Of-The-Art.Forensic Science International,231(1–3),284–295.doi:10.1016/j.forsciint.2013.05.027PMID:23890651

Amerini,I.,Ballan,L.,Caldelli,R.,DelBimbo,A.,DelTongo,L.,&Serra,G.(2013).Copy-MoveForgeryDetection and Localization by Means of Robust Clustering With J-Linkage. Signal Processing Image Communication,28(6),659–669.doi:10.1016/j.image.2013.03.006

Amerini,I.,Ballan,L.,Caldelli,R.,DelBimbo,A.,&Serra,G.(2011).ASIFT-BasedForensicMethodforCopy-MoveAttackDetectionandTransformationRecovery.IEEE Transactions on Information Forensics and Security,6(3Part2),1099–1110.doi:10.1109/TIFS.2011.2129512

Bakiah,N.,Warif,A.,Wahid,A.,Wahab,A.,Yamani,M.,Idris,I.,&Shamshirband,S.et al.(2016).Copy-moveforgerydetection:Survey,challengesandfuturedirections.Journal of Network and Computer Applications,75,259–278.doi:10.1016/j.jnca.2016.09.008

Cao,Y.,Gao,T.,Fan,L.,&Yang,Q.(2011a).ARobustDetectionAlgorithmforCopy-MoveForgeryinDigitalImages.Forensic Science International,214(1–3),33–43.PMID:21813252

Cao,Y.,Gao,T.,Fan,L.,&Yang,Q.(2011b).ARobustDetectionAlgorithmforRegionDuplicationinDigitalImages.International Journal of Digital Content Technology and Its Applications,5(6),95–103.doi:10.4156/jdcta.vol5.issue6.12

Christlein,V.,Riess,C.,&Angelopoulou,E.(2010).AStudyonFeaturesfortheDetectionofCopy-MoveForgeries.Sicherheit,105–116.

Christlein,V.,Riess,C.,Jordan,J.,Riess,C.,&Angelopoulou,E.(2012).AnEvaluationofPopularCopy-MoveForgeryDetectionApproaches.IEEE Transactions on Information Forensics and Security,7(6),1841–1854.doi:10.1109/TIFS.2012.2218597

Dada,A.,Dharaskar,R.V.,&Thakare,V.M.(2016).ASurveyonKeypointBasedCopy-PasteForgeryDetectionTechniques.Procedia Computer Science,78,61–67.doi:10.1016/j.procs.2016.02.011

Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster Analysis (5th ed.). London: Wiley.doi:10.1002/9780470977811

Farid,H.(2006).ExposingDigitalForgeriesinScientificImages.Proceedings of the8th Workshop on Multimedia and Security(pp.29-36).ACM.

Fischler, M., & Bolles, R. C. (1981). Random Sample Consensus: A paradigm for Model Fitting withApplicationstoImageAnalysisandAutomatedCartography.Communications of the ACM,24(6),381–395.doi:10.1145/358669.358692

Fridrich,J.,Soukal,D.,&Lukáš,J.(2003).DetectionofCopy-MoveForgeryinDigitalImages.Proceedings of theDigital Forensic Research Workshop.AcademicPress.

Gan,G.(2007).Data Clustering Theory, Algorithms, and Applications(Vol.20).Alexandria,Virginia:Siam.doi:10.1137/1.9780898718348

Halkidi,M.,Batistakis,Y.,&Vazirgiannis,M.(2002).ClusteringValidityCheckingMethods:PartII.ACM Special Interest Group on Management of Data,31(3),40–45.

Hashmi,M.F.,Anand,V.,&Keskar,A.G.(2014).Copy-moveImageForgeryDetectionUsinganEfficientandRobustMethodCombiningUn-decimatedWaveletTransformandScaleInvariantFeatureTransform.AASRI Procedia,9,84–91.doi:10.1016/j.aasri.2014.09.015

Hastie,T.,Tibshirani,R.,&Friedman,J.(2003).TheElementsofStatisticalLearningData(2nded.).Springer.

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

22

Hu,J.,Zhang,H.,Gao,Q.,&Huang,H.(2011).AnImprovedLexicographicalSortAlgorithmofCopy-MoveForgeryDetection.Proceedings of the2nd International Conference on Networking and Distributed Computing(pp.23–27).AcademicPress.doi:10.1109/ICNDC.2011.12

Junhong,Z.(2010).DetectionofCopy-MoveForgeryBasedonOneImprovedLLEmethod.Proceedings of the 2nd International Conference on Advanced Computer Control (Vol. 4, pp. 547–550). Academic Press.doi:10.1109/ICACC.2010.5486861

Katta,R.M.R.,&Patro,C.S.(2017).InfluenceofWebAttributesonConsumerPurchaseIntentions.International Journal of Sociotechnology and Knowledge Development,9(2),1–16.doi:10.4018/IJSKD.2017040101

Kumar,R.,&Sharma,H. (2008).ComparativeStudyofCLAHE,DSIHE&DHESchemes. International Journal of Research in Management,Science & Technology,1(1),1–4.

Li,J.,Li,X.,Yang,B.,&Sun,X.(2015).Segmentation-BasedImageCopy-MoveForgeryDetectionScheme.IEEE Transactions on Information Forensics and Security,10(3),507–518.doi:10.1109/TIFS.2014.2381872

Lin,X.,Li,J.H.,Wang,S.L.,Liew,A.W.C.,Cheng,F.,&Huang,X.S.(2018).RecentAdvancesinPassiveDigitalImageSecurityForensics:ABriefReview.Engineering,4(1),29–39.doi:10.1016/j.eng.2018.02.008

Liu,G.,Wang,J.,Lian,S.,&Wang,Z.(2010).Apassiveimageauthenticationschemefordetectingregion-duplication forgery with rotation. Journal of Network and Computer Applications, 34(5), 1557–1565.doi:10.1016/j.jnca.2010.09.001

Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision,60(2),91–110.doi:10.1023/B:VISI.0000029664.99615.94

Ma,J.,Fan,X.,Yang,S.X.,Zhang,X.,&Zhu,X.(2017).ContrastLimitedAdaptiveHistogramEqualizationBasedFusionforUnderwaterImageEnhancement.

Mahajan,P.,Delhi,N.,&Rana,A.(2018).SentimentClassification-HowtoQuantifyPublicEmotionsUsingTwitter.International Journal of Sociotechnology and Knowledge Development,10(1),57–71.doi:10.4018/IJSKD.2018010104

Mohamdian,Z.,&Pouyan,A.A.(2013).DetectionofDuplicationForgeryinDigitalImagesinUniformandNon-UniformRegions.Proceedings of the15th International Conference on Computer Modelling and Simulation(Vol.1,pp.455–460).AcademicPress.doi:10.1109/UKSim.2013.94

Muja,M.,&Lowe,D.G.(2009).FastApproximateNearestNeighborswithAutomaticAlgorithmConfiguration.Proceedings of theVISAPP International Conference on Computer Vision Theory and Applications(Vol.1,pp.331–340).AcademicPress.

Muliawaty,L.,Alamsyah,K.,Salamah,U.,&Maylawati,D.S.(2019).TheConceptofBigDatainBureaucraticServiceUsingSentimentAnalysis. International Journal of Sociotechnology and Knowledge Development,11(3),13.doi:10.4018/IJSKD.2019070101

Pan,X.,&Lyu,S.(2010).RegionDuplicationDetectionUsingImageFeatureMatching.IEEE Transactions on Information Forensics and Security,5(4),857–867.doi:10.1109/TIFS.2010.2078506

Qureshi,M.A.,&Deriche,M.(2015).ABibliographyofPixel-BasedBlindImageForgeryDetectionTechniques.Signal Processing: Image Communication, 39(PartA),46-74.

Redi,J.A.,Taktak,W.,&Dugelay,J.(2011).DigitalImageForensics:ABookletforBeginners.Multimedia Tools and Applications,51(1),133–162.doi:10.1007/s11042-010-0620-1

Sharma,S.,&Ghanekar,U.(2019).SplicedImageClassificationandTamperedRegionLocalizationUsingLocal Directional Pattern. International Journal of Image, Graphics and Signal Processing, 11(3), 35–42.doi:10.5815/ijigsp.2019.03.05

Shivakumar,B.L.,&Baboo,S.(2011).AutomatedForensicMethodforCopy-MoveForgeryDetectionbasedonHarrisInterestPointsandSIFTDescriptors.International Journal of Computers and Applications,27(3),9–17.doi:10.5120/3283-4472

International Journal of Sociotechnology and Knowledge DevelopmentVolume 12 • Issue 1 • January-March 2020

23

Aya Hegazi is a teaching assistant at faculty of computers and informatics, Benha University.

Ahmed Taha received his M.Sc. degree and his Ph.D. degree in computer science, at Ain Shams University, Egypt, in February 2009 and July 2015 respectively. He currently works as assistant professor at computer science department, Benha University, Egypt. His research interests are: computer vision and image processing (human behavior analysis - video surveillance systems), digital forensics (image forgery detection – document forgery detection), security (encryption – steganography – cloud computing), content-based retrieval (Arabic text retrieval - video scenes classification-video scenes retrieval – trademark image retrieval - closed-caption technology).

Mazen M. Selim received the BSc in Electrical Engineering in 1982, the MSc in 1987 and PhD in 1993 from Zagazig University (Benha Branch) in electrical and communication engineering. He is now a Professor at the faculty of computers and informatics, Benha University. His areas of interest are image processing, biometrics, sign language, content-based image retrieval (CBIR), face recognition, and watermarking.

Tibshirani,R.,Walther,G.,&Hastie,T. (2001).Estimating theNumberofClusters inADataSetVia theGap Statistic. Journal of the Royal Statistical Society. Series A, (Statistics in Society), 63(2), 411–423.doi:10.1111/1467-9868.00293

Tijdink,J.K.,Verbeke,R.,&Smulders,Y.M.(2014).PublicationPressureandScientificMisconductinMedicalScientists.Journal of Empirical Research on Human Research Ethics,9(5),64–71.doi:10.1177/1556264614552421PMID:25747691

Walia,S.,&Kumar,K. (2018).AnEagle-EyeViewofRecentDigital ImageForgeryDetectionMethods.Proceedings of the International Conference on Next Generation Computing Technologies (pp. 469–487).AcademicPress.doi:10.1007/978-981-10-8660-1_36

Yang,F.,Li,J.,Lu,W.,&Weng,J.(2017).Copy-MoveForgeryDetectionBasedonHybridFeatures.Engineering Applications of Artificial Intelligence,59,73–83.doi:10.1016/j.engappai.2016.12.022