mediaeval 2015 - the certh-unitn participation @ verifying multimedia use 2015

The CERTH-UNITN Participation @ Verifying Multimedia Use 2015 Christina Boididou1, Symeon Papadopoulos1, Duc-Tien Dang-Nguyen2, Giulia Boato2, and Yiannis Kompatsiaris1

MediaEval 2015 Workshop, Sept 14-15, 2015, Wurzen, Germany

This task is supported by the REVEAL EC FP7 Project.

1Information Technologies Institute (ITI), CERTH, Greece 2University of Trento, Italy

Overview

2

Approach Use of tweet-, user-based and forensics features

Supervised learning (SL) scheme

Semi-Supervised learning scheme called Agreement-based retraining technique (SSL-AR)

Aim Predict if a tweet that shares multimedia content is fake or real

Features Features used in the experiments

3

Feature Set Description

TB–base Baseline tweet-based

TB–ext Extended tweet-based

UB–base Baseline user-based

UB–ext Extended user-based

FOR Forensics

Types • Tweet-based: information coming from the tweet and its metadata • User-based: information and metadata about the user posting (or retweeting) the tweet • Multimedia forensics: based on the image that accompanies the tweet.

Sets • Baseline (base) set: Features shared by the task • Extended (ext) set: New features extracted • Forensics (FOR) set: Both distributed by the task and some additional ones

Additional Features

4

Tweet-based User-based Forensics

Contains word please Account age AJPG-BAG combined

Has external link Number of media content NAJPG-BAG combined

Number of slang words Shares location

Number of nouns Shares location that exists1

Readability2

Web Of Trust (WOT) score

In-degree centrality3

Harmonic centrality3

Alexa rankings

For the links

1Geonames dataset (http://download.geonames.org/export/) 2Flesch Reading Ease method, which computes the complexity of a piece of text as a score in the interval [0; 100] 3Common Crawl WWW Ranking (http://wwwranking.webdatacommons.org/more.html)

http://download.geonames.org/export/

http://wwwranking.webdatacommons.org/more.html

Additional Forensics Features

5

AJPG map Binary map

‘Object’

Mask BAG

AJPG-BAG

combined

‘Object’

features

‘Background’

features

thresholding

• NAJPG-BAG was combined in the same way from NAJPG and BAG features.

Agreement-based retraining method

6

• Make the initial model adaptable • Predict more accurately the values of the disagreed samples

Bagging

7

Training set

• N=9 • Equal number of samples from each class • Average result of numerous predictors

Submitted Runs

Run Learning Features

RUN-1 SL TB-base

RUN-2 SL TB-base + FOR

RUN-3 SSL-AR (TB-base + FOR) + UB-base

RUN-4 SL TB-ext + UB-ext + FOR

RUN-5 SSL-AR (TB-ext + FOR) + UB-ext

8

• RUN1, RUN2 & RUN4 plain classification model • RUN3 & RUN5 agreement-based retraining technique

• Random Forest classifier used for all models

CL1 CL2

SL: Supervised Learning SSL-AR: Semi-supervised-Learning – Agreement Retraining

Results

Runs Recall Precision F-score

RUN-1 0.794 0.733 0.762

RUN-2 0.749 0.994 0.854

RUN-3 0.922 0.736 0.819

RUN-4 0.798 0.860 0.828

RUN-5 0.969 0.861 0.911

9

A. RUN5 achieved the best score B. Use of SSL-AR technique improves the performance a lot C. RUN2 better than RUN1 -> FOR features contribution D. RUN3 & RUN5 comparison -> ext features’ contribution

A B

C

D

Features

TB-base

TB-base + FOR

(TB-base + FOR) + UB-base

TB-ext + UB-ext + FOR

(TB-ext + FOR) + UB-ext

Examples

Fake example classified as real

10

Fake example classified as fake

Conclusions / Future Work

Features

• ext features perform better than base ones

• FOR features improve performance

Agreement-based retraining technique

• improves accuracy

• adapts to the new data

• requires a number of test samples to be applied

Future Ideas

• Experiment with other set of features

• Perform feature selection

• Adapt the method to be applied with fewer samples

11

Questions

12

Thank you for your attention!

mediaeval 2015 - the certh-unitn participation @ verifying multimedia use 2015

Education