thesis proposals ideas for research from social bots to beer · personalized proﬁle preferences...

Florian Daniel [email protected]

Thesis proposals Ideas for research from social bots to beer

February 25, 2019

Detection of harmful social bots

Identification of harmful communication patterns

Conversational screen readers

Domain-specific content extraction

(Social) Bot = algorithmically driven entity that behaves like a human in online communications

Talk with user

Redirect user

Write post

Comment post

Forward post

Like message

Follow user

Create user

Chat

Post

Endorse

Action

Participate

Abuse

Not regulated

Regulatedby law

Denigrate10

Be grossly offensive

3

Be indecent or obscene

4

Be threatening2

Disclose sensitive facts

1

Make false allegations

6

Deceive

Invade space

Spread misinformation

Clone profile

Mimic interest

AASlangBoostJuice

Geico

SethRich

Puma

Oreo

MSTay

DeathThreat

eCommerce, CustomerSvc

WiseShibe

DatingIvana

Trump, ColludingBots

PolarBotInstagres

InstaClone, JasonSlotkin6

SMSsex

Spam

Spam

Examples

F. Daniel, C. Cappiello, B. Benatallah. Bots Acting Like Humans: Understanding and Preventing Harm. IEEE Internet Computing, 2019, accepted for publication. https://ieeexplore.ieee.org/document/8611348

Empirical study shows: bots may cause harm to humans

What bots doWhen abuse happens

What else can they do?

What else can go wrong?

Account

Tweets

Detection of harmful social bots

Harm1 HumanHarm2 HarmN

How do we identify and classify bots according to the harm they may cause?

What has been done so far?

Not Safe For Work

Profile picture stolen from existing people (usually women)

Personalized profile preferences aimed at emulating actual users

Catchy profile description paired with unsafe URL

Tweets aim to redirect users to untrusted URLs

Lorenzo Cannone, Matteo Di Pierro. Detection and Classification of Harmful Bots in Human-Bot Interactions on Twitter. MSc Thesis, Politecnico di Milano, December 2018.

News-Spreader

High number of interactionsPropaganda hashtags and messages in description

Lots of (political) news retweeting

Propagandistic profile picture and tile


Spam-Bot

High number of tweets with high frequency

Corporate link in description paired with messages of job offers (product sales)

Repetitive tweets aimed at spamming URLs

Profile picture with corporate logo

Profile tile with catchy messages


Fake-Follower

Low number of content posting (usually null) Following as main interaction

Poor description (usually empty)

Optional (re)tweeting activity

Optional profile picture (usually missing)

Poor profile customization


Thesis ingredients

Creation of datasets

Feature selection and engineering

Multi-class ensemble classifier

Figure 7.3: Client - Server architecture

Inception neural network. Users with less than 10 tweets, or less than 10

tweets with embedded images, require a shorter timespan to be classified,

which is, in average up to 5 seconds. Figure 7.4 shows the repartition of

computational time, along with the processes involved. It is based on the

timespan needed to classify users with at least 100 tweets and with at least

10 tweets with images.

7.2.1 Engine

The engine of the web application is a Python 3 script. It performs all the

steps described in the pipeline execution section 6.2. The models, that have

been previously fitted with data, serialized and stored, are now loaded by

the Python script. They have to perform a single prediction at a time. In

addition to the models we built for the classification, the pre-trained convo-

lutional neural network for NSFW recognition has been introduced to the

pipeline, in order to infer on the media contents posted by the examined

user. The first step consists in calling the Twitter APIs to retrieve user’s

data and its most recent tweets, up to 100. The script, then, handles the

109

Web appCan we add more types of harm?How to train classes as users use BotBuster?

Which potential harms can we identifyinside the code of the bots?

Bot code repositories

GitHub

Identification of harmful communication patterns

Harmful code patterns

Next: patterns search engine + web site for users

12 A. Millimaggi and F. Daniel

Follow

Like

Tweet

Mention

Retweet

Talk to

Pause

Store

Indiscriminate follow

Whitelist-based follow

Blacklist-based follow

Phantom follow

Indiscriminate like

Whitelist-based like

Blacklist-based like

Mass like

Fixed-content tweet

AI-generated tweet

Trusted source tweet

Indiscriminate mention

Opt-in mention

Targeted mention

Whitelist-based mention

Blacklist-based mention

Indiscriminate retweet

Whitelist-based retweet

Blacklist-based retweet

Mass retweet

Indiscriminate talk

Fixed-content talk

AI-generated talk

Talk with opt-in

Targeted talk

Mimic human

Satisfy API contraints

Store persistently

Action Pattern Inva

de s

pace

Disc

lose

sens

itive

fact

s

Denig

rate

Be g

ross

ly of

fens

ive

Be in

dece

nt o

r obs

cene

Be th

reat

ening

Mak

e fa

lse a

llega

tions

Dece

iveSp

am

Spre

ad m

isinf

orm

ation

Mim

ic int

eres

tCl

one

profi

le

Enables Prevents Vulnerable to content abuse Vulnerable to trust abuse

Abuse

Fig. 3: Potential e↵ects of actions and patterns on the users in online communi-cations: patterns either enable, prevent or are vulnerable to abuses. For example,following an account with a denigrating or o↵ending username may perpetuateand endorse the denigration or o↵ense.

Andrea Millimaggi and Florian Daniel. On Social Bots Behaving Badly: Empirical Study of Code Patterns on GitHub. Submitted for publication to ICWE 2019, under review.

Python

Twitter

What else can we learn from the code of bots? Is it possible to trace back from messages to code?

Conversational screen readers

Amazon Alexa

Google Assistant

Can we extract conversational knowledge from webpages? What about “talking” to websites?

Domain-specific content extraction

How effectively can we extract recipes from free text? And how do we do it?

Typical data science steps

Domain understanding

Data collection

Manual inspection of data

Hypotheses formulation

Feature engineering, data labeling

Algorithm engineering (from AI/machine learning to statistics)

Validation and hypotheses verificationOnline tool

http://www.floriandaniel.it

Soft skills

Proposals

Template

Florian Daniel

thesis proposals ideas for research from social bots to beer · personalized proﬁle preferences...

Documents