ins 10 feb_2017

8
Data Engineering 17 Feb 2017 Twitchcrement Demonstration Twitchcrement Making Spam fun for everyone Keith Rose 17 Feb 2017 Twitchcrement Demo

Upload: keith-rose

Post on 11-Apr-2017

13 views

Category:

Data & Analytics


0 download

TRANSCRIPT

DQM Update

17 Feb 2017Twitchcrement DemonstrationTwitchcrement

Making Spam fun for everyoneKeith Rose

17 Feb 2017

Twitchcrement Demo

Data Engineering

1

Twitch.tv17 Feb 2017Twitchcrement Demonstration

Online streaming service focused on video games

81st in worldwide web traffic

4th peak North American traffic

2Driven by subscriptions and advertisements

Strength: User InteractionConsequence: SPAM!

2

Characterize spam/spammers17 Feb 2017Twitchcrement Demonstration3Does spam show up in multiple channels? (Related networks)

Is it the same user spamming over and over again? (Probably bot)

Are otherwise inactive users spamming? (Might be more responsive to user appeals)

3

Data Flow17 Feb 2017Twitchcrement Demonstration

Scrape top 500 channels

Chatmessage (user, channel, message)

Count chat messages in 10s span, aggregate with spam messages, count unique usersSpammessage (channel, message, spam count, user list)

stream to consumerschatmessagespammessage4

4

Design challenges17 Feb 2017Twitchcrement DemonstrationDesign an algorithm that works as a dynamic trigger

Dont want to store list of spam terms in memory

Would like to avoid continual database reads even to a lightweight system (i.e. redis)

Solution: Flink and stateful streaming

Flink maintains state of its messages and updates as new messages come in on a topic

5

5

Design challenges17 Feb 2017Twitchcrement DemonstrationSolution: Propogate spam messages as parallel stream to chat messages, loop around and perform union6

Spam messages keyed similarly to chat messages

Comparisons only occur when and where relevant

Message counts maintain state (SPAM/NOT SPAM)Incoming spam topics maintain state (NEW/ACTIVE)

6

Demonstration link17 Feb 2017Twitchcrement Demonstration

http://twitchcrement.us:8000/

Django application takes as argument a channel, streams spam messages and unique participants live7

7

About Me17 Feb 2017Twitchcrement Demonstration

Keith Rose

Ph.D. Experimental Physics (2012)

Built data pipeline for part of a major accelerator experiment

Extensive C++/python experience

Longtime gamer for 30+ years

Swiss National Dance Dance Revolution champion

8

8