beating the bookmakers: leveraging statistics and twitter microposts for predicting soccer results
DESCRIPTION
Poster presented at the Large-Scale Sports Analytics Workshop at KDD 2014 (conference on Knowledge Discovery and Datamining) Paper can be found at : http://www.large-scale-sports-analytics.org/Large-Scale-Sports-Analytics/Submissions.html or Research Gate. ---------------------------- ABSTRACT: In this paper, we investigate the feasibility of using collec-tive knowledge for predicting the winner of a soccer game. Specifically, we developed different methods that extract and aggregate the information contained in over 50 million Twitter microposts to predict the outcome of soccer games, considering methods that use the Twitter volume, the sen-timent towards teams and the score predictions made by Twitter users. Apart from collective knowledge-based pre-diction methods, we also implemented traditional statistical methods. Our results show that the combination of different types of methods using both statistical knowledge and large sources of collective knowledge can beat both expert and bookmaker predictions. Indeed, we were for instance able to realize a monetary profit of almost 30% when betting on soccer games of the second half of the English Premier League 2013-2014.TRANSCRIPT
http://multimedialab.elis.ugent.beGhent University – iMinds, ELIS Department/Multimedia Lab
Gaston Crommenlaan 8 bus 201B-9050 Ledeberg – Ghent, Belgium
Fréderic Godin, Jasper Zuallaert, Baptist Vandersmissen, Wesley De Neve and Rik Van de WalleWorkshop on Large-scale Sports Analytics, KDD 2014
Beating the Bookmakers: Leveraging Statistics and Twitter Microposts for Predicting Soccer Results
24/8/2014, New York, USA
Research Question
Approach
General evaluation of the predictions for 100 soccer gamesMethod Match day 20-24 Match day 29-34 OverallHome Team Wins 48% 54% 51%A BBC Soccer Expert 62% 58% 60%The Bookmakers 66% 68% 67%Twitter Volume Model 48% 52% 50%Sentiment Model 48% 56% 52%User Prediction Model 58% 68% 63%Statistical Model 58% 70% 64%Majority Voting 64% 64% 64%Late Fusion 62% 70% 66%Early Fusion 66% 70% 68%
1. Harvest input data 2. Construct feature vectors
3. Train individual prediction models
Statistical Analysis
Twitter Volume
Sentiment Analysis
User Prediction Analysis
Statistical model
User Prediction Model
Twitter Volume Model
Sentiment Model
Late Fusion
Majority Voting
Early Fusion
Evaluation
Can we use the wisdom of the crowd to predict the outcome of a soccer game correctly?
Monetary Profit
Conclusion
Method Money earnedThe Bookmakers €18.55Statistical Model €25.82Early Fusion €29.70
How much would we earn if we bet €1 on every game? (100 games)
30% profit!
By using the wisdom of the crowd we could beat the bookmakers in predicting the result of a soccer game.
4. Train combined prediction models
@frederic_godin, @jasperzuallaert, @BaptistV, @wmdeneve and @rvdwalle