tweet alert - semantic analysis in social networks for citizen opinion mining
DESCRIPTION
Description of a configurable, real-time system for automatic record, analysis and visualization of information from user interactions in Twitter. The system is designed to provide public bodies (government agencies) with a powerful tool to rapidly and easily understand what the citizen behavior trends are, what their opinion about city services, events, etc. is, and also may be used as a primary alert system to improve the efficiency of emergency systems. The citizen is here observed as a proactive city sensor capable of generating huge amounts of very rich, high-level and valuable data through social media platforms, which, after properly processed, summarized and annotated, allows city officers to better understand citizen needs. The architecture and component blocks are described and some key details of the design, implementation and scenarios of application are discussed. Textalytics APIS are used for the semantic analysis of relevant tweets. Presentation by DAEDALUS, UPM and UC3M at PEGOV 2014, 2nd International Workshop on Personalization in eGovernment Services and Applications, Aalborg, Denmark, in conjunction with the 22nd Conference on User Modeling, Adaptation and Personalization - UMAP 2014.TRANSCRIPT
TweetAlert: Semantic Analytics in Social Networks
for Citizen Opinion Mining in the City of the Future
Julio Villena-Román1,2, Adrián Luna-Cobos1,3, José Carlos González-Cristóbal3,1
1 DAEDALUS - Data, Decisions and Language, S.A. 2 Universidad Carlos III de Madrid
3 Universidad Politécnica de Madrid [email protected], [email protected], [email protected]
PeGOV 2014 – 2nd Workshop on Personalization in eGovernment Services and Applications 11 July 2014, Aalborg, Denmark
PeGOV-2014 11 July 2014, Aalborg, Denmark 2
Agenda ! Framework ! Citizen Sensor ! System ! Business cases ! Future work
PeGOV-2014 11 July 2014, Aalborg, Denmark 3
Framework ! Ciudad 2020 aims to achieve significant improvements in areas of
energetic efficiency, Internet of the Future, Internet of Things, human behaviour, environmental sustainability and mobility and transport, in order to design the City of the Future: sustainable, efficient, smart. ! Spanish R&D project, INNPRONTA Programme, Center for Industrial
Technological Development (CDTI), Ministry of Economy and Competitiveness
! 2011-2014 ! 16,3 M€ budget ! 5 multinational corporations, 4 SMEs, 8 PRIs
! Daedalus focuses on the automatic extraction of meaning from all types of multimedia content, using NLP technologies and data/text analytics to help our customers solve any challenge in these areas.
PeGOV-2014 11 July 2014, Aalborg, Denmark 4
Citizen Sensor
mobility
opinions in social media
relationship with public administration
collaborative sensing
professional activities
relationship with other people
Citizen 2020 = another city sensor
surveys
leisure and free time
PeGOV-2014 11 July 2014, Aalborg, Denmark 5
Citizen Sensor ! Innovative way to capture a very descriptive high-level
heterogeneous information, bringing high added value especially when considering aggregations ! More complex and richer information than other sensors
! “smells awful”, “there is a fire”, “I’m going to the sales”… ! Individual actions may show citizen trends
! validate a bus ticket " route density ! Opinion/sentiments of the citizen about the city
! follow social networks to assess the impact of new policies ! Collaborative sensing
! using smartphones to get data (pollution, energy consumption) with low cost and new possibilities
PeGOV-2014 11 July 2014, Aalborg, Denmark 6
Our approach What: Build a system able to capture, store and analyze user
messages Where: In Twitter For whom: City administrators What for: To help them rapidly and easily understand citizen
behaviour trends and know their opinion about city services, events, etc.
Why: To enable them to better understand citizen necessities, generate hypotheses over urban behaviour models, in order to improve municipal management policies, bringing them closer to the actual reality of the citizens
How: Using NLP technologies When: In real-time
PeGOV-2014 11 July 2014, Aalborg, Denmark 7
Architecture
PeGOV-2014 11 July 2014, Aalborg, Denmark 8
Information Repository ! Stores the high volume of data and provides advanced search
functionality to exploit the information ! Based on Elasticsearch
! open source, distributed, real-time search and analytics engine ! complex search capabilities ! scalable high-performance solution
http://www.elasticsearch.org
PeGOV-2014 11 July 2014, Aalborg, Denmark 9
Gatherer ! Set of concurrent processes that query the Twitter APIs to collect
tweets ! Search or Streaming API ! Filter by a list of user identifiers, a list of keywords to track (terms,
hashtags) and/or a set of geographical bounding boxes ! Returns tweet text, author, location, embedded media
https://dev.twitter.com/docs/api/1.1
PeGOV-2014 11 July 2014, Aalborg, Denmark 10
Inquirer ! Set of concurrent processes that annotate tweets using our
Textalytics Core APIs ! Entities ! Concepts ! Hashtags ! Thematic area of the message (transport, economy, daily life…) ! Citizen Sensor model
! Alert situations (road accidents, fires, street violence…) ! Specific location of the user (building, means of transport...) ! Events to which the text refers (cultural events, sports...)
! Sentiment polarity : P+, P, NEU, N, N+, NONE ! Irony and subjectivity ! User demographics: gender, age, type of tweet author
Topic Extraction API
Sentiment Analysis API
Text Classification
API
User Demographics API
http://textalytics.com
PeGOV-2014 11 July 2014, Aalborg, Denmark 11
Entities, concepts, hashtags Advanced NLP to obtain POS, syntactic tree and semantic analyses of the text and use it to identify different types of significant elements
PeGOV-2014 11 July 2014, Aalborg, Denmark 12
Text classification State-of-the-art hybrid text classification model using a statistical classification combined with a rule-based filtering
Social Media Citizen Sensor
PeGOV-2014 11 July 2014, Aalborg, Denmark 13
Topics
PeGOV-2014 11 July 2014, Aalborg, Denmark 14
Alerts
PeGOV-2014 11 July 2014, Aalborg, Denmark 15
Locations, events
PeGOV-2014 11 July 2014, Aalborg, Denmark 16
Sentiment analysis State-of-the-art lexicon-based model for sentiment analysis, using POS and syntactic tree for detecting negation and controlling the scope of modifiers + subjectivity classification + irony detection
PeGOV-2014 11 July 2014, Aalborg, Denmark 17
User Demographics Text classification based on n-grams model to guess user type, gender and age from his/her login, name and profile description
PeGOV-2014 11 July 2014, Aalborg, Denmark 18
Example
{ "text":"el viento ha roto una rama y hay un atascazo increible en toda la gran vía...",
"tag_list":[ {"type":"sensor", "value":"011002 Ubicación - Exteriores - Vías públicas"}, {"type":"sensor", "value":"070700 Alertas meteorológicas - Viento"},
{"type":"sensor", "value":"080100 Incidencia - Congestión de tráfico"}, {"type":"topic", "value":"06 medio ambiente, meteorología y energía"}, {"type":"entity", "value":"Gran Vía"}, {"type":"concept", "value":"viento"},
{"type":"sentiment", "value":"N"}, {"type":"subjectivity", "value":"OBJ"}, {"type":"irony", "value":"NONIRONIC"}, {"type":"user_type", "value":"PERSON"},
{"type":"user_gender", "value":"FEMALE"}, {"type":"user_age", "value":"25-35"} ]
}
PeGOV-2014 11 July 2014, Aalborg, Denmark 19
Geolocation
PeGOV-2014 11 July 2014, Aalborg, Denmark 20
Visualization
http://www.highcharts.com http://openlayers.org http://d3js.org
PeGOV-2014 11 July 2014, Aalborg, Denmark 21
Ongoing business cases ! City console for a local administration to analyze in real-time the
behaviour and topics of interest of the citizens, with two components: ! a private console, internal for the city services, for analytics ! a public dashboard to engage citizens with their city, displaying
attractive, summarized, non-confidential information at selected public locations (town hall, libraries, museums) or a LED video wall in a populous square in downtown
! Social alert detection system ! For 112 emergency services, providing early detection of security-
related issues
PeGOV-2014 11 July 2014, Aalborg, Denmark 22
For short/mid term future ! Trending topics geolocation clustering
! Analysis at neighbourhood level
health
traffic jam
air pollution
jellyfish
pollen
PeGOV-2014 11 July 2014, Aalborg, Denmark 23
For short/mid term future ! Analysis of city pace of life
PeGOV-2014 11 July 2014, Aalborg, Denmark 24
For short/mid term future ! Mobility analysis
! How, when, why people move through the city ! Route identification (home"work"free time"home) ! Route changes (due to weather)
PeGOV-2014 11 July 2014, Aalborg, Denmark 25
For short/mid term future ! City reputation and brand personality ! Automated satisfaction surveys
PeGOV-2014 11 July 2014, Aalborg, Denmark 26
This work has been supported by several Spanish R&D projects: Ciudad2020: Hacia un nuevo modelo de ciudad inteligente sostenible (INNPRONTA IPT-20111006), MA2VICMR: Improving the access, analysis and visibility of the multilingual and multimedia information in web for the Region of Madrid (S2009/TIC-1542) and MULTIMEDICA: Multilingual Information Extraction in Health domain and application to scientific and informative documents (TIN2010-20644-C03-01). Authors would like to thank all partners for their knowledge and support.