11_statistical analysis of users who chatting about beer on twitter

21
 Statistical Analysis of users who chatting about beer on Twitter Rodrigo Otávio de Araújo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen PMKT   Revista Brasileira d e Pesquisas de Marketing, Opi nião e Mídia (ISSN 1983-945 6 Impressa e I SSN 2317-0123 O n-line), São Paulo , Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt .com.br 1 Statistical Analysis of users who chatting about beer on Twitter 1  Análise de Usuários que Conversam sobre Cerveja no Twitter Submission: Mar./28/2014 - Approval: Apr./14/2014  Rodrigo Otávio de Araújo Ribeiro Doctor and Master in Production Engineering from Universidade Federal Fluminense - UFF. Bachelor's degree in Statistics from Escola Nacional de Ciências Estatísticas - ENCE/IBGE. He has a large experience on statistical modeling in large databases. Nowadays he is Director of Marketing Intelligence at IBOPE DTM. E-mail: [email protected] Professional Address: IBOPE DTM - Rua Voluntários da Pátria - nº 89 - sala 803 - 22270-000 - Botafogo - Rio de Janeiro/RJ    Brasil. Tarsila Gomes Bello Tavares Bachelor's degree in Statistics from Escola Nacional de Ciências Estatísticas - ENCE/IBGE.  Nowadays she is Coordinator o f Marketing Intelligence a t IBOPE DTM. E-mail: [email protected] Daniel de Oliveira Cohen Bachelor's degree in Statistics from the State University of Campinas - UNICAMP. He performs statistical analysis as regression, segmentation and social network analysis on data collected through quantitative surveys. Nowaday s he is Statistician at IBOPE Inteligência. E-mail: [email protected]  1  This was one of the papers presented at ABEP’s 6 th  Brazilian Market, Opinion and Media Research Congress (held on March 24 and 25, 2014), winner of the Prize “Alfredo Carmo” turned into an article by its author(s), submitted to PMKT, and approved for publication.  

Upload: rodrigo-ribeiro

Post on 06-Oct-2015

219 views

Category:

Documents


0 download

DESCRIPTION

The identification of influential users in social media is a subject that has generated great interest by companies in recent years. This work aims to evaluate this influence through the use of graphs for understanding the existing relational structure between users, established through their conversations on Twitter. Exploratory data analysis and text mining techniques have been used to further conclusions about the subject. The "conversation environment" was chosen is Brazilian beer, and the search related words were the major active brands in domestic market. The evaluation was performed considering a sample of 25 days between the months of December 2013 and January 2014.

TRANSCRIPT

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 1

    Statistical Analysis of users who chatting about beer on Twitter1

    Anlise de Usurios que Conversam sobre Cerveja no Twitter

    Submission: Mar./28/2014 - Approval: Apr./14/2014

    Rodrigo Otvio de Arajo Ribeiro

    Doctor and Master in Production Engineering from Universidade Federal Fluminense - UFF.

    Bachelor's degree in Statistics from Escola Nacional de Cincias Estatsticas - ENCE/IBGE. He has

    a large experience on statistical modeling in large databases. Nowadays he is Director of Marketing

    Intelligence at IBOPE DTM.

    E-mail: [email protected]

    Professional Address: IBOPE DTM - Rua Voluntrios da Ptria - n 89 - sala 803 - 22270-000 -

    Botafogo - Rio de Janeiro/RJ Brasil.

    Tarsila Gomes Bello Tavares

    Bachelor's degree in Statistics from Escola Nacional de Cincias Estatsticas - ENCE/IBGE.

    Nowadays she is Coordinator of Marketing Intelligence at IBOPE DTM.

    E-mail: [email protected]

    Daniel de Oliveira Cohen

    Bachelor's degree in Statistics from the State University of Campinas - UNICAMP. He performs

    statistical analysis as regression, segmentation and social network analysis on data collected

    through quantitative surveys. Nowadays he is Statistician at IBOPE Inteligncia.

    E-mail: [email protected]

    1 This was one of the papers presented at ABEPs 6th Brazilian Market, Opinion and Media Research Congress (held on March 24 and 25, 2014), winner of the Prize Alfredo Carmo turned into an article by its author(s), submitted to PMKT, and approved for publication.

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 2

    ABSTRACT

    The identification of influential users in social media is a subject that has generated great interest by

    companies in recent years. This work aims to evaluate this influence through the use of graphs for

    understanding the existing relational structure between users, established through their

    conversations on Twitter. Exploratory data analysis and text mining techniques have been used to

    further conclusions about the subject. The "conversation environment" was chosen is Brazilian beer,

    and the search related words were the major active brands in domestic market. The evaluation was

    performed considering a sample of 25 days between the months of December 2013 and January

    2014.

    KEYWORDS:

    Beer, Twitter, Social network analysis.

    RESUMO

    A identificao de usurios influentes nas mdias sociais um assunto que tem gerado grande

    interesse por parte das empresas nos ltimos anos. Este artigo visa avaliar esta influncia por meio

    da utilizao de grafos para entendimento da estrutura relacional existente entre os usurios,

    estabelecida por suas conversas no Twitter. A anlise exploratria de dados e as tcnicas de

    Minerao de Textos foram utilizadas para concluses complementares acerca do assunto. O

    ambiente de conversas escolhido para avaliao foi o das cervejas brasileiras, sendo as buscas

    realizadas por palavras relacionadas s principais marcas atuantes no mercado nacional. A avaliao

    foi realizada, considerando uma amostra de 25 dias entre os meses de dezembro de 2013 e janeiro

    de 2014.

    PALAVRAS-CHAVE: Cerveja, Twitter, Anlise de Redes Sociais.

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 3

    1. INTRODUCTION

    This article aims to identify the most influential users on Twitter who posted messages about beer.

    A sample of 25 days between the months of December 2013 and January of 2014 was used,

    considering only posts made in Portuguese, in Brazil.

    The content of the conversations was also assessed by applying text mining algorithms. A

    descriptive analysis of the general behavior of Twitter users who talk about the subject, toward an

    understanding of aspects about the use and impact of different brands and the users profile was

    performed.

    The largest number of posts on the subject took place in the afternoon and evening, where there is a

    strong asymmetry with respect to the distribution of the number of messages posted by users; the

    majority posted only a single message during the period. By observation of the peaks in the time

    series of the total number of messages posted, it was possible to evaluate the effect of holidays: the

    behavior of users during the New Year was very close to what was observed at Christmas.

    As semantic evaluation of posts about beer, many topics (themes) within the main subject were

    identified. This kind of information can assist companies in targeting their strategies and ongoing

    monitoring of consumer behavior. It was noticed that when many users post messages about beer,

    they mention information about where, with whom, or even when they will consume it. Many times

    they also mention the brands of their preference as well.

    The analysis of influence of users in social networks allows the creation of various marketing

    strategies. Most influential users on a particular subject can be contacted by companies to publicize

    their brands being used as links between companies and other end users.

    The measurement of the influence made in this work was done based on the number of connections

    that the user had during the study period. On Twitter, users can target their messages to each other

    and pass on information disclosed by any of their connections (retweets).

    One way to assess the degree of influence of users consists in verifying the number of connections

    that pass their messages or the number of connections to which they direct their posts. This paper

    aims to evaluate these two points of view.

    This study was structured as follows: after the introduction, in the second part, we pointed out the

    main features of the different techniques used in the analysis. In the third, there was a small

    explanation about Twitter and its strong growth in Brazil. In the fourth part, the contextualization of

    the domestic beer market, its evolution, its trends and key brands. In the fifth part, it was detailed

    the analytical methodology applied, clarifying the questions answered by the study. In the sixth

    part, there was an explanation about the data information used. In the seventh, the results of data

    analysis were shown. In the eighth, it was presented the main conclusions and, finally, the limitation

    and suggestions for new research.

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 4

    2. THEORETICAL BACKGROUND

    2.1 SOCIAL NETWORK ANALYSIS

    A social network is determined by a set of actors (or nodes) and pre-established relationships

    between them (WASSERMAN; FAUST, 1994). Actors can take many forms and represent different

    groups of individuals as users, companies and entities. Because of its great flexibility, social

    network analysis (also commonly called Social Network Analysis - SNA) can be applied in almost

    any context.

    Generally, SNA techniques are visually represented by "graphs". In these graphs, the actors or

    nodes are represented by dots and the relationship between a pair of nodes is defined by edges or

    connections.

    The connections can be direct when it is important to highlight that the actor was the source of this

    relationship (WASSERMAN; FAUST, 1994). According to the authors, in addition to being

    visually displayed, a social network can be described by an n x n matrix, where n is the total

    number of nodes on that network.

    The existence of relationship between the pair of nodes u and v, for example, would be given the

    value 1 in the corresponding cell of the matrix. The reading can be done as follows: the rows

    represent nodes where the relationship goes (actors of origin) and the columns, where the

    relationship ends (actors of destination). Thus, an indirect social network will always give a

    symmetric matrix.

    In order to assist in understanding the relationships between the actors, there are some metrics that

    can consider the network as a whole or each node in specific. Among them are:

    Degree (degree): number of edges connected to each node. PageRank: spectral measure of popularity set to direct graphs with non-negative weights of

    connections (PAGE et al., 1998) , and can be given by:

    = (1 )1 + (

    )

    Where:

    n = total number of nodes in the network (users).

    A = {1,0, +1}nn is the adjacency matrix with values Auv = +1 when user u marked user v as a friend and Auv = 1 when user u marked user v as a foe. A is sparse, square and asymmetric.

    = absolute diagonal matrix defined by = || .

    nn = is a matrix full of ones of the specied size, and 0 < < 1 is the teleportation parameter.

    The matrix G is left-stochastic, each row sums to one (KUNEGIS; LOMMATZSCH;

    BAUCKHAGE, 2009).

    The software used in this study was Gephi, a freeware that allows different forms of editing and

    customization of the final results. It can be used in the creation of graphs and calculating the metrics

    analysis.

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 5

    2.2 TEXT MINING

    Text Mining is the process of extracting useful information or knowledge from unstructured text

    documents (BARYON; LAKE, 2008). In the context of this study, this technique is applied to

    identify patterns of comments and opinions expressed by users of Twitter about the Brazilian beer

    market.

    Information Retrieval Techniques or Information Extraction Techniques are applied over a set of

    texts, with the aim of making it structured. From these structured data, data mining techniques are

    applied to obtain relevant information, as shown in Figure 1.

    Source: BARION, E. C. N.; LAGO, D. Minerao de textos. Revista de Cincias Exatas e Tecnologia, 2008.

    FIGURE 1 Text Mining Process

    The first step Mining is the indexing process that stores an index structure, from the words of the

    text, and makes it possible to search for documents by all terms contained therein (SALTON;

    MCGILL, 1983). Some steps to an analysis of Text Mining (BARYON; LAKE, 2008):

    Lexical Analysis: converts a string into a sequence of words that are candidates for index terms. Removal of Stop-words: removes a set of words that appear frequently in texts, but have no

    semantic value, such as prepositions, articles and conjunctions. This phase is extremely

    important, because it reduces the base to be indexed and facilitates mining.

    Stemming: removes all variations of words, leaving only the root of each, for example, the word dreaming" becomes identified as the root of "dream".

    Selection of index terms: determines which words or radical elements will be used as indexing. These words are selected according to the weight assigned to them.

    Bag of Words - BOW: a matrix in which each different term in this collection of documents is indexed. From this indexing, each document can be represented by a first vector xn, where n is

    the total number of terms; each entry of this vector is the number of times the terms appear in

    this document (SIVIC, 2009).

    Determination of weights: filling the BOW matrix is based on metrics that weigh the frequency of occurrence of terms in documents and in the total collection (set of all documents). The

    metric most commonly used for this purpose is called tf-idf (term frequency inverse document

    frequency).

    Correlation (similarity) between terms : BOW based on the matrix, one can calculate the Pearson correlation between different words, in order to measure how they are related by the

    formula (HUANG, 2008):

    (

    ,

    ) = , ,=1 ,

    =1 ,

    =1

    [ ,2

    =1 ,=1

    2][ ,

    2=1 ,

    =1

    2]

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 6

    Where:

    = vector created by the BOW

    m = total number of distinct terms in the entire collection of documents

    ,= weight (tf-idf) of term t in the document a.

    2. TWITTER

    Twitter was founded in 2006 by partners Jack Dorsey, Evan Williams, Biz Stone and Noah Glass,

    in San Francisco, USA. The service is a social network that allows users to post and read tweets,

    which are nothing more than a 140 character messages. Its access can be made directly on any

    internet browser, for applications in mobiles. In some countries, the posts can be made by SMS as

    well. The idea quickly spread and gained popularity throughout the world: in 2012, there were more

    than 500 million registered users who posted 340 million tweets per day (LUNDEN, 2012).

    According to the information site of hits on web pages (), Twitter was one of the

    ten most accessed pages of the world that year.

    Once registered, the user defines an address on the site that is not already being used. From then on,

    he will always be known by other users for that address preceded by the @ symbol.

    Set this address and registered the account, the user can "follow" or "be followed" by other users.

    This means when a user posts something, the message appears directly for the users that follow him.

    By default, tweets are publicly visible. However, you can restrict viewing messages only to their

    followers. Another possibility is to repost the message that has already been posted by someone

    else, a practice known as retweet, and which is characterized by the abbreviation RT. In this case,

    the goal is to get the message out (STRACHAN, 2009).

    When a post that is on a specific topic, users can apply hashtags on their messages - phrases or

    words that begin with the # symbol (STRACHAN, 2009). Likewise, its possible to display only

    messages that on that specific topic.

    When a word, a phrase or an expression are often mentioned simultaneously by a large number of

    different users, they can be considered a trending topic (CHOWDHURY, 2009). Trending generally

    occurs when efforts of a group of users with common interest are brought together for the sake of

    some goal or when large and popular events are happening.

    3. BEER MARKET IN BRAZIL

    Currently, Brazil has a highly competitive beer market in which companies stand out as AmBev,

    Brazil and Petrpolis Kirin Group. With a turnover of R$ 63 billion in 2012, the country is the third

    largest brewer in the 26th international consumer ranking (ECONOMIC VALUE, 2013).

    The market share of this market in Brazil is concentrated in AmBev breweries, Kirin Group and

    Grupo Petrpolis, which together have 90% of the market. Another important information is the per

    capita consumption in liters per year. In 2012, consumption reached 66.7 liters per capita (Chart 1).

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 7

    CHART 1

    Brazilian consumption of beer (liters per capita)

    Since 2008, the beer consumption in Brazil has presented a significant increase (Chart 2).

    CHART 2

    Market share of the Brazilian beer market

    Due to the relevance of the beer market in the Brazilian economy and its continued growth we

    decided to perform this study in which the monitoring was conducted following brands: Antarctica,

    Baden Baden, Bohemia, Brahma, Budweiser, Eisenbahn, Itaipava, Nova Schin, Serramalte, Stella

    Artois and Skol, besides the word cerveja (beer) and two of its regional variations: breja and cerva.

    4. ANALYTICAL METHODOLOGY

    The analytical methodology consists in the execution of three steps: the first refers to the analysis of

    the general behavior and the profile of users on the use of Twitter to make posts about beer; the

    second, the semantic analysis based on text mining techniques and multivariate statistics to identify

    the most relevant topics of discussion within the brewing environment and, finally, evaluation of the

    influence of users.

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 8

    5.1 GENERAL BEHAVIOR AND PROFILE OF USERS WHEN THE SUBJECT IS BEER

    In the first analytical step, we sought to assess the main aggregated metrics present at work grouped

    in time. The most important were the following:

    Number of posts: measures the total number of posts made by time interval. Number of distinct users: measures the total number of distinct users who have had postings per

    time interval.

    Average Posts per user: calculated by dividing the number of posts by the number of distinct users.

    Percentage of posts: proportion of posts classified in each of the existing categories.

    The analysis of the total number of posts makes it possible to evaluate the total intensity of impacts

    occurred during the observed period. Through the average of posts per user we can verify, in

    general terms, the degree of intensity of disclosure of the matter considered among the users , so

    that, the closer to 1 is the average, the lower the intensity. The percentage of posts evaluates the

    weight of each existing category within a given categorical variable in the total of posts considered.

    The evaluation of these metrics is aimed to understanding the characteristics of the general behavior

    of the Twitter users about beer. The identification of the peaks was made by visualization of time

    series of the number of posts. The same procedure must be performed to evaluate the time curve.

    When we analyze the average posts per user we could evaluate changes on the behavior of

    individual users. Often there are large variations in this metric on time intervals, due to specific

    users who tend to perform more posts about the specific topics or events.

    Twitter allows the use of specific metrics that denote the different types of behavior of its users,

    among them you can highlight the penetration (proportion of posts with certain characteristic).

    These characteristics were the following:

    RTs: tweets which passed on a message that had already been posted by another user. @: directing messages to another person. Http: tweets possessing information contained on websites. Hashtag (#): group discussion on a specific topic. Other: tweets which do not contain any of the aforementioned characteristics.

    5.2 INFLUENCE ANALYSIS

    The analysis of influence is taken from a network of conversations in which two distinct cases of

    influence were observed, the first case considers the retweets. The other case of influence includes

    the tweets sent directly to other users.

    In the first case, it was noted how influential a user is checking how many other users have made

    retweet its post. Then, there was the influence of the number of directed conversations between

    users.

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 9

    In this article, we considered the two cases and all sorts of connections between users. However, in

    practical terms, the effect of retweets has always more impact, because it happens with more

    frequency.

    5.3 SEMANTIC ANALYSIS

    Correlation analysis between topics was accomplished as the following process: first the lexical

    analysis was performed. In a second step, the cleaning of stop-words (words without semantic

    value) for later execution stemming algorithm (extraction of radicals) was taken. After these steps,

    the BOW matrix was calculated. In this array, each term corresponding to a considered column and

    each row to a document (tweet).

    The measure was used to assess the tf-idf (term frequency inverse document frequency). Based on

    the information matrix it was possible to obtain the measures most associated with the particular

    word. This similarity was assessed by Pearson correlation.

    The classification of posts on the theme was generated through the development of a heuristic based

    on the selection of keywords defined by experts. The evaluation process of the words to be

    considered is:

    Step 1: definition of keywords that characterize certain theme. Step 2: development of algorithm to count the keywords defined in step 1. Step 3: Repeat steps 1 and 2 until the proportion of posts classified into any theme can be

    considered satisfactory.

    Generally, the minimum proportion of posts classified into themes for obtaining consistent results is

    50 percent.

    5. AVAILABLE INFORMATION

    The extraction of information was done through a program developed by IBOPE DTM that

    connects directly to the Twitter API.

    Based on the distribution of market share in the Brazilian beer market, it was decided to study only

    the brands of the most significant companies in the segment: AmBev, Kirin Group and Grupo

    Petrpolis. Therefore, we carried out the monitoring of the following brands: Antarctica, Baden

    Baden, Bohemia, Brahma, Budweiser, Eisenbahn, Itaipava, Nova Schin, Serramalte, Stella Artois

    and Skol, besides the word cerveja (beer) and two of its regional variations: cerva and breja. The data refer to all messages posted during the study period containing the specified words.

    After 25 days of monitoring, 438,507 tweets (posts) related to beer were obtained. However, the

    study was done focusing on disclosure in Brazil, we only considered posts in Portuguese and work

    was started with 291,043 posts (66.4%).

    The monitoring period from 10/12/2013 to 01/03/2014 was chosen based on the assumption that the

    holidays of the end of the year: Christmas and New Year influencing the number of posts on

    Twitter about beer.

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 10

    6. DATA ANALYSIS

    The analysis followed the same structure of the methodology presented. First the general

    distribution of the posts was evaluated.

    7.1 GENERAL BEHAVIOR AND PROFILE OF USERS WHEN THE SUBJECT IS BEER

    The impacts caused by the holidays put considerable variation in the daily number of tweets posted.

    In chart 3, we can see that the days that had incidence peaks posts were 24, 25 and December 31, or

    Christmas Eve, Christmas and New Year's Eve, in which there was an increase of over 4000 posts

    for the total period average.

    CHART 3

    Distribution of posts about beer in Twitter

    As for the timing of posts (chart 4), there was a sharp increase from 10 o'clock in the morning,

    which has stabilized at between 15 and 21 hours.

    CHART 4

    Number of posts per hour

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 11

    By analyzing the average of posts per user, it can be seen that there was a peak in the middle at 9

    am (Chart 5). But this peak was not large enough to consider the behavior very different from other

    hours of the day.

    CHART 5 Average posts per user by Total

    However, when Christmas and New Year holidays were detailed, it was seen that, at Christmas, the

    highest average incidence of posts occurred between 8 and 9 o'clock, while in the New Year this

    higher average incidence of posts occurred in the period as from 23 hours, as shown in Chart 6.

    CHART 6

    Average posts per user by Christmas and New Year

    In chart 7, we note that 85.5 % of the posts pertaining to beer do not mention a specific brand.

    However, considering 14.5% of the posts with quote of some brand, Skol is the one with higher

    participation in Twitter with 4.3 %, followed by Brahma Itaipava with 3.5% and 2.2% of the posts.

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 12

    CHART 7

    Percentage of posts of search words

    As to individual metrics (Table 1), it was seen that the only brand that had featured a significant

    number of posts with hashtag (#) was the Eisenbahn with 17.9% of the posts. Brands which

    contained links to sites (http) were: Baden Baden with 41.4 %, with 36.5% Eisenbahn, Antarctica

    with 32.4% and Stella Artois with 29.4% of the posts. In context messages directed (@), the

    Serramalte brand stood out with 32.5%, followed by Nova Schin with 23.4% of the posts. Finally,

    the transfer of messages previously posted (RT) were higher in the Budweiser brand in 26.9% and

    21.1% of Antarctica in posts.

    TABLE 1

    Search Words Metrics

    Focusing on users who made some comment about beer, it was possible to see that only one person

    was responsible for 1416 posts of beer (Table 2), but the person has only 153 followers, in other

    words, just his 153 followers directly viewed the information disclosed.

    RT @ HTTP HASTAG OTHERS

    215.229 74,0% 21,1% 13,2% 8,0% 3,8% 56,3%

    19.537 6,7% 11,0% 16,9% 6,1% 3,5% 64,4%

    14.112 4,8% 11,2% 15,8% 5,7% 3,8% 65,7%

    ANTARCTICA 5.781 2,0% 21,1% 11,3% 32,4% 4,5% 33,7%

    BOHEMIA 1.943 0,7% 6,2% 13,2% 17,4% 9,0% 59,1%

    BRAHAMA 10.234 3,5% 15,7% 13,9% 16,1% 6,7% 52,0%

    BUDWEISER 3.232 1,1% 26,9% 8,7% 12,1% 9,5% 50,5%

    SERRAMALTE 114 0,0% 4,4% 32,5% 13,2% 12,3% 45,6%

    SKOL 12.632 4,3% 15,4% 13,5% 12,9% 9,0% 55,7%

    STELLA ARTOIS 574 0,2% 9,2% 7,0% 29,4% 6,3% 52,4%

    BADEN BADEN 331 0,1% 3,0% 16,6% 41,4% 6,9% 38,1%

    EISENBAHN 263 0,1% 6,1% 11,8% 36,5% 17,9% 44,9%

    NOVA SCHIN 538 0,2% 15,2% 23,4% 9,3% 3,2% 50,2%

    PETRPOLIS ITAIPAVA 6.523 2,2% 10,2% 14,2% 19,5% 3,7% 54,9%

    291.043 100,0% 19,1% 13,6% 9,2% 4,2% 56,5%

    % PENETRATION BY POST TYPE

    CERVEJA (beer)

    BREJA

    CERVA

    AMBEV

    SEARCH WORD POSTS %

    KIRIN

    TOTAL

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 13

    TABLE 2

    Top ten users with large number of posts

    Following this line of reasoning, the singer Claudia Leitte sent only one post about beer, but this

    information was seen by her 7,869,106 followers (Table 3).

    TABLE 3

    Top ten users with the highest number of followers on Twitter

    To analyze the influence users, it was made a ranking of the 20 largest users by PageRank. The user

    "frasesdebebada" has a PageRank of 0,007 and 365 connections (Table 4), it had the greatest

    influence on the network. You can also see in Table 4, the presence of two users who talked about

    beer in Twitter, which are the top 20 users with the largest number of followers (Table 3).

    1 BEEINNDEX 1461 153

    2 SKOL_ 443 107

    3 DJ_RICARDOO 348 512

    4 CERVEJA_DUFF 208 155

    5 RENATORDM 188 514

    6 ITAIPAVA_ 185 415

    7 PREDRERO 162 28.107

    8 MARCIO_SKOL 157 171

    9 SERRALHERO 107 2.181

    10 GORONAH 105 769

    3364TOTAL

    RANKUSERS

    (TWITTER )POSTS

    FOLLOWERS ON

    TWITTER

    1 CLAUDIALEITTE 1 7.869.106

    2 DANILOGENTILI 1 5.324.329

    3 SPIDERANDERSON 1 4.226.383

    4 CLARORONALDO 1 3.625.623

    5 PRETAGIL 2 3.450.693

    6 PORTALR7 5 2.835.528

    7 VEJA 2 2.825.215

    8 BGAGLIASSO 1 2.735.376

    9 G1 11 2.220.615

    10 SIGNOSFODAS 1 1.432.674

    26TOTAL

    POSTS FOLLOWERS ON

    TWITTERRANK

    USERS

    (TWITTER )

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 14

    TABLE 4

    Ranking of 20 users with higher Page Rank

    Among the influential people there is Anderson Silva, a famous MMA fighter, with more than 4

    million followers and the site G1 (from GLOBO organizations) has only 2 million followers. In the

    case of Claudia Leitte, she is the person with the most followers who chatted about beer, but her

    posts were retweeted by people who do not have the habit of chatting about beer, and because of

    this, their position in the ranking of influencers was not superior.

    Anderson Silva posted a message to thanks his sponsor, a famous brand of American beer, before

    his fateful struggle: ... equipando j pra sair... Aproveito para agradecer a todos os meus parceiros: Budweiser, Burger King... (). The ability to determine the real influence of distinguished users reinforces the importance of this type of analysis.

    There is the presence of users who represent companies among the influential, even if the tweet is

    not directed to certain person, its information resonate with various groups within the network.

    In Figure 2, you can see the full network of users who talk about beer on Twitter. Figures 3, 4 and 5

    show the networks of users: "frasesdebebada", "Irma_Zuleide" and "Spider Anderson",

    respectively. The "frasesdebebada" user, being the most influential network in relation to the

    number of connections, got further spread their messages recorded by the intensity of red color in

    Figure 3.

    RANK USERS COMPANY? DEGREE PAGERANK

    1 FRASESDEBEBADA NO 365 0,0070

    2 IRMA_ZULEIDE NO 51 0,0033

    3 SPIDERANDERSON NO 40 0,0029

    4 ASTROSLUMINOSOS YES 73 0,0024

    5 SIGNOSFODAS YES 48 0,0021

    6 FACTBR YES 160 0,0020

    7 SOUVODKA NO 60 0,0018

    8 SENTOAVARAEMVCS NO 32 0,0017

    9 EDUTESTOSTERONA NO 98 0,0016

    10 EVERTOUS NO 108 0,0016

    11 PIADAMALIGNA NO 19 0,0015

    12 G1 YES 89 0,0014

    13 RELAXEI NO 96 0,0013

    14 MATEUSALIANO NO 93 0,0012

    15 LUCASPFVR NO 49 0,0011

    16 FELIXPASSIVA NO 22 0,0010

    17 B1TCH_MALVADA NO 15 0,0010

    18 EUZOERO NO 24 0,0010

    19 PREDRERO YES 25 0,0009

    20 UMVINGADOR NO 12 0,0009

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 15

    FIGURE 2

    Full network

    FIGURE 3

    Network of user frasesdebebada

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 16

    FIGURE 4

    Network of user Irma_Zuleide

    FIGURA 5

    Network of user SpiderAnderson

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 17

    In semantic analysis, we can see that there is not one only word that associates strongly with more

    than one brand. Therefore, in order to facilitate visualization, we selected only the ten words most

    associated with the brands. The brands were chosen according to their volume of posts.

    It was found that the Skol was responsible for 4.3% of the posts related to beer, Brahma with 3.5%

    Itaipava with 2.2 % and 2.0% with Antarctica. the word most often associated with Skol was

    redondo, with a Pearson correlation equal to 0.21 , followed by the words beats and vire with a correlation of 0.16 (Chart 8). These words are related to the marketing campaign of the brand.

    A differential that Brahma had over other brands was the poster girl of the brand , Claudia Leitte,

    appeared in the 6th position of the words most associated with correlation of 0.14 (Chart 8) .

    In the case of Antarctica brand, it has a higher correlation related to a soft drink (guaran) than beer

    specifically (Chart 8). It happens because the name of the brand is the same for both products.

    CHART 8

    Top 10 words with highest correlation with brands

    A group of experts in semantics was responsible for the selection of keywords grouped into some

    issues as major when it comes to beer. A total of 39.2% of posts with no classification was

    obtained. These posts generally have information on beer, but without relevant content. However, it

    can be seen in chart 9, the distribution of 60% of rated posts. As from this point, there was a

    concentration of posts relating to the PLACE where the drink was consumed (19.8%), WITH

    WHOM the person was drinking (13.8%) and specifically about the BRANDS (13.0%).

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 18

    CHART 9

    Proportion of posts by theme

    When analyzing the most discussed themes in the brands studied (Chart 10), it was seen that the

    beers produced by AmBev, the Stella Artois brand has 35 % of posts on the theme

    COMMEMORATIVE DATES (Chart 10), unlike other brands of the same company with posts on

    the subject PLACE. The beers of Kirin Group showed up into three themes: Baden Baden with 44%

    of posts in COMMEMORATIVE DATES, the Eisenbahn with 32 % in the theme PLACE, the

    Nova Schin with 23 % of posts in WITH WHOM theme. The beer Itaipava, Grupo Petrpolis, got

    31 % of the posts in PLACE against 25 % in the theme WHEN.

    CHART 10

    Percentage of posts by theme by beer brands

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 19

    7. CONCLUSION

    It was noted in this article that holidays have a great influence on the number of posts related to

    beer, reaching increases in excess of 35% on the average number of daily posts.

    During the day, in general, there is an increase of posts in afternoon and evening. Schedules with

    greater intensity postings were between 23 hours and 02 hours.

    The social network analysis identified efficiently influential users by the quantity and quality of

    connections during the period. Several influencers were identified, among them stand out Anderson

    Silva who sent a tweet thanking his sponsors before the fight, and G1, a communications company.

    Semantic analysis of posts to identify issues related to beer demonstrated that there is a

    concentration of posts related to the place of consummation of the drink, consumed with WHOM

    and WHICH were the brands consumed.

    In Kirin Group each brand had a higher incidence in different themes: Baden Baden had large

    numbers of postings associated with COMMEMORATIVE DATES, the Eisenbahn posts associated

    with the PLACE and the Nova Schin posts associated with the theme WHOM. Itaipava, Grupo

    Petrpolis, had a higher incidence in posts with the theme PLACE.

    8. LIMITATIONS AND FUTURE WORK

    There was no sudden break in the time series of the total number of posts. It is understood that there

    was no problem of disconnection with the Twitter API, so we can rely on the consistency and

    quality of the information used in this study.

    In future studies, it is a useful idea to perform the analysis with larger historical information in order

    to understand if there is a seasonality behavior on the theme.

    Another hypothesis under study is the evaluation of the difference between the hours of

    consumption and posting.

    9. REFERENCES

    ALEXA. Disponvel em: . Acessado em: 6 jan. 2014.

    BAEZA-YATES, R.; RIBEIRO NETO, B. Modern information retrieval. Addison-Wesley, 1999.

    BARION, E. C. N.; LAGO, D. Minerao de textos. Revista de Cincias Exatas e Tecnologia,

    2008.

    BAVELAS, Alex. A mathematical model for group structure. Applied Anthropology 7, 1948.

    CERVBRASIL. A Cerveja Contribuio econmica, s. d. Disponvel em: . Acessado em: 6 jan. 2014.

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 20

    CERVEJAS DO MUNDO. Histria da cerveja, 2009. Disponvel em:

    . Acessado em: 6 jan. 2014.

    CHOWDHURY, A. Top Twitter Trends of 2009. Twitter Blog, 15 dez. 2009. Disponvel em:

    . Acessado em: 3 fev. 2014.

    CORRA, A. C. G. Recuperao de documentos baseada em Informao Semntica no Ambiente

    AMMO. UFSCAR, 2003.

    COUTINHO, C. A. T.; QUINTELLA, C. A. S.; PANZANI, M. M. Histria da Cerveja no Brasil.

    Portal So Francisco, s. d. Disponvel em: . Acessado em: 6 jan. 2014.

    HUANG, A. Similarity Measures for Text Document Clustering. Department of Computer Science,

    The University of Waikato, 2008.

    KUNEGIS, J.; LOMMATZSCH, A.; BAUCKHAGE, C. The Slashdot zoo: mining a social network

    with negative edges. Track: Social Networks and Web 2.0 / Session: Interactions in Social

    Communities, 2009.

    LIU, Bing. Web Data Mining: exploring hyperlinks, contents, and usage data. Springer, 2011.

    LUNDEN, I. Analyst: Twitter Passed 500M Users In June 2012, 140M Of Them In US; Jakarta

    Biggest Tweeting City. TechCrunch, 30 jul. 2012. Disponvel em: . Acessado em: 3 fev. 2014.

    MANNING, C. D.; RAGHAVAN, P.; SCHUTZE, H. Scoring, term weighting, and the vector

    space model: introduction to information retrieval. Stanford, 2008.

    MELO, I. D. et al., Anlise de Redes Sociais. Universidade Federal da Paraba, 2013.

    MOURA, M. F. Proposta de utilizao de minerao de textos para seleo, classificao e

    qualificao de documentos. Campinas: Embrapa Informtica Agropecuria, 2004.

    NCLEO EDUCACIONAL DE BROGLIE. Produo e consumo de cerveja no Brasil e no mundo,

    2013. Disponvel em: . Acessado em: 6 jan. 2014.

    PAGE, L. et al. The PageRank citation ranking: bringing order to the web. Technical report,

    Stanford Digital Library Technologies Project, 1998.

    QUEIROZ, D. F. Anlise estrutural do setor cervejeiro. FAEC Departamento de Economia, 2010. Disponvel em: . Acessado em: 6 jan. 2014.

    SALTON, G.; MCGILL, M. J. Introduction to modern information retrieval. Computer Science

    Series, USA: McGraw-Hill, 1983.

  • Statistical Analysis of users who chatting about beer on Twitter

    Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen

    PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 21

    SILVA, Anderson. (SpiderAnderson) tweets. Disponvel em: .

    Acessado em: 15 abr. 2014.

    SANTOS, M. A. M. R. Extraindo regras de associao a partir de textos. PUC, 2002.

    SINDICATO NACIONAL DA INDSTRIA DA CERVEJA SINDICERV. Mercado, s. d. Disponvel em: . Acessado em: 6 jan. 2014.

    SIVIC, J. Efcient visual search of videos cast as text retrieval. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, v. 31, n. 4, IEEE, 2009.

    STRACHAN, D. Twitter: how to set up your account. Telegraph, 19 fev. 2009. Disponvel em:

    . Acessado

    em: 3 fev. 2014.

    TWITTER, Finding your Twitter short or long code. Disponvel em:

    . Acessado em: 3

    fev. 2014.

    VALOR ECONMICO. Ritmo de produo de cerveja cai em 2013. 2013. Disponvel em:

    .

    Acessado em: 6 jan. 2014.

    WASSERMAN, Stanley; FAUST, Katherine. Social network analysis: methods and applications.

    Cambridge: Cambridge University Press, 1994.