big data opinion analisis sabm latam 201404
TRANSCRIPT
[Big Data] Simple Exercise of Consumer Preferences Analysis Based on Twits for SAB Miller LATAM Brands
By Gustavo Pabón – May 2014
Our group mission:
To own and nurture local and
international brands that are the
first choice of the consumer
Our group mission:
To own and nurture local and
international brands that are the
first choice of the consumer
How to measure the
consumer preferences?
Our group mission:
To own and nurture local and
international brands that are the
first choice of the consumer
How to measure the
consumer preferences? Twitter may help
Our group mission:
To own and nurture local and
international brands that are the
first choice of the consumer
How to measure the
consumer preferences? Twitter may help
Here it is presented the result of a simple exercise of consumer preferences analysis based on twits from 2nd of April of 2014 to 26th of April of 2014.
Here it is presented the result of a simple exercise of consumer preferences analysis based on twits from 2nd of April of 2014 to 26th of April of 2014.
On a scale from 1 to 5*, the weighted average of SAB
Miller LATAM consumer preference was:
4.76
* The scale will be explained in next slide
© SABMiller plc 2012
Internal Use / Confidential / Secret
Exercise Summary
Twits sample streaming range of dates and filter
Twits were streamed from 04/02/2014 to 04/26/2014 with a GAP from 04/12/2014 to 04/15/2014. The GAP was due to a technical issue on the streaming program. The filter used was based on keywords presented on the next slide.
Twits sample size
The raw data size was 33.853 twits.
First step of data selection was filtering twits not related to SAB Miller LATAM / Global Brands using bag of words technique. The result was a reduction to 20.044 twits.
Second step of data selection was filtering twits not related to consumer preferences using crowd sourcing (Amazon Mechanical Turk). The result was a reduction to 3669 twits.
Consumer preference scale from 1 to 5
Using crowd sourcing, each twit were classified by 10 different people in three categories: 1 negative preference, 5 positive preference, 3 neutral. If more than 6 people agree on the preference, the twit was classified on such preference, if not, the twit was classified as neutral.
Why crowd sourcing?
It is very difficult for an automatic sentiment analysis program to work with twits. They are usually not well written, have a lot of slangs and sarcasms. In addition Spanish internet language is not as studied as English. Human raters typically agree 79% of the time*, while a program is at most 70% accurate. The first run of an automatic sentiment analysis were able to classify just 151 twits.
8 Presentation information in footer
* Taken from Ogneva, M. "How Companies Can Use Sentiment Analysis to Improve Their Business". Retrieved 2012-12-13.
© SABMiller plc 2012
Internal Use / Confidential / Secret
Keywords used for streaming filter (1 of 2)
Global brands
@Grolsch, #Grolsch, @Miller_Global, @MillerCoors, #MillerGenuineDraft, @Birra_Peroni, @peroniclub, #PeroniNastroAzzurro, #peroni, @Pilsner_Urquell, #PilsnerUrquell, @MillerLite, #millerlite, @MGD_Argentina, @MillerLiteAR, @MillerLiteCol, @millerlitehn, @MillerPanama, @MillerLitepa, @Miller_SLV.
Argentina’s brands
@CervezaIsenbeck, #isenbeck, @Warsteiner, @WarsteinerAR, #Warsteiner.
Colombia’s brands
@CervezaAguila, #AguilaLight, #aguila, #CervezaAguila, @clubcolombia, #ClubColombia, #clubcolombiadorada, #clubcolombiaroja, #clubcolombianegra, @cervezacostena, #CervezaCosteña, #CervezaCostena, @PilsenCerveza, #Pilsen, @CervezaPoker, #CervezaPoker, @pokerligera, #pokerligera, #colaypola, @ReddsColombia, #redds.
Ecuador’s brands
@ClubPremiumEc, #ClubPremium, #ClubPremiumRoja, #ClubPremiumNegra, @cervezaconquer, @PilsenerEcuador, @Miller_Ecuador
9 Presentation information in footer
© SABMiller plc 2012
Internal Use / Confidential / Secret
Keywords used for streaming filter (2 of 2)
Salvador’s brands
@BarenaHN, #Barena, @Barena_Peru, @PilsenerSV, @PilsenerLiteSV, #PilsenerLite, #Pilsener, @RegiaSV, #regiaextra, @SupremaSV, #cervezasuprema, @GoldenSV, #cervezagolden
Honduras’ brands
@ImperialHN, #CervezaImperial, #imperialhn, @PortRoyalhn_com, @SalvaVidaHn, #salvavida, #salvavidahn, #cervezasalvavida, @BarenaHN, #Barena
Panamá’s brands
@cervezaaltlas, #cervezaatlas, @Cerveza_BALBOA, #cervezabalboa
Peru’s brands
@cerarequipena, #cervezaarequipena, #cervezaarequipeña, #arequipeña, #arequipena, #Barena, @Barena_Peru, @CristalPeru, #cervezacristal, @cusquenaperu, #cusqueñaperu, #cusquenaperu, #cervezacusqueña, #cervezacusquena, #cusqueñamalta, @Pilsen_Callao, #PilsenCallao, @Pilsen_Trujillo, #PilsenTrujillo, #CervezaSanJuan, @Backus_Ice, #BackusIce
10 Presentation information in footer
© SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by country
12 Presentation information in footer
© SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by country
13 Presentation information in footer
Number of twits
© SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by country
14 Presentation information in footer
Average consumer
preference rate
© SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by country
15 Presentation information in footer
Using SSD (sum of squared distance) from “Number of twits”
and “rate”, Salvador did have the highest rate: 4,83.
© SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by country
16 Presentation information in footer
Using SSD, Argentina did
have the lowest rate: 4,32
© SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by brand
17 Presentation information in footer
© SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by brand
18 Presentation information in footer
Using SSD, Pilsener
Salvador did have the
highest rate: 4,85
© SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by brand
19 Presentation information in footer
Using SSD, Barena did
have the lowest rate: 3,94
© SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by date and day of week
20 Presentation information in footer
© SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by date
21 Presentation information in footer
GAP due to a technical issue
on the streamer program
© SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by day of week
22 Presentation information in footer
© SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by day of week
23 Presentation information in footer
From Wednesday to Friday the
number of twits increases as
well as the rate.
© SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by day of week
24 Presentation information in footer
From Friday to Sunday the
number of twits decreases as
well as the rate.
© SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by hour
25 Presentation information in footer
© SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by hour
26 Presentation information in footer
From 10 am to 1 am, most
of twits were posted. But,
in average, rates do not
change much.
© SABMiller plc 2012
Internal Use / Confidential / Secret
Word picture of common words of positive preference twits.
27 Presentation information in footer
© SABMiller plc 2012
Internal Use / Confidential / Secret
Word picture of common words of negative preference twits.
28 Presentation information in footer
Conclusion
I could conclude from this simple exercise that
sentiment and opinion analyses on twits related
to SAB Miller LATAM brands can be an
alternative tool to effectively measure
customer preferences.