using sociolinguistics to enhance customer segmentation, geomarketing & diversity analytics

24
Elian CARSENAT, NamSor 2016-01-28 “Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics”

Upload: digaai

Post on 19-Jan-2017

97 views

Category:

Government & Nonprofit


0 download

TRANSCRIPT

Page 1: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

Elian CARSENAT, NamSor 2016-01-28

1 “Using Sociolinguistics to

Enhance Customer

Segmentation, Geomarketing

& Diversity Analytics”

Page 2: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

Founder Bio 2

Elian CARSENAT, a computer scientist trained at ENSIIE/INRIA, started his career at JP Morgan in Paris in 1997. He later worked as consultant and managed business & IT projects in London, Paris, Moscow and Shanghai.

In 2012, Elian created NamSor, a piece of sociolinguistics software to mine the 'Big Data' and better understand international flows of money, ideas and people. NamSor helps answer the perennial question all countries ask about their diasporas – who are they, where are they and what are they doing.

NamSor has been used to attract Foreign Direct Investments (FDI), to build-up international collaboration within scientific communities, to attract and facilitate Diaspora investment in Start-ups... as well as other use cases.

http://fr.linkedin.com/in/eliancarsenat/en

Page 3: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

NamSor sorts Names 3

Names are meaningful : we use sociolinguistics to extract their

semantics and deliver actionable intelligence.

Names reflect cultural Identity

NamSor data mining software

recognizes the linguistic or cultural

origin of names in any alphabet /

language, with fine grain and high

accuracy.

Page 4: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

4

Gender Gap

in

Fina

ncin

g

Page 5: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

5

Gender Gap

in

Sci

enc

e

Page 6: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

Diasporas in Science (in collaboration with French INSERM)

6

Thomson Reuters WebOfScience (6 countries, 250k scientists, 50k papers)

“Analysts uncovered amazing patterns in the way scientists’ names correlate with whom they publish, and who

they cite in their papers - not just in case of a particular country, but globally. Tania Vichnevskaia of the French

National Institute for Health (INSERM) presented the paper ‘Applying onomastics to scientometrics‘ at IREG

International symposium 2015 organised by University of Maribor and Shanghai Jiao Tong University. The

paper was prepared jointly with NamSor, a private start-up company specialized in mapping international

Diasporas.”

Source: WoS; Data Mining: INSERM with NamSor

Page 7: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

Scholar names in some Canadian Universities Chinese, Indian, Iranian, Moroccan, Italian names

7

Canadian Science Policy Conference - CSPC2015

Page 8: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

8

USE CASE – BOSTON CITY GEODEMOGRAPHICS

Page 9: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

US Census vs NamSor geo-demographics

9

In July 2015, the US Government announced new

rules that will require all cities and towns receiving

federal housing funds to assess patterns of

segregation.

The NY Times has published interactive maps of

Boston geo-demographics, which we can compare

with the information inferred by NamSor

Page 11: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

Using Voters List

US Census: 1pixel = 40 inhabitants

Voters List: 1 pixel = 1 voter

11

Source: Boston Voters List

Visualization : ESRI

Data Mining: NamSor+RapidMiner

Page 12: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

Breaking down ‘White’ and ‘Asian’ into

Portuguese, Spanish, Italian, India, Pakistan, China, ...

12

Source: Boston Voters List

Visualization : ESRI

Data Mining: NamSor+RapidMiner

Page 13: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

Who LIVES in New York ? 13

Page 14: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

Who OWNS in Brooklyn, NY? Inferring origin in NYC ACRIS (Real Estate OpenData)

14

> Brooklyn zip codes

> N

am

Sor

ori

gin

s

Page 15: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

Who OWNS in Brooklyn, NY? Inferring origin in NYC ACRIS (Real Estate OpenData)

15

Interesting ‘Little’ spots

ZIP 11209 : Irish

ZIP 11219 : Jewish

ZIP 11233 : African American

ZIP 11228 : Italian

ZIP 11208 : Hispanic

ZIP 11214 : Chinese

ZIP 11235 : Ukrainian/Russian

ZIP 11416 : Indian

ZIP 11222 : Polish

Page 16: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

16

USE CASE – ELECTIONS

Page 17: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

A Decision Tree from FLORIDA Voters List

(open data) 17

//TODO : based on FLORIDA

Page 18: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

Segmenting ‘Asian’ voters would improve the model Using NamSor Origin to infer : Indian, Vietnamese, Korean, Chinese, ...

18

Tree

ethno = (Chin: DEM {DEM=3311, REP=2636, IDP=48, INT=199, LPF=9, GRE=5, CPF=2, REF=2, AIP=0, PSL=0}

ethno = (Indi: DEM {DEM=12509, REP=4565, IDP=95, INT=432, LPF=32, GRE=10, CPF=0, REF=1, AIP=3, PSL=1}

ethno = (Indo: DEM {DEM=984, REP=718, IDP=9, INT=43, LPF=4, GRE=1, CPF=1, REF=0, AIP=0, PSL=0}

ethno = (Japa: DEM {DEM=488, REP=403, IDP=9, INT=34, LPF=2, GRE=1, CPF=1, REF=0, AIP=0, PSL=0}

ethno = (Kore: REP {DEM=1148, REP=1174, IDP=11, INT=75, LPF=3, GRE=0, CPF=0, REF=0, AIP=0, PSL=0}

ethno = (Mong: DEM {DEM=24, REP=22, IDP=0, INT=0, LPF=0, GRE=1, CPF=0, REF=0, AIP=0, PSL=0}

ethno = (Paki: DEM {DEM=4411, REP=843, IDP=25, INT=110, LPF=9, GRE=6, CPF=0, REF=0, AIP=0, PSL=0}

ethno = (Viet: REP {DEM=3798, REP=5780, IDP=65, INT=272, LPF=10, GRE=5, CPF=3, REF=3, AIP=2, PSL=0}

Pakistanis, Vietnamese didn’t vote the same.

Page 19: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

19

USE CASE – TRAVEL INTELLIGENCE

Page 20: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

“Incredible India” – 1.2 BN People Indian onomastics by State/Union Territory

20

Names in LATIN, BENGALI, DEVANAGARI, GUJARATI, GURMUKHI, KANNADA, MALAYALAM,

ORIYA, TAMIL, TELUGU, ARABIC

Page 21: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

ASSAM: Karbi Anglong, within district Inter-caste marriages ?

21

output Input Input

clusterId clusterParentId Firstname LastName parent is FirstParentLastParent

L25354:253L64958:2797 A¡à[\¹ ¹}[ššã husband ¤àl¡ü[W¡³ [W¡}>à¹

L47490:1593L64958:2797 ¤àK[¹ [W¡}>๠father ¤àl¡ü[W¡³ [W¡}>à¹

L28582:1209L47490:1593 [³>à Òü}[t¡šã husband ¤àK[¹ [W¡}>à¹

L23643:669L35593:510 ™åKƒ}à [W¡}>๚ã father ¤ài¡[W¡³ [W¡}>à¹

L23643:669L35593:510 ³à>àÒü [W¡}>๚ã father ¤ài¡[W¡³ [W¡}>à¹

L47490:1593L35593:510 W¡àì=¢ [W¡}>๠father Wå¡ì¤ [W¡}>à¹

L23643:669L35593:510 A¡àì¹ t¡àì¹ïšã husband Wå¡ì¤ [W¡}>à¹

L35593:510L47490:1593 [ƒ[ºš [W¡}>๠father W¡àì¤ [W¡}>à¹

L23643:669L47490:1593 [¹>à [W¡}>๚ã father W¡àì¤ [W¡}>à¹

parent is husband

Count of se ria l Column Labe ls

Row Labe ls L47490:1593 L116370:3612 L54332:2031 L184096:2297 L35593:510 L168871:1819 L135664:4438 L51271:837

L23643:669 6931 84 5099 15 2069 28 791 1924

L151415:3559 18 212 11 6446 19 1217 55 6

L28582:1209 5132 68 3565 10 1494 17 592 1323

L116370:3612 66 10283 38 72 40 321 137 29

L9839:442 2491 60 1851 9 774 11 321 660

L168871:1819 7 263 6 361 8 2730 24 4

L23642:141 1198 8 822 2 375 4 156 332

L25354:253 1181 12 932 375 7 100 323

L135664:4438 20 154 5 22 19 44 2212 3

L87032:1210 11 315 13 51 14 141 37 9

L90333:3644 3 204 2 31 190 5

L184096:2297 13 1735 3 84 11 1

L87031:697 4 136 4 12 3 137 4 5

L14495:131 614 10 432 167 4 68 163

L63724:1422 17 83 10 34 34 28 96 6

L98994:891 31 161 46 21 19 59 21 5

ASSAM: Karbi Anlong district

names clusteredL116370:3612L23643:669L151415:3559L47490:1593L28582:1209L54332:2031L184096:2297L168871:1819L9839:442L135664:4438L87032:1210L90333:3644L35593:510L51271:837L63724:1422L154797:1168L64959:1796L23642:141L87031:697L6536:295L98994:891L25354:253L64958:2797L30570:2614L90334:1189L95839:287L100510:366L121390:783Other

Source: Voters List; Data Mining: NamSor

Page 22: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

Applications to an Airline’s customer intelligence

22

A global airline : ‘For 93% of our customers, when

NamSor recognizes an Indian

name, the client has travelled to

India in the past.’

Finer grain segmentation using

names brings insights about

diasporas travel pattern

visiting family and friends in

their home country, as well as

their specific needs.

Page 23: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

Using NamSor API 23

(1) Get an API Key (2) Get NamSor

RapidMiner Extension

Page 24: Using Sociolinguistics to Enhance Customer Segmentation, Geomarketing & Diversity Analytics

Thank you!

Elian CARSENAT,

[email protected]

Phone : +33 6 52 77 99 07

http://www.namsor.com/

24

Juillet 2013, Ambassade de Lituanie à Paris