construction of micro genealogy as “social dna sequencing

23
Dealing with Complexity in Society: From Plurality of Data to Synthetic Indicators September 17 th and 18 th , 2015 1 Ji-Ping Lin (corresponding author) Research Center for Humanities and Social Sciences, Academia Sinica 128, Sec. 2, Academia Rd., Nankang 115 Taipei, Taiwan E-mail: [email protected] Construction of Micro Genealogy as “Social DNA Sequencing” for The Study of Social Assimilation and Integration: An Approach Using High Performance Computing (HPC) Applied to Cumulated Micro Data Sets of Taiwan Indigenous Peoples

Upload: others

Post on 08-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Construction of Micro Genealogy as “Social DNA Sequencing

Dealing with Complexity in Society:

From Plurality of Data to Synthetic Indicators

September 17th and 18th, 2015 1

Ji-Ping Lin (corresponding author)

Research Center for Humanities and Social Sciences, Academia Sinica

128, Sec. 2, Academia Rd., Nankang

115 Taipei, Taiwan

E-mail: [email protected]

Construction of Micro Genealogy as “Social DNA

Sequencing” for The Study of Social Assimilation and

Integration: An Approach Using High Performance

Computing (HPC) Applied to Cumulated Micro Data Sets of

Taiwan Indigenous Peoples

Page 2: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

OUTLINE

1. Background

2. Objectives

3. Methods

3.1 Conceptualization of “Social DNA”

3.2 Definition of indicators

3.3 Data

3.4 Computing methodology

4. Results

5. Conclusion and Discussion

Ji-Ping Lin Dealing with Complexity in society 2

Page 3: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

1. Background

Ji-Ping Lin Dealing with Complexity in society

Taiwan Indigenous peoples are a branch of Polynesian-Malaysian

(or Austronesian) ethnic groups in genetic and linguistic context,

whose ancestors have been living in Taiwan 8,000 years before the

influx of Chinese immigrants in the 17th century. Fig 1, Geographic Distribution of the Austronesians

Source: http://www.taiwandna.com/AborigineAustronesia.jpg 3

Page 4: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

1. Background (cont’d)

Ji-Ping Lin Dealing with Complexity in society

Various Aspects of TIPS like linguistic system & culture infrastrure don’t

support “Traditional Wisdoms”:

e.g.,

1) Law of Geographic Proximity

2) Zipf’s Power Law

e.g. Formosan languages are branch of Austronesian linguistic system,

but are irrelevant to Tibetan-Han linguistic system.

Source: http://historum.com/asian-history/77013-sino-tibetan-languages.html

Source: https://en.wikipedia.org/wiki/Austronesian_languages

Tibetan-Han languages Austanesian languages

4

Page 5: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

1. Background (cont’d)

Ji-Ping Lin Dealing with Complexity in society

Sakizaya Rukai

Seediq Amis

Tsou Kavalan

Bunun

Paiwan

A Look at TIPs (Taiwan

Indigenous Peoples)

Source: http://thetaiwanphotographer.com/ 5

Page 6: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

1. Background (cont’d)

Ji-Ping Lin Dealing with Complexity in society

Dao (Yami)

Saisiyat

Truku Puyuma

Source: http://thetaiwanphotographer.com/

Thao Dao (Yami)

A Look at TIPs

(Taiwan Indigenous

Peoples)

6

Page 7: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

1. Background (cont’d)

Ji-Ping Lin Dealing with Complexity in society

TIPs Population Spatial Distritution:

7

Page 8: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

1. Background (cont’d)

Ji-Ping Lin Dealing with Complexity in society

Based on the author previous co-authored

studies on the internal migration of TIPs, TIPs

are characterized by four features in terms of

population distribution and migration:

1. geographically segregated population

distribution,

2. very migratory and mostly rural-to-

urban migration,

3. periphery of metropolitan areas serving

as main destination choice for TIPs

rural-to-urban migrants;

4. weak ability of TIPs migrants to make

onward migration and mostly choose

return migration, once repeat migration

occurs (see Map 1). Source: 2000 Taiwan Population Census

8

Page 9: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

2. Objectives

Ji-Ping Lin Dealing with Complexity in society

To propose formal definition of and to compute two

group-level synthetic indicators measuring social integration

and social identity, that allows us to measure the quantitative

level of social integration and social identify, based on (1)

individual inter- & intra-ethnic marriage indicator derived

from personal marriage match by ethnicity and (2) individual

patriarchy & matriarchy indicator derived from the

constructed micro genealogy.

9

Page 10: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

3. Methods

3.1 Conceptualization of “Social DNA”

Ji-Ping Lin Dealing with Complexity in society

Constructing genealogy as a “Social DNA Sequencing”

e.g. a piece of “Social DNA”

10

Page 11: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

3. Methods (cont’d)

3.2 Definition of indicators

Ji-Ping Lin Dealing with Complexity in society

Intra-ethnic Marriage Pattern as Indicator of Integration

Definition of IEMI & EMSI:

1. Individual level: for any given pair of spouse, inter-ethnic marriage

indicator IEMI = 1 if they share the same ethnicity, otherwise IEMI

=0;

2. Group level: for a given ethnicity, ethnic marriage similarity

indicator (EMSI) is defined as the mean of all IEMIs over all

spouses in the given ethnicity.

11

Page 12: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

3. Methods (cont’d)

Ji-Ping Lin Dealing with Complexity in society

Identify of Matriarchy & Patriarchy

1. Definition of MI (Matriarchy Indicator): a child’s MI = 1 if

personal registered ethnicity = mother’s ethnicity, otherwise

MI = 0;

2. Definition of PI (Patriarchy Indicator): a child’s PI = 1 if

personal registered ethnicity = father’s ethnicity, otherwise PI

= 0;

3. Definition of group MI & PI: for a given ethnic group, its

ethnic MI & ethnic PI is defined respectively as the mean of

all individual MIs & PIs;

3.2 Definition of indicators

12

Page 13: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

3. Methods (cont’d)

Ji-Ping Lin Dealing with Complexity in society

3.3 Data

Household registration data

Household ID, Time of data creation, PIN, name, spouse name,

parents’ names, education, age, marital status, address, birth place,

mobility…

1tP

2tP3tP1tP 2tP

死亡或跨國移出

死亡或跨國移出

跨國移入 跨國移入

原住民基礎生活發展資料庫:人口及公務資料整合及動態結構

時間點 t1 t2 t3

教育、勞動及就業、所得、住宅、健保及醫療等公務資料

教育、勞動及就業、所得、住宅、健保及醫療等公務資料

教育、勞動及就業、所得、住宅、健保及醫療等公務資料

連結

連結 連結

連結連結

原住民戶籍資料 原住民戶籍資料 原住民戶籍資料

13

Page 14: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

3. Methods (cont’d)

Ji-Ping Lin Dealing with Complexity in society

Genealogy: Construction of

Micro Kinship & Friendship

Network

Recursively build-up process (see source code)

spouse

fath

er

moth

er

frie

ndship

Spouse D

ad

spouse

fath

er

moth

er

Spouse fath

er

Spouse m

oth

er

spouse

fath

er

moth

er

Spouse fath

er

Spouse m

oth

er

spouse

fath

er

moth

er

Spouse fath

er

Spouse m

oth

er

spouse

fath

er

moth

er

Spouse fath

er

Sp

ou

se

mo

the

r

spouse

fath

er

moth

er

Spouse fath

er

Spouse m

oth

er

3.4 Computing methodology

14

Page 15: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

3. Methods (cont’d)

Ji-Ping Lin Dealing with Complexity in society

3.4 Computing methodology (cont’d)

Record matching process 1. Load pooled data bank

into memory (n= 6.2m) 2. Sort pooled data bank

by gender, family name,

given name, and ethnicity

& construct index file

3. Load master data into

memory (n= 530

thousands)

4. Retrieve given and

family names from

master data to quickly

match micro genealogy

info via index file (n= 530

thousands)

(1)

(2)

(3)

(4)

(5)

(6)

(7)

15

Page 16: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

3. Methods (cont’d)

Ji-Ping Lin Dealing with Complexity in society

3.4 Computing methodology (cont’d)

Manipulation of digital hardware: In-memory computing is used to achieve

genealogy computing by overclocking digital hardware (1) CPUs & (2) IO bus &

(3) DRAM. DRAM overclocking I/O bus overclocking CPUs overclocking

16

Page 17: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

3. Methods (cont’d)

Ji-Ping Lin Dealing with Complexity in society

3.4 Computing methodology (cont’d)

Why In-memory Computing? to achieve high performance computing to decode

the complexity of intertwined micro social network

17

Page 18: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

3. Methods (cont’d)

Ji-Ping Lin Dealing with Complexity in society

3.4 Computing methodology (cont’d)

Digital hardware infrastructure for the study: Supermicro A7X9-7f mobo + dual

Intel Xeon E5-2680v2 + 256GB ECC DDR3 1600 + 80GB RAM disk + RAID0 of

2*1TB SATA3 Micron Crucial MX200 SSD + nVidia GTX Titan…

x2

+ +

+

+ +

x2 +

18

Page 19: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

3. Methods (cont’d)

Ji-Ping Lin Dealing with Complexity in society

3.4 Computing methodology (cont’d)

OS & Programming language: Win8 x.64 Enterprise + x.64 programming

language object Pascal & coding in RAD Studio Delphi ( click here to see codes)

Philae on 67P comet

19

Page 20: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

4. Results

Ji-Ping Lin Dealing with Complexity in society

4.1 Intra-ethnic Marriage Pattern as Indicator of Integration

20

Page 21: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

4. Results

Ji-Ping Lin Dealing with Complexity in society

4.1 Intra-ethnic Marriage Pattern as Indicator of Integration

Mean Var

All TIPS 0.15 0.13

Amis 0.24 0.18

Atayal 0.11 0.10

Paiwan 0.12 0.11

Bunun 0.16 0.13

Rukai 0.07 0.06

Puyuma 0.02 0.02

Tsou 0.09 0.08

Saysiyat 0.10 0.09

Tao 0.04 0.04

Thao 0.03 0.03

Kavalan 0.00 0.00

Taroko 0.06 0.05

Sakizaya 0.00 0.00

Sediq 0.01 0.01

Undocumented Indi. 0.04 0.04

EthnicityEthnic Mariage Similarity Indicator

Ethnic marriage similarity

indicator (EMSI) is defined

as the mean of all IEMIs over

all spouses in the given

ethnicity;

Integration declines as EMSI

increases.

In terms of the extent of

integration of TIPS with Taiwan

population system, ethnic

population size is negatively

associated with integration.

21

Page 22: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

4. Results

Ji-Ping Lin Dealing with Complexity in society

4.2 Identify of Matriarchy & Patriarchy

Mean Var Mean Var

All TIPs 0.59 0.24 All TIPs 0.55 0.25

Amis 0.61 0.24 Amis 0.57 0.25

Atayal 0.56 0.25 Atayal 0.53 0.25

Paiwan 0.56 0.25 Paiwan 0.53 0.25

Bunun 0.65 0.23 Bunun 0.56 0.25

Rukai 0.59 0.24 Rukai 0.55 0.25

Puyuma 0.45 0.25 Puyuma 0.57 0.24

Tsou 0.64 0.23 Tsou 0.48 0.25

Saysiyat 0.54 0.25 Saysiyat 0.62 0.24

Tao 0.41 0.24 Tao 0.66 0.22

Thao 0.38 0.24 Thao 0.62 0.24

Kavalan 0.53 0.26 Kavalan 0.47 0.26

Taroko 0.53 0.25 Taroko 0.51 0.25

Sakizaya 1.00 0.00 Sakizaya 0.00 0.00

Sediq 0.71 0.21 Sediq 0.31 0.22

Ethnic Patriarchy IndicatorEthnicity Ethnicity

Ethnic Matriarchy Indicator

In terms of ethnic

identify, TIPSs’

matriarchy identity

tends to outweigh

patriarchy identity a

little bit;

This finding fits

general wisdom and

TIPSs cultural

tradition.

22

Page 23: Construction of Micro Genealogy as “Social DNA Sequencing

Opening Session

1. Background

2. Objectives

3. Methods

4. Results

5. Conclusion and Discussion

5. Conclusion and Discussion

Ji-Ping Lin Dealing with Complexity in society

1. With gradual availability of massive micro data & decline of digital

hardware costs, computation for social complexity like the

construction of micro genealogy becomes feasible;

2. But computing issues are challenging & total costs of computing

are still time expensive;

3. The emerging data science that integrates multi-disciplinary

skills & knowledge of “hacking skills”, “advanced math/stat”,

and “domain knowledge” is crucial to overcome such

constraint.

23