cctst - diversity and team science...30 countries and the winner is… ensemble 10.06% bellkor...

83
Diversity and Team Science Scott E Page Santa Fe Institute University of Michigan

Upload: others

Post on 31-Jan-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

  • Diversity and Team Science

    Scott E Page Santa Fe Institute

    University of Michigan

  • Outline

  • A Great Big Complex World

    Diversity and Prediction Diversity Prediction Theorem

    Model Diversity Theorem

    Categorical Diversity Theorem

    Big Science

  • A Great Big (Complex) World

  • http://library.sc.edu

  • http://crissp.eu

  • Authors Per Paper: Computer Science

  • Identity and Cognitive Diversity

  • Authors Per Paper: Computer Science

    Courtesy: Jacob Foster

  • ``Few if any funding programs support research on the effectiveness of science teams and larger groups.’’

  • Conclusions 1&2

    #1: Team composition matters.

    Diversity is critical

    #2: Team composition systematic

  • Prediction

  • “All Models Are Wrong”

    - George Box

  • “Hence, our truth is the intersection of independent lies.”

    - Richard Levins

  • 1197

  • 1197

    Average = 1,197

  • Diversity Prediction Theorem

  • Diversity Prediction Theorem

    Crowd Error = Average Error - Diversity

  • Crowd Error = Average Error - Diversity

  • Crowd Error = Average Error – Diversity

    0.6 = 2,956.0 - 2955.4

  • Actual Number 103 Class Prediction 70.04

    Diversity Prediction Theorem Crowd = Average – Diversity

    1086 = 3456 – 2370

    Number Finnish Athletes

  • Actual Number 103 Class Prediction 65.6

    Diversity Prediction Theorem Crowd = Average – Diversity

    1401 = 7039 – 5638

    Number Finnish Athletes

  • Actual Number 319

    Class Prediction 318.58

    Diversity Prediction Theorem Crowd = Average – Diversity

    0.1 = 202,323.6 – 202,323.5

    Latvian Cars Per 1000

  • Actual Number 319

    Class Prediction 369

    Diversity Prediction Theorem Crowd = Average – Diversity

    2542 = 89,743 – 87,201

    Latvian Cars Per 1000

  • Christina Romer

  • Model Diversity Theorem

  • N Models

    Distribution across models:

    (P1, P2, P3 … Pn)

    Pj = probability someone uses model j

    F = probability two individuals have the same model

  • Probability two people use the same model.

    Match(P) = (P1)2 + (P2)2 +… (Pn)2

  • Diversity Index: Δ = 1/Match(P)

    Δ = Effective number of parties (political science)

    firms (economics)

    species (ecology)

  • Diversity Index

    Δ = (p12 + p22 + p32)-1

    (1/3)2 + (1/3)2 + (1/3)2 = 3(1/9) = 1/3

    Δ = 3

    (1/2)2 + (1/4)2 + (1/4)2 = 6/16

    Δ = 2.66

  • Claim: N independent predictive models with a diversity index of Δ and an average variance of V, have an expected squared error equal to:

    V/Δ

    Economo, Hong, and Page (2015)

    Model Diversity Theorem

  • Size DIVERSITY (Δ) Matters

  • Goncola Abecasis

  • Categorical Diversity Theorem

  • Partition the set of possible instances into categories

    Make a prediction for each category

  • Computer Science

    PAC Learning: Valient Robust Classification: Provost and Foster

    Ensemble Learning: Desarthy and Sheela

  • Categorical Predictive Models

    Partition the set of possible instances into categories

    Make a prediction for each category

  • 300 200 100 200

    A B C D

    Chloride in Water: mg/L

  • 300 200 100 200

    A B C D

    Chloride in Water: mg/L

    240 170

  • Variation in Data: 20,000 20,000 = (200-200)2 +(300-200)2 +(100-200)2 +(200-200)2

  • Variation in Data: 20,000 20,000 = (200-200)2 +(300-200)2 +(100-200)2 +(200-200)2

    Residual Variation: 12,000 20,000 = (200-240)2 +(300-240)2 +(100-170)2 +(200-170)2

  • Variation in Data: 20,000 20,000 = (200-200)2 +(300-200)2 +(100-200)2 +(200-200)2

    Residual Variation: 11,000 20,000 = (200-240)2 +(300-240)2 +(100-170)2 +(200-170)2

    R2: 0.45

  • Total Variation (20,000)

  • 300 200 100 200

    A B C D

    Chloride in Water: mg/L

  • Categorization Loss: 10,000 Mean of Category 1: 250 Categorization Loss 1 : 5000 5000 = (200-250)2 +(300-250)2 Mean of Category 2: 150 Categorization Loss 2 : 5000 5000 = (200-150)2 +(100-150)2

  • Categorization Error

    Possible Variation Explained 10,000

    Categorization Loss

    10,000

  • 300 200 100 200

    A B C D

    Chloride in Water: mg/L

    240 170

  • Prediction Error: 1000= 200 + 800 Prediction Category 1: 240 Actual Value: 250 Prediction Error: 200 = (250-240)2 +(250-240)2 Prediction Category 2: 170 Actual Value: 150 Prediction Error: 800 = (170-150)2 +(170-150)2

  • Categorization Error

    Variation Explained 9,000

    Categorization Loss

    10,000 Prediction Error

    1000

  • Categorical Diversity Theorem

    Variation = Explained + Category Loss + Predictive Error

  • Value of Distinct Categories

    Distinct Categories result in diverse categorization losses and by the Diversity Prediction Theorem lower error.

  • Six years of data Half million users 17,700 movies Data divided into (training, testing) Testing Data dived into (probe, quiz, test)

  • Singular Value Decomposition

    Each movie represented by a vector: (p1,p2,p3,p4…pn)

    Each person represented by a vector: (q1,q2,q3,q4…qn)

  • Christina and David Rom

    Robert Bell

  • BellKor

    50 dimensions

    107 models

    Best Model: 6.8%

  • BellKor

    50 dimensions

    107 models

    Best Model: 6.8%

    Combination of Models: 8.4%

  • BellKor’s Pragmatic Chaos

    Best Model 8.4% Ensemble: 10.1%

  • Enter ``The Ensemble’’

    23 Teams

    30 Countries

  • And the Winner is…

    Ensemble 10.06%

    Bellkor 10.06%

  • But, the Real Winner is…

    Ensemble 10.06%

    Bellkor 10.06%

    50-50 Combination 10.19%

  • Combine accurate and diverse models to make good predictions.

  • Big Science

  • Leaving Our Silos

  • Medicine Sociology

    Chemistry Economics

  • Freeman and Huang

    Citations Impact Factor

    # Addresses + + # References + + # Past papers + + Homophily - -

  • Science & Engineering Social Science

    Papers > 100 Cites

    0.00%

    0.02%

    0.04%

    0.06%

    0.08%

    0.10%

    0.12%

    0.14%

    Team

    Team

    Solo Solo

  • Jones B, Wuchty S, Uzzi B (2008) Multi-University Research Teams: Shifting Impact, Geography, and Stratification in Science. Science 322: 1259

    Inter-University Collaboration Increases Impact

  • Cummings, J. N., Kiesler, S., Zadeh, R., & Balakrishnan, A. (2013). Group heterogeneity increases the risks of large group size: A longitudinal study of productivity in research groups. Psychological Science, 24(6), 880-890.

  • Cummings, J. N., Kiesler, S., Zadeh, R., & Balakrishnan, A. (2013). Group heterogeneity increases the risks of large group size: A longitudinal study of productivity in research groups. Psychological Science, 24(6), 880-890.

  • Cummings, J. N., Kiesler, S., Zadeh, R., & Balakrishnan, A. (2013). Group heterogeneity increases the risks of large group size: A longitudinal study of productivity in research groups. Psychological Science, 24(6), 880-890.

  • Best papers have low proximity

    Best patents have low proximity

  • Atypical Connections Variable Odds Ratio

    Years since PHD 1.14

    Prior Cites 2.25

    Author Count 0.8

    Depth (HHI) 3.29

    Atypical Connect 15.17

    ``Recombinant search and breakthrough idea generation: An analysis of high impact papers in the social sciences’’ Melissa A. Schilling and Elad Green Research Policy, 2011, vol. 40, issue 10, pages 1321-1331

    Melissa Schilling

  • Q?

    Diversity and Team ScienceOutlineSlide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Identity and Cognitive Diversity�Slide Number 10Slide Number 11Conclusions 1&2Prediction�Slide Number 14“All Models Are Wrong”“Hence, our truth is the intersection of independent lies.” Slide Number 17Slide Number 18Slide Number 1911971197Diversity Prediction Theorem�Diversity Prediction TheoremSlide Number 24Slide Number 25Slide Number 26Slide Number 27Slide Number 28Slide Number 29Slide Number 30Slide Number 31Model Diversity TheoremSlide Number 33Slide Number 34Slide Number 35Diversity IndexModel Diversity TheoremSize DIVERSITY (Δ) MattersSlide Number 39Slide Number 40Categorical Diversity TheoremSlide Number 42Computer ScienceCategorical Predictive ModelsSlide Number 45Slide Number 46Slide Number 47Slide Number 48Slide Number 49Slide Number 50Slide Number 51Slide Number 52Slide Number 53Slide Number 54Slide Number 55Slide Number 56Categorical Diversity TheoremValue of Distinct CategoriesSlide Number 59Slide Number 60Slide Number 61Singular Value DecompositionSlide Number 63BellKorBellKorBellKor’s Pragmatic ChaosEnter ``The Ensemble’’And the Winner is…But, the Real Winner is… Big ScienceSlide Number 72Slide Number 73Freeman and HuangSlide Number 75Inter-University Collaboration Increases ImpactSlide Number 77Slide Number 78Slide Number 79Slide Number 80Slide Number 81Atypical ConnectionsQ? ��