"melting pot" of the sciences in interdisciplinary research

42
Stirring the melting pot of the sciences: Leading the way to interdisciplinary research Mixing Social Science into Computer Science, Bioinformatics and more. Natalie Jane de Vries

Upload: natalie-de-vries

Post on 31-Jul-2015

61 views

Category:

Presentations & Public Speaking


2 download

TRANSCRIPT

Page 1: "Melting Pot" of the Sciences in interdisciplinary research

Stirring the melting pot of the sciences: Leading the way to interdisciplinary research

Mixing Social Science into Computer Science, Bioinformatics and more.

Natalie Jane de Vries

Page 2: "Melting Pot" of the Sciences in interdisciplinary research

Introduction - The University of Newcastle and CIBM

• The Newcastle region is the second most populated area in the Australian state of New South Wales (approx 510,000)

• Situated 162 km (2 hours) North of Sydney in the Hunter Region

• University of Newcastle established: 1965• Directors of CIBM: Prof. Pablo Moscato and Co-director Prof. Rodney Scott

Page 3: "Melting Pot" of the Sciences in interdisciplinary research

The Centre for Bioinformatics, Biomarker Discovery and Information-based Medicine – Background

• One of only 10 Priority Research Centres of The University of Newcastle.

• Origin: The Newcastle Bioinformatics Initiative (2002-2006) established by the work of Moscato and Berretta in Computer Science

3

BioinformaticsThe application of Computer

Science and Information Technology to Biology/Life

Sciences

Information-based Medicineis a shift toward a future of

medicine that can become more personalized, more predictive,

and ultimately more preventative

Page 4: "Melting Pot" of the Sciences in interdisciplinary research

“Melting pot” of the Sciences?

• Big Data• Data Analytics• Consumer Insights• Consumer Analytics• ‘Internet of things’• Social Media

Analysis• Clustering/

subtyping/segmenting

• Ordering• Ranking• Optimization

4

• Community Detection• Graph analysis• Similarity Measures• Classification• Characterisation• Predictive Analytics• Etc..

Page 5: "Melting Pot" of the Sciences in interdisciplinary research

5

Page 6: "Melting Pot" of the Sciences in interdisciplinary research

AgendaWhat will I talk about today?

• Part 1) General Introduction to the mixing of Computer Science, Social Science, Marketing and Consumer Behaviour at out Centre

• Part 2) Clustering and Segmentation– From Breast Cancer Subtypes to Consumer Behaviours to Social

Media Metrics data and more…

• Part 3) Reverse Engineering Consumer Behaviour Modelling Constructs from Data– We introduce the idea of functional constructs to model online

customer engagement behaviours through symbolic regression

• Part 4) Future Research Directions– Future Directions, Aims, Conclusions and time for questions

6

Page 7: "Melting Pot" of the Sciences in interdisciplinary research

Part 1: Computer Science and Consumer Behaviour Research

• Increase in amount and size of consumer-related data• Online technologies generate large datasets• Increase in online behaviours towards brands• Increasing importance of social media in marketing strategies• Need for greater understanding of consumers through e.g. clustering

consumers (or objects in general) into similar groups

Page 8: "Melting Pot" of the Sciences in interdisciplinary research

Part 2: Clustering and Segmentation

Complete graph Minimum Spanning Tree Select and remove edges that are not k-Nearest Neigbors

Final forest (a forest is a set of trees) = clusters

Previous (large scale) applications of the MST-kNN method:• U.S. Stock market time series data (Inostroza-Ponta, Berretta, & Moscato, 2011)

• Yeast gene expression data (Inostroza-Ponta, Mendes, Berretta, & Moscato, 2007)

• Alzheimer’s disease data - in the order of 1 million data elements (Arefin, Mathieson, Johnstone, Berretta, & Moscato, 2012)

• Prostate cancer data (Capp et al., 2009)

• Social Media (Facebook) Metrics Data (Lucas et al. 2014)

These examples show the methodology proposed here has a proven scalability for larger datasets

Novel methodology of clustering by CIBM’s researchers: MST-kNN

Page 9: "Melting Pot" of the Sciences in interdisciplinary research

Biomarker Discovery and Clustering in Breast Cancer

9

• Incidence – In 2014, it is estimated that 15,270 women will be diagnosed with breast cancer in Australia.

• Luminal A• Luminal B• HER2-enriched• Normal-like• Basal-like

Molecular Subtypes

Page 10: "Melting Pot" of the Sciences in interdisciplinary research

TreatmentNot all patients need the same treatment or respond to the same treatment

• Surgery• Radiotherapy• Hormonal therapy• Chemotherapy

10

Page 11: "Melting Pot" of the Sciences in interdisciplinary research

Luminal A

Luminal B

Her2

Normal-like

Basal

Controls

METABRIC data setPAM50 labels

Figure. MST-kNN clustering.

Page 12: "Melting Pot" of the Sciences in interdisciplinary research

12

The MST-kNN Clustering Method in Consumer Behaviour Research

Page 13: "Melting Pot" of the Sciences in interdisciplinary research

Customer Engagement Behaviours- behavioural manifestations of Customer Engagement (CE) toward a firm after and beyond purchase (van Doorn et al. 2010)

13

Online Customer Engagement Survey/Questionnaire Tool

Page 14: "Melting Pot" of the Sciences in interdisciplinary research

Methodological Outline14Categor

y No. Explanation Percentage of sample

1 Fashion Brands 31.54%

2Community, Charities, Personality and Sports Fan Pages

23.99%

3 Other Services 19.68%

4 Other Consumer Goods 8.09%

5 Hospitality (Restaurants, Cafes, Bars) 7.28%

6 Consumer Electronics 7.01%

7 Automotive 2.43%

Respondents’ chosen brand categories

Page 15: "Melting Pot" of the Sciences in interdisciplinary research

Methodology: Difference Meta-features

The difference of values between two measured features might be capable to distinguish between two given categories, even when those features are not able to do so alone (De Paula et al, 2011)

Previous successful application of difference meta-features in Alzheimer’s Disease biomarker detection (De Paula et al. 2011) and (Arefin et al. 2012), both in PLoS ONE.

Data collection and pre-

processing

Meta-features: Pair-wise

differences

Meta-features: Pair-wise products

Intra- and inter-construct relationships

Distance Computation

Data preparation

1 2 3 4 5 6 7 8 9 10 11

-6

-4

-2

0

2

4

6

8

10

12

f1f2Meta-f

Class A Class B

1 2 3 4 5 6 7 8 9 10 11 12

-6

-4

-2

0

2

4

6

8

10

12

f1 f2

Meta-f

Class A Class B

Page 16: "Melting Pot" of the Sciences in interdisciplinary research
Page 17: "Melting Pot" of the Sciences in interdisciplinary research

Results: Clustering Highlights

Heterogeneous cluster?More homogenous cluster?

Page 18: "Melting Pot" of the Sciences in interdisciplinary research

Results: Clustering and Significance Values

Data Rows selected Distance Metric

MST-kNN merged with the kNN cliques of

size

p-values

Wilcoxon’s Test Kruskal-Wallis

Original All

Robust 5NN 0.021187 0.042364

Spearman 6NN 0.025987 0.051962

Robust 6NN 0.028565 0.057117

Pearson 3NN 0.030232 0.060451

Spearman 3NN 0.040661 0.081306

Euclidean 6NN 0.041232 0.082448

Difference Metafeatures

‘Intra’ constructs 

Robust 3NN 0.016551 0.033095

Robust 6NN 0.017177 0.03434

Pearson 3NN 0.018628 0.0372481

Pearson 6NN 0.019066 0.038124

Pearson 5NN 0.019656 0.039303

All Pearson 3NN 0.020594 0.041180

Product Metafeatures

‘Inter’ ConstructsSpearman 3NN 0.016949 0.033891

Pearson 4NN 0.01757 0.035132

All Pearson 4NN 0.017721 0.035433

‘Inter’ ConstructsPearson 6NN 0.01781 0.035611

Pearson 3NN 0.017816 0.035624

‘Inter’ Constructs Robust 4NN 0.017998 0.035988

Page 19: "Melting Pot" of the Sciences in interdisciplinary research

Future Research Directions in this study

• Various domains and contexts to apply the novel process outlined in this study

• Combine a study using survey data as well as ‘live’ behaviour data from social networking sites (real-time interactions)

• Further exploration of meta-features in both survey data and ‘real’ online behaviour clustering studies; ‘differences’ meta-features in this study yielded better results

• This study guides the development of future feature selection models to identify group of consumers according to higher-order characteristics.

Page 20: "Melting Pot" of the Sciences in interdisciplinary research

20

The MST-kNN Method in Social Media Metrics Data

Engagement in Motion: Exploring Short Term Dynamics in Page-level Social Media Metrics

Benjamin Lucas1,2, Ahmed Shamsul Arefin1,3, Natalie de Vries1,3, Regina Berretta1,3, Jamie Carlson1,2, Pablo Moscato1,3

1 The University of Newcastle, Australia2 Newcastle Business School, Faculty of Business and Law3 The Priority Research Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine

Page 21: "Melting Pot" of the Sciences in interdisciplinary research

21

Page 22: "Melting Pot" of the Sciences in interdisciplinary research
Page 23: "Melting Pot" of the Sciences in interdisciplinary research
Page 24: "Melting Pot" of the Sciences in interdisciplinary research

Part 3: Reverse Engineering Consumer Behaviour Modelling Constructs from Data

Consumer Behaviour Modelling is usually done by testing hypotheses that are generated from theory

24

For example:

Source: de Vries & Carlson 2014 – Journal of Brand Management

Items (questions) make up one theoretical construct in Structural Equation Modelling (Hair et al. 2014). For example:

Page 25: "Melting Pot" of the Sciences in interdisciplinary research

25

Page 26: "Melting Pot" of the Sciences in interdisciplinary research

26

Page 27: "Melting Pot" of the Sciences in interdisciplinary research

Symbolic Regression Analysis27

Page 28: "Melting Pot" of the Sciences in interdisciplinary research

Symbolic Regression Analysis 28

Page 29: "Melting Pot" of the Sciences in interdisciplinary research

Figure 2. The Figure shows the items ‘used’ by Eureqa through symbolic regression setting each of the five ENG items as dependent variables (obtained using the whole data set).

de Vries NJ, Carlson J, Moscato P (2014) A Data-Driven Approach to Reverse Engineering Customer Engagement Models: Towards Functional Constructs. PLoS ONE 9(7): e102768. doi:10.1371/journal.pone.0102768http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0102768

Page 30: "Melting Pot" of the Sciences in interdisciplinary research

Figure 3. Data Set A – Network found as a result of the application of the model finding optimization software on each variable as a target.

de Vries NJ, Carlson J, Moscato P (2014) A Data-Driven Approach to Reverse Engineering Customer Engagement Models: Towards Functional Constructs. PLoS ONE 9(7): e102768. doi:10.1371/journal.pone.0102768http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0102768

Page 31: "Melting Pot" of the Sciences in interdisciplinary research

Inter-rater Agreement

31

de Vries NJ, Carlson J, Moscato P (2014) A Data-Driven Approach to Reverse Engineering Customer Engagement Models: Towards Functional Constructs. PLoS ONE 9(7): e102768. doi:10.1371/journal.pone.0102768http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0102768

Page 32: "Melting Pot" of the Sciences in interdisciplinary research

Our Future research directions

• Work on scalability of methodologies• Improve optimisation algorithms (minimum distance, maximum

objectives, etc.)• Meta-heuristics (Memetic Algorithms) for applications on social

sciences• Network alignment (complex network analysis) of consumer

behaviour networks for uncovering structure in datasets• Proposal of edited book in large scale “Business and Consumer

Analytics” (Springer)• Smart Cities Network (sensor data, optimisation of cities and their

networks)• Digital Economy technologies

Page 33: "Melting Pot" of the Sciences in interdisciplinary research

UoN and UKM

Things to remember:• UoN is always open for research collaborations (depending on funds – we operate on a project basis)• At CIBM we have supercomputing capacity available for large-scale projects• In our team we have particular strong expertise in operations research and management science• CIBM is open to diversify into new areas (e.g. computational social science as demonstrated today)• As Prof. Moscato says: “Do not hesitate to throw and ‘odd-ball’. Either we could be interested, or we

could put you in touch with other collaborators and colleagues”.

Page 34: "Melting Pot" of the Sciences in interdisciplinary research

Terima Kasih

Questions?

Page 35: "Melting Pot" of the Sciences in interdisciplinary research

References• Arefin AS, A, Mathieson L, Johnston D, Berretta R, Moscato P (2012) Unveiling Clusters of RNA Transcript Pairs Associated with

Markers of Alzheimer’s Disease Progression, PLOS ONE, DOI: 10.1371/journal.pone.0045535• Capp et al. (2009) Is there more than one proctitis syndrome? A revisitation using data from the TROG 96.01 trial, Radiotherapy

and Oncology, 90(3), 400-407• Hair, J. F., Hult, G. T. M., Ringle, C. M. and Sarstedt, M. (2014) A Primer on Partial Least Squares Structural Equation Modeling

(PLS-SEM) Los Angelos: Sage Publications Inc.• Inostroza-Ponta M, Mendes A, Berretta R, Moscato P (2007) An Integrated QAP-Based Approach to Visualize Patterns of Gene

Expression Similarity, Progress in Artificial Life, Lecture Notes in Computer Science, 4828, pp 156-167• Inostroza-Ponta M, Berretta R, Moscato P (2011) QAPgrid: A Two Level QAP-Based Approach for Large-Scale Data Analysis and

Visualization, PLOS ONE, DOI: 10.1371/journal.pone.0014468• Lucas B, Arefin AS, de Vries NJ, Berretta R, Carlson J, Moscato P (2014) Engagement in Motion: Exploring Short Term Dynamics

in Page-Level Social Media Metrics, IEEE Conference on Social Computing and Big Data and Cloud Computing (Sydney)• de Vries NJ, Carlson J (2014) Examining the drivers and brand performance implications of customer engagement with brands in

the social media environment, Journal of Brand Management, 21, 495-515• de Vries NJ, Carlson J, Moscato P (2014) A Data-Driven Approach to Reverse Engineering Customer Engagement Models:

Towards Functional Constructs, PLOS ONE, DOI: 10.1371/journal.pone.0102768• de Vries NJ, Arefin AS, Moscato P (2014) Gauging Heterogeneity in Online Consumer Behaviour Data: A Proximity Graph

Approach, IEEE Conference on Social Computing and Big Data and Cloud Computing (Sydney)• Marsden J, Budden D, Craig H, Moscato P (2013) Language Individuation and Marker Words: Shakespeare and His Maxwell's

Demon, PLOS ONE, DOI: 10.1371/journal.pone.0066813• Naeni LM, de Vries NJ, Reis R, Arefin AS, Berretta R, Moscato P (2014) Identifying Communities of Trust and Confidence in the

Charity and Not-for-Profit Sector: A Memetic Algorithm Approach, , IEEE Conference on Social Computing and Big Data and Cloud Computing (Sydney)

• van Doorn, J., Lemon, K. N., Mittal, V., Nass, S., Pick, D., Pirner, P. and Verhoef, P. C. (2010). Customer Engagement Behavior: Theoretical Foundations and Research Directions. Journal of Service Research, 13(3): 253-266.

35

Page 36: "Melting Pot" of the Sciences in interdisciplinary research

APPENDIX(Extra Slides)

36

Page 37: "Melting Pot" of the Sciences in interdisciplinary research

New Publication

Published 7th April 2015 in PLOS ONE

N J de Vries

R Reis

P Moscato

Clustering of consumers based on trust and donating behaviours in the not-for-profit sector

Including symbolic regression predictive modeling for consumer involvement with charities

37

Page 38: "Melting Pot" of the Sciences in interdisciplinary research

38

Page 39: "Melting Pot" of the Sciences in interdisciplinary research

Resulting Segments of the Australian Market

1. Non-institutionalist charity supporters

2. Resource allocation critics

3. Information-seeking financial sceptics

4. Non-questioning charity supporters

5. Non-trusting sceptics

6. Charity management believers

7. Institutionalist charity believers

http://journals.plos.org/plosone/article?id=10.1371%2Fjournal.pone.0122133

39

Page 40: "Melting Pot" of the Sciences in interdisciplinary research

IEEE Conference paperMethodology: Product Meta-features

The product of values between two measured features might be capable to distinguish between two given categories, even when those features are not able to do so alone.

This study is the first to trial the application of this idea.

Left, the values of f1 (blue) and f2 (red) do not distinguish the classes well but their product (meta-feature in green) does.

Data collection and pre-

processing

Meta-features: Pair-wise

differences

Meta-features: Pair-wise products

Intra- and inter-construct relationships

Distance Computation

Data preparation

1 2 3 4 5 6 7 8 9 10 11 120

2

4

6

8

10

12

14

16

18

f1

f2

Meta-f

Class A Class B1 2 3 4 5 6 7 8 9 10 11 12

0

2

4

6

8

10

12

14

16

18

f1f2Meta-f

Class A Class B

Page 41: "Melting Pot" of the Sciences in interdisciplinary research

My publications

• A Data-Driven Approach to Reverse Engineering Customer Engagement Models: Towards Functional Constructs (de Vries, Carlson and Moscato) http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0102768

• Examining the drivers and brand performance implications of customer engagement with brands in the social media environment (de Vries and Carlson): http://www.palgrave-journals.com/bm/journal/v21/n6/abs/bm201418a.html

• Gauging Heterogeneity in Online Consumer Behaviour Data: A Proximity Graph Approach (de Vries, Arefin and Moscato) http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7034833

• Engagement in Motion: Exploring Short Term Dynamics in Page-Level Social Media Metrics (Lucas et al) http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7034813&tag=1

• Identifying Communities of Trust and Confidence in the Charity and Not-for-Profit Sector: A Memetic Algorithm Approach (Moslemi et al) http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7034835&refinements%3D4251871666%26filter%3DAND%28p_IS_Number%3A7034739%29

Page 42: "Melting Pot" of the Sciences in interdisciplinary research

Other SourcesFirst uses of ‘meta-features’:• Differences in Abundances of Cell-Signalling Proteins in Blood Reveal Novel

Biomarkers for Early Detection Of Clinical Alzheimer's Disease (De Paula et al) http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0017481

• Unveiling Clusters of RNA Transcript Pairs Associated with Markers of Alzheimer’s Disease Progression (Arefin et al) http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0045535

MST-kNN papers:• An Integrated QAP-Based Approach to Visualize Patterns of Gene Expression

Similarity (Inostroza Ponta et al) http://link.springer.com/chapter/10.1007/978-3-540-76931-6_14

• kNN-MST-Agglomerative: A fast and scalable graph-based data clustering approach on GPU (Arefin et al) http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6295143