espon 2013 database malmö seminar, 2-3 december 2009 thematic structuring of the espon 2013 db...

7
ESPON 2013 DATABASE Malmö Seminar, 2-3 December 2009 Thematic structuring of the ESPON 2013 DB Geoffrey Caruso and Nuno Madeira

Upload: david-higgins

Post on 27-Mar-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: ESPON 2013 DATABASE Malmö Seminar, 2-3 December 2009 Thematic structuring of the ESPON 2013 DB Geoffrey Caruso and Nuno Madeira

ESPON 2013 DATABASE

Malmö Seminar, 2-3 December 2009

Thematic structuring of the ESPON 2013 DB

Geoffrey Caruso and Nuno Madeira

Page 2: ESPON 2013 DATABASE Malmö Seminar, 2-3 December 2009 Thematic structuring of the ESPON 2013 DB Geoffrey Caruso and Nuno Madeira

Outline

• Towards an ESPON thesaurus?• Text mining methods for organising knowledge• Techniques to increase visual perception: first results• Short-term solution

Page 3: ESPON 2013 DATABASE Malmö Seminar, 2-3 December 2009 Thematic structuring of the ESPON 2013 DB Geoffrey Caruso and Nuno Madeira

Towards an ESPON thesaurus?

• Draft technical report describes some of the main features for thesaurus construction

• Presents some examples developed by international organisations (ILO, UNESCO, FAO, EUROSTAT, …)

• Stresses the importance of harmonising vocabulary

• Explores the usefulness of text mining methods to further support the thematic structuring

Page 4: ESPON 2013 DATABASE Malmö Seminar, 2-3 December 2009 Thematic structuring of the ESPON 2013 DB Geoffrey Caruso and Nuno Madeira

Text mining methods for organising knowledge

• Textual data is usually considered as a collection of unstructured information that needs to be prepared in a very special way before any method can be applied

• Text mining methods transform data from text to standard numerical forms

• For this purpose we have collected approximately 200 reports, studies, and policy notes addressing ESPON evidence and results.

• The dependency and ambiguity of textual data required a primary focus on data preparation

Page 5: ESPON 2013 DATABASE Malmö Seminar, 2-3 December 2009 Thematic structuring of the ESPON 2013 DB Geoffrey Caruso and Nuno Madeira

Techniques to increase visual perception

• Explore visualisation tools through maps of keywords based on co-occurrence data to better communicate outputs

• First results reveal highly complex structures, though some interpretation can be discerned

• However, it questions the completeness of our corpus for analysis, especially in terms of cluster stability

• For instance, how many reports and studies are sufficient to guarantee consistent results?

Page 6: ESPON 2013 DATABASE Malmö Seminar, 2-3 December 2009 Thematic structuring of the ESPON 2013 DB Geoffrey Caruso and Nuno Madeira

Short-term solution

• First hierarchical structure not deriving from text mining methods but rather adapting the previous ESPON DB based on indicators delivered so far

• Investigate the degree of resemblance between some important database classifications (EUROSTAT, OECD, EEA, UNEP, WPI) and ESPON 2006 DB

• Identify patterns that could contribute to the harmonisation of categories or themes

• Employ matrix visualisation techniques for cluster analysis

• Knowledge acquired from text mining methods will constitute the basis for improvement on both hierarchical and associative relationships

ESPON 2013 Database

Population

Natural population change

Life expectancy at birth

Transport

Potential accessibility by air

Potential accessibility by road

Environment

Landscape fragmentation

Environmental quality

Agriculture

?

?

Page 7: ESPON 2013 DATABASE Malmö Seminar, 2-3 December 2009 Thematic structuring of the ESPON 2013 DB Geoffrey Caruso and Nuno Madeira

Thank you for your attention !