study in spatial distribution analysis of science research activities based on toponym resolution in...

Study in Spatial Distribution Analysis ofScience Research Activities based onToponym Resolution in Text

Jianxia Ma1, Guodong Cheng2, Shaoxiong Liu1 , Hanqing Ma1, Jinhui Ma3 ,Na Li1

1.The Lanzhou Branch of the National Science Library, Chinese Academy of Sciences, , Lanzhou 73000,China;

2. Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences, Lanzhou 730000,China;

3. College Of Earth and Environmental Science, LanZhou University,Lanzhou 730000, China

Collnet 2012 ,Korea Souel

Outline

Background Intorduction to Related Study Framework of the Analysis Tool Spatial Analysis of Research

Activity in sporopollen in China Conclusion

Background Recently, many scholars and applications

have begun to show analysis results of scientific papers combined with GIS visually.

Most of their studies are based on addresses of authors given by the authors directly.

There are few reports on the analysis of distribution of research area based on text-mining in research papers, especially written in Chinese.

Katy Börner , Shashikant Penumarthy, Mark Meriss etc. Mapping the Diffusion of Information Among Major U.S. Research Institutions. Scientometrics, 2006,68(3):415-426

Xuemei Wang, Mingguo Ma.Spatial information mining and visualization for Qinghai-Tibet Plateau’s literature based on GIS[A] in:Yaolin Liu, Xinming Tang.International Symposium on Spatial Analysis, Spatial-Temporal Data Mining[C].Wuhan, Proc. Of SPIE,2009,1-8

Lutz Bornmann$, Ludo Waltman. The detection of “hot regions” in the geography of science – A visualization approach by using density maps , arXiv:1102.3862v2

Lutz Bornmann, Loet Leydesdorff, Christiane Walch-Solimena, Christoph Ettl$Mapping excellence in the geography of science: An approach based on Scopus data

Background In earth science, resources and

environment related fields, research is closely related with some location.

It is inefficient to read the articles one by one while annotate the research area by hand to get the understanding of the distribution of research area. In doing so, it is not easy to grasp where the research blanks and hot spots are.

Background Through automatic recognition and indication

of geographical names referred in research papers, we can analyze the spatial distribution of research activities in a research field, and understand the hot areas and blank areas in the field.

It will help decision makers and researchers to adjust strategy of research and optimize research resources allocation, and it will be an innovation in information analysis by adding a new spatial dimension to traditional information analysis.

Background PossibilityCan we mine hidden geographical knowledge from large-

scale research papers to support spatial analysis of research activity?

How? How to analyze geographical feature in magnanimous

textual collections and mine the hidden knowledge efficiently?

Key:Toponym resolution in the research articles

includes two tasks, namely Geo-Parsing and Geo-Coding

Introduction to Related Study

Geo-parsing Geo-parsing consists of detecting and

extracting the geographic names referred in the unstructured text of an article or a Web page using Named Entity Recognition (NER) techniques.

Gazetteers based extraction. Simple and allows efficient implementations, with a

loss of precision in toponym extraction. A tedious job to get a full covered gazetteer.

Natural language processing generally based on statistical models. Hidden Markov Models (HMMs) , Maximum Entropy

Models (MEMs),Maximum Entropy Markov Models (MEMMs) ,Conditional Random Field (CRF) ,Supporting Vector Machine(SVM)were discussed in many documents for extraction of geographic names.

require lots of training and are corpus dependent.

Geo-coding Geo-coding is the key step to correlate textual

information to maps. Gazetteer or the geographical knowledge base is the key component

A well-designed digital gazetteer can support geo-entity identification, toponym disambiguation and geo-coding.

By now, the famous digital gazetteers includes ADL Gazetteer , Getty TGN 、 GeoName.

And some digital map services, including Google Map , Microsoft Bing map , Yahoo PlaceFinder, Baidu Map provide API for geo-coding.

Chinese Toponym Extraction

Unlike English, there is no blank to mark word boundaries in Chinese text.

The previous research focused on syntax rules and word segmentation. Statistical models have been used to identify unknown geographical names in Chinese text.

The research mainly carried out in webpage & news, few of them related to research paper.

Framework of The Analysis Tool

Documentary Database Preparation Geo-parsing in Text

Geo-extraction from authors’ affiliation and address fields

Geo-recognition from unstructured text CRF++ Based Toponym Identification Geographical Knowledge Base with Semantic Relationship

Supporting Toponym Disambiguation GeoFocus

Geo-Coding Spatial Analysis of Research Activity Based on

Toponym Resolution from Documents with ArcGIS

Geographical Knowledge Base with Semantic Relationship Supporting Toponym Disambiguation

四、实验原型设计 15

Geographical KB Abbre-Alias-Formal Toponym transformation Toponoy-Footprint/Coordinate Combining with toponym rules to support

toponym annotation. Combining

administrative,spatialrelationship, and feature type of geo-entity to support disambugation.

Geo-coding

Spatial Analysis of Research Activity in sporopollen in China

The author’s distribution CNKI 1490 papers (2000-2010) 1402 items have clear authors’ affiliations

and addresses. identified 97.08% author’s affiliation and

address. In combination with Google earth and

Google Map, the rate of geo-coding to 96.9%.

As Fig shows most of authors of palynology come from Beijing, Jiansu and Shanghai, then from Gansu and Shanxi. Few of the authors are from Xizang and Ningxia.

Distribution of the research area in sporopollen in China

There are 1112 papers referred geographical names according to manual annotation in abstract.

Distribution of research area in sporopollen

The hottest research area of Sporopollen in China is estuary of the Yangtze river, Shandong inland area, Beijing, Qinling mountain area and Junggar Basin,

the sampling point is sparse in the south of Changjiang, mountainous border of Heilongjiang Jilin and Inner Mongolia and northwest desert and southwest tropical regions.

These places should be payed much attention in the future,i.e. in addition to consider research significance, geographic area representative and filling blanks research area also is worth considering.

Conclusion

The experiment shows that it is possible to analyze distribution of research activities based on automatic identification and annotation of the geo-entity in large-scale textual collections.

The method is useful for the science decision maker to allocate research resources.

Further research Further research and experiment is needed and

actually is on-going to improve geo-parsing and geo-coding rate.

We need much more corpus to be trained, need to adjust the feature template to get better efficiency.

We also need to take into consideration of other heuristics to improve the toponym resolution.

A systematic evaluation of the method we have taken should be carried out as well.

Thanks for your attention!

study in spatial distribution analysis of science research activities based on toponym resolution in...

research field

research articles

strategy of research

research blanks

previous research

research institutions

geo coding slide

distribution of research

Documents

chapter eleven north america. global prominence and...

toponyms and identity of canggal temple area in eco ......

introduction to the cooperation between cas and driver...

electronic supporting information hydrosilanes catalysed...

projects - bestile...bereber languages are adrar and adras,...

toponym disambiguation in information retrieval

shaoxiong wu 2011 t - leading research university in ......

using internet resources, define the following as they...

creating a novel geolocation corpus from historical...

william t. freeman - arxiv · 2018. 8. 10. · 3d shape...

towards an analysis of toponym endings€¦ · endings of...

multifaceted toponym recognition for streaming...

a comprehensive evaluation of rpl under mobility kevin lee,...

introduction to web appbuilder for arcgis: javascript apps...

toponym resolution in text - universal … resolution ... d...

so far away and yet so close: augmenting toponym ... · so...

mesoamerica and the american southwest karl a. taube ·...

suicidal ideation detection: a review of machine learning...

toponym recognition in historical maps by gazetteer...

an algorithm for local geoparsing of...