extraction and visualization of geographical names in text

Post on 11-Jan-2016

52 Views

Category:

Documents

8 Downloads

Preview:

Click to see full reader

DESCRIPTION

Extraction and Visualization of Geographical Names in Text. ZHANG Xueying zhangsnowy@163.com. Key Laboratory of Virtual Geographical Environment, Ministry of Education Nanjing Normal University Nov. 18, 2009. Content. 1. 2. 3. Background. Extraction of geographical names. Applications. - PowerPoint PPT Presentation

TRANSCRIPT

Extraction and Visualization of Geographical Names in Text

ZHANG Xueyingzhangsnowy@163.com

Key Laboratory of Virtual Geographical Environment, Ministry of Education Nanjing Normal University

Nov. 18, 2009

Content

Background1

Extraction of geographical names2

Applications3

Resolution of Geographical names

Generation of geographical names

GIS

Geography

spatial model of the earth

Information and Library Sciences

Computer Science

Natural Language Processing

Computational linguistics

Human Computer Interaction

Cognitive Psychology

Medicine

Political and social sciences

Geophysics

Biology(botany/zoology/ecology)

Archeology

……

1.1 Disciplines concerned with geographic space

Location designator

1.2 What is a geographical names?

Geographical named entity: named entities with nouns or location expressions

Place name: the name by which a geographical place is known.

Location

Toponym: a named point of reference in both the physical and cultural landscape on the Earth's surface.

Geographical name: essentially labels which distinguish one part of the earth’s surface from another.

Recognition: identify geospatial names from a text span and then classifies them to predefined geographical feature categories.

1.3 Main tasks

Resolution: look up candidate referents and uses algorithms to pick the correct referents assigned to the recognized geographical names.

1.4 Basic processing architecture

Applications

Representation

Extraction

Formalization

Dataset

Natural language processing and Machine learning

Geo

spatial In

form

ation

Geographical Information System

Natural language text

1.5 Statistical models-ME

Maximum Entropy 1996 Natural language processing

√ no assumption of a normal distribution

√ no limits of context characteristics

√ learning cost of its parameters

√Considering single situations

1.5 Statistical Models-HMM

Hidden Markov Model

Markov property

Markov chain model: For observable state sequences (state is known from data).

Hidden Markov Model: For non-observable states

Speech recognition

Speech recognition

Part-of-speech tagging

Part-of-speech tagging

HandwritingrecognitionHandwritingrecognition

Machine translation

HMM in Computational Linguistics

1.5 Statistical Models-HMM

Conditional Random Field

1.6 Statistical Models-CRF

Much like a Markov random field

An HMM –a CRF with very specific feature functions

A CRF --generalization of an HMM

Content

Background1

Extraction of geographical names2

Applications3

2.1 Diagram of CRF based recognition

label granularity

Feature template

CRF training

CRF test

CCRF test

Dataset

CCRF training

Simple geographical names

linguistic characteristics

Combined geographical names

2.2 Linguistic characteristics

language, history and culturespecial charactersCombined named unitsspatial relations

2.3 Label granularity

Granularity:1-gram, 2-gram, …., word, phrase, sentence, paragraph, discourse

1-gram: sparse data

Word segmentation

2.4 CCRF( cascaded CRF)

The upper recognition model

…… ……2CT iCT nCT1CT

The lower recognition model

…… ……

…… ……1W 2W iW nW

2ST iST nST1ST

2.5 Feature template

Context: observable windows

( 1) 0 1( , ,..., ,..., , )n n n nw w w w w

n: training time and test performance

Feature type Relative position

Front neighbor feature W-n….. W-(n-1)

Back neighbor feature W1….. Wn

Current feature W0

Front combined feature W-1 W0

Back combined feature W0 W1

Transition state Label of the first front neighbor feature

2.5 Feature template

2.6 A example

位于黑龙江省哈尔滨市的哈尔滨市儿童公园为孩子们准备了特殊的贺岁礼物。Harbin Children Park in the Harbin city of Heilongjiang Province

prepared special new year gifts for children.

位于黑龙江省哈尔滨市的哈尔滨市儿童公园为孩子们准备了特殊的贺岁礼物。Harbin Children Park/SGN in the Harbin city/SGN of Heilongjiang

Province/SGN prepared special new year gifts for children.

位于黑龙江省哈尔滨市的哈尔滨市儿童公园为孩子们准备了特殊的贺岁礼物。Harbin Children Park/SGN in the Harbin city of Heilongjiang Province/CGN prepared special new year gifts for children.

2.7 Experimental performance

Dataset

Precision Recall F-1

Number of recognized

geographical names

Train Test

PER ( 1-5)

PER( 1) 94.01 94.91 94.46 26185

PER ( 1-5)

PER( 6) 94.30 94.35 94.33 30126

PER ( 1-5) MSRA 73.40 73.10 73.25 2674

MSRA MSRA 93.23 87.78 90.43 3211

MSRAPER

( 1) 73.61 67.84 70.61 18718

MSRAPER

( 6) 71.90 69.68 70.77 22249

2.8 Resolution approach

Matching

Gazetteer

Reference disambiguation

Candidate referents

Cognitive salience model

intended referents

2.9 Cognitive salience model

High degree of spatial correlation in geographic references that are in textual proximity.

2.10 Problems

Ancient geographical names

Spatio-temple Changs

Limits of statistical models

Limits of gazetteers

……

Content

Background1

Extraction of geographical names2

Applications3

GeoChunk: an annotation system

TextMAP: a integrated system for text and map

CGeoCoder: a address geocoding systems

SRAnnotation

top related