research article fracture mechanics method for word

12
Research Article Fracture Mechanics Method for Word Embedding Generation of Neural Probabilistic Linguistic Model Size Bi, Xiao Liang, and Ting-lei Huang Institute of Electronics, Chinese Academy of Sciences, Beijing, China Correspondence should be addressed to Ting-lei Huang; [email protected] Received 3 January 2016; Revised 16 July 2016; Accepted 26 July 2016 Academic Editor: Trong H. Duong Copyright © 2016 Size Bi et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Word embedding, a lexical vector representation generated via the neural linguistic model (NLM), is empirically demonstrated to be appropriate for improvement of the performance of traditional language model. However, the supreme dimensionality that is inherent in NLM contributes to the problems of hyperparameters and long-time training in modeling. Here, we propose a force- directed method to improve such problems for simplifying the generation of word embedding. In this framework, each word is assumed as a point in the real world; thus it can approximately simulate the physical movement following certain mechanics. To simulate the variation of meaning in phrases, we use the fracture mechanics to do the formation and breakdown of meaning combined by a 2-gram word group. With the experiments on the natural linguistic tasks of part-of-speech tagging, named entity recognition and semantic role labeling, the result demonstrated that the 2-dimensional word embedding can rival the word embeddings generated by classic NLMs, in terms of accuracy, recall, and text visualization. 1. Introduction Word embedding is a word numerical representation that uses a continuous vector space to represent a group of words [1]. In the word vector space, each word corresponds to a unique point. Intuitively, those points that have similar mean- ings should be put close, while those who are distant in mean- ing should be put far away. Based on the space, the degree of relation between words can be estimated via computing the distance between vector points, such as the Euclidian or Mahalanobis distance. Such semantic numeralization enables textual and symbol data to be processed in the traditional neural models. For this reason, more nature language pro- cessing (NLP) systems are improved with these neural net- work technologies [2] and enhance the performance of com- mon natural linguistic tasks [3], such as POS (part-of-speech tagging), chunking, NER (named entity recognition) [4], and SRL (semantic role labeling). However, the training for such word embedding is time consuming and required sophisti- cated tuning for parameters of the model [5], because these models have a huge number of parameters needed for train- ing, which are high dimensional and sparse. Moreover, text visualization for such high-dimensional data suffered the dis- tortion of information from the preprocessing in dimensional reduction, confusing the understanding of data relations [6]. e word vectorization has always been an active topic for text mining and NLP. eir generating approaches can be roughly divided into two categories: manual coding and autocoding. Manual coding is to code words according to the knowledge of domain experts. It is a heavy work for considering the value of each word. For example, WordNet is a huge project whose aim is to build a word graph database by language experts, where word meanings are associated and presented via the tree-based structural graph [7, 8]. Such representation can only associate a few ranges of words for each word. It is insufficient to build a global relation for all words. On the other hand, autocoding is to code word by neural network models [9]. Every word is initialized with a random vector and varied following the parameter tuning according to a range of contexts around a word. Generally, such methods are performed by the training of NLM, where word embedding is a part of result when the convergence of the objective function is finished [1]. However, the NLM- based approaches such as feedforward neural networks [1], Hindawi Publishing Corporation Computational Intelligence and Neuroscience Volume 2016, Article ID 3506261, 11 pages http://dx.doi.org/10.1155/2016/3506261

Upload: others

Post on 09-Jun-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Fracture Mechanics Method for Word

Research ArticleFracture Mechanics Method for Word Embedding Generation ofNeural Probabilistic Linguistic Model

Size Bi Xiao Liang and Ting-lei Huang

Institute of Electronics Chinese Academy of Sciences Beijing China

Correspondence should be addressed to Ting-lei Huang tlhuangmailieaccn

Received 3 January 2016 Revised 16 July 2016 Accepted 26 July 2016

Academic Editor Trong H Duong

Copyright copy 2016 Size Bi et al This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Word embedding a lexical vector representation generated via the neural linguistic model (NLM) is empirically demonstrated tobe appropriate for improvement of the performance of traditional language model However the supreme dimensionality that isinherent in NLM contributes to the problems of hyperparameters and long-time training in modeling Here we propose a force-directed method to improve such problems for simplifying the generation of word embedding In this framework each word isassumed as a point in the real world thus it can approximately simulate the physical movement following certain mechanicsTo simulate the variation of meaning in phrases we use the fracture mechanics to do the formation and breakdown of meaningcombined by a 2-gram word group With the experiments on the natural linguistic tasks of part-of-speech tagging named entityrecognition and semantic role labeling the result demonstrated that the 2-dimensional word embedding can rival the wordembeddings generated by classic NLMs in terms of accuracy recall and text visualization

1 Introduction

Word embedding is a word numerical representation thatuses a continuous vector space to represent a group of words[1] In the word vector space each word corresponds to aunique point Intuitively those points that have similarmean-ings should be put close while those who are distant inmean-ing should be put far away Based on the space the degreeof relation between words can be estimated via computingthe distance between vector points such as the Euclidian orMahalanobis distance Such semantic numeralization enablestextual and symbol data to be processed in the traditionalneural models For this reason more nature language pro-cessing (NLP) systems are improved with these neural net-work technologies [2] and enhance the performance of com-mon natural linguistic tasks [3] such as POS (part-of-speechtagging) chunking NER (named entity recognition) [4] andSRL (semantic role labeling) However the training for suchword embedding is time consuming and required sophisti-cated tuning for parameters of the model [5] because thesemodels have a huge number of parameters needed for train-ing which are high dimensional and sparse Moreover text

visualization for such high-dimensional data suffered the dis-tortion of information from the preprocessing in dimensionalreduction confusing the understanding of data relations [6]

The word vectorization has always been an active topicfor text mining and NLP Their generating approaches canbe roughly divided into two categories manual coding andautocoding Manual coding is to code words according tothe knowledge of domain experts It is a heavy work forconsidering the value of each word For example WordNet isa huge project whose aim is to build a word graph database bylanguage experts where word meanings are associated andpresented via the tree-based structural graph [7 8] Suchrepresentation can only associate a few ranges of words foreach word It is insufficient to build a global relation for allwords On the other hand autocoding is to code word byneural network models [9] Every word is initialized with arandom vector and varied following the parameter tuningaccording to a range of contexts around a word Generallysuch methods are performed by the training of NLM whereword embedding is a part of result when the convergenceof the objective function is finished [1] However the NLM-based approaches such as feedforward neural networks [1]

Hindawi Publishing CorporationComputational Intelligence and NeuroscienceVolume 2016 Article ID 3506261 11 pageshttpdxdoiorg10115520163506261

2 Computational Intelligence and Neuroscience

recurrent neural network (RNN) [8] and restricted Boltz-mann machine (RBM) [10] have also suffered from the highlearning complexity [11] sophisticated preferences [12] andthe curse of dimensionality [13]

In this article we present a novel method for wordembedding learning that can reduce the high dimensionalitywhich is inherent in the traditional NLMWe assume that thegeneration of word embedding can be viewed as a particlemovement on a plane Particles that are close represent thecorresponding words have the similar meaning whereas theparticles that are distant represent the corresponding wordsare far away inmeaning For simulating text semantics as cor-rectly as possible a fracturemechanicsmodel is presented forcontrolling the generating process of word embedding Weaim to provide an effective intuitive approach for learning a2-dimensional word vector which is applicable to the generalnatural language tasks In particular we omit the homonymyand polysemy for keeping the consistency of word represen-tation With this context each word corresponds to a singlevectorThe generatingmodel that is based onneural networksis substituted by a particle system which can simulate thecorrelating process of semanticity between words

Our specific contributions are as follows

(i) We propose a force-directed model that is based onfracture mechanics to generate word embedding Alinear elastic fracture model is introduced to controlthe varying progress of word semantic

(ii) We use a language model that is based on the feedfor-ward NLM to experiment with the word embeddingon the task of POS andNER and SRL where the wordembedding is the input of the language model

(iii) The coordinates of the 2-dimensional word embed-ding can be used for word visualization that facilitatesobserving the degree of relation among words intu-itively

The next section describes the related work regarding wordnumerical representation Section 3 introduces our method-ology Sections 4 and 5 give the result and discussionSection 6 gives a conclusion and some possible works infuture

2 Background

21 Text Representation by Word Embedding Choosing anappropriate data representation can facilitate the performingof a machine learning algorithm The related methods havedeveloped at the level of automatic features selection accord-ing to the requirement of applications [14] As a branch ofmachine learning representation learning [15] has graduallybecome active in some famous communities especially toinvestigate knowledge extraction from some raw datasetsText representation in natural linguistic tasks can be dividedinto the three levels of corpus [16 17] paragraph [18 19]and word [20ndash22] This paper focuses on the representationlearning for words Feature words and context have beenconsidered as the foundation for text representation learningand the constructing of a NLP system [23ndash25] We follow

this direction aiming at mapping word text to a vector spaceCompared with the representing level of paragraph andcorpus word is the fewer granularities of semantics that ismore suitable to be analyzed by a vector distance

The idea that uses a vector space to map word meaningis proposed from [26] initially The earlier representationlearnings fail to consider the semantic measurement for thedifferences between words and emphasize how to quantizefeature words For example in [27] to represent the wordldquodoverdquo the first byte of the corresponding representationdenotes the property of its shape that is set to ldquo1rdquo if the dovesatisfies some conditionsThe representation learning did notfocus on the context feature but presents some approachesfor measuring the differences regarding word semantic Theself-organizing map (SOM) [27] is employed to computeword vector distance which uses the length to represent theneighboring degree of word meanings Based on SOM [28]high frequency cowords are mapped to a 90-dimensionalvector space The investigation [29 30] for SOM-based wordvector applies it in the fields of NLP and data mining

22 Generation of Word Embedding Word representationhas started to integrate some measuring approaches fortext semantic with the developing of neural networks inprobabilistic linguistic model Reference [31] proposed touse neural network to build a language model There exitsan assumption that the numerical result of those similarmeaning words should be put closely whereas the resultregarding those distant meaning words should be put faraway Meanwhile the probability function is introduced toadd the probability output for NLM which can give a statis-tics result for estimating a given 119899-gram words combinationIn the proposedmodels a convoluted neural network [3] thatis tuned by hand somewhat after learning obtains an accuracyof 9720 in POS 9365 in chunking 8867 in NER and7415 in SRL A comparison between conditional restrictedBoltzmann machine models and support vector machines[32] is performed for a music annotator system which isbased on the context around lyric that can use cooccurrenceand sequential features to improve the accuracy of labelingThe SENNA software [3 33] performs a word sense disam-biguation with an accuracy of 7235Word representation isdenoted by multiple vectors [4] to express the polysemy thatis tested onNERwhich shows that it rivals the traditional lin-guistic models In addition word representation learning hasbeen trained based on the 119899-gram models hidden Markovmodels and partial lattice Markov random field model [34]

For representation learning by NLM [1] proposes a feed-forward neural network to train the word embedding whichis regarded as the internal parameters requiring the tuningfollowing the object function Reference [35] presents a textsequence learningmodel that is called RNN which is capableof capturing local context feature in sequence of interestFollowing this route more machine learning algorithms areintroduced to improve the weaknesses of natural linguistictasks Reference [10] uses the restricted Boltzmann machineto improve the performance of the sequential probabilityfunction Reference [36] creates a hierarchical learning that

Computational Intelligence and Neuroscience 3

Table 1 Properties of word-particle

Name NotesID IdentifierPos Coordinates (word embedding)Word Corresponding wordMass Synthetic alivenessTmpMass All backward related particle alivenesshistory Mass Historical alivenessCurrent Mass Current alivenessBase Mass IdentifierChain Backward related indexMax flaw Backward related degreeFlaw Current flaw lengthRadius Repulsion radiusPull Radius Tension-start radiusVelocity Current velocity of particle

represents the semantic relation among words as a tree struc-ture The deep learning mechanism is tried to build a NLM[37]

23 Force Simulated Method The force-directed algorithmsare mainly applied in data visualization Reference [38] com-pares several force-directed algorithms Reference [39] usesthesemethods for analyzing a complex social network whichadds a gravity to draw the graph of social network In somecross-domain applications wireless sensors network uses it tobuild layouts [40] and [41] performs electrical circuit layoutsautomatically based on the force rules

The reason why we use the force simulated approachto improve the generation of word embedding is that therelation between words semantic is relatively stable within acertain numbers of documents that depict the similar topicThis is somewhat like the convergence of the stable energystatus in the force-directed algorithm We are inspired it touse the idea of the force-related methods to improve theproblems of NLM

3 Methodology

31 Word-Particle Model To map word into a particlessystem we must define a mapping rule that specifies whichattribute of particle corresponds to which feature of wordsemantics The linking table is shown in Table 1 The tableconsists of two parts where the left part designates the namesof each property for particles and the right part gives theexplanation for the corresponding semantic feature

For more explanation we explain them one by one

(i) ID each word has a unique identifier for relating thecorresponding particle

(ii) Pos it is exactly the attribute that we want to trainwhich is also called word embedding that denotes thecoordinate of a particle in a plane actually

Hello

Boy

Bye

World

Figure 1 An example of the word-particle model

(iii) Mass we use the concept of particle mass to denotethe occurrence frequency of a word

(iv) TmpMass we use the temporal mass of particle torepresent the coword frequency

(v) Current Mass amp history Mass amp base Mass theyrepresent the current mass historical mass and basicmass of a particle respectively A function combinedwith them that is used to control the occurrencefrequency of coword within a sampling period isdescribed as

Mass = base Mass +history Mass2

+

Current Mass2

(1)

For controlling the intensity of relevance betweenwords we use the concept of edge in physic todescribe it

(vi) Chain it is an array for recording the ID of thebackward relating word

(vii) Max flaw it is the maximum associating strengthregarding a group of coword

(viii) Flaw it describes the associating strength regarding agroup of cowords

(ix) Radius it is the repulsion radius that keeps a mini-mum distance from other word-particles

(x) Pull Radius it is the pulling radius which means ifother word-particles break in they will be pushedaway keeping the radius distance from the intrudedword-particle

(xi) Velocity it defines a semantic shifting trend that canstrengthen the relation of two force-affected words

In Figure 1 the word-particles world and boy are back-ward related to hello and the word-particle bye is an isolatedword-particle The intensity of relation between the word-particles hello and world is stronger than the intensity ofrelation between hello and boy The directed line denotes thedirection ofword order where the pointedword-particles candrive their linking word-particles For example the cowordhello world appearsmore frequent than the coword hello boy

4 Computational Intelligence and Neuroscience

The BombTank By JDAMInput sentence

Objective word particle Backward relatedword particle

Tank

State 2

State 3

State 4

State 1Bomb

Tank Bomb

f

Tank Bomb

Fire

Plane Pilot

Air

TankBomb

Bomb

tt minus 1

v

v

v

998400

F

Figure 2 The semantic relating rule

32 Semantic Relation Building Based on the word-particlemodel we define the semantic relating rule to control themotion of particles within a given text context The doc-uments for training play the role of a driven-force sourcemaking the words have more opportunities to come togetherwhich appear in similar contextsThe procedure is as follows

Step 1 Theword embedding is trained by the document Eachdocument will be sampled sentence by sentence via a 2-gramwindow In the 2-gram window the first word is assumedas the target object and the second word is assumed as theassociated object The assumption means that the associatedword-particle will be forced tomove towards the target word-particle The related word-particle will be given an impulseThis can drive the word-particle with a certain velocity Theprogress is illustrated as state 1 in Figure 2 The word-particlebomb is associated with tank moving with the velocity V

Step 2 Given an impulse the word-particle can be initializedwith a velocityMeanwhile it will be slowed down by the forcethat comes from friction until its velocity reduces to zero anddoes not get in the repulsion radius of its objective word-particle When the word-particle moves into the repulsionradius of the objective word-particle it will be stopped at theedge keeping a distance of repulsion radius from the objectiveword-particle This is shown as state 2 A velocity affects theword-particle tank and the word-particle bomb is affectedcontinuously by the friction force 119891

Step 3 During a certain period of document learning someword-particles will set up some relations with other word-particlesWe establish a chain reacting rule for simulating thecontext feature The rule specifies the impulsion transition

in the way of particle by particle and the initial energy willdegrade at each reaction This passive action simulates thephenomenon that a topic word in a document has moresemantics and can be an index for document retrieval Theprogress is controlled by (2) Given 119898

0denotes the property

Mass of the impactedword-particle and119898119894denotes this prop-

erty of other word-particles The relation-building conditionis

119894 isin Chain

119889119894gt Pull Radius

(2)

where 119894 denotes the ID of the 119894th word-particle that relates tothe object word-particle and 119889

119894denotes the corresponding

distance of the 119894th word-particle between the object word-particles The velocity V

119905minus1will update V

119905via (3) if the word-

particle satisfies the conditionThis procedurewill repeat iter-atively till the velocity falls to zero For example in state 3 theword-particle bomb has two backward associating particlesfire and plane Its initial velocity will be decomposed withplane according to (3) if given an impulsion towards tankBut the word-particle fire fails to move because it is outsideof the Pull Radius distance of bombThe decomposition willbe delivered repetitively if the velocity fits the condition andis greater than zero

1198980V119905minus1= (

119896

sum

119894isinChain 119889119894gtPull Radius119898119894+ 1198980) V119905 (3)

Step 4 We add a repulsion radius for keeping the uniquenessof every word-particle because each word embedding isunique When the moving word-particle intrudes the repul-sion radius of other particles it will stop and stay at theedge of the affectedword-particles keeping a repulsion radiusdistanceThe progress is shown as state 4 Generally the wordrelation network is expected growing stably we present aninspecting criterion to check the convergence which is asfollows

V119905=

1198980V119905minus1

lim119896rarrinfinsum119896

119894isinChain 119889119894gtPull Radius119898119894 + 1198980997888rarr 0 (4)

In (4) the initial velocity will trend to zero with the relationincreasing of the number of associated word-particles thatis V119905rarr 0 When the movement of an activated particle

becomes tiny such convergence means the property Poshas reached relatively fixed coordinates This indicates thatthe word already has situated in a relatively stable semanticnetwork

33 Semantic Relation Separating For controlling the growthof words association those words that are of low frequency in2-gram context should be filtered whereas those words withhigh frequency of relations should be retained We proposeto use a linear elastic fracture mechanics to control suchfiltering A rope model is presented to represent the cowordrelation which can be assumed as a flaw that comes from atype ofmaterialThe strengthening or weakening of a relation

Computational Intelligence and Neuroscience 5

s

s

W

2a

Bomb

RelationTank

Figure 3 The flaw model

between words is controlled via the corresponding flaw sizeAn illustration is shown in Figure 3

More formally given 119882 denotes the width of rope itsvalue is obtained through the counting of a 2-gram samplingIts maximum value corresponds to the property Max flawGiven 2119886 denotes the size of a flaw corresponding to theproperty Flaw Given 119904 is the pull force that is calculated by119904 = Mass times Velocity We use the stress-intensity factor 119870 tocontrol the size of a flaw 119870 can be obtained as follows

119870 = 119898relationVradic1205871198862119886

119882

(5)

In (5) the variant 119898relation corresponds to the propertyregarding the synthetically occurring frequency of a word-particle and V corresponds to the velocity of an activatedword-particle The value of 119870 is in proportion to the size of2119886 which refers to the concept of flaw Moreover the flawextending speed is denoted by 119889119886119889119873 as

lg 119889119886119889119873

= lg119862 + 119899 lgΔ119870 (6)

In (6) lg119862 denotes a compensation constant and 119899 is ascale factor 119889119886119889119873 is in proportion of 119870 The condition isthat 119870 will decrease if119882 goes beyond 2119886 When the size offlaw is up to 119882 a separation will happen in the semanticrelation This means the associated word-particles are nolonger affected by the initial impulses which are generatedfrom their objective word-particles

4 Simulations

In this section we compare the proposed word embeddingwith the three classic NLM-based word embeddings Huang2012 [42] CampW [3] andMampS-Turian [20] Huang 2012 is theword embedding that uses multiple embeddings per wordCampW is the embedding that is trained by a feedforwardneural network and MampS-Turian is the embedding that isobtained by an improved RBM training model The twodatasets Reuters 21578 and RCV1 are used for training andevaluation In Reuters 21578 we extracted 21578 documentsfrom the raw XML format and discarded the original classlabels and titles only using the description sectionTheRCV1contains 5000 documents that are written from 50 authors70 percent of random sampling among these documentswas used to train the word embedding and the remainder

was used to evaluate the NLP tasks of POS NER and SRLwith other three types of embedding All words will keeptheir original forms whereas the numbers symbols and stopwords are kept to be trained together Those words that arenot included in training corpus will be discarded We regardthese tasks as a classification problem and apply a unifiedfeedforward neural linguistic model to perform the tasksThe compared word embeddings are readymade but theparameters of the neural networks require to be trained bythe corresponding embedding We use the benchmark thatis provided by [3] The results regarding the NLP tasks arecompared based on this benchmark

41 Evaluation The benchmark is measured in terms ofprecision recall and 1198651 [3 20 42] Assuming 119873 words arewaiting for labeling there exit 1198731 | 1198731 le 119873 words thatare labeled correctly and 1198732 | 1198732 le 119873 words that arelabeled wrong The value of precision is used to evaluate theaccuracy of labeling on POS NER and SRL The precisioncan be obtained as

119901 =

1198731

1198731+ 1198732

(7)

The value of recall is used to evaluate the coverage oflabeling on POS NER and SRL The recall can be calculatedas

119903 =

1198731

119873

(8)

The 1198651 is a combining evaluation with precision andrecall which are as follows

1198651 =

2119901119903

119901 + 119903

(9)

42 Training The parameters of the physical system are setas followsThe coefficient of fiction is set to 01 the coefficientof gravity is set to 98 and the initial velocity is set to 2 Theparameters that control semantic separating are set such thatMax flaw is set to 02 the initial value of Flaw is 0 Radiusis set to 1 and Pull Raduis is set to 20 For controlling theflaw extending speed lg119862 is set to 1 and 119899 is set to 1 Wedemonstrate the training result in terms of the generatingprocedure of word graph and average speed of word-particles A word graph can give an intuitive visualization forobserving a group of word relations We test a small numberof datasets to simulate the word embedding generation Theresult is shown in Figure 4 which contains the names from 19countries

In Figure 4(a) all the words appear in the plain physicalsystem and obtain a certain position because the trainingdocuments had given some forces to direct the word arrang-ing that follows the context in original text But in this stagesome word-particles still have a certain degree of speed tomove Those frequent word-particles such as China USAGermany and France have a relatively high speed they movepulling their backward relating word-particles (Figures 4(b)ndash4(d)) For example China has four backward relating word-particles Pakistan India USA and UK Germany has two

6 Computational Intelligence and Neuroscience

IndiaPakistan

Germany

SwedenThailand

CanadaMexico

Hungary

NewZelandPoland

France

USAUKFinland

Russia

Ukraine

Singapore

Austrilia

China

(a)

IndiaPakistan

Germany

Sweden

Thailand

Canada

Mexico

Hungary

NewZeland

PolandFranceUSA

UK

Finland

Russia

UkraineSingapore

Austrilia

China

(b)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA

UK

Finland

Russia

UkraineSingapore

AustriliaChina

(c)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA

UK

Finland

Russia

UkraineSingapore

Austrilia

China

(d)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA UK

Finland

Russia

UkraineSingapore

AustriliaChina

(e)

IndiaPakistan

Germany

Sweden

Thailand

Canada

Mexico

Hungary

NewZeland

PolandFrance

USA UK

Finland

Russia

UkraineSingapore

AustriliaChina

(f)

Figure 4 Force-directed embedding for country names

backward relating word-particles France and Finland Theother isolated word-particles have situated in a relativelystable position with a few movements We can see that someword-particles overlay with each other (Figure 4(d)) forexample India and Pakistan are too close at China andCanada overlays USA The repulsing force starts to functionat this timeThe too close word-particles will push each otheruntil they reach a balance distance (Figure 4(e)) When theinputting documents are all about similar topics the positionsof word-particles will also not vary too much showing arelatively stable topological graph (Figure 4(f))

The training result of dataset Reuters 21578 is shownin Figure 5 Each word-particle is colored with a greenblock The intensity of the relation between word-particlesis represented by a blue line where the thicker lines meana higher frequency of coword relation The position of eachword-particle is a 2-dimensional coordinate that is exactlythe training result word embeddingThe result shows that thenumbers of word relations and new word-particles will grow

with the training which iterates from 1000 to 10000 Theparticles system expands the region outward gradually Theparticle distribution presents as an ellipse for accommodatingmore new words-particles

The training result of RCV1 is shown in Figure 6 Suchdistance-based semantic measurement can be interpretedfrom some viewpoints For example the country and geo-graphical word-particles German Russian US and Bei-jing are clustered together The geo-related word-particlesNiagara WTO and US-based are pushed closely to thesewords Such word-particle graph can present an intuitiveresult for evaluating the training of word embedding nomatter during the training or after training we can intervenethe position of word-particle to improve the final result in adata-visualization based way

On the other hand we use (3) to estimate the averagevelocity of word-particles for evaluating the training processfrom a statistics viewpoint When the velocity decreasesto 1 below the convergence is assumed to be happening

Computational Intelligence and Neuroscience 7

1000 5000

10000

minus40 minus20 0 20 40 60

minus40 minus20 0 20 40 60

minus20 0 20 40 60 80

100

80

60

40

20

0

100

80

60

40

20

0

60

40

20

0

minus20

minus40

1000

minusminusminusminusminusminusminusminusminusminusminusminusminusminusminus20222222222222222222222222222222 00000000000000000000000000000000000000000 20202220222222222202222022022202222 40 60 80

100

80

60

40

20

0

Figure 5 Force-directed embedding for Reuters 21578

and the assumption coincides with the reality roughly Weexperiment 50 times for the two datasets respectively Theresult is shown as the boxplots in Figure 7 From thedownwards trends of average velocity the assumption that aword-semantic networkwill be stabilized in a certain numberof similar documents coincides with the result of two datasetsroughly Both the convergences of the two datasetsrsquo trainingappear at the training stage around the 20000th documents

In the presented word embedding learning there is nota specific converging criterion for terminating the trainingbecause the object functions of these neural based modelsare nonconvex so there may not exist an optimum valuetheoretically Empirically these word embedding learningmethods require repeating documents 20 or 30 times [3] Itbrings a serious problem that time consumption is propor-tional to the number of documents Such procedure usuallyrequires undergoing a long-term training time [3 30] But inour proposed model we demonstrate that setting a semantic

convergence condition is more convenient to select thanthose neural based approaches The convergence criterionprovides amore explicit direction for word embedding learn-ing Meanwhile the result demonstrates that to learn anacceptable word embedding requiring a certain number ofdocuments small or medium scale datasets may not beappropriate

5 Reuters 21578 and RCV1

For testing the usability we compare the word embeddingwith other three word embeddings on the NLP tasks usingthe same type of neural linguistic model The testing itemsare performed on POS NER and SRL The results are listedin Tables 2 and 3 In Reuters 21578 the labeling system usingour word embedding obtains 910 on POS 834 on NERand 673 on SRL for 1198651 This 1198651 score gets the third place inPOS and the second place on both NER and SRL In RCV1 it

8 Computational Intelligence and Neuroscience

Everen bound

statementrequested

eastern

intial

strengthens

Therersquod

title

revive

Thompson

Conservative

12-13

gearedAdding

Range

declaration

Niagara

metal

tragiclisting

deals

US-basedpreception

InteriorFine-tuning

accomplishments

Reformrsquos

appropriate

uprevolt

uncertainty center

WTO

German

US

Russian

canceleddent

lively

testing

Burden

Beijing

42-2-2423-0003

owner

Figure 6 Force-directed embedding for RCV1

achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL

The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by

such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified

Computational Intelligence and Neuroscience 9

1000 5000 10000 20000 250000

05

1

15

2

25

Reuters 21578(a)

RCV11000 5000 10000 20000 25000

0

05

1

15

2

25

(b)

Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends

Table 2 Comparison of POSNERSRL on Reuters 21578

Precision Recall 1198651

Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630

Table 3 Comparison of POSNERSRL on RCV1

Precision Recall 1198651

Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646

which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]

6 Conclusions and Future Work

In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with

the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this manuscript

References

[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003

[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015

[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011

[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009

[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015

[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005

[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005

[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th

10 Computational Intelligence and Neuroscience

Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010

[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016

[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007

[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016

[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016

[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016

[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014

[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013

[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001

[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997

[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010

[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001

[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006

[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal

of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005

[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003

[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999

[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986

[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989

[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995

[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997

[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013

[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000

[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011

[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012

[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014

[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990

[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008

[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012

[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012

[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013

[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010

[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the

Computational Intelligence and Neuroscience 11

International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005

[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012

[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article Fracture Mechanics Method for Word

2 Computational Intelligence and Neuroscience

recurrent neural network (RNN) [8] and restricted Boltz-mann machine (RBM) [10] have also suffered from the highlearning complexity [11] sophisticated preferences [12] andthe curse of dimensionality [13]

In this article we present a novel method for wordembedding learning that can reduce the high dimensionalitywhich is inherent in the traditional NLMWe assume that thegeneration of word embedding can be viewed as a particlemovement on a plane Particles that are close represent thecorresponding words have the similar meaning whereas theparticles that are distant represent the corresponding wordsare far away inmeaning For simulating text semantics as cor-rectly as possible a fracturemechanicsmodel is presented forcontrolling the generating process of word embedding Weaim to provide an effective intuitive approach for learning a2-dimensional word vector which is applicable to the generalnatural language tasks In particular we omit the homonymyand polysemy for keeping the consistency of word represen-tation With this context each word corresponds to a singlevectorThe generatingmodel that is based onneural networksis substituted by a particle system which can simulate thecorrelating process of semanticity between words

Our specific contributions are as follows

(i) We propose a force-directed model that is based onfracture mechanics to generate word embedding Alinear elastic fracture model is introduced to controlthe varying progress of word semantic

(ii) We use a language model that is based on the feedfor-ward NLM to experiment with the word embeddingon the task of POS andNER and SRL where the wordembedding is the input of the language model

(iii) The coordinates of the 2-dimensional word embed-ding can be used for word visualization that facilitatesobserving the degree of relation among words intu-itively

The next section describes the related work regarding wordnumerical representation Section 3 introduces our method-ology Sections 4 and 5 give the result and discussionSection 6 gives a conclusion and some possible works infuture

2 Background

21 Text Representation by Word Embedding Choosing anappropriate data representation can facilitate the performingof a machine learning algorithm The related methods havedeveloped at the level of automatic features selection accord-ing to the requirement of applications [14] As a branch ofmachine learning representation learning [15] has graduallybecome active in some famous communities especially toinvestigate knowledge extraction from some raw datasetsText representation in natural linguistic tasks can be dividedinto the three levels of corpus [16 17] paragraph [18 19]and word [20ndash22] This paper focuses on the representationlearning for words Feature words and context have beenconsidered as the foundation for text representation learningand the constructing of a NLP system [23ndash25] We follow

this direction aiming at mapping word text to a vector spaceCompared with the representing level of paragraph andcorpus word is the fewer granularities of semantics that ismore suitable to be analyzed by a vector distance

The idea that uses a vector space to map word meaningis proposed from [26] initially The earlier representationlearnings fail to consider the semantic measurement for thedifferences between words and emphasize how to quantizefeature words For example in [27] to represent the wordldquodoverdquo the first byte of the corresponding representationdenotes the property of its shape that is set to ldquo1rdquo if the dovesatisfies some conditionsThe representation learning did notfocus on the context feature but presents some approachesfor measuring the differences regarding word semantic Theself-organizing map (SOM) [27] is employed to computeword vector distance which uses the length to represent theneighboring degree of word meanings Based on SOM [28]high frequency cowords are mapped to a 90-dimensionalvector space The investigation [29 30] for SOM-based wordvector applies it in the fields of NLP and data mining

22 Generation of Word Embedding Word representationhas started to integrate some measuring approaches fortext semantic with the developing of neural networks inprobabilistic linguistic model Reference [31] proposed touse neural network to build a language model There exitsan assumption that the numerical result of those similarmeaning words should be put closely whereas the resultregarding those distant meaning words should be put faraway Meanwhile the probability function is introduced toadd the probability output for NLM which can give a statis-tics result for estimating a given 119899-gram words combinationIn the proposedmodels a convoluted neural network [3] thatis tuned by hand somewhat after learning obtains an accuracyof 9720 in POS 9365 in chunking 8867 in NER and7415 in SRL A comparison between conditional restrictedBoltzmann machine models and support vector machines[32] is performed for a music annotator system which isbased on the context around lyric that can use cooccurrenceand sequential features to improve the accuracy of labelingThe SENNA software [3 33] performs a word sense disam-biguation with an accuracy of 7235Word representation isdenoted by multiple vectors [4] to express the polysemy thatis tested onNERwhich shows that it rivals the traditional lin-guistic models In addition word representation learning hasbeen trained based on the 119899-gram models hidden Markovmodels and partial lattice Markov random field model [34]

For representation learning by NLM [1] proposes a feed-forward neural network to train the word embedding whichis regarded as the internal parameters requiring the tuningfollowing the object function Reference [35] presents a textsequence learningmodel that is called RNN which is capableof capturing local context feature in sequence of interestFollowing this route more machine learning algorithms areintroduced to improve the weaknesses of natural linguistictasks Reference [10] uses the restricted Boltzmann machineto improve the performance of the sequential probabilityfunction Reference [36] creates a hierarchical learning that

Computational Intelligence and Neuroscience 3

Table 1 Properties of word-particle

Name NotesID IdentifierPos Coordinates (word embedding)Word Corresponding wordMass Synthetic alivenessTmpMass All backward related particle alivenesshistory Mass Historical alivenessCurrent Mass Current alivenessBase Mass IdentifierChain Backward related indexMax flaw Backward related degreeFlaw Current flaw lengthRadius Repulsion radiusPull Radius Tension-start radiusVelocity Current velocity of particle

represents the semantic relation among words as a tree struc-ture The deep learning mechanism is tried to build a NLM[37]

23 Force Simulated Method The force-directed algorithmsare mainly applied in data visualization Reference [38] com-pares several force-directed algorithms Reference [39] usesthesemethods for analyzing a complex social network whichadds a gravity to draw the graph of social network In somecross-domain applications wireless sensors network uses it tobuild layouts [40] and [41] performs electrical circuit layoutsautomatically based on the force rules

The reason why we use the force simulated approachto improve the generation of word embedding is that therelation between words semantic is relatively stable within acertain numbers of documents that depict the similar topicThis is somewhat like the convergence of the stable energystatus in the force-directed algorithm We are inspired it touse the idea of the force-related methods to improve theproblems of NLM

3 Methodology

31 Word-Particle Model To map word into a particlessystem we must define a mapping rule that specifies whichattribute of particle corresponds to which feature of wordsemantics The linking table is shown in Table 1 The tableconsists of two parts where the left part designates the namesof each property for particles and the right part gives theexplanation for the corresponding semantic feature

For more explanation we explain them one by one

(i) ID each word has a unique identifier for relating thecorresponding particle

(ii) Pos it is exactly the attribute that we want to trainwhich is also called word embedding that denotes thecoordinate of a particle in a plane actually

Hello

Boy

Bye

World

Figure 1 An example of the word-particle model

(iii) Mass we use the concept of particle mass to denotethe occurrence frequency of a word

(iv) TmpMass we use the temporal mass of particle torepresent the coword frequency

(v) Current Mass amp history Mass amp base Mass theyrepresent the current mass historical mass and basicmass of a particle respectively A function combinedwith them that is used to control the occurrencefrequency of coword within a sampling period isdescribed as

Mass = base Mass +history Mass2

+

Current Mass2

(1)

For controlling the intensity of relevance betweenwords we use the concept of edge in physic todescribe it

(vi) Chain it is an array for recording the ID of thebackward relating word

(vii) Max flaw it is the maximum associating strengthregarding a group of coword

(viii) Flaw it describes the associating strength regarding agroup of cowords

(ix) Radius it is the repulsion radius that keeps a mini-mum distance from other word-particles

(x) Pull Radius it is the pulling radius which means ifother word-particles break in they will be pushedaway keeping the radius distance from the intrudedword-particle

(xi) Velocity it defines a semantic shifting trend that canstrengthen the relation of two force-affected words

In Figure 1 the word-particles world and boy are back-ward related to hello and the word-particle bye is an isolatedword-particle The intensity of relation between the word-particles hello and world is stronger than the intensity ofrelation between hello and boy The directed line denotes thedirection ofword order where the pointedword-particles candrive their linking word-particles For example the cowordhello world appearsmore frequent than the coword hello boy

4 Computational Intelligence and Neuroscience

The BombTank By JDAMInput sentence

Objective word particle Backward relatedword particle

Tank

State 2

State 3

State 4

State 1Bomb

Tank Bomb

f

Tank Bomb

Fire

Plane Pilot

Air

TankBomb

Bomb

tt minus 1

v

v

v

998400

F

Figure 2 The semantic relating rule

32 Semantic Relation Building Based on the word-particlemodel we define the semantic relating rule to control themotion of particles within a given text context The doc-uments for training play the role of a driven-force sourcemaking the words have more opportunities to come togetherwhich appear in similar contextsThe procedure is as follows

Step 1 Theword embedding is trained by the document Eachdocument will be sampled sentence by sentence via a 2-gramwindow In the 2-gram window the first word is assumedas the target object and the second word is assumed as theassociated object The assumption means that the associatedword-particle will be forced tomove towards the target word-particle The related word-particle will be given an impulseThis can drive the word-particle with a certain velocity Theprogress is illustrated as state 1 in Figure 2 The word-particlebomb is associated with tank moving with the velocity V

Step 2 Given an impulse the word-particle can be initializedwith a velocityMeanwhile it will be slowed down by the forcethat comes from friction until its velocity reduces to zero anddoes not get in the repulsion radius of its objective word-particle When the word-particle moves into the repulsionradius of the objective word-particle it will be stopped at theedge keeping a distance of repulsion radius from the objectiveword-particle This is shown as state 2 A velocity affects theword-particle tank and the word-particle bomb is affectedcontinuously by the friction force 119891

Step 3 During a certain period of document learning someword-particles will set up some relations with other word-particlesWe establish a chain reacting rule for simulating thecontext feature The rule specifies the impulsion transition

in the way of particle by particle and the initial energy willdegrade at each reaction This passive action simulates thephenomenon that a topic word in a document has moresemantics and can be an index for document retrieval Theprogress is controlled by (2) Given 119898

0denotes the property

Mass of the impactedword-particle and119898119894denotes this prop-

erty of other word-particles The relation-building conditionis

119894 isin Chain

119889119894gt Pull Radius

(2)

where 119894 denotes the ID of the 119894th word-particle that relates tothe object word-particle and 119889

119894denotes the corresponding

distance of the 119894th word-particle between the object word-particles The velocity V

119905minus1will update V

119905via (3) if the word-

particle satisfies the conditionThis procedurewill repeat iter-atively till the velocity falls to zero For example in state 3 theword-particle bomb has two backward associating particlesfire and plane Its initial velocity will be decomposed withplane according to (3) if given an impulsion towards tankBut the word-particle fire fails to move because it is outsideof the Pull Radius distance of bombThe decomposition willbe delivered repetitively if the velocity fits the condition andis greater than zero

1198980V119905minus1= (

119896

sum

119894isinChain 119889119894gtPull Radius119898119894+ 1198980) V119905 (3)

Step 4 We add a repulsion radius for keeping the uniquenessof every word-particle because each word embedding isunique When the moving word-particle intrudes the repul-sion radius of other particles it will stop and stay at theedge of the affectedword-particles keeping a repulsion radiusdistanceThe progress is shown as state 4 Generally the wordrelation network is expected growing stably we present aninspecting criterion to check the convergence which is asfollows

V119905=

1198980V119905minus1

lim119896rarrinfinsum119896

119894isinChain 119889119894gtPull Radius119898119894 + 1198980997888rarr 0 (4)

In (4) the initial velocity will trend to zero with the relationincreasing of the number of associated word-particles thatis V119905rarr 0 When the movement of an activated particle

becomes tiny such convergence means the property Poshas reached relatively fixed coordinates This indicates thatthe word already has situated in a relatively stable semanticnetwork

33 Semantic Relation Separating For controlling the growthof words association those words that are of low frequency in2-gram context should be filtered whereas those words withhigh frequency of relations should be retained We proposeto use a linear elastic fracture mechanics to control suchfiltering A rope model is presented to represent the cowordrelation which can be assumed as a flaw that comes from atype ofmaterialThe strengthening or weakening of a relation

Computational Intelligence and Neuroscience 5

s

s

W

2a

Bomb

RelationTank

Figure 3 The flaw model

between words is controlled via the corresponding flaw sizeAn illustration is shown in Figure 3

More formally given 119882 denotes the width of rope itsvalue is obtained through the counting of a 2-gram samplingIts maximum value corresponds to the property Max flawGiven 2119886 denotes the size of a flaw corresponding to theproperty Flaw Given 119904 is the pull force that is calculated by119904 = Mass times Velocity We use the stress-intensity factor 119870 tocontrol the size of a flaw 119870 can be obtained as follows

119870 = 119898relationVradic1205871198862119886

119882

(5)

In (5) the variant 119898relation corresponds to the propertyregarding the synthetically occurring frequency of a word-particle and V corresponds to the velocity of an activatedword-particle The value of 119870 is in proportion to the size of2119886 which refers to the concept of flaw Moreover the flawextending speed is denoted by 119889119886119889119873 as

lg 119889119886119889119873

= lg119862 + 119899 lgΔ119870 (6)

In (6) lg119862 denotes a compensation constant and 119899 is ascale factor 119889119886119889119873 is in proportion of 119870 The condition isthat 119870 will decrease if119882 goes beyond 2119886 When the size offlaw is up to 119882 a separation will happen in the semanticrelation This means the associated word-particles are nolonger affected by the initial impulses which are generatedfrom their objective word-particles

4 Simulations

In this section we compare the proposed word embeddingwith the three classic NLM-based word embeddings Huang2012 [42] CampW [3] andMampS-Turian [20] Huang 2012 is theword embedding that uses multiple embeddings per wordCampW is the embedding that is trained by a feedforwardneural network and MampS-Turian is the embedding that isobtained by an improved RBM training model The twodatasets Reuters 21578 and RCV1 are used for training andevaluation In Reuters 21578 we extracted 21578 documentsfrom the raw XML format and discarded the original classlabels and titles only using the description sectionTheRCV1contains 5000 documents that are written from 50 authors70 percent of random sampling among these documentswas used to train the word embedding and the remainder

was used to evaluate the NLP tasks of POS NER and SRLwith other three types of embedding All words will keeptheir original forms whereas the numbers symbols and stopwords are kept to be trained together Those words that arenot included in training corpus will be discarded We regardthese tasks as a classification problem and apply a unifiedfeedforward neural linguistic model to perform the tasksThe compared word embeddings are readymade but theparameters of the neural networks require to be trained bythe corresponding embedding We use the benchmark thatis provided by [3] The results regarding the NLP tasks arecompared based on this benchmark

41 Evaluation The benchmark is measured in terms ofprecision recall and 1198651 [3 20 42] Assuming 119873 words arewaiting for labeling there exit 1198731 | 1198731 le 119873 words thatare labeled correctly and 1198732 | 1198732 le 119873 words that arelabeled wrong The value of precision is used to evaluate theaccuracy of labeling on POS NER and SRL The precisioncan be obtained as

119901 =

1198731

1198731+ 1198732

(7)

The value of recall is used to evaluate the coverage oflabeling on POS NER and SRL The recall can be calculatedas

119903 =

1198731

119873

(8)

The 1198651 is a combining evaluation with precision andrecall which are as follows

1198651 =

2119901119903

119901 + 119903

(9)

42 Training The parameters of the physical system are setas followsThe coefficient of fiction is set to 01 the coefficientof gravity is set to 98 and the initial velocity is set to 2 Theparameters that control semantic separating are set such thatMax flaw is set to 02 the initial value of Flaw is 0 Radiusis set to 1 and Pull Raduis is set to 20 For controlling theflaw extending speed lg119862 is set to 1 and 119899 is set to 1 Wedemonstrate the training result in terms of the generatingprocedure of word graph and average speed of word-particles A word graph can give an intuitive visualization forobserving a group of word relations We test a small numberof datasets to simulate the word embedding generation Theresult is shown in Figure 4 which contains the names from 19countries

In Figure 4(a) all the words appear in the plain physicalsystem and obtain a certain position because the trainingdocuments had given some forces to direct the word arrang-ing that follows the context in original text But in this stagesome word-particles still have a certain degree of speed tomove Those frequent word-particles such as China USAGermany and France have a relatively high speed they movepulling their backward relating word-particles (Figures 4(b)ndash4(d)) For example China has four backward relating word-particles Pakistan India USA and UK Germany has two

6 Computational Intelligence and Neuroscience

IndiaPakistan

Germany

SwedenThailand

CanadaMexico

Hungary

NewZelandPoland

France

USAUKFinland

Russia

Ukraine

Singapore

Austrilia

China

(a)

IndiaPakistan

Germany

Sweden

Thailand

Canada

Mexico

Hungary

NewZeland

PolandFranceUSA

UK

Finland

Russia

UkraineSingapore

Austrilia

China

(b)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA

UK

Finland

Russia

UkraineSingapore

AustriliaChina

(c)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA

UK

Finland

Russia

UkraineSingapore

Austrilia

China

(d)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA UK

Finland

Russia

UkraineSingapore

AustriliaChina

(e)

IndiaPakistan

Germany

Sweden

Thailand

Canada

Mexico

Hungary

NewZeland

PolandFrance

USA UK

Finland

Russia

UkraineSingapore

AustriliaChina

(f)

Figure 4 Force-directed embedding for country names

backward relating word-particles France and Finland Theother isolated word-particles have situated in a relativelystable position with a few movements We can see that someword-particles overlay with each other (Figure 4(d)) forexample India and Pakistan are too close at China andCanada overlays USA The repulsing force starts to functionat this timeThe too close word-particles will push each otheruntil they reach a balance distance (Figure 4(e)) When theinputting documents are all about similar topics the positionsof word-particles will also not vary too much showing arelatively stable topological graph (Figure 4(f))

The training result of dataset Reuters 21578 is shownin Figure 5 Each word-particle is colored with a greenblock The intensity of the relation between word-particlesis represented by a blue line where the thicker lines meana higher frequency of coword relation The position of eachword-particle is a 2-dimensional coordinate that is exactlythe training result word embeddingThe result shows that thenumbers of word relations and new word-particles will grow

with the training which iterates from 1000 to 10000 Theparticles system expands the region outward gradually Theparticle distribution presents as an ellipse for accommodatingmore new words-particles

The training result of RCV1 is shown in Figure 6 Suchdistance-based semantic measurement can be interpretedfrom some viewpoints For example the country and geo-graphical word-particles German Russian US and Bei-jing are clustered together The geo-related word-particlesNiagara WTO and US-based are pushed closely to thesewords Such word-particle graph can present an intuitiveresult for evaluating the training of word embedding nomatter during the training or after training we can intervenethe position of word-particle to improve the final result in adata-visualization based way

On the other hand we use (3) to estimate the averagevelocity of word-particles for evaluating the training processfrom a statistics viewpoint When the velocity decreasesto 1 below the convergence is assumed to be happening

Computational Intelligence and Neuroscience 7

1000 5000

10000

minus40 minus20 0 20 40 60

minus40 minus20 0 20 40 60

minus20 0 20 40 60 80

100

80

60

40

20

0

100

80

60

40

20

0

60

40

20

0

minus20

minus40

1000

minusminusminusminusminusminusminusminusminusminusminusminusminusminusminus20222222222222222222222222222222 00000000000000000000000000000000000000000 20202220222222222202222022022202222 40 60 80

100

80

60

40

20

0

Figure 5 Force-directed embedding for Reuters 21578

and the assumption coincides with the reality roughly Weexperiment 50 times for the two datasets respectively Theresult is shown as the boxplots in Figure 7 From thedownwards trends of average velocity the assumption that aword-semantic networkwill be stabilized in a certain numberof similar documents coincides with the result of two datasetsroughly Both the convergences of the two datasetsrsquo trainingappear at the training stage around the 20000th documents

In the presented word embedding learning there is nota specific converging criterion for terminating the trainingbecause the object functions of these neural based modelsare nonconvex so there may not exist an optimum valuetheoretically Empirically these word embedding learningmethods require repeating documents 20 or 30 times [3] Itbrings a serious problem that time consumption is propor-tional to the number of documents Such procedure usuallyrequires undergoing a long-term training time [3 30] But inour proposed model we demonstrate that setting a semantic

convergence condition is more convenient to select thanthose neural based approaches The convergence criterionprovides amore explicit direction for word embedding learn-ing Meanwhile the result demonstrates that to learn anacceptable word embedding requiring a certain number ofdocuments small or medium scale datasets may not beappropriate

5 Reuters 21578 and RCV1

For testing the usability we compare the word embeddingwith other three word embeddings on the NLP tasks usingthe same type of neural linguistic model The testing itemsare performed on POS NER and SRL The results are listedin Tables 2 and 3 In Reuters 21578 the labeling system usingour word embedding obtains 910 on POS 834 on NERand 673 on SRL for 1198651 This 1198651 score gets the third place inPOS and the second place on both NER and SRL In RCV1 it

8 Computational Intelligence and Neuroscience

Everen bound

statementrequested

eastern

intial

strengthens

Therersquod

title

revive

Thompson

Conservative

12-13

gearedAdding

Range

declaration

Niagara

metal

tragiclisting

deals

US-basedpreception

InteriorFine-tuning

accomplishments

Reformrsquos

appropriate

uprevolt

uncertainty center

WTO

German

US

Russian

canceleddent

lively

testing

Burden

Beijing

42-2-2423-0003

owner

Figure 6 Force-directed embedding for RCV1

achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL

The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by

such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified

Computational Intelligence and Neuroscience 9

1000 5000 10000 20000 250000

05

1

15

2

25

Reuters 21578(a)

RCV11000 5000 10000 20000 25000

0

05

1

15

2

25

(b)

Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends

Table 2 Comparison of POSNERSRL on Reuters 21578

Precision Recall 1198651

Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630

Table 3 Comparison of POSNERSRL on RCV1

Precision Recall 1198651

Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646

which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]

6 Conclusions and Future Work

In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with

the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this manuscript

References

[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003

[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015

[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011

[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009

[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015

[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005

[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005

[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th

10 Computational Intelligence and Neuroscience

Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010

[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016

[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007

[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016

[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016

[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016

[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014

[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013

[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001

[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997

[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010

[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001

[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006

[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal

of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005

[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003

[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999

[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986

[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989

[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995

[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997

[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013

[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000

[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011

[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012

[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014

[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990

[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008

[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012

[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012

[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013

[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010

[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the

Computational Intelligence and Neuroscience 11

International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005

[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012

[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article Fracture Mechanics Method for Word

Computational Intelligence and Neuroscience 3

Table 1 Properties of word-particle

Name NotesID IdentifierPos Coordinates (word embedding)Word Corresponding wordMass Synthetic alivenessTmpMass All backward related particle alivenesshistory Mass Historical alivenessCurrent Mass Current alivenessBase Mass IdentifierChain Backward related indexMax flaw Backward related degreeFlaw Current flaw lengthRadius Repulsion radiusPull Radius Tension-start radiusVelocity Current velocity of particle

represents the semantic relation among words as a tree struc-ture The deep learning mechanism is tried to build a NLM[37]

23 Force Simulated Method The force-directed algorithmsare mainly applied in data visualization Reference [38] com-pares several force-directed algorithms Reference [39] usesthesemethods for analyzing a complex social network whichadds a gravity to draw the graph of social network In somecross-domain applications wireless sensors network uses it tobuild layouts [40] and [41] performs electrical circuit layoutsautomatically based on the force rules

The reason why we use the force simulated approachto improve the generation of word embedding is that therelation between words semantic is relatively stable within acertain numbers of documents that depict the similar topicThis is somewhat like the convergence of the stable energystatus in the force-directed algorithm We are inspired it touse the idea of the force-related methods to improve theproblems of NLM

3 Methodology

31 Word-Particle Model To map word into a particlessystem we must define a mapping rule that specifies whichattribute of particle corresponds to which feature of wordsemantics The linking table is shown in Table 1 The tableconsists of two parts where the left part designates the namesof each property for particles and the right part gives theexplanation for the corresponding semantic feature

For more explanation we explain them one by one

(i) ID each word has a unique identifier for relating thecorresponding particle

(ii) Pos it is exactly the attribute that we want to trainwhich is also called word embedding that denotes thecoordinate of a particle in a plane actually

Hello

Boy

Bye

World

Figure 1 An example of the word-particle model

(iii) Mass we use the concept of particle mass to denotethe occurrence frequency of a word

(iv) TmpMass we use the temporal mass of particle torepresent the coword frequency

(v) Current Mass amp history Mass amp base Mass theyrepresent the current mass historical mass and basicmass of a particle respectively A function combinedwith them that is used to control the occurrencefrequency of coword within a sampling period isdescribed as

Mass = base Mass +history Mass2

+

Current Mass2

(1)

For controlling the intensity of relevance betweenwords we use the concept of edge in physic todescribe it

(vi) Chain it is an array for recording the ID of thebackward relating word

(vii) Max flaw it is the maximum associating strengthregarding a group of coword

(viii) Flaw it describes the associating strength regarding agroup of cowords

(ix) Radius it is the repulsion radius that keeps a mini-mum distance from other word-particles

(x) Pull Radius it is the pulling radius which means ifother word-particles break in they will be pushedaway keeping the radius distance from the intrudedword-particle

(xi) Velocity it defines a semantic shifting trend that canstrengthen the relation of two force-affected words

In Figure 1 the word-particles world and boy are back-ward related to hello and the word-particle bye is an isolatedword-particle The intensity of relation between the word-particles hello and world is stronger than the intensity ofrelation between hello and boy The directed line denotes thedirection ofword order where the pointedword-particles candrive their linking word-particles For example the cowordhello world appearsmore frequent than the coword hello boy

4 Computational Intelligence and Neuroscience

The BombTank By JDAMInput sentence

Objective word particle Backward relatedword particle

Tank

State 2

State 3

State 4

State 1Bomb

Tank Bomb

f

Tank Bomb

Fire

Plane Pilot

Air

TankBomb

Bomb

tt minus 1

v

v

v

998400

F

Figure 2 The semantic relating rule

32 Semantic Relation Building Based on the word-particlemodel we define the semantic relating rule to control themotion of particles within a given text context The doc-uments for training play the role of a driven-force sourcemaking the words have more opportunities to come togetherwhich appear in similar contextsThe procedure is as follows

Step 1 Theword embedding is trained by the document Eachdocument will be sampled sentence by sentence via a 2-gramwindow In the 2-gram window the first word is assumedas the target object and the second word is assumed as theassociated object The assumption means that the associatedword-particle will be forced tomove towards the target word-particle The related word-particle will be given an impulseThis can drive the word-particle with a certain velocity Theprogress is illustrated as state 1 in Figure 2 The word-particlebomb is associated with tank moving with the velocity V

Step 2 Given an impulse the word-particle can be initializedwith a velocityMeanwhile it will be slowed down by the forcethat comes from friction until its velocity reduces to zero anddoes not get in the repulsion radius of its objective word-particle When the word-particle moves into the repulsionradius of the objective word-particle it will be stopped at theedge keeping a distance of repulsion radius from the objectiveword-particle This is shown as state 2 A velocity affects theword-particle tank and the word-particle bomb is affectedcontinuously by the friction force 119891

Step 3 During a certain period of document learning someword-particles will set up some relations with other word-particlesWe establish a chain reacting rule for simulating thecontext feature The rule specifies the impulsion transition

in the way of particle by particle and the initial energy willdegrade at each reaction This passive action simulates thephenomenon that a topic word in a document has moresemantics and can be an index for document retrieval Theprogress is controlled by (2) Given 119898

0denotes the property

Mass of the impactedword-particle and119898119894denotes this prop-

erty of other word-particles The relation-building conditionis

119894 isin Chain

119889119894gt Pull Radius

(2)

where 119894 denotes the ID of the 119894th word-particle that relates tothe object word-particle and 119889

119894denotes the corresponding

distance of the 119894th word-particle between the object word-particles The velocity V

119905minus1will update V

119905via (3) if the word-

particle satisfies the conditionThis procedurewill repeat iter-atively till the velocity falls to zero For example in state 3 theword-particle bomb has two backward associating particlesfire and plane Its initial velocity will be decomposed withplane according to (3) if given an impulsion towards tankBut the word-particle fire fails to move because it is outsideof the Pull Radius distance of bombThe decomposition willbe delivered repetitively if the velocity fits the condition andis greater than zero

1198980V119905minus1= (

119896

sum

119894isinChain 119889119894gtPull Radius119898119894+ 1198980) V119905 (3)

Step 4 We add a repulsion radius for keeping the uniquenessof every word-particle because each word embedding isunique When the moving word-particle intrudes the repul-sion radius of other particles it will stop and stay at theedge of the affectedword-particles keeping a repulsion radiusdistanceThe progress is shown as state 4 Generally the wordrelation network is expected growing stably we present aninspecting criterion to check the convergence which is asfollows

V119905=

1198980V119905minus1

lim119896rarrinfinsum119896

119894isinChain 119889119894gtPull Radius119898119894 + 1198980997888rarr 0 (4)

In (4) the initial velocity will trend to zero with the relationincreasing of the number of associated word-particles thatis V119905rarr 0 When the movement of an activated particle

becomes tiny such convergence means the property Poshas reached relatively fixed coordinates This indicates thatthe word already has situated in a relatively stable semanticnetwork

33 Semantic Relation Separating For controlling the growthof words association those words that are of low frequency in2-gram context should be filtered whereas those words withhigh frequency of relations should be retained We proposeto use a linear elastic fracture mechanics to control suchfiltering A rope model is presented to represent the cowordrelation which can be assumed as a flaw that comes from atype ofmaterialThe strengthening or weakening of a relation

Computational Intelligence and Neuroscience 5

s

s

W

2a

Bomb

RelationTank

Figure 3 The flaw model

between words is controlled via the corresponding flaw sizeAn illustration is shown in Figure 3

More formally given 119882 denotes the width of rope itsvalue is obtained through the counting of a 2-gram samplingIts maximum value corresponds to the property Max flawGiven 2119886 denotes the size of a flaw corresponding to theproperty Flaw Given 119904 is the pull force that is calculated by119904 = Mass times Velocity We use the stress-intensity factor 119870 tocontrol the size of a flaw 119870 can be obtained as follows

119870 = 119898relationVradic1205871198862119886

119882

(5)

In (5) the variant 119898relation corresponds to the propertyregarding the synthetically occurring frequency of a word-particle and V corresponds to the velocity of an activatedword-particle The value of 119870 is in proportion to the size of2119886 which refers to the concept of flaw Moreover the flawextending speed is denoted by 119889119886119889119873 as

lg 119889119886119889119873

= lg119862 + 119899 lgΔ119870 (6)

In (6) lg119862 denotes a compensation constant and 119899 is ascale factor 119889119886119889119873 is in proportion of 119870 The condition isthat 119870 will decrease if119882 goes beyond 2119886 When the size offlaw is up to 119882 a separation will happen in the semanticrelation This means the associated word-particles are nolonger affected by the initial impulses which are generatedfrom their objective word-particles

4 Simulations

In this section we compare the proposed word embeddingwith the three classic NLM-based word embeddings Huang2012 [42] CampW [3] andMampS-Turian [20] Huang 2012 is theword embedding that uses multiple embeddings per wordCampW is the embedding that is trained by a feedforwardneural network and MampS-Turian is the embedding that isobtained by an improved RBM training model The twodatasets Reuters 21578 and RCV1 are used for training andevaluation In Reuters 21578 we extracted 21578 documentsfrom the raw XML format and discarded the original classlabels and titles only using the description sectionTheRCV1contains 5000 documents that are written from 50 authors70 percent of random sampling among these documentswas used to train the word embedding and the remainder

was used to evaluate the NLP tasks of POS NER and SRLwith other three types of embedding All words will keeptheir original forms whereas the numbers symbols and stopwords are kept to be trained together Those words that arenot included in training corpus will be discarded We regardthese tasks as a classification problem and apply a unifiedfeedforward neural linguistic model to perform the tasksThe compared word embeddings are readymade but theparameters of the neural networks require to be trained bythe corresponding embedding We use the benchmark thatis provided by [3] The results regarding the NLP tasks arecompared based on this benchmark

41 Evaluation The benchmark is measured in terms ofprecision recall and 1198651 [3 20 42] Assuming 119873 words arewaiting for labeling there exit 1198731 | 1198731 le 119873 words thatare labeled correctly and 1198732 | 1198732 le 119873 words that arelabeled wrong The value of precision is used to evaluate theaccuracy of labeling on POS NER and SRL The precisioncan be obtained as

119901 =

1198731

1198731+ 1198732

(7)

The value of recall is used to evaluate the coverage oflabeling on POS NER and SRL The recall can be calculatedas

119903 =

1198731

119873

(8)

The 1198651 is a combining evaluation with precision andrecall which are as follows

1198651 =

2119901119903

119901 + 119903

(9)

42 Training The parameters of the physical system are setas followsThe coefficient of fiction is set to 01 the coefficientof gravity is set to 98 and the initial velocity is set to 2 Theparameters that control semantic separating are set such thatMax flaw is set to 02 the initial value of Flaw is 0 Radiusis set to 1 and Pull Raduis is set to 20 For controlling theflaw extending speed lg119862 is set to 1 and 119899 is set to 1 Wedemonstrate the training result in terms of the generatingprocedure of word graph and average speed of word-particles A word graph can give an intuitive visualization forobserving a group of word relations We test a small numberof datasets to simulate the word embedding generation Theresult is shown in Figure 4 which contains the names from 19countries

In Figure 4(a) all the words appear in the plain physicalsystem and obtain a certain position because the trainingdocuments had given some forces to direct the word arrang-ing that follows the context in original text But in this stagesome word-particles still have a certain degree of speed tomove Those frequent word-particles such as China USAGermany and France have a relatively high speed they movepulling their backward relating word-particles (Figures 4(b)ndash4(d)) For example China has four backward relating word-particles Pakistan India USA and UK Germany has two

6 Computational Intelligence and Neuroscience

IndiaPakistan

Germany

SwedenThailand

CanadaMexico

Hungary

NewZelandPoland

France

USAUKFinland

Russia

Ukraine

Singapore

Austrilia

China

(a)

IndiaPakistan

Germany

Sweden

Thailand

Canada

Mexico

Hungary

NewZeland

PolandFranceUSA

UK

Finland

Russia

UkraineSingapore

Austrilia

China

(b)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA

UK

Finland

Russia

UkraineSingapore

AustriliaChina

(c)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA

UK

Finland

Russia

UkraineSingapore

Austrilia

China

(d)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA UK

Finland

Russia

UkraineSingapore

AustriliaChina

(e)

IndiaPakistan

Germany

Sweden

Thailand

Canada

Mexico

Hungary

NewZeland

PolandFrance

USA UK

Finland

Russia

UkraineSingapore

AustriliaChina

(f)

Figure 4 Force-directed embedding for country names

backward relating word-particles France and Finland Theother isolated word-particles have situated in a relativelystable position with a few movements We can see that someword-particles overlay with each other (Figure 4(d)) forexample India and Pakistan are too close at China andCanada overlays USA The repulsing force starts to functionat this timeThe too close word-particles will push each otheruntil they reach a balance distance (Figure 4(e)) When theinputting documents are all about similar topics the positionsof word-particles will also not vary too much showing arelatively stable topological graph (Figure 4(f))

The training result of dataset Reuters 21578 is shownin Figure 5 Each word-particle is colored with a greenblock The intensity of the relation between word-particlesis represented by a blue line where the thicker lines meana higher frequency of coword relation The position of eachword-particle is a 2-dimensional coordinate that is exactlythe training result word embeddingThe result shows that thenumbers of word relations and new word-particles will grow

with the training which iterates from 1000 to 10000 Theparticles system expands the region outward gradually Theparticle distribution presents as an ellipse for accommodatingmore new words-particles

The training result of RCV1 is shown in Figure 6 Suchdistance-based semantic measurement can be interpretedfrom some viewpoints For example the country and geo-graphical word-particles German Russian US and Bei-jing are clustered together The geo-related word-particlesNiagara WTO and US-based are pushed closely to thesewords Such word-particle graph can present an intuitiveresult for evaluating the training of word embedding nomatter during the training or after training we can intervenethe position of word-particle to improve the final result in adata-visualization based way

On the other hand we use (3) to estimate the averagevelocity of word-particles for evaluating the training processfrom a statistics viewpoint When the velocity decreasesto 1 below the convergence is assumed to be happening

Computational Intelligence and Neuroscience 7

1000 5000

10000

minus40 minus20 0 20 40 60

minus40 minus20 0 20 40 60

minus20 0 20 40 60 80

100

80

60

40

20

0

100

80

60

40

20

0

60

40

20

0

minus20

minus40

1000

minusminusminusminusminusminusminusminusminusminusminusminusminusminusminus20222222222222222222222222222222 00000000000000000000000000000000000000000 20202220222222222202222022022202222 40 60 80

100

80

60

40

20

0

Figure 5 Force-directed embedding for Reuters 21578

and the assumption coincides with the reality roughly Weexperiment 50 times for the two datasets respectively Theresult is shown as the boxplots in Figure 7 From thedownwards trends of average velocity the assumption that aword-semantic networkwill be stabilized in a certain numberof similar documents coincides with the result of two datasetsroughly Both the convergences of the two datasetsrsquo trainingappear at the training stage around the 20000th documents

In the presented word embedding learning there is nota specific converging criterion for terminating the trainingbecause the object functions of these neural based modelsare nonconvex so there may not exist an optimum valuetheoretically Empirically these word embedding learningmethods require repeating documents 20 or 30 times [3] Itbrings a serious problem that time consumption is propor-tional to the number of documents Such procedure usuallyrequires undergoing a long-term training time [3 30] But inour proposed model we demonstrate that setting a semantic

convergence condition is more convenient to select thanthose neural based approaches The convergence criterionprovides amore explicit direction for word embedding learn-ing Meanwhile the result demonstrates that to learn anacceptable word embedding requiring a certain number ofdocuments small or medium scale datasets may not beappropriate

5 Reuters 21578 and RCV1

For testing the usability we compare the word embeddingwith other three word embeddings on the NLP tasks usingthe same type of neural linguistic model The testing itemsare performed on POS NER and SRL The results are listedin Tables 2 and 3 In Reuters 21578 the labeling system usingour word embedding obtains 910 on POS 834 on NERand 673 on SRL for 1198651 This 1198651 score gets the third place inPOS and the second place on both NER and SRL In RCV1 it

8 Computational Intelligence and Neuroscience

Everen bound

statementrequested

eastern

intial

strengthens

Therersquod

title

revive

Thompson

Conservative

12-13

gearedAdding

Range

declaration

Niagara

metal

tragiclisting

deals

US-basedpreception

InteriorFine-tuning

accomplishments

Reformrsquos

appropriate

uprevolt

uncertainty center

WTO

German

US

Russian

canceleddent

lively

testing

Burden

Beijing

42-2-2423-0003

owner

Figure 6 Force-directed embedding for RCV1

achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL

The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by

such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified

Computational Intelligence and Neuroscience 9

1000 5000 10000 20000 250000

05

1

15

2

25

Reuters 21578(a)

RCV11000 5000 10000 20000 25000

0

05

1

15

2

25

(b)

Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends

Table 2 Comparison of POSNERSRL on Reuters 21578

Precision Recall 1198651

Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630

Table 3 Comparison of POSNERSRL on RCV1

Precision Recall 1198651

Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646

which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]

6 Conclusions and Future Work

In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with

the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this manuscript

References

[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003

[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015

[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011

[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009

[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015

[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005

[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005

[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th

10 Computational Intelligence and Neuroscience

Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010

[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016

[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007

[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016

[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016

[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016

[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014

[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013

[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001

[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997

[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010

[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001

[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006

[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal

of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005

[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003

[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999

[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986

[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989

[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995

[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997

[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013

[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000

[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011

[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012

[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014

[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990

[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008

[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012

[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012

[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013

[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010

[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the

Computational Intelligence and Neuroscience 11

International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005

[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012

[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article Fracture Mechanics Method for Word

4 Computational Intelligence and Neuroscience

The BombTank By JDAMInput sentence

Objective word particle Backward relatedword particle

Tank

State 2

State 3

State 4

State 1Bomb

Tank Bomb

f

Tank Bomb

Fire

Plane Pilot

Air

TankBomb

Bomb

tt minus 1

v

v

v

998400

F

Figure 2 The semantic relating rule

32 Semantic Relation Building Based on the word-particlemodel we define the semantic relating rule to control themotion of particles within a given text context The doc-uments for training play the role of a driven-force sourcemaking the words have more opportunities to come togetherwhich appear in similar contextsThe procedure is as follows

Step 1 Theword embedding is trained by the document Eachdocument will be sampled sentence by sentence via a 2-gramwindow In the 2-gram window the first word is assumedas the target object and the second word is assumed as theassociated object The assumption means that the associatedword-particle will be forced tomove towards the target word-particle The related word-particle will be given an impulseThis can drive the word-particle with a certain velocity Theprogress is illustrated as state 1 in Figure 2 The word-particlebomb is associated with tank moving with the velocity V

Step 2 Given an impulse the word-particle can be initializedwith a velocityMeanwhile it will be slowed down by the forcethat comes from friction until its velocity reduces to zero anddoes not get in the repulsion radius of its objective word-particle When the word-particle moves into the repulsionradius of the objective word-particle it will be stopped at theedge keeping a distance of repulsion radius from the objectiveword-particle This is shown as state 2 A velocity affects theword-particle tank and the word-particle bomb is affectedcontinuously by the friction force 119891

Step 3 During a certain period of document learning someword-particles will set up some relations with other word-particlesWe establish a chain reacting rule for simulating thecontext feature The rule specifies the impulsion transition

in the way of particle by particle and the initial energy willdegrade at each reaction This passive action simulates thephenomenon that a topic word in a document has moresemantics and can be an index for document retrieval Theprogress is controlled by (2) Given 119898

0denotes the property

Mass of the impactedword-particle and119898119894denotes this prop-

erty of other word-particles The relation-building conditionis

119894 isin Chain

119889119894gt Pull Radius

(2)

where 119894 denotes the ID of the 119894th word-particle that relates tothe object word-particle and 119889

119894denotes the corresponding

distance of the 119894th word-particle between the object word-particles The velocity V

119905minus1will update V

119905via (3) if the word-

particle satisfies the conditionThis procedurewill repeat iter-atively till the velocity falls to zero For example in state 3 theword-particle bomb has two backward associating particlesfire and plane Its initial velocity will be decomposed withplane according to (3) if given an impulsion towards tankBut the word-particle fire fails to move because it is outsideof the Pull Radius distance of bombThe decomposition willbe delivered repetitively if the velocity fits the condition andis greater than zero

1198980V119905minus1= (

119896

sum

119894isinChain 119889119894gtPull Radius119898119894+ 1198980) V119905 (3)

Step 4 We add a repulsion radius for keeping the uniquenessof every word-particle because each word embedding isunique When the moving word-particle intrudes the repul-sion radius of other particles it will stop and stay at theedge of the affectedword-particles keeping a repulsion radiusdistanceThe progress is shown as state 4 Generally the wordrelation network is expected growing stably we present aninspecting criterion to check the convergence which is asfollows

V119905=

1198980V119905minus1

lim119896rarrinfinsum119896

119894isinChain 119889119894gtPull Radius119898119894 + 1198980997888rarr 0 (4)

In (4) the initial velocity will trend to zero with the relationincreasing of the number of associated word-particles thatis V119905rarr 0 When the movement of an activated particle

becomes tiny such convergence means the property Poshas reached relatively fixed coordinates This indicates thatthe word already has situated in a relatively stable semanticnetwork

33 Semantic Relation Separating For controlling the growthof words association those words that are of low frequency in2-gram context should be filtered whereas those words withhigh frequency of relations should be retained We proposeto use a linear elastic fracture mechanics to control suchfiltering A rope model is presented to represent the cowordrelation which can be assumed as a flaw that comes from atype ofmaterialThe strengthening or weakening of a relation

Computational Intelligence and Neuroscience 5

s

s

W

2a

Bomb

RelationTank

Figure 3 The flaw model

between words is controlled via the corresponding flaw sizeAn illustration is shown in Figure 3

More formally given 119882 denotes the width of rope itsvalue is obtained through the counting of a 2-gram samplingIts maximum value corresponds to the property Max flawGiven 2119886 denotes the size of a flaw corresponding to theproperty Flaw Given 119904 is the pull force that is calculated by119904 = Mass times Velocity We use the stress-intensity factor 119870 tocontrol the size of a flaw 119870 can be obtained as follows

119870 = 119898relationVradic1205871198862119886

119882

(5)

In (5) the variant 119898relation corresponds to the propertyregarding the synthetically occurring frequency of a word-particle and V corresponds to the velocity of an activatedword-particle The value of 119870 is in proportion to the size of2119886 which refers to the concept of flaw Moreover the flawextending speed is denoted by 119889119886119889119873 as

lg 119889119886119889119873

= lg119862 + 119899 lgΔ119870 (6)

In (6) lg119862 denotes a compensation constant and 119899 is ascale factor 119889119886119889119873 is in proportion of 119870 The condition isthat 119870 will decrease if119882 goes beyond 2119886 When the size offlaw is up to 119882 a separation will happen in the semanticrelation This means the associated word-particles are nolonger affected by the initial impulses which are generatedfrom their objective word-particles

4 Simulations

In this section we compare the proposed word embeddingwith the three classic NLM-based word embeddings Huang2012 [42] CampW [3] andMampS-Turian [20] Huang 2012 is theword embedding that uses multiple embeddings per wordCampW is the embedding that is trained by a feedforwardneural network and MampS-Turian is the embedding that isobtained by an improved RBM training model The twodatasets Reuters 21578 and RCV1 are used for training andevaluation In Reuters 21578 we extracted 21578 documentsfrom the raw XML format and discarded the original classlabels and titles only using the description sectionTheRCV1contains 5000 documents that are written from 50 authors70 percent of random sampling among these documentswas used to train the word embedding and the remainder

was used to evaluate the NLP tasks of POS NER and SRLwith other three types of embedding All words will keeptheir original forms whereas the numbers symbols and stopwords are kept to be trained together Those words that arenot included in training corpus will be discarded We regardthese tasks as a classification problem and apply a unifiedfeedforward neural linguistic model to perform the tasksThe compared word embeddings are readymade but theparameters of the neural networks require to be trained bythe corresponding embedding We use the benchmark thatis provided by [3] The results regarding the NLP tasks arecompared based on this benchmark

41 Evaluation The benchmark is measured in terms ofprecision recall and 1198651 [3 20 42] Assuming 119873 words arewaiting for labeling there exit 1198731 | 1198731 le 119873 words thatare labeled correctly and 1198732 | 1198732 le 119873 words that arelabeled wrong The value of precision is used to evaluate theaccuracy of labeling on POS NER and SRL The precisioncan be obtained as

119901 =

1198731

1198731+ 1198732

(7)

The value of recall is used to evaluate the coverage oflabeling on POS NER and SRL The recall can be calculatedas

119903 =

1198731

119873

(8)

The 1198651 is a combining evaluation with precision andrecall which are as follows

1198651 =

2119901119903

119901 + 119903

(9)

42 Training The parameters of the physical system are setas followsThe coefficient of fiction is set to 01 the coefficientof gravity is set to 98 and the initial velocity is set to 2 Theparameters that control semantic separating are set such thatMax flaw is set to 02 the initial value of Flaw is 0 Radiusis set to 1 and Pull Raduis is set to 20 For controlling theflaw extending speed lg119862 is set to 1 and 119899 is set to 1 Wedemonstrate the training result in terms of the generatingprocedure of word graph and average speed of word-particles A word graph can give an intuitive visualization forobserving a group of word relations We test a small numberof datasets to simulate the word embedding generation Theresult is shown in Figure 4 which contains the names from 19countries

In Figure 4(a) all the words appear in the plain physicalsystem and obtain a certain position because the trainingdocuments had given some forces to direct the word arrang-ing that follows the context in original text But in this stagesome word-particles still have a certain degree of speed tomove Those frequent word-particles such as China USAGermany and France have a relatively high speed they movepulling their backward relating word-particles (Figures 4(b)ndash4(d)) For example China has four backward relating word-particles Pakistan India USA and UK Germany has two

6 Computational Intelligence and Neuroscience

IndiaPakistan

Germany

SwedenThailand

CanadaMexico

Hungary

NewZelandPoland

France

USAUKFinland

Russia

Ukraine

Singapore

Austrilia

China

(a)

IndiaPakistan

Germany

Sweden

Thailand

Canada

Mexico

Hungary

NewZeland

PolandFranceUSA

UK

Finland

Russia

UkraineSingapore

Austrilia

China

(b)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA

UK

Finland

Russia

UkraineSingapore

AustriliaChina

(c)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA

UK

Finland

Russia

UkraineSingapore

Austrilia

China

(d)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA UK

Finland

Russia

UkraineSingapore

AustriliaChina

(e)

IndiaPakistan

Germany

Sweden

Thailand

Canada

Mexico

Hungary

NewZeland

PolandFrance

USA UK

Finland

Russia

UkraineSingapore

AustriliaChina

(f)

Figure 4 Force-directed embedding for country names

backward relating word-particles France and Finland Theother isolated word-particles have situated in a relativelystable position with a few movements We can see that someword-particles overlay with each other (Figure 4(d)) forexample India and Pakistan are too close at China andCanada overlays USA The repulsing force starts to functionat this timeThe too close word-particles will push each otheruntil they reach a balance distance (Figure 4(e)) When theinputting documents are all about similar topics the positionsof word-particles will also not vary too much showing arelatively stable topological graph (Figure 4(f))

The training result of dataset Reuters 21578 is shownin Figure 5 Each word-particle is colored with a greenblock The intensity of the relation between word-particlesis represented by a blue line where the thicker lines meana higher frequency of coword relation The position of eachword-particle is a 2-dimensional coordinate that is exactlythe training result word embeddingThe result shows that thenumbers of word relations and new word-particles will grow

with the training which iterates from 1000 to 10000 Theparticles system expands the region outward gradually Theparticle distribution presents as an ellipse for accommodatingmore new words-particles

The training result of RCV1 is shown in Figure 6 Suchdistance-based semantic measurement can be interpretedfrom some viewpoints For example the country and geo-graphical word-particles German Russian US and Bei-jing are clustered together The geo-related word-particlesNiagara WTO and US-based are pushed closely to thesewords Such word-particle graph can present an intuitiveresult for evaluating the training of word embedding nomatter during the training or after training we can intervenethe position of word-particle to improve the final result in adata-visualization based way

On the other hand we use (3) to estimate the averagevelocity of word-particles for evaluating the training processfrom a statistics viewpoint When the velocity decreasesto 1 below the convergence is assumed to be happening

Computational Intelligence and Neuroscience 7

1000 5000

10000

minus40 minus20 0 20 40 60

minus40 minus20 0 20 40 60

minus20 0 20 40 60 80

100

80

60

40

20

0

100

80

60

40

20

0

60

40

20

0

minus20

minus40

1000

minusminusminusminusminusminusminusminusminusminusminusminusminusminusminus20222222222222222222222222222222 00000000000000000000000000000000000000000 20202220222222222202222022022202222 40 60 80

100

80

60

40

20

0

Figure 5 Force-directed embedding for Reuters 21578

and the assumption coincides with the reality roughly Weexperiment 50 times for the two datasets respectively Theresult is shown as the boxplots in Figure 7 From thedownwards trends of average velocity the assumption that aword-semantic networkwill be stabilized in a certain numberof similar documents coincides with the result of two datasetsroughly Both the convergences of the two datasetsrsquo trainingappear at the training stage around the 20000th documents

In the presented word embedding learning there is nota specific converging criterion for terminating the trainingbecause the object functions of these neural based modelsare nonconvex so there may not exist an optimum valuetheoretically Empirically these word embedding learningmethods require repeating documents 20 or 30 times [3] Itbrings a serious problem that time consumption is propor-tional to the number of documents Such procedure usuallyrequires undergoing a long-term training time [3 30] But inour proposed model we demonstrate that setting a semantic

convergence condition is more convenient to select thanthose neural based approaches The convergence criterionprovides amore explicit direction for word embedding learn-ing Meanwhile the result demonstrates that to learn anacceptable word embedding requiring a certain number ofdocuments small or medium scale datasets may not beappropriate

5 Reuters 21578 and RCV1

For testing the usability we compare the word embeddingwith other three word embeddings on the NLP tasks usingthe same type of neural linguistic model The testing itemsare performed on POS NER and SRL The results are listedin Tables 2 and 3 In Reuters 21578 the labeling system usingour word embedding obtains 910 on POS 834 on NERand 673 on SRL for 1198651 This 1198651 score gets the third place inPOS and the second place on both NER and SRL In RCV1 it

8 Computational Intelligence and Neuroscience

Everen bound

statementrequested

eastern

intial

strengthens

Therersquod

title

revive

Thompson

Conservative

12-13

gearedAdding

Range

declaration

Niagara

metal

tragiclisting

deals

US-basedpreception

InteriorFine-tuning

accomplishments

Reformrsquos

appropriate

uprevolt

uncertainty center

WTO

German

US

Russian

canceleddent

lively

testing

Burden

Beijing

42-2-2423-0003

owner

Figure 6 Force-directed embedding for RCV1

achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL

The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by

such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified

Computational Intelligence and Neuroscience 9

1000 5000 10000 20000 250000

05

1

15

2

25

Reuters 21578(a)

RCV11000 5000 10000 20000 25000

0

05

1

15

2

25

(b)

Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends

Table 2 Comparison of POSNERSRL on Reuters 21578

Precision Recall 1198651

Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630

Table 3 Comparison of POSNERSRL on RCV1

Precision Recall 1198651

Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646

which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]

6 Conclusions and Future Work

In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with

the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this manuscript

References

[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003

[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015

[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011

[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009

[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015

[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005

[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005

[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th

10 Computational Intelligence and Neuroscience

Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010

[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016

[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007

[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016

[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016

[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016

[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014

[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013

[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001

[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997

[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010

[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001

[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006

[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal

of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005

[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003

[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999

[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986

[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989

[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995

[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997

[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013

[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000

[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011

[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012

[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014

[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990

[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008

[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012

[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012

[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013

[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010

[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the

Computational Intelligence and Neuroscience 11

International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005

[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012

[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article Fracture Mechanics Method for Word

Computational Intelligence and Neuroscience 5

s

s

W

2a

Bomb

RelationTank

Figure 3 The flaw model

between words is controlled via the corresponding flaw sizeAn illustration is shown in Figure 3

More formally given 119882 denotes the width of rope itsvalue is obtained through the counting of a 2-gram samplingIts maximum value corresponds to the property Max flawGiven 2119886 denotes the size of a flaw corresponding to theproperty Flaw Given 119904 is the pull force that is calculated by119904 = Mass times Velocity We use the stress-intensity factor 119870 tocontrol the size of a flaw 119870 can be obtained as follows

119870 = 119898relationVradic1205871198862119886

119882

(5)

In (5) the variant 119898relation corresponds to the propertyregarding the synthetically occurring frequency of a word-particle and V corresponds to the velocity of an activatedword-particle The value of 119870 is in proportion to the size of2119886 which refers to the concept of flaw Moreover the flawextending speed is denoted by 119889119886119889119873 as

lg 119889119886119889119873

= lg119862 + 119899 lgΔ119870 (6)

In (6) lg119862 denotes a compensation constant and 119899 is ascale factor 119889119886119889119873 is in proportion of 119870 The condition isthat 119870 will decrease if119882 goes beyond 2119886 When the size offlaw is up to 119882 a separation will happen in the semanticrelation This means the associated word-particles are nolonger affected by the initial impulses which are generatedfrom their objective word-particles

4 Simulations

In this section we compare the proposed word embeddingwith the three classic NLM-based word embeddings Huang2012 [42] CampW [3] andMampS-Turian [20] Huang 2012 is theword embedding that uses multiple embeddings per wordCampW is the embedding that is trained by a feedforwardneural network and MampS-Turian is the embedding that isobtained by an improved RBM training model The twodatasets Reuters 21578 and RCV1 are used for training andevaluation In Reuters 21578 we extracted 21578 documentsfrom the raw XML format and discarded the original classlabels and titles only using the description sectionTheRCV1contains 5000 documents that are written from 50 authors70 percent of random sampling among these documentswas used to train the word embedding and the remainder

was used to evaluate the NLP tasks of POS NER and SRLwith other three types of embedding All words will keeptheir original forms whereas the numbers symbols and stopwords are kept to be trained together Those words that arenot included in training corpus will be discarded We regardthese tasks as a classification problem and apply a unifiedfeedforward neural linguistic model to perform the tasksThe compared word embeddings are readymade but theparameters of the neural networks require to be trained bythe corresponding embedding We use the benchmark thatis provided by [3] The results regarding the NLP tasks arecompared based on this benchmark

41 Evaluation The benchmark is measured in terms ofprecision recall and 1198651 [3 20 42] Assuming 119873 words arewaiting for labeling there exit 1198731 | 1198731 le 119873 words thatare labeled correctly and 1198732 | 1198732 le 119873 words that arelabeled wrong The value of precision is used to evaluate theaccuracy of labeling on POS NER and SRL The precisioncan be obtained as

119901 =

1198731

1198731+ 1198732

(7)

The value of recall is used to evaluate the coverage oflabeling on POS NER and SRL The recall can be calculatedas

119903 =

1198731

119873

(8)

The 1198651 is a combining evaluation with precision andrecall which are as follows

1198651 =

2119901119903

119901 + 119903

(9)

42 Training The parameters of the physical system are setas followsThe coefficient of fiction is set to 01 the coefficientof gravity is set to 98 and the initial velocity is set to 2 Theparameters that control semantic separating are set such thatMax flaw is set to 02 the initial value of Flaw is 0 Radiusis set to 1 and Pull Raduis is set to 20 For controlling theflaw extending speed lg119862 is set to 1 and 119899 is set to 1 Wedemonstrate the training result in terms of the generatingprocedure of word graph and average speed of word-particles A word graph can give an intuitive visualization forobserving a group of word relations We test a small numberof datasets to simulate the word embedding generation Theresult is shown in Figure 4 which contains the names from 19countries

In Figure 4(a) all the words appear in the plain physicalsystem and obtain a certain position because the trainingdocuments had given some forces to direct the word arrang-ing that follows the context in original text But in this stagesome word-particles still have a certain degree of speed tomove Those frequent word-particles such as China USAGermany and France have a relatively high speed they movepulling their backward relating word-particles (Figures 4(b)ndash4(d)) For example China has four backward relating word-particles Pakistan India USA and UK Germany has two

6 Computational Intelligence and Neuroscience

IndiaPakistan

Germany

SwedenThailand

CanadaMexico

Hungary

NewZelandPoland

France

USAUKFinland

Russia

Ukraine

Singapore

Austrilia

China

(a)

IndiaPakistan

Germany

Sweden

Thailand

Canada

Mexico

Hungary

NewZeland

PolandFranceUSA

UK

Finland

Russia

UkraineSingapore

Austrilia

China

(b)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA

UK

Finland

Russia

UkraineSingapore

AustriliaChina

(c)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA

UK

Finland

Russia

UkraineSingapore

Austrilia

China

(d)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA UK

Finland

Russia

UkraineSingapore

AustriliaChina

(e)

IndiaPakistan

Germany

Sweden

Thailand

Canada

Mexico

Hungary

NewZeland

PolandFrance

USA UK

Finland

Russia

UkraineSingapore

AustriliaChina

(f)

Figure 4 Force-directed embedding for country names

backward relating word-particles France and Finland Theother isolated word-particles have situated in a relativelystable position with a few movements We can see that someword-particles overlay with each other (Figure 4(d)) forexample India and Pakistan are too close at China andCanada overlays USA The repulsing force starts to functionat this timeThe too close word-particles will push each otheruntil they reach a balance distance (Figure 4(e)) When theinputting documents are all about similar topics the positionsof word-particles will also not vary too much showing arelatively stable topological graph (Figure 4(f))

The training result of dataset Reuters 21578 is shownin Figure 5 Each word-particle is colored with a greenblock The intensity of the relation between word-particlesis represented by a blue line where the thicker lines meana higher frequency of coword relation The position of eachword-particle is a 2-dimensional coordinate that is exactlythe training result word embeddingThe result shows that thenumbers of word relations and new word-particles will grow

with the training which iterates from 1000 to 10000 Theparticles system expands the region outward gradually Theparticle distribution presents as an ellipse for accommodatingmore new words-particles

The training result of RCV1 is shown in Figure 6 Suchdistance-based semantic measurement can be interpretedfrom some viewpoints For example the country and geo-graphical word-particles German Russian US and Bei-jing are clustered together The geo-related word-particlesNiagara WTO and US-based are pushed closely to thesewords Such word-particle graph can present an intuitiveresult for evaluating the training of word embedding nomatter during the training or after training we can intervenethe position of word-particle to improve the final result in adata-visualization based way

On the other hand we use (3) to estimate the averagevelocity of word-particles for evaluating the training processfrom a statistics viewpoint When the velocity decreasesto 1 below the convergence is assumed to be happening

Computational Intelligence and Neuroscience 7

1000 5000

10000

minus40 minus20 0 20 40 60

minus40 minus20 0 20 40 60

minus20 0 20 40 60 80

100

80

60

40

20

0

100

80

60

40

20

0

60

40

20

0

minus20

minus40

1000

minusminusminusminusminusminusminusminusminusminusminusminusminusminusminus20222222222222222222222222222222 00000000000000000000000000000000000000000 20202220222222222202222022022202222 40 60 80

100

80

60

40

20

0

Figure 5 Force-directed embedding for Reuters 21578

and the assumption coincides with the reality roughly Weexperiment 50 times for the two datasets respectively Theresult is shown as the boxplots in Figure 7 From thedownwards trends of average velocity the assumption that aword-semantic networkwill be stabilized in a certain numberof similar documents coincides with the result of two datasetsroughly Both the convergences of the two datasetsrsquo trainingappear at the training stage around the 20000th documents

In the presented word embedding learning there is nota specific converging criterion for terminating the trainingbecause the object functions of these neural based modelsare nonconvex so there may not exist an optimum valuetheoretically Empirically these word embedding learningmethods require repeating documents 20 or 30 times [3] Itbrings a serious problem that time consumption is propor-tional to the number of documents Such procedure usuallyrequires undergoing a long-term training time [3 30] But inour proposed model we demonstrate that setting a semantic

convergence condition is more convenient to select thanthose neural based approaches The convergence criterionprovides amore explicit direction for word embedding learn-ing Meanwhile the result demonstrates that to learn anacceptable word embedding requiring a certain number ofdocuments small or medium scale datasets may not beappropriate

5 Reuters 21578 and RCV1

For testing the usability we compare the word embeddingwith other three word embeddings on the NLP tasks usingthe same type of neural linguistic model The testing itemsare performed on POS NER and SRL The results are listedin Tables 2 and 3 In Reuters 21578 the labeling system usingour word embedding obtains 910 on POS 834 on NERand 673 on SRL for 1198651 This 1198651 score gets the third place inPOS and the second place on both NER and SRL In RCV1 it

8 Computational Intelligence and Neuroscience

Everen bound

statementrequested

eastern

intial

strengthens

Therersquod

title

revive

Thompson

Conservative

12-13

gearedAdding

Range

declaration

Niagara

metal

tragiclisting

deals

US-basedpreception

InteriorFine-tuning

accomplishments

Reformrsquos

appropriate

uprevolt

uncertainty center

WTO

German

US

Russian

canceleddent

lively

testing

Burden

Beijing

42-2-2423-0003

owner

Figure 6 Force-directed embedding for RCV1

achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL

The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by

such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified

Computational Intelligence and Neuroscience 9

1000 5000 10000 20000 250000

05

1

15

2

25

Reuters 21578(a)

RCV11000 5000 10000 20000 25000

0

05

1

15

2

25

(b)

Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends

Table 2 Comparison of POSNERSRL on Reuters 21578

Precision Recall 1198651

Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630

Table 3 Comparison of POSNERSRL on RCV1

Precision Recall 1198651

Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646

which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]

6 Conclusions and Future Work

In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with

the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this manuscript

References

[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003

[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015

[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011

[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009

[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015

[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005

[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005

[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th

10 Computational Intelligence and Neuroscience

Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010

[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016

[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007

[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016

[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016

[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016

[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014

[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013

[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001

[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997

[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010

[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001

[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006

[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal

of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005

[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003

[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999

[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986

[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989

[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995

[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997

[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013

[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000

[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011

[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012

[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014

[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990

[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008

[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012

[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012

[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013

[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010

[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the

Computational Intelligence and Neuroscience 11

International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005

[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012

[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article Fracture Mechanics Method for Word

6 Computational Intelligence and Neuroscience

IndiaPakistan

Germany

SwedenThailand

CanadaMexico

Hungary

NewZelandPoland

France

USAUKFinland

Russia

Ukraine

Singapore

Austrilia

China

(a)

IndiaPakistan

Germany

Sweden

Thailand

Canada

Mexico

Hungary

NewZeland

PolandFranceUSA

UK

Finland

Russia

UkraineSingapore

Austrilia

China

(b)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA

UK

Finland

Russia

UkraineSingapore

AustriliaChina

(c)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA

UK

Finland

Russia

UkraineSingapore

Austrilia

China

(d)

IndiaPakistan

Germany

Sweden

Thailand

CanadaMexico

Hungary

NewZeland

PolandFrance

USA UK

Finland

Russia

UkraineSingapore

AustriliaChina

(e)

IndiaPakistan

Germany

Sweden

Thailand

Canada

Mexico

Hungary

NewZeland

PolandFrance

USA UK

Finland

Russia

UkraineSingapore

AustriliaChina

(f)

Figure 4 Force-directed embedding for country names

backward relating word-particles France and Finland Theother isolated word-particles have situated in a relativelystable position with a few movements We can see that someword-particles overlay with each other (Figure 4(d)) forexample India and Pakistan are too close at China andCanada overlays USA The repulsing force starts to functionat this timeThe too close word-particles will push each otheruntil they reach a balance distance (Figure 4(e)) When theinputting documents are all about similar topics the positionsof word-particles will also not vary too much showing arelatively stable topological graph (Figure 4(f))

The training result of dataset Reuters 21578 is shownin Figure 5 Each word-particle is colored with a greenblock The intensity of the relation between word-particlesis represented by a blue line where the thicker lines meana higher frequency of coword relation The position of eachword-particle is a 2-dimensional coordinate that is exactlythe training result word embeddingThe result shows that thenumbers of word relations and new word-particles will grow

with the training which iterates from 1000 to 10000 Theparticles system expands the region outward gradually Theparticle distribution presents as an ellipse for accommodatingmore new words-particles

The training result of RCV1 is shown in Figure 6 Suchdistance-based semantic measurement can be interpretedfrom some viewpoints For example the country and geo-graphical word-particles German Russian US and Bei-jing are clustered together The geo-related word-particlesNiagara WTO and US-based are pushed closely to thesewords Such word-particle graph can present an intuitiveresult for evaluating the training of word embedding nomatter during the training or after training we can intervenethe position of word-particle to improve the final result in adata-visualization based way

On the other hand we use (3) to estimate the averagevelocity of word-particles for evaluating the training processfrom a statistics viewpoint When the velocity decreasesto 1 below the convergence is assumed to be happening

Computational Intelligence and Neuroscience 7

1000 5000

10000

minus40 minus20 0 20 40 60

minus40 minus20 0 20 40 60

minus20 0 20 40 60 80

100

80

60

40

20

0

100

80

60

40

20

0

60

40

20

0

minus20

minus40

1000

minusminusminusminusminusminusminusminusminusminusminusminusminusminusminus20222222222222222222222222222222 00000000000000000000000000000000000000000 20202220222222222202222022022202222 40 60 80

100

80

60

40

20

0

Figure 5 Force-directed embedding for Reuters 21578

and the assumption coincides with the reality roughly Weexperiment 50 times for the two datasets respectively Theresult is shown as the boxplots in Figure 7 From thedownwards trends of average velocity the assumption that aword-semantic networkwill be stabilized in a certain numberof similar documents coincides with the result of two datasetsroughly Both the convergences of the two datasetsrsquo trainingappear at the training stage around the 20000th documents

In the presented word embedding learning there is nota specific converging criterion for terminating the trainingbecause the object functions of these neural based modelsare nonconvex so there may not exist an optimum valuetheoretically Empirically these word embedding learningmethods require repeating documents 20 or 30 times [3] Itbrings a serious problem that time consumption is propor-tional to the number of documents Such procedure usuallyrequires undergoing a long-term training time [3 30] But inour proposed model we demonstrate that setting a semantic

convergence condition is more convenient to select thanthose neural based approaches The convergence criterionprovides amore explicit direction for word embedding learn-ing Meanwhile the result demonstrates that to learn anacceptable word embedding requiring a certain number ofdocuments small or medium scale datasets may not beappropriate

5 Reuters 21578 and RCV1

For testing the usability we compare the word embeddingwith other three word embeddings on the NLP tasks usingthe same type of neural linguistic model The testing itemsare performed on POS NER and SRL The results are listedin Tables 2 and 3 In Reuters 21578 the labeling system usingour word embedding obtains 910 on POS 834 on NERand 673 on SRL for 1198651 This 1198651 score gets the third place inPOS and the second place on both NER and SRL In RCV1 it

8 Computational Intelligence and Neuroscience

Everen bound

statementrequested

eastern

intial

strengthens

Therersquod

title

revive

Thompson

Conservative

12-13

gearedAdding

Range

declaration

Niagara

metal

tragiclisting

deals

US-basedpreception

InteriorFine-tuning

accomplishments

Reformrsquos

appropriate

uprevolt

uncertainty center

WTO

German

US

Russian

canceleddent

lively

testing

Burden

Beijing

42-2-2423-0003

owner

Figure 6 Force-directed embedding for RCV1

achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL

The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by

such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified

Computational Intelligence and Neuroscience 9

1000 5000 10000 20000 250000

05

1

15

2

25

Reuters 21578(a)

RCV11000 5000 10000 20000 25000

0

05

1

15

2

25

(b)

Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends

Table 2 Comparison of POSNERSRL on Reuters 21578

Precision Recall 1198651

Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630

Table 3 Comparison of POSNERSRL on RCV1

Precision Recall 1198651

Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646

which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]

6 Conclusions and Future Work

In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with

the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this manuscript

References

[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003

[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015

[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011

[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009

[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015

[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005

[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005

[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th

10 Computational Intelligence and Neuroscience

Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010

[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016

[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007

[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016

[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016

[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016

[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014

[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013

[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001

[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997

[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010

[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001

[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006

[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal

of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005

[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003

[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999

[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986

[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989

[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995

[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997

[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013

[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000

[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011

[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012

[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014

[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990

[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008

[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012

[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012

[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013

[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010

[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the

Computational Intelligence and Neuroscience 11

International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005

[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012

[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article Fracture Mechanics Method for Word

Computational Intelligence and Neuroscience 7

1000 5000

10000

minus40 minus20 0 20 40 60

minus40 minus20 0 20 40 60

minus20 0 20 40 60 80

100

80

60

40

20

0

100

80

60

40

20

0

60

40

20

0

minus20

minus40

1000

minusminusminusminusminusminusminusminusminusminusminusminusminusminusminus20222222222222222222222222222222 00000000000000000000000000000000000000000 20202220222222222202222022022202222 40 60 80

100

80

60

40

20

0

Figure 5 Force-directed embedding for Reuters 21578

and the assumption coincides with the reality roughly Weexperiment 50 times for the two datasets respectively Theresult is shown as the boxplots in Figure 7 From thedownwards trends of average velocity the assumption that aword-semantic networkwill be stabilized in a certain numberof similar documents coincides with the result of two datasetsroughly Both the convergences of the two datasetsrsquo trainingappear at the training stage around the 20000th documents

In the presented word embedding learning there is nota specific converging criterion for terminating the trainingbecause the object functions of these neural based modelsare nonconvex so there may not exist an optimum valuetheoretically Empirically these word embedding learningmethods require repeating documents 20 or 30 times [3] Itbrings a serious problem that time consumption is propor-tional to the number of documents Such procedure usuallyrequires undergoing a long-term training time [3 30] But inour proposed model we demonstrate that setting a semantic

convergence condition is more convenient to select thanthose neural based approaches The convergence criterionprovides amore explicit direction for word embedding learn-ing Meanwhile the result demonstrates that to learn anacceptable word embedding requiring a certain number ofdocuments small or medium scale datasets may not beappropriate

5 Reuters 21578 and RCV1

For testing the usability we compare the word embeddingwith other three word embeddings on the NLP tasks usingthe same type of neural linguistic model The testing itemsare performed on POS NER and SRL The results are listedin Tables 2 and 3 In Reuters 21578 the labeling system usingour word embedding obtains 910 on POS 834 on NERand 673 on SRL for 1198651 This 1198651 score gets the third place inPOS and the second place on both NER and SRL In RCV1 it

8 Computational Intelligence and Neuroscience

Everen bound

statementrequested

eastern

intial

strengthens

Therersquod

title

revive

Thompson

Conservative

12-13

gearedAdding

Range

declaration

Niagara

metal

tragiclisting

deals

US-basedpreception

InteriorFine-tuning

accomplishments

Reformrsquos

appropriate

uprevolt

uncertainty center

WTO

German

US

Russian

canceleddent

lively

testing

Burden

Beijing

42-2-2423-0003

owner

Figure 6 Force-directed embedding for RCV1

achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL

The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by

such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified

Computational Intelligence and Neuroscience 9

1000 5000 10000 20000 250000

05

1

15

2

25

Reuters 21578(a)

RCV11000 5000 10000 20000 25000

0

05

1

15

2

25

(b)

Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends

Table 2 Comparison of POSNERSRL on Reuters 21578

Precision Recall 1198651

Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630

Table 3 Comparison of POSNERSRL on RCV1

Precision Recall 1198651

Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646

which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]

6 Conclusions and Future Work

In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with

the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this manuscript

References

[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003

[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015

[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011

[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009

[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015

[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005

[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005

[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th

10 Computational Intelligence and Neuroscience

Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010

[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016

[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007

[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016

[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016

[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016

[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014

[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013

[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001

[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997

[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010

[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001

[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006

[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal

of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005

[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003

[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999

[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986

[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989

[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995

[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997

[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013

[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000

[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011

[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012

[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014

[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990

[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008

[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012

[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012

[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013

[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010

[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the

Computational Intelligence and Neuroscience 11

International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005

[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012

[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article Fracture Mechanics Method for Word

8 Computational Intelligence and Neuroscience

Everen bound

statementrequested

eastern

intial

strengthens

Therersquod

title

revive

Thompson

Conservative

12-13

gearedAdding

Range

declaration

Niagara

metal

tragiclisting

deals

US-basedpreception

InteriorFine-tuning

accomplishments

Reformrsquos

appropriate

uprevolt

uncertainty center

WTO

German

US

Russian

canceleddent

lively

testing

Burden

Beijing

42-2-2423-0003

owner

Figure 6 Force-directed embedding for RCV1

achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL

The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by

such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified

Computational Intelligence and Neuroscience 9

1000 5000 10000 20000 250000

05

1

15

2

25

Reuters 21578(a)

RCV11000 5000 10000 20000 25000

0

05

1

15

2

25

(b)

Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends

Table 2 Comparison of POSNERSRL on Reuters 21578

Precision Recall 1198651

Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630

Table 3 Comparison of POSNERSRL on RCV1

Precision Recall 1198651

Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646

which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]

6 Conclusions and Future Work

In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with

the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this manuscript

References

[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003

[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015

[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011

[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009

[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015

[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005

[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005

[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th

10 Computational Intelligence and Neuroscience

Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010

[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016

[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007

[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016

[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016

[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016

[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014

[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013

[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001

[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997

[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010

[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001

[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006

[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal

of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005

[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003

[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999

[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986

[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989

[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995

[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997

[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013

[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000

[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011

[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012

[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014

[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990

[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008

[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012

[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012

[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013

[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010

[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the

Computational Intelligence and Neuroscience 11

International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005

[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012

[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article Fracture Mechanics Method for Word

Computational Intelligence and Neuroscience 9

1000 5000 10000 20000 250000

05

1

15

2

25

Reuters 21578(a)

RCV11000 5000 10000 20000 25000

0

05

1

15

2

25

(b)

Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends

Table 2 Comparison of POSNERSRL on Reuters 21578

Precision Recall 1198651

Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630

Table 3 Comparison of POSNERSRL on RCV1

Precision Recall 1198651

Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646

which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]

6 Conclusions and Future Work

In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with

the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this manuscript

References

[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003

[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015

[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011

[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009

[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015

[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005

[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005

[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th

10 Computational Intelligence and Neuroscience

Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010

[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016

[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007

[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016

[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016

[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016

[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014

[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013

[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001

[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997

[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010

[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001

[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006

[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal

of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005

[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003

[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999

[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986

[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989

[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995

[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997

[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013

[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000

[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011

[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012

[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014

[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990

[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008

[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012

[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012

[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013

[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010

[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the

Computational Intelligence and Neuroscience 11

International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005

[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012

[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article Fracture Mechanics Method for Word

10 Computational Intelligence and Neuroscience

Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010

[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016

[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007

[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016

[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016

[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016

[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014

[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975

[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013

[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001

[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997

[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010

[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001

[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006

[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal

of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005

[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003

[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999

[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986

[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989

[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995

[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997

[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013

[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000

[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011

[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012

[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014

[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990

[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008

[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012

[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012

[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013

[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010

[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the

Computational Intelligence and Neuroscience 11

International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005

[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012

[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article Fracture Mechanics Method for Word

Computational Intelligence and Neuroscience 11

International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005

[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012

[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Research Article Fracture Mechanics Method for Word

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014