research article fracture mechanics method for word
Post on 09-Jun-2022
1 Views
Preview:
TRANSCRIPT
Research ArticleFracture Mechanics Method for Word Embedding Generation ofNeural Probabilistic Linguistic Model
Size Bi Xiao Liang and Ting-lei Huang
Institute of Electronics Chinese Academy of Sciences Beijing China
Correspondence should be addressed to Ting-lei Huang tlhuangmailieaccn
Received 3 January 2016 Revised 16 July 2016 Accepted 26 July 2016
Academic Editor Trong H Duong
Copyright copy 2016 Size Bi et al This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Word embedding a lexical vector representation generated via the neural linguistic model (NLM) is empirically demonstrated tobe appropriate for improvement of the performance of traditional language model However the supreme dimensionality that isinherent in NLM contributes to the problems of hyperparameters and long-time training in modeling Here we propose a force-directed method to improve such problems for simplifying the generation of word embedding In this framework each word isassumed as a point in the real world thus it can approximately simulate the physical movement following certain mechanicsTo simulate the variation of meaning in phrases we use the fracture mechanics to do the formation and breakdown of meaningcombined by a 2-gram word group With the experiments on the natural linguistic tasks of part-of-speech tagging named entityrecognition and semantic role labeling the result demonstrated that the 2-dimensional word embedding can rival the wordembeddings generated by classic NLMs in terms of accuracy recall and text visualization
1 Introduction
Word embedding is a word numerical representation thatuses a continuous vector space to represent a group of words[1] In the word vector space each word corresponds to aunique point Intuitively those points that have similarmean-ings should be put close while those who are distant inmean-ing should be put far away Based on the space the degreeof relation between words can be estimated via computingthe distance between vector points such as the Euclidian orMahalanobis distance Such semantic numeralization enablestextual and symbol data to be processed in the traditionalneural models For this reason more nature language pro-cessing (NLP) systems are improved with these neural net-work technologies [2] and enhance the performance of com-mon natural linguistic tasks [3] such as POS (part-of-speechtagging) chunking NER (named entity recognition) [4] andSRL (semantic role labeling) However the training for suchword embedding is time consuming and required sophisti-cated tuning for parameters of the model [5] because thesemodels have a huge number of parameters needed for train-ing which are high dimensional and sparse Moreover text
visualization for such high-dimensional data suffered the dis-tortion of information from the preprocessing in dimensionalreduction confusing the understanding of data relations [6]
The word vectorization has always been an active topicfor text mining and NLP Their generating approaches canbe roughly divided into two categories manual coding andautocoding Manual coding is to code words according tothe knowledge of domain experts It is a heavy work forconsidering the value of each word For example WordNet isa huge project whose aim is to build a word graph database bylanguage experts where word meanings are associated andpresented via the tree-based structural graph [7 8] Suchrepresentation can only associate a few ranges of words foreach word It is insufficient to build a global relation for allwords On the other hand autocoding is to code word byneural network models [9] Every word is initialized with arandom vector and varied following the parameter tuningaccording to a range of contexts around a word Generallysuch methods are performed by the training of NLM whereword embedding is a part of result when the convergenceof the objective function is finished [1] However the NLM-based approaches such as feedforward neural networks [1]
Hindawi Publishing CorporationComputational Intelligence and NeuroscienceVolume 2016 Article ID 3506261 11 pageshttpdxdoiorg10115520163506261
2 Computational Intelligence and Neuroscience
recurrent neural network (RNN) [8] and restricted Boltz-mann machine (RBM) [10] have also suffered from the highlearning complexity [11] sophisticated preferences [12] andthe curse of dimensionality [13]
In this article we present a novel method for wordembedding learning that can reduce the high dimensionalitywhich is inherent in the traditional NLMWe assume that thegeneration of word embedding can be viewed as a particlemovement on a plane Particles that are close represent thecorresponding words have the similar meaning whereas theparticles that are distant represent the corresponding wordsare far away inmeaning For simulating text semantics as cor-rectly as possible a fracturemechanicsmodel is presented forcontrolling the generating process of word embedding Weaim to provide an effective intuitive approach for learning a2-dimensional word vector which is applicable to the generalnatural language tasks In particular we omit the homonymyand polysemy for keeping the consistency of word represen-tation With this context each word corresponds to a singlevectorThe generatingmodel that is based onneural networksis substituted by a particle system which can simulate thecorrelating process of semanticity between words
Our specific contributions are as follows
(i) We propose a force-directed model that is based onfracture mechanics to generate word embedding Alinear elastic fracture model is introduced to controlthe varying progress of word semantic
(ii) We use a language model that is based on the feedfor-ward NLM to experiment with the word embeddingon the task of POS andNER and SRL where the wordembedding is the input of the language model
(iii) The coordinates of the 2-dimensional word embed-ding can be used for word visualization that facilitatesobserving the degree of relation among words intu-itively
The next section describes the related work regarding wordnumerical representation Section 3 introduces our method-ology Sections 4 and 5 give the result and discussionSection 6 gives a conclusion and some possible works infuture
2 Background
21 Text Representation by Word Embedding Choosing anappropriate data representation can facilitate the performingof a machine learning algorithm The related methods havedeveloped at the level of automatic features selection accord-ing to the requirement of applications [14] As a branch ofmachine learning representation learning [15] has graduallybecome active in some famous communities especially toinvestigate knowledge extraction from some raw datasetsText representation in natural linguistic tasks can be dividedinto the three levels of corpus [16 17] paragraph [18 19]and word [20ndash22] This paper focuses on the representationlearning for words Feature words and context have beenconsidered as the foundation for text representation learningand the constructing of a NLP system [23ndash25] We follow
this direction aiming at mapping word text to a vector spaceCompared with the representing level of paragraph andcorpus word is the fewer granularities of semantics that ismore suitable to be analyzed by a vector distance
The idea that uses a vector space to map word meaningis proposed from [26] initially The earlier representationlearnings fail to consider the semantic measurement for thedifferences between words and emphasize how to quantizefeature words For example in [27] to represent the wordldquodoverdquo the first byte of the corresponding representationdenotes the property of its shape that is set to ldquo1rdquo if the dovesatisfies some conditionsThe representation learning did notfocus on the context feature but presents some approachesfor measuring the differences regarding word semantic Theself-organizing map (SOM) [27] is employed to computeword vector distance which uses the length to represent theneighboring degree of word meanings Based on SOM [28]high frequency cowords are mapped to a 90-dimensionalvector space The investigation [29 30] for SOM-based wordvector applies it in the fields of NLP and data mining
22 Generation of Word Embedding Word representationhas started to integrate some measuring approaches fortext semantic with the developing of neural networks inprobabilistic linguistic model Reference [31] proposed touse neural network to build a language model There exitsan assumption that the numerical result of those similarmeaning words should be put closely whereas the resultregarding those distant meaning words should be put faraway Meanwhile the probability function is introduced toadd the probability output for NLM which can give a statis-tics result for estimating a given 119899-gram words combinationIn the proposedmodels a convoluted neural network [3] thatis tuned by hand somewhat after learning obtains an accuracyof 9720 in POS 9365 in chunking 8867 in NER and7415 in SRL A comparison between conditional restrictedBoltzmann machine models and support vector machines[32] is performed for a music annotator system which isbased on the context around lyric that can use cooccurrenceand sequential features to improve the accuracy of labelingThe SENNA software [3 33] performs a word sense disam-biguation with an accuracy of 7235Word representation isdenoted by multiple vectors [4] to express the polysemy thatis tested onNERwhich shows that it rivals the traditional lin-guistic models In addition word representation learning hasbeen trained based on the 119899-gram models hidden Markovmodels and partial lattice Markov random field model [34]
For representation learning by NLM [1] proposes a feed-forward neural network to train the word embedding whichis regarded as the internal parameters requiring the tuningfollowing the object function Reference [35] presents a textsequence learningmodel that is called RNN which is capableof capturing local context feature in sequence of interestFollowing this route more machine learning algorithms areintroduced to improve the weaknesses of natural linguistictasks Reference [10] uses the restricted Boltzmann machineto improve the performance of the sequential probabilityfunction Reference [36] creates a hierarchical learning that
Computational Intelligence and Neuroscience 3
Table 1 Properties of word-particle
Name NotesID IdentifierPos Coordinates (word embedding)Word Corresponding wordMass Synthetic alivenessTmpMass All backward related particle alivenesshistory Mass Historical alivenessCurrent Mass Current alivenessBase Mass IdentifierChain Backward related indexMax flaw Backward related degreeFlaw Current flaw lengthRadius Repulsion radiusPull Radius Tension-start radiusVelocity Current velocity of particle
represents the semantic relation among words as a tree struc-ture The deep learning mechanism is tried to build a NLM[37]
23 Force Simulated Method The force-directed algorithmsare mainly applied in data visualization Reference [38] com-pares several force-directed algorithms Reference [39] usesthesemethods for analyzing a complex social network whichadds a gravity to draw the graph of social network In somecross-domain applications wireless sensors network uses it tobuild layouts [40] and [41] performs electrical circuit layoutsautomatically based on the force rules
The reason why we use the force simulated approachto improve the generation of word embedding is that therelation between words semantic is relatively stable within acertain numbers of documents that depict the similar topicThis is somewhat like the convergence of the stable energystatus in the force-directed algorithm We are inspired it touse the idea of the force-related methods to improve theproblems of NLM
3 Methodology
31 Word-Particle Model To map word into a particlessystem we must define a mapping rule that specifies whichattribute of particle corresponds to which feature of wordsemantics The linking table is shown in Table 1 The tableconsists of two parts where the left part designates the namesof each property for particles and the right part gives theexplanation for the corresponding semantic feature
For more explanation we explain them one by one
(i) ID each word has a unique identifier for relating thecorresponding particle
(ii) Pos it is exactly the attribute that we want to trainwhich is also called word embedding that denotes thecoordinate of a particle in a plane actually
Hello
Boy
Bye
World
Figure 1 An example of the word-particle model
(iii) Mass we use the concept of particle mass to denotethe occurrence frequency of a word
(iv) TmpMass we use the temporal mass of particle torepresent the coword frequency
(v) Current Mass amp history Mass amp base Mass theyrepresent the current mass historical mass and basicmass of a particle respectively A function combinedwith them that is used to control the occurrencefrequency of coword within a sampling period isdescribed as
Mass = base Mass +history Mass2
+
Current Mass2
(1)
For controlling the intensity of relevance betweenwords we use the concept of edge in physic todescribe it
(vi) Chain it is an array for recording the ID of thebackward relating word
(vii) Max flaw it is the maximum associating strengthregarding a group of coword
(viii) Flaw it describes the associating strength regarding agroup of cowords
(ix) Radius it is the repulsion radius that keeps a mini-mum distance from other word-particles
(x) Pull Radius it is the pulling radius which means ifother word-particles break in they will be pushedaway keeping the radius distance from the intrudedword-particle
(xi) Velocity it defines a semantic shifting trend that canstrengthen the relation of two force-affected words
In Figure 1 the word-particles world and boy are back-ward related to hello and the word-particle bye is an isolatedword-particle The intensity of relation between the word-particles hello and world is stronger than the intensity ofrelation between hello and boy The directed line denotes thedirection ofword order where the pointedword-particles candrive their linking word-particles For example the cowordhello world appearsmore frequent than the coword hello boy
4 Computational Intelligence and Neuroscience
The BombTank By JDAMInput sentence
Objective word particle Backward relatedword particle
Tank
State 2
State 3
State 4
State 1Bomb
Tank Bomb
f
Tank Bomb
Fire
Plane Pilot
Air
TankBomb
Bomb
tt minus 1
v
v
v
998400
F
Figure 2 The semantic relating rule
32 Semantic Relation Building Based on the word-particlemodel we define the semantic relating rule to control themotion of particles within a given text context The doc-uments for training play the role of a driven-force sourcemaking the words have more opportunities to come togetherwhich appear in similar contextsThe procedure is as follows
Step 1 Theword embedding is trained by the document Eachdocument will be sampled sentence by sentence via a 2-gramwindow In the 2-gram window the first word is assumedas the target object and the second word is assumed as theassociated object The assumption means that the associatedword-particle will be forced tomove towards the target word-particle The related word-particle will be given an impulseThis can drive the word-particle with a certain velocity Theprogress is illustrated as state 1 in Figure 2 The word-particlebomb is associated with tank moving with the velocity V
Step 2 Given an impulse the word-particle can be initializedwith a velocityMeanwhile it will be slowed down by the forcethat comes from friction until its velocity reduces to zero anddoes not get in the repulsion radius of its objective word-particle When the word-particle moves into the repulsionradius of the objective word-particle it will be stopped at theedge keeping a distance of repulsion radius from the objectiveword-particle This is shown as state 2 A velocity affects theword-particle tank and the word-particle bomb is affectedcontinuously by the friction force 119891
Step 3 During a certain period of document learning someword-particles will set up some relations with other word-particlesWe establish a chain reacting rule for simulating thecontext feature The rule specifies the impulsion transition
in the way of particle by particle and the initial energy willdegrade at each reaction This passive action simulates thephenomenon that a topic word in a document has moresemantics and can be an index for document retrieval Theprogress is controlled by (2) Given 119898
0denotes the property
Mass of the impactedword-particle and119898119894denotes this prop-
erty of other word-particles The relation-building conditionis
119894 isin Chain
119889119894gt Pull Radius
(2)
where 119894 denotes the ID of the 119894th word-particle that relates tothe object word-particle and 119889
119894denotes the corresponding
distance of the 119894th word-particle between the object word-particles The velocity V
119905minus1will update V
119905via (3) if the word-
particle satisfies the conditionThis procedurewill repeat iter-atively till the velocity falls to zero For example in state 3 theword-particle bomb has two backward associating particlesfire and plane Its initial velocity will be decomposed withplane according to (3) if given an impulsion towards tankBut the word-particle fire fails to move because it is outsideof the Pull Radius distance of bombThe decomposition willbe delivered repetitively if the velocity fits the condition andis greater than zero
1198980V119905minus1= (
119896
sum
119894isinChain 119889119894gtPull Radius119898119894+ 1198980) V119905 (3)
Step 4 We add a repulsion radius for keeping the uniquenessof every word-particle because each word embedding isunique When the moving word-particle intrudes the repul-sion radius of other particles it will stop and stay at theedge of the affectedword-particles keeping a repulsion radiusdistanceThe progress is shown as state 4 Generally the wordrelation network is expected growing stably we present aninspecting criterion to check the convergence which is asfollows
V119905=
1198980V119905minus1
lim119896rarrinfinsum119896
119894isinChain 119889119894gtPull Radius119898119894 + 1198980997888rarr 0 (4)
In (4) the initial velocity will trend to zero with the relationincreasing of the number of associated word-particles thatis V119905rarr 0 When the movement of an activated particle
becomes tiny such convergence means the property Poshas reached relatively fixed coordinates This indicates thatthe word already has situated in a relatively stable semanticnetwork
33 Semantic Relation Separating For controlling the growthof words association those words that are of low frequency in2-gram context should be filtered whereas those words withhigh frequency of relations should be retained We proposeto use a linear elastic fracture mechanics to control suchfiltering A rope model is presented to represent the cowordrelation which can be assumed as a flaw that comes from atype ofmaterialThe strengthening or weakening of a relation
Computational Intelligence and Neuroscience 5
s
s
W
2a
Bomb
RelationTank
Figure 3 The flaw model
between words is controlled via the corresponding flaw sizeAn illustration is shown in Figure 3
More formally given 119882 denotes the width of rope itsvalue is obtained through the counting of a 2-gram samplingIts maximum value corresponds to the property Max flawGiven 2119886 denotes the size of a flaw corresponding to theproperty Flaw Given 119904 is the pull force that is calculated by119904 = Mass times Velocity We use the stress-intensity factor 119870 tocontrol the size of a flaw 119870 can be obtained as follows
119870 = 119898relationVradic1205871198862119886
119882
(5)
In (5) the variant 119898relation corresponds to the propertyregarding the synthetically occurring frequency of a word-particle and V corresponds to the velocity of an activatedword-particle The value of 119870 is in proportion to the size of2119886 which refers to the concept of flaw Moreover the flawextending speed is denoted by 119889119886119889119873 as
lg 119889119886119889119873
= lg119862 + 119899 lgΔ119870 (6)
In (6) lg119862 denotes a compensation constant and 119899 is ascale factor 119889119886119889119873 is in proportion of 119870 The condition isthat 119870 will decrease if119882 goes beyond 2119886 When the size offlaw is up to 119882 a separation will happen in the semanticrelation This means the associated word-particles are nolonger affected by the initial impulses which are generatedfrom their objective word-particles
4 Simulations
In this section we compare the proposed word embeddingwith the three classic NLM-based word embeddings Huang2012 [42] CampW [3] andMampS-Turian [20] Huang 2012 is theword embedding that uses multiple embeddings per wordCampW is the embedding that is trained by a feedforwardneural network and MampS-Turian is the embedding that isobtained by an improved RBM training model The twodatasets Reuters 21578 and RCV1 are used for training andevaluation In Reuters 21578 we extracted 21578 documentsfrom the raw XML format and discarded the original classlabels and titles only using the description sectionTheRCV1contains 5000 documents that are written from 50 authors70 percent of random sampling among these documentswas used to train the word embedding and the remainder
was used to evaluate the NLP tasks of POS NER and SRLwith other three types of embedding All words will keeptheir original forms whereas the numbers symbols and stopwords are kept to be trained together Those words that arenot included in training corpus will be discarded We regardthese tasks as a classification problem and apply a unifiedfeedforward neural linguistic model to perform the tasksThe compared word embeddings are readymade but theparameters of the neural networks require to be trained bythe corresponding embedding We use the benchmark thatis provided by [3] The results regarding the NLP tasks arecompared based on this benchmark
41 Evaluation The benchmark is measured in terms ofprecision recall and 1198651 [3 20 42] Assuming 119873 words arewaiting for labeling there exit 1198731 | 1198731 le 119873 words thatare labeled correctly and 1198732 | 1198732 le 119873 words that arelabeled wrong The value of precision is used to evaluate theaccuracy of labeling on POS NER and SRL The precisioncan be obtained as
119901 =
1198731
1198731+ 1198732
(7)
The value of recall is used to evaluate the coverage oflabeling on POS NER and SRL The recall can be calculatedas
119903 =
1198731
119873
(8)
The 1198651 is a combining evaluation with precision andrecall which are as follows
1198651 =
2119901119903
119901 + 119903
(9)
42 Training The parameters of the physical system are setas followsThe coefficient of fiction is set to 01 the coefficientof gravity is set to 98 and the initial velocity is set to 2 Theparameters that control semantic separating are set such thatMax flaw is set to 02 the initial value of Flaw is 0 Radiusis set to 1 and Pull Raduis is set to 20 For controlling theflaw extending speed lg119862 is set to 1 and 119899 is set to 1 Wedemonstrate the training result in terms of the generatingprocedure of word graph and average speed of word-particles A word graph can give an intuitive visualization forobserving a group of word relations We test a small numberof datasets to simulate the word embedding generation Theresult is shown in Figure 4 which contains the names from 19countries
In Figure 4(a) all the words appear in the plain physicalsystem and obtain a certain position because the trainingdocuments had given some forces to direct the word arrang-ing that follows the context in original text But in this stagesome word-particles still have a certain degree of speed tomove Those frequent word-particles such as China USAGermany and France have a relatively high speed they movepulling their backward relating word-particles (Figures 4(b)ndash4(d)) For example China has four backward relating word-particles Pakistan India USA and UK Germany has two
6 Computational Intelligence and Neuroscience
IndiaPakistan
Germany
SwedenThailand
CanadaMexico
Hungary
NewZelandPoland
France
USAUKFinland
Russia
Ukraine
Singapore
Austrilia
China
(a)
IndiaPakistan
Germany
Sweden
Thailand
Canada
Mexico
Hungary
NewZeland
PolandFranceUSA
UK
Finland
Russia
UkraineSingapore
Austrilia
China
(b)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA
UK
Finland
Russia
UkraineSingapore
AustriliaChina
(c)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA
UK
Finland
Russia
UkraineSingapore
Austrilia
China
(d)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA UK
Finland
Russia
UkraineSingapore
AustriliaChina
(e)
IndiaPakistan
Germany
Sweden
Thailand
Canada
Mexico
Hungary
NewZeland
PolandFrance
USA UK
Finland
Russia
UkraineSingapore
AustriliaChina
(f)
Figure 4 Force-directed embedding for country names
backward relating word-particles France and Finland Theother isolated word-particles have situated in a relativelystable position with a few movements We can see that someword-particles overlay with each other (Figure 4(d)) forexample India and Pakistan are too close at China andCanada overlays USA The repulsing force starts to functionat this timeThe too close word-particles will push each otheruntil they reach a balance distance (Figure 4(e)) When theinputting documents are all about similar topics the positionsof word-particles will also not vary too much showing arelatively stable topological graph (Figure 4(f))
The training result of dataset Reuters 21578 is shownin Figure 5 Each word-particle is colored with a greenblock The intensity of the relation between word-particlesis represented by a blue line where the thicker lines meana higher frequency of coword relation The position of eachword-particle is a 2-dimensional coordinate that is exactlythe training result word embeddingThe result shows that thenumbers of word relations and new word-particles will grow
with the training which iterates from 1000 to 10000 Theparticles system expands the region outward gradually Theparticle distribution presents as an ellipse for accommodatingmore new words-particles
The training result of RCV1 is shown in Figure 6 Suchdistance-based semantic measurement can be interpretedfrom some viewpoints For example the country and geo-graphical word-particles German Russian US and Bei-jing are clustered together The geo-related word-particlesNiagara WTO and US-based are pushed closely to thesewords Such word-particle graph can present an intuitiveresult for evaluating the training of word embedding nomatter during the training or after training we can intervenethe position of word-particle to improve the final result in adata-visualization based way
On the other hand we use (3) to estimate the averagevelocity of word-particles for evaluating the training processfrom a statistics viewpoint When the velocity decreasesto 1 below the convergence is assumed to be happening
Computational Intelligence and Neuroscience 7
1000 5000
10000
minus40 minus20 0 20 40 60
minus40 minus20 0 20 40 60
minus20 0 20 40 60 80
100
80
60
40
20
0
100
80
60
40
20
0
60
40
20
0
minus20
minus40
1000
minusminusminusminusminusminusminusminusminusminusminusminusminusminusminus20222222222222222222222222222222 00000000000000000000000000000000000000000 20202220222222222202222022022202222 40 60 80
100
80
60
40
20
0
Figure 5 Force-directed embedding for Reuters 21578
and the assumption coincides with the reality roughly Weexperiment 50 times for the two datasets respectively Theresult is shown as the boxplots in Figure 7 From thedownwards trends of average velocity the assumption that aword-semantic networkwill be stabilized in a certain numberof similar documents coincides with the result of two datasetsroughly Both the convergences of the two datasetsrsquo trainingappear at the training stage around the 20000th documents
In the presented word embedding learning there is nota specific converging criterion for terminating the trainingbecause the object functions of these neural based modelsare nonconvex so there may not exist an optimum valuetheoretically Empirically these word embedding learningmethods require repeating documents 20 or 30 times [3] Itbrings a serious problem that time consumption is propor-tional to the number of documents Such procedure usuallyrequires undergoing a long-term training time [3 30] But inour proposed model we demonstrate that setting a semantic
convergence condition is more convenient to select thanthose neural based approaches The convergence criterionprovides amore explicit direction for word embedding learn-ing Meanwhile the result demonstrates that to learn anacceptable word embedding requiring a certain number ofdocuments small or medium scale datasets may not beappropriate
5 Reuters 21578 and RCV1
For testing the usability we compare the word embeddingwith other three word embeddings on the NLP tasks usingthe same type of neural linguistic model The testing itemsare performed on POS NER and SRL The results are listedin Tables 2 and 3 In Reuters 21578 the labeling system usingour word embedding obtains 910 on POS 834 on NERand 673 on SRL for 1198651 This 1198651 score gets the third place inPOS and the second place on both NER and SRL In RCV1 it
8 Computational Intelligence and Neuroscience
Everen bound
statementrequested
eastern
intial
strengthens
Therersquod
title
revive
Thompson
Conservative
12-13
gearedAdding
Range
declaration
Niagara
metal
tragiclisting
deals
US-basedpreception
InteriorFine-tuning
accomplishments
Reformrsquos
appropriate
uprevolt
uncertainty center
WTO
German
US
Russian
canceleddent
lively
testing
Burden
Beijing
42-2-2423-0003
owner
Figure 6 Force-directed embedding for RCV1
achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL
The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by
such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified
Computational Intelligence and Neuroscience 9
1000 5000 10000 20000 250000
05
1
15
2
25
Reuters 21578(a)
RCV11000 5000 10000 20000 25000
0
05
1
15
2
25
(b)
Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends
Table 2 Comparison of POSNERSRL on Reuters 21578
Precision Recall 1198651
Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630
Table 3 Comparison of POSNERSRL on RCV1
Precision Recall 1198651
Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646
which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]
6 Conclusions and Future Work
In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with
the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi
Competing Interests
The authors declare that there is no conflict of interestsregarding the publication of this manuscript
References
[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003
[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015
[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011
[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009
[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015
[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005
[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005
[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th
10 Computational Intelligence and Neuroscience
Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010
[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016
[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007
[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016
[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016
[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016
[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014
[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013
[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975
[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013
[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001
[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997
[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010
[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001
[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006
[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal
of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005
[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003
[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999
[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986
[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989
[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995
[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997
[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013
[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000
[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011
[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012
[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014
[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990
[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008
[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012
[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012
[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013
[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010
[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the
Computational Intelligence and Neuroscience 11
International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005
[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012
[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
2 Computational Intelligence and Neuroscience
recurrent neural network (RNN) [8] and restricted Boltz-mann machine (RBM) [10] have also suffered from the highlearning complexity [11] sophisticated preferences [12] andthe curse of dimensionality [13]
In this article we present a novel method for wordembedding learning that can reduce the high dimensionalitywhich is inherent in the traditional NLMWe assume that thegeneration of word embedding can be viewed as a particlemovement on a plane Particles that are close represent thecorresponding words have the similar meaning whereas theparticles that are distant represent the corresponding wordsare far away inmeaning For simulating text semantics as cor-rectly as possible a fracturemechanicsmodel is presented forcontrolling the generating process of word embedding Weaim to provide an effective intuitive approach for learning a2-dimensional word vector which is applicable to the generalnatural language tasks In particular we omit the homonymyand polysemy for keeping the consistency of word represen-tation With this context each word corresponds to a singlevectorThe generatingmodel that is based onneural networksis substituted by a particle system which can simulate thecorrelating process of semanticity between words
Our specific contributions are as follows
(i) We propose a force-directed model that is based onfracture mechanics to generate word embedding Alinear elastic fracture model is introduced to controlthe varying progress of word semantic
(ii) We use a language model that is based on the feedfor-ward NLM to experiment with the word embeddingon the task of POS andNER and SRL where the wordembedding is the input of the language model
(iii) The coordinates of the 2-dimensional word embed-ding can be used for word visualization that facilitatesobserving the degree of relation among words intu-itively
The next section describes the related work regarding wordnumerical representation Section 3 introduces our method-ology Sections 4 and 5 give the result and discussionSection 6 gives a conclusion and some possible works infuture
2 Background
21 Text Representation by Word Embedding Choosing anappropriate data representation can facilitate the performingof a machine learning algorithm The related methods havedeveloped at the level of automatic features selection accord-ing to the requirement of applications [14] As a branch ofmachine learning representation learning [15] has graduallybecome active in some famous communities especially toinvestigate knowledge extraction from some raw datasetsText representation in natural linguistic tasks can be dividedinto the three levels of corpus [16 17] paragraph [18 19]and word [20ndash22] This paper focuses on the representationlearning for words Feature words and context have beenconsidered as the foundation for text representation learningand the constructing of a NLP system [23ndash25] We follow
this direction aiming at mapping word text to a vector spaceCompared with the representing level of paragraph andcorpus word is the fewer granularities of semantics that ismore suitable to be analyzed by a vector distance
The idea that uses a vector space to map word meaningis proposed from [26] initially The earlier representationlearnings fail to consider the semantic measurement for thedifferences between words and emphasize how to quantizefeature words For example in [27] to represent the wordldquodoverdquo the first byte of the corresponding representationdenotes the property of its shape that is set to ldquo1rdquo if the dovesatisfies some conditionsThe representation learning did notfocus on the context feature but presents some approachesfor measuring the differences regarding word semantic Theself-organizing map (SOM) [27] is employed to computeword vector distance which uses the length to represent theneighboring degree of word meanings Based on SOM [28]high frequency cowords are mapped to a 90-dimensionalvector space The investigation [29 30] for SOM-based wordvector applies it in the fields of NLP and data mining
22 Generation of Word Embedding Word representationhas started to integrate some measuring approaches fortext semantic with the developing of neural networks inprobabilistic linguistic model Reference [31] proposed touse neural network to build a language model There exitsan assumption that the numerical result of those similarmeaning words should be put closely whereas the resultregarding those distant meaning words should be put faraway Meanwhile the probability function is introduced toadd the probability output for NLM which can give a statis-tics result for estimating a given 119899-gram words combinationIn the proposedmodels a convoluted neural network [3] thatis tuned by hand somewhat after learning obtains an accuracyof 9720 in POS 9365 in chunking 8867 in NER and7415 in SRL A comparison between conditional restrictedBoltzmann machine models and support vector machines[32] is performed for a music annotator system which isbased on the context around lyric that can use cooccurrenceand sequential features to improve the accuracy of labelingThe SENNA software [3 33] performs a word sense disam-biguation with an accuracy of 7235Word representation isdenoted by multiple vectors [4] to express the polysemy thatis tested onNERwhich shows that it rivals the traditional lin-guistic models In addition word representation learning hasbeen trained based on the 119899-gram models hidden Markovmodels and partial lattice Markov random field model [34]
For representation learning by NLM [1] proposes a feed-forward neural network to train the word embedding whichis regarded as the internal parameters requiring the tuningfollowing the object function Reference [35] presents a textsequence learningmodel that is called RNN which is capableof capturing local context feature in sequence of interestFollowing this route more machine learning algorithms areintroduced to improve the weaknesses of natural linguistictasks Reference [10] uses the restricted Boltzmann machineto improve the performance of the sequential probabilityfunction Reference [36] creates a hierarchical learning that
Computational Intelligence and Neuroscience 3
Table 1 Properties of word-particle
Name NotesID IdentifierPos Coordinates (word embedding)Word Corresponding wordMass Synthetic alivenessTmpMass All backward related particle alivenesshistory Mass Historical alivenessCurrent Mass Current alivenessBase Mass IdentifierChain Backward related indexMax flaw Backward related degreeFlaw Current flaw lengthRadius Repulsion radiusPull Radius Tension-start radiusVelocity Current velocity of particle
represents the semantic relation among words as a tree struc-ture The deep learning mechanism is tried to build a NLM[37]
23 Force Simulated Method The force-directed algorithmsare mainly applied in data visualization Reference [38] com-pares several force-directed algorithms Reference [39] usesthesemethods for analyzing a complex social network whichadds a gravity to draw the graph of social network In somecross-domain applications wireless sensors network uses it tobuild layouts [40] and [41] performs electrical circuit layoutsautomatically based on the force rules
The reason why we use the force simulated approachto improve the generation of word embedding is that therelation between words semantic is relatively stable within acertain numbers of documents that depict the similar topicThis is somewhat like the convergence of the stable energystatus in the force-directed algorithm We are inspired it touse the idea of the force-related methods to improve theproblems of NLM
3 Methodology
31 Word-Particle Model To map word into a particlessystem we must define a mapping rule that specifies whichattribute of particle corresponds to which feature of wordsemantics The linking table is shown in Table 1 The tableconsists of two parts where the left part designates the namesof each property for particles and the right part gives theexplanation for the corresponding semantic feature
For more explanation we explain them one by one
(i) ID each word has a unique identifier for relating thecorresponding particle
(ii) Pos it is exactly the attribute that we want to trainwhich is also called word embedding that denotes thecoordinate of a particle in a plane actually
Hello
Boy
Bye
World
Figure 1 An example of the word-particle model
(iii) Mass we use the concept of particle mass to denotethe occurrence frequency of a word
(iv) TmpMass we use the temporal mass of particle torepresent the coword frequency
(v) Current Mass amp history Mass amp base Mass theyrepresent the current mass historical mass and basicmass of a particle respectively A function combinedwith them that is used to control the occurrencefrequency of coword within a sampling period isdescribed as
Mass = base Mass +history Mass2
+
Current Mass2
(1)
For controlling the intensity of relevance betweenwords we use the concept of edge in physic todescribe it
(vi) Chain it is an array for recording the ID of thebackward relating word
(vii) Max flaw it is the maximum associating strengthregarding a group of coword
(viii) Flaw it describes the associating strength regarding agroup of cowords
(ix) Radius it is the repulsion radius that keeps a mini-mum distance from other word-particles
(x) Pull Radius it is the pulling radius which means ifother word-particles break in they will be pushedaway keeping the radius distance from the intrudedword-particle
(xi) Velocity it defines a semantic shifting trend that canstrengthen the relation of two force-affected words
In Figure 1 the word-particles world and boy are back-ward related to hello and the word-particle bye is an isolatedword-particle The intensity of relation between the word-particles hello and world is stronger than the intensity ofrelation between hello and boy The directed line denotes thedirection ofword order where the pointedword-particles candrive their linking word-particles For example the cowordhello world appearsmore frequent than the coword hello boy
4 Computational Intelligence and Neuroscience
The BombTank By JDAMInput sentence
Objective word particle Backward relatedword particle
Tank
State 2
State 3
State 4
State 1Bomb
Tank Bomb
f
Tank Bomb
Fire
Plane Pilot
Air
TankBomb
Bomb
tt minus 1
v
v
v
998400
F
Figure 2 The semantic relating rule
32 Semantic Relation Building Based on the word-particlemodel we define the semantic relating rule to control themotion of particles within a given text context The doc-uments for training play the role of a driven-force sourcemaking the words have more opportunities to come togetherwhich appear in similar contextsThe procedure is as follows
Step 1 Theword embedding is trained by the document Eachdocument will be sampled sentence by sentence via a 2-gramwindow In the 2-gram window the first word is assumedas the target object and the second word is assumed as theassociated object The assumption means that the associatedword-particle will be forced tomove towards the target word-particle The related word-particle will be given an impulseThis can drive the word-particle with a certain velocity Theprogress is illustrated as state 1 in Figure 2 The word-particlebomb is associated with tank moving with the velocity V
Step 2 Given an impulse the word-particle can be initializedwith a velocityMeanwhile it will be slowed down by the forcethat comes from friction until its velocity reduces to zero anddoes not get in the repulsion radius of its objective word-particle When the word-particle moves into the repulsionradius of the objective word-particle it will be stopped at theedge keeping a distance of repulsion radius from the objectiveword-particle This is shown as state 2 A velocity affects theword-particle tank and the word-particle bomb is affectedcontinuously by the friction force 119891
Step 3 During a certain period of document learning someword-particles will set up some relations with other word-particlesWe establish a chain reacting rule for simulating thecontext feature The rule specifies the impulsion transition
in the way of particle by particle and the initial energy willdegrade at each reaction This passive action simulates thephenomenon that a topic word in a document has moresemantics and can be an index for document retrieval Theprogress is controlled by (2) Given 119898
0denotes the property
Mass of the impactedword-particle and119898119894denotes this prop-
erty of other word-particles The relation-building conditionis
119894 isin Chain
119889119894gt Pull Radius
(2)
where 119894 denotes the ID of the 119894th word-particle that relates tothe object word-particle and 119889
119894denotes the corresponding
distance of the 119894th word-particle between the object word-particles The velocity V
119905minus1will update V
119905via (3) if the word-
particle satisfies the conditionThis procedurewill repeat iter-atively till the velocity falls to zero For example in state 3 theword-particle bomb has two backward associating particlesfire and plane Its initial velocity will be decomposed withplane according to (3) if given an impulsion towards tankBut the word-particle fire fails to move because it is outsideof the Pull Radius distance of bombThe decomposition willbe delivered repetitively if the velocity fits the condition andis greater than zero
1198980V119905minus1= (
119896
sum
119894isinChain 119889119894gtPull Radius119898119894+ 1198980) V119905 (3)
Step 4 We add a repulsion radius for keeping the uniquenessof every word-particle because each word embedding isunique When the moving word-particle intrudes the repul-sion radius of other particles it will stop and stay at theedge of the affectedword-particles keeping a repulsion radiusdistanceThe progress is shown as state 4 Generally the wordrelation network is expected growing stably we present aninspecting criterion to check the convergence which is asfollows
V119905=
1198980V119905minus1
lim119896rarrinfinsum119896
119894isinChain 119889119894gtPull Radius119898119894 + 1198980997888rarr 0 (4)
In (4) the initial velocity will trend to zero with the relationincreasing of the number of associated word-particles thatis V119905rarr 0 When the movement of an activated particle
becomes tiny such convergence means the property Poshas reached relatively fixed coordinates This indicates thatthe word already has situated in a relatively stable semanticnetwork
33 Semantic Relation Separating For controlling the growthof words association those words that are of low frequency in2-gram context should be filtered whereas those words withhigh frequency of relations should be retained We proposeto use a linear elastic fracture mechanics to control suchfiltering A rope model is presented to represent the cowordrelation which can be assumed as a flaw that comes from atype ofmaterialThe strengthening or weakening of a relation
Computational Intelligence and Neuroscience 5
s
s
W
2a
Bomb
RelationTank
Figure 3 The flaw model
between words is controlled via the corresponding flaw sizeAn illustration is shown in Figure 3
More formally given 119882 denotes the width of rope itsvalue is obtained through the counting of a 2-gram samplingIts maximum value corresponds to the property Max flawGiven 2119886 denotes the size of a flaw corresponding to theproperty Flaw Given 119904 is the pull force that is calculated by119904 = Mass times Velocity We use the stress-intensity factor 119870 tocontrol the size of a flaw 119870 can be obtained as follows
119870 = 119898relationVradic1205871198862119886
119882
(5)
In (5) the variant 119898relation corresponds to the propertyregarding the synthetically occurring frequency of a word-particle and V corresponds to the velocity of an activatedword-particle The value of 119870 is in proportion to the size of2119886 which refers to the concept of flaw Moreover the flawextending speed is denoted by 119889119886119889119873 as
lg 119889119886119889119873
= lg119862 + 119899 lgΔ119870 (6)
In (6) lg119862 denotes a compensation constant and 119899 is ascale factor 119889119886119889119873 is in proportion of 119870 The condition isthat 119870 will decrease if119882 goes beyond 2119886 When the size offlaw is up to 119882 a separation will happen in the semanticrelation This means the associated word-particles are nolonger affected by the initial impulses which are generatedfrom their objective word-particles
4 Simulations
In this section we compare the proposed word embeddingwith the three classic NLM-based word embeddings Huang2012 [42] CampW [3] andMampS-Turian [20] Huang 2012 is theword embedding that uses multiple embeddings per wordCampW is the embedding that is trained by a feedforwardneural network and MampS-Turian is the embedding that isobtained by an improved RBM training model The twodatasets Reuters 21578 and RCV1 are used for training andevaluation In Reuters 21578 we extracted 21578 documentsfrom the raw XML format and discarded the original classlabels and titles only using the description sectionTheRCV1contains 5000 documents that are written from 50 authors70 percent of random sampling among these documentswas used to train the word embedding and the remainder
was used to evaluate the NLP tasks of POS NER and SRLwith other three types of embedding All words will keeptheir original forms whereas the numbers symbols and stopwords are kept to be trained together Those words that arenot included in training corpus will be discarded We regardthese tasks as a classification problem and apply a unifiedfeedforward neural linguistic model to perform the tasksThe compared word embeddings are readymade but theparameters of the neural networks require to be trained bythe corresponding embedding We use the benchmark thatis provided by [3] The results regarding the NLP tasks arecompared based on this benchmark
41 Evaluation The benchmark is measured in terms ofprecision recall and 1198651 [3 20 42] Assuming 119873 words arewaiting for labeling there exit 1198731 | 1198731 le 119873 words thatare labeled correctly and 1198732 | 1198732 le 119873 words that arelabeled wrong The value of precision is used to evaluate theaccuracy of labeling on POS NER and SRL The precisioncan be obtained as
119901 =
1198731
1198731+ 1198732
(7)
The value of recall is used to evaluate the coverage oflabeling on POS NER and SRL The recall can be calculatedas
119903 =
1198731
119873
(8)
The 1198651 is a combining evaluation with precision andrecall which are as follows
1198651 =
2119901119903
119901 + 119903
(9)
42 Training The parameters of the physical system are setas followsThe coefficient of fiction is set to 01 the coefficientof gravity is set to 98 and the initial velocity is set to 2 Theparameters that control semantic separating are set such thatMax flaw is set to 02 the initial value of Flaw is 0 Radiusis set to 1 and Pull Raduis is set to 20 For controlling theflaw extending speed lg119862 is set to 1 and 119899 is set to 1 Wedemonstrate the training result in terms of the generatingprocedure of word graph and average speed of word-particles A word graph can give an intuitive visualization forobserving a group of word relations We test a small numberof datasets to simulate the word embedding generation Theresult is shown in Figure 4 which contains the names from 19countries
In Figure 4(a) all the words appear in the plain physicalsystem and obtain a certain position because the trainingdocuments had given some forces to direct the word arrang-ing that follows the context in original text But in this stagesome word-particles still have a certain degree of speed tomove Those frequent word-particles such as China USAGermany and France have a relatively high speed they movepulling their backward relating word-particles (Figures 4(b)ndash4(d)) For example China has four backward relating word-particles Pakistan India USA and UK Germany has two
6 Computational Intelligence and Neuroscience
IndiaPakistan
Germany
SwedenThailand
CanadaMexico
Hungary
NewZelandPoland
France
USAUKFinland
Russia
Ukraine
Singapore
Austrilia
China
(a)
IndiaPakistan
Germany
Sweden
Thailand
Canada
Mexico
Hungary
NewZeland
PolandFranceUSA
UK
Finland
Russia
UkraineSingapore
Austrilia
China
(b)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA
UK
Finland
Russia
UkraineSingapore
AustriliaChina
(c)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA
UK
Finland
Russia
UkraineSingapore
Austrilia
China
(d)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA UK
Finland
Russia
UkraineSingapore
AustriliaChina
(e)
IndiaPakistan
Germany
Sweden
Thailand
Canada
Mexico
Hungary
NewZeland
PolandFrance
USA UK
Finland
Russia
UkraineSingapore
AustriliaChina
(f)
Figure 4 Force-directed embedding for country names
backward relating word-particles France and Finland Theother isolated word-particles have situated in a relativelystable position with a few movements We can see that someword-particles overlay with each other (Figure 4(d)) forexample India and Pakistan are too close at China andCanada overlays USA The repulsing force starts to functionat this timeThe too close word-particles will push each otheruntil they reach a balance distance (Figure 4(e)) When theinputting documents are all about similar topics the positionsof word-particles will also not vary too much showing arelatively stable topological graph (Figure 4(f))
The training result of dataset Reuters 21578 is shownin Figure 5 Each word-particle is colored with a greenblock The intensity of the relation between word-particlesis represented by a blue line where the thicker lines meana higher frequency of coword relation The position of eachword-particle is a 2-dimensional coordinate that is exactlythe training result word embeddingThe result shows that thenumbers of word relations and new word-particles will grow
with the training which iterates from 1000 to 10000 Theparticles system expands the region outward gradually Theparticle distribution presents as an ellipse for accommodatingmore new words-particles
The training result of RCV1 is shown in Figure 6 Suchdistance-based semantic measurement can be interpretedfrom some viewpoints For example the country and geo-graphical word-particles German Russian US and Bei-jing are clustered together The geo-related word-particlesNiagara WTO and US-based are pushed closely to thesewords Such word-particle graph can present an intuitiveresult for evaluating the training of word embedding nomatter during the training or after training we can intervenethe position of word-particle to improve the final result in adata-visualization based way
On the other hand we use (3) to estimate the averagevelocity of word-particles for evaluating the training processfrom a statistics viewpoint When the velocity decreasesto 1 below the convergence is assumed to be happening
Computational Intelligence and Neuroscience 7
1000 5000
10000
minus40 minus20 0 20 40 60
minus40 minus20 0 20 40 60
minus20 0 20 40 60 80
100
80
60
40
20
0
100
80
60
40
20
0
60
40
20
0
minus20
minus40
1000
minusminusminusminusminusminusminusminusminusminusminusminusminusminusminus20222222222222222222222222222222 00000000000000000000000000000000000000000 20202220222222222202222022022202222 40 60 80
100
80
60
40
20
0
Figure 5 Force-directed embedding for Reuters 21578
and the assumption coincides with the reality roughly Weexperiment 50 times for the two datasets respectively Theresult is shown as the boxplots in Figure 7 From thedownwards trends of average velocity the assumption that aword-semantic networkwill be stabilized in a certain numberof similar documents coincides with the result of two datasetsroughly Both the convergences of the two datasetsrsquo trainingappear at the training stage around the 20000th documents
In the presented word embedding learning there is nota specific converging criterion for terminating the trainingbecause the object functions of these neural based modelsare nonconvex so there may not exist an optimum valuetheoretically Empirically these word embedding learningmethods require repeating documents 20 or 30 times [3] Itbrings a serious problem that time consumption is propor-tional to the number of documents Such procedure usuallyrequires undergoing a long-term training time [3 30] But inour proposed model we demonstrate that setting a semantic
convergence condition is more convenient to select thanthose neural based approaches The convergence criterionprovides amore explicit direction for word embedding learn-ing Meanwhile the result demonstrates that to learn anacceptable word embedding requiring a certain number ofdocuments small or medium scale datasets may not beappropriate
5 Reuters 21578 and RCV1
For testing the usability we compare the word embeddingwith other three word embeddings on the NLP tasks usingthe same type of neural linguistic model The testing itemsare performed on POS NER and SRL The results are listedin Tables 2 and 3 In Reuters 21578 the labeling system usingour word embedding obtains 910 on POS 834 on NERand 673 on SRL for 1198651 This 1198651 score gets the third place inPOS and the second place on both NER and SRL In RCV1 it
8 Computational Intelligence and Neuroscience
Everen bound
statementrequested
eastern
intial
strengthens
Therersquod
title
revive
Thompson
Conservative
12-13
gearedAdding
Range
declaration
Niagara
metal
tragiclisting
deals
US-basedpreception
InteriorFine-tuning
accomplishments
Reformrsquos
appropriate
uprevolt
uncertainty center
WTO
German
US
Russian
canceleddent
lively
testing
Burden
Beijing
42-2-2423-0003
owner
Figure 6 Force-directed embedding for RCV1
achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL
The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by
such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified
Computational Intelligence and Neuroscience 9
1000 5000 10000 20000 250000
05
1
15
2
25
Reuters 21578(a)
RCV11000 5000 10000 20000 25000
0
05
1
15
2
25
(b)
Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends
Table 2 Comparison of POSNERSRL on Reuters 21578
Precision Recall 1198651
Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630
Table 3 Comparison of POSNERSRL on RCV1
Precision Recall 1198651
Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646
which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]
6 Conclusions and Future Work
In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with
the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi
Competing Interests
The authors declare that there is no conflict of interestsregarding the publication of this manuscript
References
[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003
[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015
[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011
[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009
[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015
[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005
[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005
[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th
10 Computational Intelligence and Neuroscience
Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010
[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016
[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007
[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016
[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016
[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016
[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014
[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013
[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975
[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013
[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001
[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997
[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010
[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001
[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006
[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal
of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005
[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003
[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999
[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986
[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989
[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995
[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997
[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013
[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000
[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011
[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012
[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014
[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990
[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008
[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012
[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012
[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013
[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010
[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the
Computational Intelligence and Neuroscience 11
International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005
[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012
[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 3
Table 1 Properties of word-particle
Name NotesID IdentifierPos Coordinates (word embedding)Word Corresponding wordMass Synthetic alivenessTmpMass All backward related particle alivenesshistory Mass Historical alivenessCurrent Mass Current alivenessBase Mass IdentifierChain Backward related indexMax flaw Backward related degreeFlaw Current flaw lengthRadius Repulsion radiusPull Radius Tension-start radiusVelocity Current velocity of particle
represents the semantic relation among words as a tree struc-ture The deep learning mechanism is tried to build a NLM[37]
23 Force Simulated Method The force-directed algorithmsare mainly applied in data visualization Reference [38] com-pares several force-directed algorithms Reference [39] usesthesemethods for analyzing a complex social network whichadds a gravity to draw the graph of social network In somecross-domain applications wireless sensors network uses it tobuild layouts [40] and [41] performs electrical circuit layoutsautomatically based on the force rules
The reason why we use the force simulated approachto improve the generation of word embedding is that therelation between words semantic is relatively stable within acertain numbers of documents that depict the similar topicThis is somewhat like the convergence of the stable energystatus in the force-directed algorithm We are inspired it touse the idea of the force-related methods to improve theproblems of NLM
3 Methodology
31 Word-Particle Model To map word into a particlessystem we must define a mapping rule that specifies whichattribute of particle corresponds to which feature of wordsemantics The linking table is shown in Table 1 The tableconsists of two parts where the left part designates the namesof each property for particles and the right part gives theexplanation for the corresponding semantic feature
For more explanation we explain them one by one
(i) ID each word has a unique identifier for relating thecorresponding particle
(ii) Pos it is exactly the attribute that we want to trainwhich is also called word embedding that denotes thecoordinate of a particle in a plane actually
Hello
Boy
Bye
World
Figure 1 An example of the word-particle model
(iii) Mass we use the concept of particle mass to denotethe occurrence frequency of a word
(iv) TmpMass we use the temporal mass of particle torepresent the coword frequency
(v) Current Mass amp history Mass amp base Mass theyrepresent the current mass historical mass and basicmass of a particle respectively A function combinedwith them that is used to control the occurrencefrequency of coword within a sampling period isdescribed as
Mass = base Mass +history Mass2
+
Current Mass2
(1)
For controlling the intensity of relevance betweenwords we use the concept of edge in physic todescribe it
(vi) Chain it is an array for recording the ID of thebackward relating word
(vii) Max flaw it is the maximum associating strengthregarding a group of coword
(viii) Flaw it describes the associating strength regarding agroup of cowords
(ix) Radius it is the repulsion radius that keeps a mini-mum distance from other word-particles
(x) Pull Radius it is the pulling radius which means ifother word-particles break in they will be pushedaway keeping the radius distance from the intrudedword-particle
(xi) Velocity it defines a semantic shifting trend that canstrengthen the relation of two force-affected words
In Figure 1 the word-particles world and boy are back-ward related to hello and the word-particle bye is an isolatedword-particle The intensity of relation between the word-particles hello and world is stronger than the intensity ofrelation between hello and boy The directed line denotes thedirection ofword order where the pointedword-particles candrive their linking word-particles For example the cowordhello world appearsmore frequent than the coword hello boy
4 Computational Intelligence and Neuroscience
The BombTank By JDAMInput sentence
Objective word particle Backward relatedword particle
Tank
State 2
State 3
State 4
State 1Bomb
Tank Bomb
f
Tank Bomb
Fire
Plane Pilot
Air
TankBomb
Bomb
tt minus 1
v
v
v
998400
F
Figure 2 The semantic relating rule
32 Semantic Relation Building Based on the word-particlemodel we define the semantic relating rule to control themotion of particles within a given text context The doc-uments for training play the role of a driven-force sourcemaking the words have more opportunities to come togetherwhich appear in similar contextsThe procedure is as follows
Step 1 Theword embedding is trained by the document Eachdocument will be sampled sentence by sentence via a 2-gramwindow In the 2-gram window the first word is assumedas the target object and the second word is assumed as theassociated object The assumption means that the associatedword-particle will be forced tomove towards the target word-particle The related word-particle will be given an impulseThis can drive the word-particle with a certain velocity Theprogress is illustrated as state 1 in Figure 2 The word-particlebomb is associated with tank moving with the velocity V
Step 2 Given an impulse the word-particle can be initializedwith a velocityMeanwhile it will be slowed down by the forcethat comes from friction until its velocity reduces to zero anddoes not get in the repulsion radius of its objective word-particle When the word-particle moves into the repulsionradius of the objective word-particle it will be stopped at theedge keeping a distance of repulsion radius from the objectiveword-particle This is shown as state 2 A velocity affects theword-particle tank and the word-particle bomb is affectedcontinuously by the friction force 119891
Step 3 During a certain period of document learning someword-particles will set up some relations with other word-particlesWe establish a chain reacting rule for simulating thecontext feature The rule specifies the impulsion transition
in the way of particle by particle and the initial energy willdegrade at each reaction This passive action simulates thephenomenon that a topic word in a document has moresemantics and can be an index for document retrieval Theprogress is controlled by (2) Given 119898
0denotes the property
Mass of the impactedword-particle and119898119894denotes this prop-
erty of other word-particles The relation-building conditionis
119894 isin Chain
119889119894gt Pull Radius
(2)
where 119894 denotes the ID of the 119894th word-particle that relates tothe object word-particle and 119889
119894denotes the corresponding
distance of the 119894th word-particle between the object word-particles The velocity V
119905minus1will update V
119905via (3) if the word-
particle satisfies the conditionThis procedurewill repeat iter-atively till the velocity falls to zero For example in state 3 theword-particle bomb has two backward associating particlesfire and plane Its initial velocity will be decomposed withplane according to (3) if given an impulsion towards tankBut the word-particle fire fails to move because it is outsideof the Pull Radius distance of bombThe decomposition willbe delivered repetitively if the velocity fits the condition andis greater than zero
1198980V119905minus1= (
119896
sum
119894isinChain 119889119894gtPull Radius119898119894+ 1198980) V119905 (3)
Step 4 We add a repulsion radius for keeping the uniquenessof every word-particle because each word embedding isunique When the moving word-particle intrudes the repul-sion radius of other particles it will stop and stay at theedge of the affectedword-particles keeping a repulsion radiusdistanceThe progress is shown as state 4 Generally the wordrelation network is expected growing stably we present aninspecting criterion to check the convergence which is asfollows
V119905=
1198980V119905minus1
lim119896rarrinfinsum119896
119894isinChain 119889119894gtPull Radius119898119894 + 1198980997888rarr 0 (4)
In (4) the initial velocity will trend to zero with the relationincreasing of the number of associated word-particles thatis V119905rarr 0 When the movement of an activated particle
becomes tiny such convergence means the property Poshas reached relatively fixed coordinates This indicates thatthe word already has situated in a relatively stable semanticnetwork
33 Semantic Relation Separating For controlling the growthof words association those words that are of low frequency in2-gram context should be filtered whereas those words withhigh frequency of relations should be retained We proposeto use a linear elastic fracture mechanics to control suchfiltering A rope model is presented to represent the cowordrelation which can be assumed as a flaw that comes from atype ofmaterialThe strengthening or weakening of a relation
Computational Intelligence and Neuroscience 5
s
s
W
2a
Bomb
RelationTank
Figure 3 The flaw model
between words is controlled via the corresponding flaw sizeAn illustration is shown in Figure 3
More formally given 119882 denotes the width of rope itsvalue is obtained through the counting of a 2-gram samplingIts maximum value corresponds to the property Max flawGiven 2119886 denotes the size of a flaw corresponding to theproperty Flaw Given 119904 is the pull force that is calculated by119904 = Mass times Velocity We use the stress-intensity factor 119870 tocontrol the size of a flaw 119870 can be obtained as follows
119870 = 119898relationVradic1205871198862119886
119882
(5)
In (5) the variant 119898relation corresponds to the propertyregarding the synthetically occurring frequency of a word-particle and V corresponds to the velocity of an activatedword-particle The value of 119870 is in proportion to the size of2119886 which refers to the concept of flaw Moreover the flawextending speed is denoted by 119889119886119889119873 as
lg 119889119886119889119873
= lg119862 + 119899 lgΔ119870 (6)
In (6) lg119862 denotes a compensation constant and 119899 is ascale factor 119889119886119889119873 is in proportion of 119870 The condition isthat 119870 will decrease if119882 goes beyond 2119886 When the size offlaw is up to 119882 a separation will happen in the semanticrelation This means the associated word-particles are nolonger affected by the initial impulses which are generatedfrom their objective word-particles
4 Simulations
In this section we compare the proposed word embeddingwith the three classic NLM-based word embeddings Huang2012 [42] CampW [3] andMampS-Turian [20] Huang 2012 is theword embedding that uses multiple embeddings per wordCampW is the embedding that is trained by a feedforwardneural network and MampS-Turian is the embedding that isobtained by an improved RBM training model The twodatasets Reuters 21578 and RCV1 are used for training andevaluation In Reuters 21578 we extracted 21578 documentsfrom the raw XML format and discarded the original classlabels and titles only using the description sectionTheRCV1contains 5000 documents that are written from 50 authors70 percent of random sampling among these documentswas used to train the word embedding and the remainder
was used to evaluate the NLP tasks of POS NER and SRLwith other three types of embedding All words will keeptheir original forms whereas the numbers symbols and stopwords are kept to be trained together Those words that arenot included in training corpus will be discarded We regardthese tasks as a classification problem and apply a unifiedfeedforward neural linguistic model to perform the tasksThe compared word embeddings are readymade but theparameters of the neural networks require to be trained bythe corresponding embedding We use the benchmark thatis provided by [3] The results regarding the NLP tasks arecompared based on this benchmark
41 Evaluation The benchmark is measured in terms ofprecision recall and 1198651 [3 20 42] Assuming 119873 words arewaiting for labeling there exit 1198731 | 1198731 le 119873 words thatare labeled correctly and 1198732 | 1198732 le 119873 words that arelabeled wrong The value of precision is used to evaluate theaccuracy of labeling on POS NER and SRL The precisioncan be obtained as
119901 =
1198731
1198731+ 1198732
(7)
The value of recall is used to evaluate the coverage oflabeling on POS NER and SRL The recall can be calculatedas
119903 =
1198731
119873
(8)
The 1198651 is a combining evaluation with precision andrecall which are as follows
1198651 =
2119901119903
119901 + 119903
(9)
42 Training The parameters of the physical system are setas followsThe coefficient of fiction is set to 01 the coefficientof gravity is set to 98 and the initial velocity is set to 2 Theparameters that control semantic separating are set such thatMax flaw is set to 02 the initial value of Flaw is 0 Radiusis set to 1 and Pull Raduis is set to 20 For controlling theflaw extending speed lg119862 is set to 1 and 119899 is set to 1 Wedemonstrate the training result in terms of the generatingprocedure of word graph and average speed of word-particles A word graph can give an intuitive visualization forobserving a group of word relations We test a small numberof datasets to simulate the word embedding generation Theresult is shown in Figure 4 which contains the names from 19countries
In Figure 4(a) all the words appear in the plain physicalsystem and obtain a certain position because the trainingdocuments had given some forces to direct the word arrang-ing that follows the context in original text But in this stagesome word-particles still have a certain degree of speed tomove Those frequent word-particles such as China USAGermany and France have a relatively high speed they movepulling their backward relating word-particles (Figures 4(b)ndash4(d)) For example China has four backward relating word-particles Pakistan India USA and UK Germany has two
6 Computational Intelligence and Neuroscience
IndiaPakistan
Germany
SwedenThailand
CanadaMexico
Hungary
NewZelandPoland
France
USAUKFinland
Russia
Ukraine
Singapore
Austrilia
China
(a)
IndiaPakistan
Germany
Sweden
Thailand
Canada
Mexico
Hungary
NewZeland
PolandFranceUSA
UK
Finland
Russia
UkraineSingapore
Austrilia
China
(b)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA
UK
Finland
Russia
UkraineSingapore
AustriliaChina
(c)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA
UK
Finland
Russia
UkraineSingapore
Austrilia
China
(d)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA UK
Finland
Russia
UkraineSingapore
AustriliaChina
(e)
IndiaPakistan
Germany
Sweden
Thailand
Canada
Mexico
Hungary
NewZeland
PolandFrance
USA UK
Finland
Russia
UkraineSingapore
AustriliaChina
(f)
Figure 4 Force-directed embedding for country names
backward relating word-particles France and Finland Theother isolated word-particles have situated in a relativelystable position with a few movements We can see that someword-particles overlay with each other (Figure 4(d)) forexample India and Pakistan are too close at China andCanada overlays USA The repulsing force starts to functionat this timeThe too close word-particles will push each otheruntil they reach a balance distance (Figure 4(e)) When theinputting documents are all about similar topics the positionsof word-particles will also not vary too much showing arelatively stable topological graph (Figure 4(f))
The training result of dataset Reuters 21578 is shownin Figure 5 Each word-particle is colored with a greenblock The intensity of the relation between word-particlesis represented by a blue line where the thicker lines meana higher frequency of coword relation The position of eachword-particle is a 2-dimensional coordinate that is exactlythe training result word embeddingThe result shows that thenumbers of word relations and new word-particles will grow
with the training which iterates from 1000 to 10000 Theparticles system expands the region outward gradually Theparticle distribution presents as an ellipse for accommodatingmore new words-particles
The training result of RCV1 is shown in Figure 6 Suchdistance-based semantic measurement can be interpretedfrom some viewpoints For example the country and geo-graphical word-particles German Russian US and Bei-jing are clustered together The geo-related word-particlesNiagara WTO and US-based are pushed closely to thesewords Such word-particle graph can present an intuitiveresult for evaluating the training of word embedding nomatter during the training or after training we can intervenethe position of word-particle to improve the final result in adata-visualization based way
On the other hand we use (3) to estimate the averagevelocity of word-particles for evaluating the training processfrom a statistics viewpoint When the velocity decreasesto 1 below the convergence is assumed to be happening
Computational Intelligence and Neuroscience 7
1000 5000
10000
minus40 minus20 0 20 40 60
minus40 minus20 0 20 40 60
minus20 0 20 40 60 80
100
80
60
40
20
0
100
80
60
40
20
0
60
40
20
0
minus20
minus40
1000
minusminusminusminusminusminusminusminusminusminusminusminusminusminusminus20222222222222222222222222222222 00000000000000000000000000000000000000000 20202220222222222202222022022202222 40 60 80
100
80
60
40
20
0
Figure 5 Force-directed embedding for Reuters 21578
and the assumption coincides with the reality roughly Weexperiment 50 times for the two datasets respectively Theresult is shown as the boxplots in Figure 7 From thedownwards trends of average velocity the assumption that aword-semantic networkwill be stabilized in a certain numberof similar documents coincides with the result of two datasetsroughly Both the convergences of the two datasetsrsquo trainingappear at the training stage around the 20000th documents
In the presented word embedding learning there is nota specific converging criterion for terminating the trainingbecause the object functions of these neural based modelsare nonconvex so there may not exist an optimum valuetheoretically Empirically these word embedding learningmethods require repeating documents 20 or 30 times [3] Itbrings a serious problem that time consumption is propor-tional to the number of documents Such procedure usuallyrequires undergoing a long-term training time [3 30] But inour proposed model we demonstrate that setting a semantic
convergence condition is more convenient to select thanthose neural based approaches The convergence criterionprovides amore explicit direction for word embedding learn-ing Meanwhile the result demonstrates that to learn anacceptable word embedding requiring a certain number ofdocuments small or medium scale datasets may not beappropriate
5 Reuters 21578 and RCV1
For testing the usability we compare the word embeddingwith other three word embeddings on the NLP tasks usingthe same type of neural linguistic model The testing itemsare performed on POS NER and SRL The results are listedin Tables 2 and 3 In Reuters 21578 the labeling system usingour word embedding obtains 910 on POS 834 on NERand 673 on SRL for 1198651 This 1198651 score gets the third place inPOS and the second place on both NER and SRL In RCV1 it
8 Computational Intelligence and Neuroscience
Everen bound
statementrequested
eastern
intial
strengthens
Therersquod
title
revive
Thompson
Conservative
12-13
gearedAdding
Range
declaration
Niagara
metal
tragiclisting
deals
US-basedpreception
InteriorFine-tuning
accomplishments
Reformrsquos
appropriate
uprevolt
uncertainty center
WTO
German
US
Russian
canceleddent
lively
testing
Burden
Beijing
42-2-2423-0003
owner
Figure 6 Force-directed embedding for RCV1
achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL
The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by
such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified
Computational Intelligence and Neuroscience 9
1000 5000 10000 20000 250000
05
1
15
2
25
Reuters 21578(a)
RCV11000 5000 10000 20000 25000
0
05
1
15
2
25
(b)
Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends
Table 2 Comparison of POSNERSRL on Reuters 21578
Precision Recall 1198651
Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630
Table 3 Comparison of POSNERSRL on RCV1
Precision Recall 1198651
Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646
which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]
6 Conclusions and Future Work
In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with
the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi
Competing Interests
The authors declare that there is no conflict of interestsregarding the publication of this manuscript
References
[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003
[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015
[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011
[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009
[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015
[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005
[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005
[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th
10 Computational Intelligence and Neuroscience
Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010
[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016
[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007
[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016
[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016
[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016
[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014
[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013
[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975
[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013
[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001
[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997
[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010
[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001
[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006
[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal
of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005
[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003
[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999
[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986
[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989
[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995
[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997
[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013
[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000
[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011
[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012
[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014
[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990
[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008
[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012
[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012
[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013
[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010
[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the
Computational Intelligence and Neuroscience 11
International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005
[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012
[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
4 Computational Intelligence and Neuroscience
The BombTank By JDAMInput sentence
Objective word particle Backward relatedword particle
Tank
State 2
State 3
State 4
State 1Bomb
Tank Bomb
f
Tank Bomb
Fire
Plane Pilot
Air
TankBomb
Bomb
tt minus 1
v
v
v
998400
F
Figure 2 The semantic relating rule
32 Semantic Relation Building Based on the word-particlemodel we define the semantic relating rule to control themotion of particles within a given text context The doc-uments for training play the role of a driven-force sourcemaking the words have more opportunities to come togetherwhich appear in similar contextsThe procedure is as follows
Step 1 Theword embedding is trained by the document Eachdocument will be sampled sentence by sentence via a 2-gramwindow In the 2-gram window the first word is assumedas the target object and the second word is assumed as theassociated object The assumption means that the associatedword-particle will be forced tomove towards the target word-particle The related word-particle will be given an impulseThis can drive the word-particle with a certain velocity Theprogress is illustrated as state 1 in Figure 2 The word-particlebomb is associated with tank moving with the velocity V
Step 2 Given an impulse the word-particle can be initializedwith a velocityMeanwhile it will be slowed down by the forcethat comes from friction until its velocity reduces to zero anddoes not get in the repulsion radius of its objective word-particle When the word-particle moves into the repulsionradius of the objective word-particle it will be stopped at theedge keeping a distance of repulsion radius from the objectiveword-particle This is shown as state 2 A velocity affects theword-particle tank and the word-particle bomb is affectedcontinuously by the friction force 119891
Step 3 During a certain period of document learning someword-particles will set up some relations with other word-particlesWe establish a chain reacting rule for simulating thecontext feature The rule specifies the impulsion transition
in the way of particle by particle and the initial energy willdegrade at each reaction This passive action simulates thephenomenon that a topic word in a document has moresemantics and can be an index for document retrieval Theprogress is controlled by (2) Given 119898
0denotes the property
Mass of the impactedword-particle and119898119894denotes this prop-
erty of other word-particles The relation-building conditionis
119894 isin Chain
119889119894gt Pull Radius
(2)
where 119894 denotes the ID of the 119894th word-particle that relates tothe object word-particle and 119889
119894denotes the corresponding
distance of the 119894th word-particle between the object word-particles The velocity V
119905minus1will update V
119905via (3) if the word-
particle satisfies the conditionThis procedurewill repeat iter-atively till the velocity falls to zero For example in state 3 theword-particle bomb has two backward associating particlesfire and plane Its initial velocity will be decomposed withplane according to (3) if given an impulsion towards tankBut the word-particle fire fails to move because it is outsideof the Pull Radius distance of bombThe decomposition willbe delivered repetitively if the velocity fits the condition andis greater than zero
1198980V119905minus1= (
119896
sum
119894isinChain 119889119894gtPull Radius119898119894+ 1198980) V119905 (3)
Step 4 We add a repulsion radius for keeping the uniquenessof every word-particle because each word embedding isunique When the moving word-particle intrudes the repul-sion radius of other particles it will stop and stay at theedge of the affectedword-particles keeping a repulsion radiusdistanceThe progress is shown as state 4 Generally the wordrelation network is expected growing stably we present aninspecting criterion to check the convergence which is asfollows
V119905=
1198980V119905minus1
lim119896rarrinfinsum119896
119894isinChain 119889119894gtPull Radius119898119894 + 1198980997888rarr 0 (4)
In (4) the initial velocity will trend to zero with the relationincreasing of the number of associated word-particles thatis V119905rarr 0 When the movement of an activated particle
becomes tiny such convergence means the property Poshas reached relatively fixed coordinates This indicates thatthe word already has situated in a relatively stable semanticnetwork
33 Semantic Relation Separating For controlling the growthof words association those words that are of low frequency in2-gram context should be filtered whereas those words withhigh frequency of relations should be retained We proposeto use a linear elastic fracture mechanics to control suchfiltering A rope model is presented to represent the cowordrelation which can be assumed as a flaw that comes from atype ofmaterialThe strengthening or weakening of a relation
Computational Intelligence and Neuroscience 5
s
s
W
2a
Bomb
RelationTank
Figure 3 The flaw model
between words is controlled via the corresponding flaw sizeAn illustration is shown in Figure 3
More formally given 119882 denotes the width of rope itsvalue is obtained through the counting of a 2-gram samplingIts maximum value corresponds to the property Max flawGiven 2119886 denotes the size of a flaw corresponding to theproperty Flaw Given 119904 is the pull force that is calculated by119904 = Mass times Velocity We use the stress-intensity factor 119870 tocontrol the size of a flaw 119870 can be obtained as follows
119870 = 119898relationVradic1205871198862119886
119882
(5)
In (5) the variant 119898relation corresponds to the propertyregarding the synthetically occurring frequency of a word-particle and V corresponds to the velocity of an activatedword-particle The value of 119870 is in proportion to the size of2119886 which refers to the concept of flaw Moreover the flawextending speed is denoted by 119889119886119889119873 as
lg 119889119886119889119873
= lg119862 + 119899 lgΔ119870 (6)
In (6) lg119862 denotes a compensation constant and 119899 is ascale factor 119889119886119889119873 is in proportion of 119870 The condition isthat 119870 will decrease if119882 goes beyond 2119886 When the size offlaw is up to 119882 a separation will happen in the semanticrelation This means the associated word-particles are nolonger affected by the initial impulses which are generatedfrom their objective word-particles
4 Simulations
In this section we compare the proposed word embeddingwith the three classic NLM-based word embeddings Huang2012 [42] CampW [3] andMampS-Turian [20] Huang 2012 is theword embedding that uses multiple embeddings per wordCampW is the embedding that is trained by a feedforwardneural network and MampS-Turian is the embedding that isobtained by an improved RBM training model The twodatasets Reuters 21578 and RCV1 are used for training andevaluation In Reuters 21578 we extracted 21578 documentsfrom the raw XML format and discarded the original classlabels and titles only using the description sectionTheRCV1contains 5000 documents that are written from 50 authors70 percent of random sampling among these documentswas used to train the word embedding and the remainder
was used to evaluate the NLP tasks of POS NER and SRLwith other three types of embedding All words will keeptheir original forms whereas the numbers symbols and stopwords are kept to be trained together Those words that arenot included in training corpus will be discarded We regardthese tasks as a classification problem and apply a unifiedfeedforward neural linguistic model to perform the tasksThe compared word embeddings are readymade but theparameters of the neural networks require to be trained bythe corresponding embedding We use the benchmark thatis provided by [3] The results regarding the NLP tasks arecompared based on this benchmark
41 Evaluation The benchmark is measured in terms ofprecision recall and 1198651 [3 20 42] Assuming 119873 words arewaiting for labeling there exit 1198731 | 1198731 le 119873 words thatare labeled correctly and 1198732 | 1198732 le 119873 words that arelabeled wrong The value of precision is used to evaluate theaccuracy of labeling on POS NER and SRL The precisioncan be obtained as
119901 =
1198731
1198731+ 1198732
(7)
The value of recall is used to evaluate the coverage oflabeling on POS NER and SRL The recall can be calculatedas
119903 =
1198731
119873
(8)
The 1198651 is a combining evaluation with precision andrecall which are as follows
1198651 =
2119901119903
119901 + 119903
(9)
42 Training The parameters of the physical system are setas followsThe coefficient of fiction is set to 01 the coefficientof gravity is set to 98 and the initial velocity is set to 2 Theparameters that control semantic separating are set such thatMax flaw is set to 02 the initial value of Flaw is 0 Radiusis set to 1 and Pull Raduis is set to 20 For controlling theflaw extending speed lg119862 is set to 1 and 119899 is set to 1 Wedemonstrate the training result in terms of the generatingprocedure of word graph and average speed of word-particles A word graph can give an intuitive visualization forobserving a group of word relations We test a small numberof datasets to simulate the word embedding generation Theresult is shown in Figure 4 which contains the names from 19countries
In Figure 4(a) all the words appear in the plain physicalsystem and obtain a certain position because the trainingdocuments had given some forces to direct the word arrang-ing that follows the context in original text But in this stagesome word-particles still have a certain degree of speed tomove Those frequent word-particles such as China USAGermany and France have a relatively high speed they movepulling their backward relating word-particles (Figures 4(b)ndash4(d)) For example China has four backward relating word-particles Pakistan India USA and UK Germany has two
6 Computational Intelligence and Neuroscience
IndiaPakistan
Germany
SwedenThailand
CanadaMexico
Hungary
NewZelandPoland
France
USAUKFinland
Russia
Ukraine
Singapore
Austrilia
China
(a)
IndiaPakistan
Germany
Sweden
Thailand
Canada
Mexico
Hungary
NewZeland
PolandFranceUSA
UK
Finland
Russia
UkraineSingapore
Austrilia
China
(b)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA
UK
Finland
Russia
UkraineSingapore
AustriliaChina
(c)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA
UK
Finland
Russia
UkraineSingapore
Austrilia
China
(d)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA UK
Finland
Russia
UkraineSingapore
AustriliaChina
(e)
IndiaPakistan
Germany
Sweden
Thailand
Canada
Mexico
Hungary
NewZeland
PolandFrance
USA UK
Finland
Russia
UkraineSingapore
AustriliaChina
(f)
Figure 4 Force-directed embedding for country names
backward relating word-particles France and Finland Theother isolated word-particles have situated in a relativelystable position with a few movements We can see that someword-particles overlay with each other (Figure 4(d)) forexample India and Pakistan are too close at China andCanada overlays USA The repulsing force starts to functionat this timeThe too close word-particles will push each otheruntil they reach a balance distance (Figure 4(e)) When theinputting documents are all about similar topics the positionsof word-particles will also not vary too much showing arelatively stable topological graph (Figure 4(f))
The training result of dataset Reuters 21578 is shownin Figure 5 Each word-particle is colored with a greenblock The intensity of the relation between word-particlesis represented by a blue line where the thicker lines meana higher frequency of coword relation The position of eachword-particle is a 2-dimensional coordinate that is exactlythe training result word embeddingThe result shows that thenumbers of word relations and new word-particles will grow
with the training which iterates from 1000 to 10000 Theparticles system expands the region outward gradually Theparticle distribution presents as an ellipse for accommodatingmore new words-particles
The training result of RCV1 is shown in Figure 6 Suchdistance-based semantic measurement can be interpretedfrom some viewpoints For example the country and geo-graphical word-particles German Russian US and Bei-jing are clustered together The geo-related word-particlesNiagara WTO and US-based are pushed closely to thesewords Such word-particle graph can present an intuitiveresult for evaluating the training of word embedding nomatter during the training or after training we can intervenethe position of word-particle to improve the final result in adata-visualization based way
On the other hand we use (3) to estimate the averagevelocity of word-particles for evaluating the training processfrom a statistics viewpoint When the velocity decreasesto 1 below the convergence is assumed to be happening
Computational Intelligence and Neuroscience 7
1000 5000
10000
minus40 minus20 0 20 40 60
minus40 minus20 0 20 40 60
minus20 0 20 40 60 80
100
80
60
40
20
0
100
80
60
40
20
0
60
40
20
0
minus20
minus40
1000
minusminusminusminusminusminusminusminusminusminusminusminusminusminusminus20222222222222222222222222222222 00000000000000000000000000000000000000000 20202220222222222202222022022202222 40 60 80
100
80
60
40
20
0
Figure 5 Force-directed embedding for Reuters 21578
and the assumption coincides with the reality roughly Weexperiment 50 times for the two datasets respectively Theresult is shown as the boxplots in Figure 7 From thedownwards trends of average velocity the assumption that aword-semantic networkwill be stabilized in a certain numberof similar documents coincides with the result of two datasetsroughly Both the convergences of the two datasetsrsquo trainingappear at the training stage around the 20000th documents
In the presented word embedding learning there is nota specific converging criterion for terminating the trainingbecause the object functions of these neural based modelsare nonconvex so there may not exist an optimum valuetheoretically Empirically these word embedding learningmethods require repeating documents 20 or 30 times [3] Itbrings a serious problem that time consumption is propor-tional to the number of documents Such procedure usuallyrequires undergoing a long-term training time [3 30] But inour proposed model we demonstrate that setting a semantic
convergence condition is more convenient to select thanthose neural based approaches The convergence criterionprovides amore explicit direction for word embedding learn-ing Meanwhile the result demonstrates that to learn anacceptable word embedding requiring a certain number ofdocuments small or medium scale datasets may not beappropriate
5 Reuters 21578 and RCV1
For testing the usability we compare the word embeddingwith other three word embeddings on the NLP tasks usingthe same type of neural linguistic model The testing itemsare performed on POS NER and SRL The results are listedin Tables 2 and 3 In Reuters 21578 the labeling system usingour word embedding obtains 910 on POS 834 on NERand 673 on SRL for 1198651 This 1198651 score gets the third place inPOS and the second place on both NER and SRL In RCV1 it
8 Computational Intelligence and Neuroscience
Everen bound
statementrequested
eastern
intial
strengthens
Therersquod
title
revive
Thompson
Conservative
12-13
gearedAdding
Range
declaration
Niagara
metal
tragiclisting
deals
US-basedpreception
InteriorFine-tuning
accomplishments
Reformrsquos
appropriate
uprevolt
uncertainty center
WTO
German
US
Russian
canceleddent
lively
testing
Burden
Beijing
42-2-2423-0003
owner
Figure 6 Force-directed embedding for RCV1
achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL
The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by
such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified
Computational Intelligence and Neuroscience 9
1000 5000 10000 20000 250000
05
1
15
2
25
Reuters 21578(a)
RCV11000 5000 10000 20000 25000
0
05
1
15
2
25
(b)
Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends
Table 2 Comparison of POSNERSRL on Reuters 21578
Precision Recall 1198651
Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630
Table 3 Comparison of POSNERSRL on RCV1
Precision Recall 1198651
Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646
which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]
6 Conclusions and Future Work
In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with
the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi
Competing Interests
The authors declare that there is no conflict of interestsregarding the publication of this manuscript
References
[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003
[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015
[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011
[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009
[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015
[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005
[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005
[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th
10 Computational Intelligence and Neuroscience
Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010
[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016
[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007
[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016
[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016
[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016
[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014
[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013
[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975
[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013
[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001
[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997
[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010
[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001
[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006
[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal
of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005
[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003
[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999
[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986
[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989
[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995
[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997
[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013
[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000
[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011
[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012
[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014
[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990
[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008
[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012
[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012
[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013
[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010
[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the
Computational Intelligence and Neuroscience 11
International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005
[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012
[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 5
s
s
W
2a
Bomb
RelationTank
Figure 3 The flaw model
between words is controlled via the corresponding flaw sizeAn illustration is shown in Figure 3
More formally given 119882 denotes the width of rope itsvalue is obtained through the counting of a 2-gram samplingIts maximum value corresponds to the property Max flawGiven 2119886 denotes the size of a flaw corresponding to theproperty Flaw Given 119904 is the pull force that is calculated by119904 = Mass times Velocity We use the stress-intensity factor 119870 tocontrol the size of a flaw 119870 can be obtained as follows
119870 = 119898relationVradic1205871198862119886
119882
(5)
In (5) the variant 119898relation corresponds to the propertyregarding the synthetically occurring frequency of a word-particle and V corresponds to the velocity of an activatedword-particle The value of 119870 is in proportion to the size of2119886 which refers to the concept of flaw Moreover the flawextending speed is denoted by 119889119886119889119873 as
lg 119889119886119889119873
= lg119862 + 119899 lgΔ119870 (6)
In (6) lg119862 denotes a compensation constant and 119899 is ascale factor 119889119886119889119873 is in proportion of 119870 The condition isthat 119870 will decrease if119882 goes beyond 2119886 When the size offlaw is up to 119882 a separation will happen in the semanticrelation This means the associated word-particles are nolonger affected by the initial impulses which are generatedfrom their objective word-particles
4 Simulations
In this section we compare the proposed word embeddingwith the three classic NLM-based word embeddings Huang2012 [42] CampW [3] andMampS-Turian [20] Huang 2012 is theword embedding that uses multiple embeddings per wordCampW is the embedding that is trained by a feedforwardneural network and MampS-Turian is the embedding that isobtained by an improved RBM training model The twodatasets Reuters 21578 and RCV1 are used for training andevaluation In Reuters 21578 we extracted 21578 documentsfrom the raw XML format and discarded the original classlabels and titles only using the description sectionTheRCV1contains 5000 documents that are written from 50 authors70 percent of random sampling among these documentswas used to train the word embedding and the remainder
was used to evaluate the NLP tasks of POS NER and SRLwith other three types of embedding All words will keeptheir original forms whereas the numbers symbols and stopwords are kept to be trained together Those words that arenot included in training corpus will be discarded We regardthese tasks as a classification problem and apply a unifiedfeedforward neural linguistic model to perform the tasksThe compared word embeddings are readymade but theparameters of the neural networks require to be trained bythe corresponding embedding We use the benchmark thatis provided by [3] The results regarding the NLP tasks arecompared based on this benchmark
41 Evaluation The benchmark is measured in terms ofprecision recall and 1198651 [3 20 42] Assuming 119873 words arewaiting for labeling there exit 1198731 | 1198731 le 119873 words thatare labeled correctly and 1198732 | 1198732 le 119873 words that arelabeled wrong The value of precision is used to evaluate theaccuracy of labeling on POS NER and SRL The precisioncan be obtained as
119901 =
1198731
1198731+ 1198732
(7)
The value of recall is used to evaluate the coverage oflabeling on POS NER and SRL The recall can be calculatedas
119903 =
1198731
119873
(8)
The 1198651 is a combining evaluation with precision andrecall which are as follows
1198651 =
2119901119903
119901 + 119903
(9)
42 Training The parameters of the physical system are setas followsThe coefficient of fiction is set to 01 the coefficientof gravity is set to 98 and the initial velocity is set to 2 Theparameters that control semantic separating are set such thatMax flaw is set to 02 the initial value of Flaw is 0 Radiusis set to 1 and Pull Raduis is set to 20 For controlling theflaw extending speed lg119862 is set to 1 and 119899 is set to 1 Wedemonstrate the training result in terms of the generatingprocedure of word graph and average speed of word-particles A word graph can give an intuitive visualization forobserving a group of word relations We test a small numberof datasets to simulate the word embedding generation Theresult is shown in Figure 4 which contains the names from 19countries
In Figure 4(a) all the words appear in the plain physicalsystem and obtain a certain position because the trainingdocuments had given some forces to direct the word arrang-ing that follows the context in original text But in this stagesome word-particles still have a certain degree of speed tomove Those frequent word-particles such as China USAGermany and France have a relatively high speed they movepulling their backward relating word-particles (Figures 4(b)ndash4(d)) For example China has four backward relating word-particles Pakistan India USA and UK Germany has two
6 Computational Intelligence and Neuroscience
IndiaPakistan
Germany
SwedenThailand
CanadaMexico
Hungary
NewZelandPoland
France
USAUKFinland
Russia
Ukraine
Singapore
Austrilia
China
(a)
IndiaPakistan
Germany
Sweden
Thailand
Canada
Mexico
Hungary
NewZeland
PolandFranceUSA
UK
Finland
Russia
UkraineSingapore
Austrilia
China
(b)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA
UK
Finland
Russia
UkraineSingapore
AustriliaChina
(c)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA
UK
Finland
Russia
UkraineSingapore
Austrilia
China
(d)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA UK
Finland
Russia
UkraineSingapore
AustriliaChina
(e)
IndiaPakistan
Germany
Sweden
Thailand
Canada
Mexico
Hungary
NewZeland
PolandFrance
USA UK
Finland
Russia
UkraineSingapore
AustriliaChina
(f)
Figure 4 Force-directed embedding for country names
backward relating word-particles France and Finland Theother isolated word-particles have situated in a relativelystable position with a few movements We can see that someword-particles overlay with each other (Figure 4(d)) forexample India and Pakistan are too close at China andCanada overlays USA The repulsing force starts to functionat this timeThe too close word-particles will push each otheruntil they reach a balance distance (Figure 4(e)) When theinputting documents are all about similar topics the positionsof word-particles will also not vary too much showing arelatively stable topological graph (Figure 4(f))
The training result of dataset Reuters 21578 is shownin Figure 5 Each word-particle is colored with a greenblock The intensity of the relation between word-particlesis represented by a blue line where the thicker lines meana higher frequency of coword relation The position of eachword-particle is a 2-dimensional coordinate that is exactlythe training result word embeddingThe result shows that thenumbers of word relations and new word-particles will grow
with the training which iterates from 1000 to 10000 Theparticles system expands the region outward gradually Theparticle distribution presents as an ellipse for accommodatingmore new words-particles
The training result of RCV1 is shown in Figure 6 Suchdistance-based semantic measurement can be interpretedfrom some viewpoints For example the country and geo-graphical word-particles German Russian US and Bei-jing are clustered together The geo-related word-particlesNiagara WTO and US-based are pushed closely to thesewords Such word-particle graph can present an intuitiveresult for evaluating the training of word embedding nomatter during the training or after training we can intervenethe position of word-particle to improve the final result in adata-visualization based way
On the other hand we use (3) to estimate the averagevelocity of word-particles for evaluating the training processfrom a statistics viewpoint When the velocity decreasesto 1 below the convergence is assumed to be happening
Computational Intelligence and Neuroscience 7
1000 5000
10000
minus40 minus20 0 20 40 60
minus40 minus20 0 20 40 60
minus20 0 20 40 60 80
100
80
60
40
20
0
100
80
60
40
20
0
60
40
20
0
minus20
minus40
1000
minusminusminusminusminusminusminusminusminusminusminusminusminusminusminus20222222222222222222222222222222 00000000000000000000000000000000000000000 20202220222222222202222022022202222 40 60 80
100
80
60
40
20
0
Figure 5 Force-directed embedding for Reuters 21578
and the assumption coincides with the reality roughly Weexperiment 50 times for the two datasets respectively Theresult is shown as the boxplots in Figure 7 From thedownwards trends of average velocity the assumption that aword-semantic networkwill be stabilized in a certain numberof similar documents coincides with the result of two datasetsroughly Both the convergences of the two datasetsrsquo trainingappear at the training stage around the 20000th documents
In the presented word embedding learning there is nota specific converging criterion for terminating the trainingbecause the object functions of these neural based modelsare nonconvex so there may not exist an optimum valuetheoretically Empirically these word embedding learningmethods require repeating documents 20 or 30 times [3] Itbrings a serious problem that time consumption is propor-tional to the number of documents Such procedure usuallyrequires undergoing a long-term training time [3 30] But inour proposed model we demonstrate that setting a semantic
convergence condition is more convenient to select thanthose neural based approaches The convergence criterionprovides amore explicit direction for word embedding learn-ing Meanwhile the result demonstrates that to learn anacceptable word embedding requiring a certain number ofdocuments small or medium scale datasets may not beappropriate
5 Reuters 21578 and RCV1
For testing the usability we compare the word embeddingwith other three word embeddings on the NLP tasks usingthe same type of neural linguistic model The testing itemsare performed on POS NER and SRL The results are listedin Tables 2 and 3 In Reuters 21578 the labeling system usingour word embedding obtains 910 on POS 834 on NERand 673 on SRL for 1198651 This 1198651 score gets the third place inPOS and the second place on both NER and SRL In RCV1 it
8 Computational Intelligence and Neuroscience
Everen bound
statementrequested
eastern
intial
strengthens
Therersquod
title
revive
Thompson
Conservative
12-13
gearedAdding
Range
declaration
Niagara
metal
tragiclisting
deals
US-basedpreception
InteriorFine-tuning
accomplishments
Reformrsquos
appropriate
uprevolt
uncertainty center
WTO
German
US
Russian
canceleddent
lively
testing
Burden
Beijing
42-2-2423-0003
owner
Figure 6 Force-directed embedding for RCV1
achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL
The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by
such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified
Computational Intelligence and Neuroscience 9
1000 5000 10000 20000 250000
05
1
15
2
25
Reuters 21578(a)
RCV11000 5000 10000 20000 25000
0
05
1
15
2
25
(b)
Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends
Table 2 Comparison of POSNERSRL on Reuters 21578
Precision Recall 1198651
Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630
Table 3 Comparison of POSNERSRL on RCV1
Precision Recall 1198651
Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646
which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]
6 Conclusions and Future Work
In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with
the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi
Competing Interests
The authors declare that there is no conflict of interestsregarding the publication of this manuscript
References
[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003
[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015
[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011
[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009
[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015
[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005
[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005
[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th
10 Computational Intelligence and Neuroscience
Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010
[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016
[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007
[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016
[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016
[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016
[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014
[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013
[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975
[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013
[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001
[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997
[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010
[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001
[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006
[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal
of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005
[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003
[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999
[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986
[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989
[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995
[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997
[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013
[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000
[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011
[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012
[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014
[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990
[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008
[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012
[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012
[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013
[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010
[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the
Computational Intelligence and Neuroscience 11
International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005
[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012
[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
6 Computational Intelligence and Neuroscience
IndiaPakistan
Germany
SwedenThailand
CanadaMexico
Hungary
NewZelandPoland
France
USAUKFinland
Russia
Ukraine
Singapore
Austrilia
China
(a)
IndiaPakistan
Germany
Sweden
Thailand
Canada
Mexico
Hungary
NewZeland
PolandFranceUSA
UK
Finland
Russia
UkraineSingapore
Austrilia
China
(b)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA
UK
Finland
Russia
UkraineSingapore
AustriliaChina
(c)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA
UK
Finland
Russia
UkraineSingapore
Austrilia
China
(d)
IndiaPakistan
Germany
Sweden
Thailand
CanadaMexico
Hungary
NewZeland
PolandFrance
USA UK
Finland
Russia
UkraineSingapore
AustriliaChina
(e)
IndiaPakistan
Germany
Sweden
Thailand
Canada
Mexico
Hungary
NewZeland
PolandFrance
USA UK
Finland
Russia
UkraineSingapore
AustriliaChina
(f)
Figure 4 Force-directed embedding for country names
backward relating word-particles France and Finland Theother isolated word-particles have situated in a relativelystable position with a few movements We can see that someword-particles overlay with each other (Figure 4(d)) forexample India and Pakistan are too close at China andCanada overlays USA The repulsing force starts to functionat this timeThe too close word-particles will push each otheruntil they reach a balance distance (Figure 4(e)) When theinputting documents are all about similar topics the positionsof word-particles will also not vary too much showing arelatively stable topological graph (Figure 4(f))
The training result of dataset Reuters 21578 is shownin Figure 5 Each word-particle is colored with a greenblock The intensity of the relation between word-particlesis represented by a blue line where the thicker lines meana higher frequency of coword relation The position of eachword-particle is a 2-dimensional coordinate that is exactlythe training result word embeddingThe result shows that thenumbers of word relations and new word-particles will grow
with the training which iterates from 1000 to 10000 Theparticles system expands the region outward gradually Theparticle distribution presents as an ellipse for accommodatingmore new words-particles
The training result of RCV1 is shown in Figure 6 Suchdistance-based semantic measurement can be interpretedfrom some viewpoints For example the country and geo-graphical word-particles German Russian US and Bei-jing are clustered together The geo-related word-particlesNiagara WTO and US-based are pushed closely to thesewords Such word-particle graph can present an intuitiveresult for evaluating the training of word embedding nomatter during the training or after training we can intervenethe position of word-particle to improve the final result in adata-visualization based way
On the other hand we use (3) to estimate the averagevelocity of word-particles for evaluating the training processfrom a statistics viewpoint When the velocity decreasesto 1 below the convergence is assumed to be happening
Computational Intelligence and Neuroscience 7
1000 5000
10000
minus40 minus20 0 20 40 60
minus40 minus20 0 20 40 60
minus20 0 20 40 60 80
100
80
60
40
20
0
100
80
60
40
20
0
60
40
20
0
minus20
minus40
1000
minusminusminusminusminusminusminusminusminusminusminusminusminusminusminus20222222222222222222222222222222 00000000000000000000000000000000000000000 20202220222222222202222022022202222 40 60 80
100
80
60
40
20
0
Figure 5 Force-directed embedding for Reuters 21578
and the assumption coincides with the reality roughly Weexperiment 50 times for the two datasets respectively Theresult is shown as the boxplots in Figure 7 From thedownwards trends of average velocity the assumption that aword-semantic networkwill be stabilized in a certain numberof similar documents coincides with the result of two datasetsroughly Both the convergences of the two datasetsrsquo trainingappear at the training stage around the 20000th documents
In the presented word embedding learning there is nota specific converging criterion for terminating the trainingbecause the object functions of these neural based modelsare nonconvex so there may not exist an optimum valuetheoretically Empirically these word embedding learningmethods require repeating documents 20 or 30 times [3] Itbrings a serious problem that time consumption is propor-tional to the number of documents Such procedure usuallyrequires undergoing a long-term training time [3 30] But inour proposed model we demonstrate that setting a semantic
convergence condition is more convenient to select thanthose neural based approaches The convergence criterionprovides amore explicit direction for word embedding learn-ing Meanwhile the result demonstrates that to learn anacceptable word embedding requiring a certain number ofdocuments small or medium scale datasets may not beappropriate
5 Reuters 21578 and RCV1
For testing the usability we compare the word embeddingwith other three word embeddings on the NLP tasks usingthe same type of neural linguistic model The testing itemsare performed on POS NER and SRL The results are listedin Tables 2 and 3 In Reuters 21578 the labeling system usingour word embedding obtains 910 on POS 834 on NERand 673 on SRL for 1198651 This 1198651 score gets the third place inPOS and the second place on both NER and SRL In RCV1 it
8 Computational Intelligence and Neuroscience
Everen bound
statementrequested
eastern
intial
strengthens
Therersquod
title
revive
Thompson
Conservative
12-13
gearedAdding
Range
declaration
Niagara
metal
tragiclisting
deals
US-basedpreception
InteriorFine-tuning
accomplishments
Reformrsquos
appropriate
uprevolt
uncertainty center
WTO
German
US
Russian
canceleddent
lively
testing
Burden
Beijing
42-2-2423-0003
owner
Figure 6 Force-directed embedding for RCV1
achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL
The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by
such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified
Computational Intelligence and Neuroscience 9
1000 5000 10000 20000 250000
05
1
15
2
25
Reuters 21578(a)
RCV11000 5000 10000 20000 25000
0
05
1
15
2
25
(b)
Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends
Table 2 Comparison of POSNERSRL on Reuters 21578
Precision Recall 1198651
Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630
Table 3 Comparison of POSNERSRL on RCV1
Precision Recall 1198651
Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646
which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]
6 Conclusions and Future Work
In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with
the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi
Competing Interests
The authors declare that there is no conflict of interestsregarding the publication of this manuscript
References
[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003
[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015
[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011
[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009
[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015
[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005
[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005
[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th
10 Computational Intelligence and Neuroscience
Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010
[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016
[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007
[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016
[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016
[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016
[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014
[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013
[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975
[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013
[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001
[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997
[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010
[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001
[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006
[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal
of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005
[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003
[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999
[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986
[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989
[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995
[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997
[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013
[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000
[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011
[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012
[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014
[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990
[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008
[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012
[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012
[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013
[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010
[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the
Computational Intelligence and Neuroscience 11
International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005
[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012
[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 7
1000 5000
10000
minus40 minus20 0 20 40 60
minus40 minus20 0 20 40 60
minus20 0 20 40 60 80
100
80
60
40
20
0
100
80
60
40
20
0
60
40
20
0
minus20
minus40
1000
minusminusminusminusminusminusminusminusminusminusminusminusminusminusminus20222222222222222222222222222222 00000000000000000000000000000000000000000 20202220222222222202222022022202222 40 60 80
100
80
60
40
20
0
Figure 5 Force-directed embedding for Reuters 21578
and the assumption coincides with the reality roughly Weexperiment 50 times for the two datasets respectively Theresult is shown as the boxplots in Figure 7 From thedownwards trends of average velocity the assumption that aword-semantic networkwill be stabilized in a certain numberof similar documents coincides with the result of two datasetsroughly Both the convergences of the two datasetsrsquo trainingappear at the training stage around the 20000th documents
In the presented word embedding learning there is nota specific converging criterion for terminating the trainingbecause the object functions of these neural based modelsare nonconvex so there may not exist an optimum valuetheoretically Empirically these word embedding learningmethods require repeating documents 20 or 30 times [3] Itbrings a serious problem that time consumption is propor-tional to the number of documents Such procedure usuallyrequires undergoing a long-term training time [3 30] But inour proposed model we demonstrate that setting a semantic
convergence condition is more convenient to select thanthose neural based approaches The convergence criterionprovides amore explicit direction for word embedding learn-ing Meanwhile the result demonstrates that to learn anacceptable word embedding requiring a certain number ofdocuments small or medium scale datasets may not beappropriate
5 Reuters 21578 and RCV1
For testing the usability we compare the word embeddingwith other three word embeddings on the NLP tasks usingthe same type of neural linguistic model The testing itemsare performed on POS NER and SRL The results are listedin Tables 2 and 3 In Reuters 21578 the labeling system usingour word embedding obtains 910 on POS 834 on NERand 673 on SRL for 1198651 This 1198651 score gets the third place inPOS and the second place on both NER and SRL In RCV1 it
8 Computational Intelligence and Neuroscience
Everen bound
statementrequested
eastern
intial
strengthens
Therersquod
title
revive
Thompson
Conservative
12-13
gearedAdding
Range
declaration
Niagara
metal
tragiclisting
deals
US-basedpreception
InteriorFine-tuning
accomplishments
Reformrsquos
appropriate
uprevolt
uncertainty center
WTO
German
US
Russian
canceleddent
lively
testing
Burden
Beijing
42-2-2423-0003
owner
Figure 6 Force-directed embedding for RCV1
achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL
The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by
such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified
Computational Intelligence and Neuroscience 9
1000 5000 10000 20000 250000
05
1
15
2
25
Reuters 21578(a)
RCV11000 5000 10000 20000 25000
0
05
1
15
2
25
(b)
Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends
Table 2 Comparison of POSNERSRL on Reuters 21578
Precision Recall 1198651
Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630
Table 3 Comparison of POSNERSRL on RCV1
Precision Recall 1198651
Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646
which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]
6 Conclusions and Future Work
In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with
the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi
Competing Interests
The authors declare that there is no conflict of interestsregarding the publication of this manuscript
References
[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003
[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015
[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011
[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009
[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015
[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005
[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005
[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th
10 Computational Intelligence and Neuroscience
Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010
[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016
[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007
[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016
[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016
[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016
[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014
[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013
[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975
[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013
[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001
[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997
[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010
[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001
[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006
[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal
of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005
[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003
[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999
[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986
[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989
[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995
[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997
[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013
[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000
[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011
[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012
[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014
[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990
[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008
[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012
[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012
[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013
[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010
[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the
Computational Intelligence and Neuroscience 11
International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005
[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012
[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
8 Computational Intelligence and Neuroscience
Everen bound
statementrequested
eastern
intial
strengthens
Therersquod
title
revive
Thompson
Conservative
12-13
gearedAdding
Range
declaration
Niagara
metal
tragiclisting
deals
US-basedpreception
InteriorFine-tuning
accomplishments
Reformrsquos
appropriate
uprevolt
uncertainty center
WTO
German
US
Russian
canceleddent
lively
testing
Burden
Beijing
42-2-2423-0003
owner
Figure 6 Force-directed embedding for RCV1
achieves 891 on POS 821 on NER and 659 on SRL for1198651 The 1198651 scores obtain the second place in POS the thirdplace in NER and the second place in SRL
The performance of the proposed word embedding isclose to the best results in [3] but the dimensional numberis two which is far less than the 50- or 100-dimensionalword embeddings [3 43] This brings a benefit for reducingthe number of neural cells in performing NLP tasks by
such type of linguistic models Implementing these NLPtasks we construct a 3-layer feedforward neural networkwith a 5-cell inputting layer and a 100-cell middle layerand a 25-cell outputting layer To utilize the compared wordembeddings the number of the inputting vectors is set to500 because all of themare the 100-dimensional embeddingsBut our corresponding inputting layer just requires 10-dimensional vector The structure of model is simplified
Computational Intelligence and Neuroscience 9
1000 5000 10000 20000 250000
05
1
15
2
25
Reuters 21578(a)
RCV11000 5000 10000 20000 25000
0
05
1
15
2
25
(b)
Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends
Table 2 Comparison of POSNERSRL on Reuters 21578
Precision Recall 1198651
Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630
Table 3 Comparison of POSNERSRL on RCV1
Precision Recall 1198651
Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646
which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]
6 Conclusions and Future Work
In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with
the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi
Competing Interests
The authors declare that there is no conflict of interestsregarding the publication of this manuscript
References
[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003
[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015
[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011
[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009
[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015
[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005
[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005
[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th
10 Computational Intelligence and Neuroscience
Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010
[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016
[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007
[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016
[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016
[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016
[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014
[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013
[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975
[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013
[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001
[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997
[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010
[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001
[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006
[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal
of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005
[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003
[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999
[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986
[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989
[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995
[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997
[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013
[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000
[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011
[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012
[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014
[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990
[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008
[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012
[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012
[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013
[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010
[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the
Computational Intelligence and Neuroscience 11
International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005
[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012
[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 9
1000 5000 10000 20000 250000
05
1
15
2
25
Reuters 21578(a)
RCV11000 5000 10000 20000 25000
0
05
1
15
2
25
(b)
Figure 7 The 119910-axis represents the velocity of a word-particle the 119909-axis represents the number of documents Both of them show thedownwards trends
Table 2 Comparison of POSNERSRL on Reuters 21578
Precision Recall 1198651
Force-directed 925841702 895827646 910834673Huang 2012 942862748 938868741 940865744CampW 936824678 928815658 932819668MampS-Turian 914821636 862752625 887785630
Table 3 Comparison of POSNERSRL on RCV1
Precision Recall 1198651
Force-directed 901827663 882815655 891821659Huang 2012 884832713 907846703 895839708CampW 885835646 852826631 868830638MampS-Turian 908805636 903739657 905771646
which can reduce the complexity of neural networks Thisadvantage will improve the performance of such modelssuch as reducing training time and improving the speed oflabeling The result also demonstrates that learning a groupof word embeddings cannot be high dimensional and dependon the neural network based approaches It means wordrepresentation learning and the task system constructingcan be decomposed to two individual parts The two-stepframework could achieve the same goal with the all-in-onemodels [3]
6 Conclusions and Future Work
In this paper we propose a force-directed method that usesa fracture mechanic model to learn word embedding Theresult demonstrates that the physical simulation approach isfeasible It improves the procedure of the traditional NLM-based approaches in terms of parameters training and tasksperforming (POS and NER and SRL) The next works areas follows The model will be improved to suit streamingdata using a one-step solution for predicting the coordinateof word-particle which will improve the performance ofour system packaging the properties of word-particle with
the gefx file format (Graph Exchange XML Format) that canprovide a capability for data sharing across multiple datavisualizing tools for example Gephi
Competing Interests
The authors declare that there is no conflict of interestsregarding the publication of this manuscript
References
[1] Y Bengio R Ducharme P Vincent and C Jauvin ldquoA neuralprobabilistic language modelrdquo Journal of Machine LearningResearch vol 3 pp 1137ndash1155 2003
[2] S R Bowman G Angeli C Potts and C D Manning ldquoA largeannotated corpus for learning natural language inferencerdquo inProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP rsquo15) 2015
[3] R Collobert J Weston L Bottou M Karlen K Kavukcuogluand P Kuksa ldquoNatural language processing (almost) fromscratchrdquo Journal of Machine Learning Research vol 3 no 12 pp2493ndash2537 2011
[4] J Turian L Ratinov Y Bengio and D Roth ldquoA preliminaryevaluation of word representations for named-entity recog-nitionrdquo in Proceedings of the NIPS Workshop on GrammarInduction Representation of Language and Language Learning2009
[5] S R Bowman C Potts and C D Manning ldquoLearningdistributed word representations for natural logic reasoningrdquoin Proceedings of the AAAI Spring Symposium on KnowledgeRepresentation and Reasoning Stanford Calif USA March2015
[6] B Fortuna M Grobelnik and D Mladenic ldquoVisualization oftext document corpusrdquo Informatica vol 29 no 4 pp 497ndash5022005
[7] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligence and Statistics (AISTATS rsquo05)vol 5 pp 246ndash252 Barbados Caribbean 2005
[8] R Navigli and S P Ponzetto ldquoBabelNet building a very largemultilingual semantic networkrdquo in Proceedings of the 48th
10 Computational Intelligence and Neuroscience
Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010
[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016
[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007
[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016
[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016
[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016
[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014
[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013
[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975
[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013
[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001
[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997
[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010
[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001
[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006
[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal
of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005
[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003
[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999
[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986
[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989
[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995
[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997
[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013
[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000
[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011
[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012
[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014
[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990
[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008
[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012
[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012
[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013
[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010
[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the
Computational Intelligence and Neuroscience 11
International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005
[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012
[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
10 Computational Intelligence and Neuroscience
Annual Meeting of the Association for Computational Linguistics(ACL rsquo10) pp 216ndash225 2010
[9] P Wang Y Qian F K Soong L He and H Zhao ldquoLearningdistributed word representations for bidirectional LSTM recur-rent neural networkrdquo in Proceedings of the Conference of theNorth American Chapter of the Association for ComputationalLinguistics Human Language Technologies (NAACL rsquo16) SanDiego Calif USA June 2016
[10] A Mnih and G Hinton ldquoThree new graphical models forstatistical language modellingrdquo in Proceedings of the 24th Inter-national Conference on Machine Learning (ICML rsquo07) pp 641ndash648 June 2007
[11] Z Li H Zhao C Pang L Wang and H Wang A ConstituentSyntactic Parse Tree Based Discourse Parser CoNLL-2016Shared Task Berlin Germany 2016
[12] Z Zhang H Zhao and L Qin ldquoProbabilistic graph-baseddependency parsing with convolutional neural networkrdquo inProceedings of the 54th Annual Meeting of the Association forComputational Linguistics (ACL rsquo16) pp 1382ndash1392 Associationfor Computational Linguistics Berlin Germany August 2016
[13] RWangM Utiyama I Goto E Sumita H Zhao and B-L LuldquoConverting continuous-space language models into N-gramlanguage models with efficient bilingual pruning for statisticalmachine translationrdquoACMTransactions onAsian Low-ResourceLanguage Information Process vol 15 no 3 pp 1ndash26 2016
[14] G Mesnil A Bordes J Weston G Chechik and Y BengioldquoLearning semantic representations of objects and their partsrdquoMachine Learning vol 94 no 2 pp 281ndash301 2014
[15] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013
[16] G Salton A Wong and C S Yang ldquoVector space model forautomatic indexingrdquo Communications of the ACM vol 18 no11 pp 613ndash620 1975
[17] E Drsquohondt S Verberne C Koster and L Boves ldquoText repre-sentations for patent classificationrdquo Computational Linguisticsvol 39 no 3 pp 755ndash775 2013
[18] C Blake and W Pratt ldquoBetter rules fewer features a semanticapproach to selecting features from textrdquo in Proceedings of theIEEE International Conference on Data Mining (ICDM rsquo01) pp59ndash66 IEEE San Jose Calif USA 2001
[19] M Mitra C Buckley A Singhal and C Cardie ldquoAn analysis ofstatistical and syntactic phrasesrdquo in Proceedings of the 5th Inter-national Conference Computer-Assisted Information Searchingon Internet (RIAO rsquo97) pp 200ndash214 Montreal Canada 1997
[20] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 48th Annual Meeting of the Association forComputational Linguistics (ACL rsquo10) pp 384ndash394 July 2010
[21] M Sahlgren ldquoVector-based semantic analysis representingword meanings based on random labelsrdquo in Proceedings of theSemantic Knowledge Acquisition and Categorisation Workshopat European Summer School in Logic Language and Information(ESSLLI XIII rsquo01) Helsinki Finland August 2001
[22] M Sahlgren The Word-Space Model Using DistributionalAnalysis to Represent Syntagmatic and Paradigmatic Relationsbetween Words in High-Dimensional Vector Spaces StockholmUniversity Stockholm Sweden 2006
[23] P Cimiano A Hotho and S Staab ldquoLearning concept hierar-chies from text corpora using formal concept analysisrdquo Journal
of Artificial Intelligence Research vol 24 no 1 pp 305ndash3392005
[24] A Kehagias V Petridis V G Kaburlasos and P FragkouldquoA comparison of word- and sense-based text categorizationusing several classification algorithmsrdquo Journal of IntelligentInformation Systems vol 21 no 3 pp 227ndash247 2003
[25] M Rajman and R Besancon ldquoStochastic distributional modelsfor textual information retrievalrdquo in Proceedings of the 9thConference of the Applied Stochastic Models and Data Analysis(ASMDA rsquo99) pp 80ndash85 1999
[26] G E Hinton ldquoLearning distributed representations of con-ceptsrdquo in Proceedings of the 8th Annual Conference of theCognitive Science Society pp 1ndash12 1986
[27] H Ritter and T Kohonen ldquoSelf-organizing semantic mapsrdquoBiological Cybernetics vol 61 no 4 pp 241ndash254 1989
[28] T Honkela V Pulkki and T Kohonen ldquoContextual relationsof words in grimm tales analyzed by self-organizing maprdquo inProceedings of the Hybrid Neural Systems 1995
[29] THonkela ldquoSelf-organizingmaps ofwords for natural languageprocessing applicationrdquo in Proceedings of the International ICSCSymposium on Soft Computing 1997
[30] T Mikolov S W Yih and G Zweig ldquoLinguistic regularitiesin continuous space word representationsrdquo in Proceedings ofthe 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics Human LanguageTechnologies (NAACL-HLT rsquo13) 2013
[31] W Xu and A Rudnicky ldquoCan artificial neural networks learnlanguagemodelsrdquo in Proceedings of the International Conferenceon Statistical Language Processing pp 1ndash13 2000
[32] M I Mandel R Pascanu D Eck et al ldquoContextual taginferencerdquo ACM Transactions on Multimedia Computing Com-munications and Applications (TOMM) vol 7S no 1 2011
[33] A Bordes X Glorot J Weston and Y Bengio ldquoJoint learningof words and meaning representations for open-text semanticparsingrdquo Journal of Machine Learning Research vol 22 pp 127ndash135 2012
[34] F Huang A Ahuja D Downey Y Yang Y Guo and AYates ldquoLearning representations for weakly supervised naturallanguage processing tasksrdquo Computational Linguistics vol 40no 1 pp 85ndash120 2014
[35] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990
[36] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the 22nd Annual Conferenceon Neural Information Processing Systems (NIPS rsquo08) pp 1081ndash1088 December 2008
[37] G Mesnil Y Dauphin X Glorot et al ldquoUnsupervised andtransfer learning challenge a deep learning approachrdquo Journalof Machine Learning Research vol 27 no 1 pp 97ndash110 2012
[38] S G Kobourov ldquoSpring embedders and force directed graphdrawing algorithmsrdquo in Proceedings of the ACM Symposium onComputational Geometry Chapel Hill NC USA June 2012
[39] M J Bannister D Eppstein M T Goodrich and L TrottldquoForce-directed graph drawing using social gravity and scalingrdquoinGraphDrawingWDidimo andM Patrignani Eds vol 7704of Lecture Notes in Computer Science pp 414ndash425 2013
[40] A Efrat D Forrester A Iyer S G Kobourov C Erten and OKilic ldquoForce-directed approaches to sensor localizationrdquo ACMTransactions on Sensor Networks vol 7 no 3 article 27 2010
[41] T Chan J Cong and K Sze ldquoMultilevel generalized force-directed method for circuit placementrdquo in Proceedings of the
Computational Intelligence and Neuroscience 11
International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005
[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012
[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 11
International Symposium on Physical Design (ISPD rsquo05) pp 185ndash192 Santa Rosa Calif USA April 2005
[42] E H Huang R Socher C D Manning and A Y Ng ldquoImprov-ing word representations via global context and multiplewordprototypesrdquo in Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics (ACL rsquo12) pp 873ndash882 July 2012
[43] T Mikolov M Karafiat L Burget J Cernocky and S Khu-danpur ldquoRecurrent neural network based language modelrdquo inProceedings of the INTERSPEECH 2010
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
top related