an approach for translating mathematics

Upload: tran-khai-thien

Post on 06-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 An Approach for Translating Mathematics

    1/7

    An Approach for Translating Mathematics

    Problems in Natural Language to Specication

    Language COKB of Intelligent EducationSoftwareNon Do Hoa Pan Trong, Tuyen Trong Tran1nformation TechnoogyUniversity

    v@vEconomics and Law University

    @gBinh Dong University

    y@v

    A Nowadays the research on intelligent systems or

    Problem solving in education has had lots of racticalalications. A system based on ECOK model and aspecication language can rocess and roduce gradual guideways or ll roblem solving exlanation. However thesesystems limit users because they require the data inut and the

    result outut both not in natural language but specicationlanguage, which few users knows how to use this language inutto the systems. Therefore, to imrove abili of interacting bynatural language, this aer rooses a new proach ortranslating mathematics roblems in natural language tospecication language as well as bringing out the model ofnatural language translating method associatedwith knowledgebase domain or fctive translation. Besides, this aer shows

    the abili o this aroach alied in derent nowledge

    domains.Ix T Artcial intelligence, Education software,owledge base system, automatic roblem solving system.

    1. INTRODUCTIONNow there are a ot of appications such as searching,

    database management... have used knowedge base, mostof them have not been abe to perform data input innatura anguage (NL) yet. Such soware reuired thedata input for ueries reuests to set up interface inputthat aows users choose om given parameters orcassied data inpt. Or other appications are sed in akind of langage to specify data from a compatibeappication and knowedge domain, which is caed thespecication langage (SL). Some of those are inteigentsystems for probem soving (ISPS) in geometr [2] withuser-friendy soutions expanation, which are appied toknowedge base mode of rather compete knowedgebase caed ECOB[9]. However, the probem input andexpanation output are reuired in SL. Or strctured ueryanguage (SQL), used to uery database, is produced fromdata input om users via interfaces seectivey. Thisimits the abiity forsing the programs poplarly.978--444-696-9/0/$600 00 IEEE

    Today natura anguage processing is aso appied in

    many domains, such as cassifying documents, transationmachine, etc. However, the abiity of genera processingis not effective when appied to a certain knowedgedomain. Above a, Vietnamese has too many speciccharacteristics to appy some of researched reslts fromother langages to. Among present Vietnamesetransating methods, there have been some researchesassociated with knowedge domains in progressing butnot much effect.

    With such consideration, it is found necessar to designa transation method that can transate geometry probemsfromL to SL for inpt data of ISPS and their soltionsfrom SL back to NL so that it is easier for users to usethese systems.

    This research is expected to reach the aim of advancinga mode of genera transation associated with knowedgebase domain, a mode of knowedge base organizationand knowedge processing method as we as processes oftransation from NL into genera SL. These are abe to beappied to many other domains, for exampe, pblicadministration and ecation ..

    These avaiabe natura anguage processing methodsare not effective enough to be appied in every generacase and geometry probem soving systems areparticuary used for mathematics knowedge domain.Therefore, this research experiments on anaytic geometryprobems to nd ot the best approaching mode (the ihgrade geometry in particlar) as we as reying onECOB's mode of knowedge base.

    2. NATURALVIETNAMESE PROCESSINGANDINTELLIGENT PROBLEM SOLVING

    SYSTEMS

    2.1 Natural Vietnamese processing

  • 8/3/2019 An Approach for Translating Mathematics

    2/7

    Present approaches of natra anguage processinghave had some resuts in factua appications, for exampe,in transation, text cassication for searching ...However, Vietnamese processing has been imited ingenera using for different domains. Here are some resutsfrom Vietnamese processing:

    Methodsfor word segmentation in Vietnamese

    Word segmentation and cassication are some basicagorithms of natura anguage processing which areappied in Vietnamese. Here are some methods: -Maximum Matching method [in Chih-Hao Tsai, 2000] itsresuts are not absoutey accurate because it dependsmuch on dictionaries. Transformation-based Learning(TBL) , word segmentation mode with weighted nitestate transducer (WFST) and Neura network useinguistic abeed cous to earn those rues automaticay.However, it is dicut to construct a corpus with detais in Vietnamese because it takes a ot of time and

    effort to do that. - Dynamic programming method: With51 % probabiity of right word compared to 5%probabiity of approximate word, this method is esseffective than the others above. - Internet and GeneticsAgorithm-based Text Categorization for Documents inVietnamese (GATEC) has ess accuracy, ow runningtime and is un-experimented on arge data.

    From the genera view, the word-base method has got ahigh accuracy vaue of over 95%, due to a arge trainingcorpus which has been annotated accuratey. However,the agorithm outut absoutey depends on this trainingcorpus. Above a, WFST method is the best choicebecause the author's goa is to segment words accuratey

    for machine transation. Those methods, in whichdictionaries or training cous is essentia, hep us notony segment words accuratey but aso base on markedinformation to perform other puoses of part-of-speech(POS) specication, for exampe, machine transation,dictation check or synonym dictionaries...Therefore,word-base approach for machine transating puose hasbrought worthy resuts in spite of rather ong training time,compex instaation and training corpus constructing.

    This research approach wi use the most successfumethod of word recognition, so buiding a good wordnetis reay essentia. The knowedge domain is impementedon geometr probems that it is not too dicut to make a

    dictionary for word segmentation and POS tagging (WSPT).

    Text classcation

    Some works on text cassication for text storing orsearching have achieved some satisfactory resuts. Forexampe, appications in [1] text cassication inVietnamese, in [2] text cassication for e-newspapers toset up a muti-ingua information searching system withphrase segmentation techniue in [5]. It is used some

    techniues of word and word form statstcs for textcassication to appy to socia statistic.

    Machine translation

    Language transation research is presented withsatisfactory resuts; however, Vietnamese transation is

    sti imited in using. Experimenting with machinetransation abiity is aso presented in [9] in this research.The abiity of knowedge transference from NL to SL,which is processed on computers, has got some rstresuts in transating NL to SQL []. However, theresearch is sti in progress so most of other appicationstoday have to specify knowedge manuay.

    2.2 Intelligent systems for Problem

    In [3], a system with a genera C-Object Sover"package of C-Object knowedge base (COB) in MAPLEenabes to sove probems automaticay for some kinds ofgeometric objects such as trianges, uadranges ... as we

    as reated knowedge of deduction and manipuationperformed on C-Object mode. MAPLE, the powerfuComputer Agebra soware with programming assistanceon compex abstract data structures, is appropriate to codeprobem soving modes and methods.

    Problem solving process

    The probem soving process accesses C-Object (CO)[3] based on COB mode. It aso constructs theknowedge base which the program needs to use fordeduction and manipuation. This knowedge baseincudes objects, object reations, fact, operators andrues ...

    The foowing genera diagram of probem sovingprocess wi be expained as foows: If a probem isreuired to sove, the system wi be input the probem inSL, then anayze and process it. The probem anayzingestabishes its mode of objects, interested attributes,variances, events (incuding events of manipuatingreations, of object cassication ... ) and the probem'sgoa. Aer that, probem soving modue wi carry onexpanation searching based on avaiabe knowedge base.Finay, expanation wi be presented.

  • 8/3/2019 An Approach for Translating Mathematics

    3/7

    However, the process of probem anayzing and expicitexpanation, which are produced in NL, is impementedmanuay based on users' knowedge base. This researchwi put the Inteigent Anaytica Geometry ProbemSoving System (IAGPSS) in experiment to earn aboutits SL and knowedge presentation mode, then nd out

    transation mode and arrange knowedge base domain forthe prpose of proceeding with knowedge transationknowedge, to transate L into knowedge specication. Plane Geomey problem solving system

    According to the introdction above, Pane Geometryprobem soving system is based on C-Object knowedgebase (CO) mode. The diagram of probem sovingprocess with C-Object network mode is presented asabove for C-Object soving (as in Fig 2.2.l). Processedprobems are specied in the langage which the systemis graspabe to process caed specication langage.From the specication inpt, the system wi process tond out the expanation for each step based on knowedgebase mode. The interested probems are SLandknowedge presentation mode to assist transationprocess.

    Speccation Language

    In [3], the specication of probem inpt is presentedin two parts: I the hypothesis starts at"begin_hypothesis and ends at _hypothesis"; (2)the goa starts at "begin_goal and ends at "endoal. Itis the SL which is constructed in the probem sovingsystem. Its strucre is:

    begin_hypothesis [ {}, { < objects

    > }, {< facts > }, { operators }] end_hypothesisbegin_goa [] end_goa

    In the hypothesis, sers have to specify parameters,objects, facts ... in the probem. The parameters appearedin the probem are present atparameter The key wordsfor decaring in Object is the C-Object name in Objectnowedge base, such as: : DIEM" (point), DOAN"(segment), GOC" (ange), TAMGIAC"(triange),TAMGIACCAN" (Isoscees triange)... Panegeometry domain.The syntax of object decaration is

    [, );

    The Object which is specied is the object name orstructred type of object base ist. For exampe: "choim M" (et a point M) is presented as [M,"DIEM"], oranother exampe cho tam gic ABC (let ABCtriangeis specied as [TAMGAC[A,B,C],"TAMGAC"] in hereA,B,Cis the ist of points.

    The facts presenting the reations or features, propertiesare dened as:[relation name,{Object list}

    The key words of reations are performed in [] schare CAT", VUONG", VUONGGOC",SONGSONG" ... For exampe: the reation o

    Cnh CD ct nh AB(the segment CD intersects AB)[CA, DOAN[A,B}, DOAN[D,C

    Moreover, decaration of andgoalcan be sed thestructred type of objects which have not decared aboveyet. For exampe, let the points: A, B, C, D (object withDIEM" type) DIEM" objects is decared, it is notnecessary to present some objects, sch as segment AB, segment AC, segment BC... However, this can be usedin reated fact in Fact decaration. For exampe"o AB vung g6c vi o CD

    (The segment AB is perpendicuar to the segment CD)The fact specication is

    VIE [VUONG, DOAN[A,B}, DOAN[C,D

    E: [, segment[A,B}, segment[C,DWithout decaring segment AB or segment CD

    VIE: [VUONG",[DOA[A,B],"DOAN",[ DOAN[C,D], "DOA"]]ENG: [RIGHT", [segment[A,B],"SEGMENT",[segment[C,D],"SEGMENT"]]

    Operators incude operator symbos (arithmeticoperators, comparative operators), nction and operators.They are used to access component of object.Parameter incudes the name of const or const object, e.g.Pi number () , 0 root coordinate in anaytic geometry 2Dor 3D.

    The goa specication is the determination of probemwhich has the same strcture as above. However, it adds

    more action verbs such as: chng minh rng" (provethat"), gii" (sove), so snh"(compare), nh"(determine) ...

    Knowledge base structure

    The process of probem soving system accessesknowedge base based on C-Object knowedge base(CO) mode. COB strcture in [3] incudes text eswith containing construction of knowedge, which theprogram uses in deducting and manipuating. It asoincudes: The es Objects.txt", Reations.txt", CObject; Hierarchy.txt" < C-Object name>.txt"Operators.txt", Facts.txt", es.txt".

    The Systems of Inteigent Anaytica GeometryProbem Soving have been sccessl in procinggraa gide ways for geometry probem soving.They hep the ser have the abiity to reason geometryprobems; however their interaction with the sers is notreay effective. Specications for Object, Reations,Rues ... are specied in SL, so is the expanation. Thesesystems, therefore, imit users because they reuire thedata input and the resut output both not in NL but SL,which few users knows how to use in the systems. To

  • 8/3/2019 An Approach for Translating Mathematics

    4/7

    achieve the goa that these systems are abe to be practicaappication products, the probem input reuires to besoved and be processed automaticay to SL withoutocia speciaists' assistance. This research's goa is topropose a new approach for automatic transation aprobem from NL to SL in accordance with probem

    soving systems.3. APPROACH METHOD

    By anayzing and studying on SL and CO mode ininteigent soving system, the paper is presented amethod of NPL in Vietnamese for transating NL to SLand hope these resuts are totay appied for otherknowedge domains. Besides, the approach method usesObjct noedge bas for an a transating

    xdu

    [q T- .

    \pSL

    POFaabF

    0=

    D

    sf

    mul

    Fig. 1 main prcessing method diagramThe main processing method is:Aer the probems in NL are input, the system is

    extracted to sentences and cassied to hypothesis andgoa automaticay. The sentence is extracted to word andtagged POS. This sentence is cassied and gathered tosentence form with word segmentation and POS tagging(POS sentence form). If a POS sentence form (POSSF)does not exist in POSSF database, it is transformed to preSL sentence form (a sentence form for pre-processing

    transating to SL sentence). Aer that, they are storedPOSSF and pre-SL sentence form into database. IfPOSSF exists, or after inserted into database, a pre-SLsentence form is given by uer in POSSF database. Thesentence and pre-SL sentence form are input to pre-SLsentence transform modue. Output is pre-SL sentence,

    which is a condensed and simpe sentence. The output ofthe sentence is SL form when the sentence is input todetaied SL modue associated with K of domain (as ing. 3.1)

    To improve probem soving system, the PaneGeometry knowedge base domain is chosen for testing.This domain is used in this system in [3],[9]. Thesentences in the probems are in simpe forms. If thesentences are compex, they are changed to simpe form.

    The NLP agorithms are used to segment sentence andtag POS for words. POS tagging is based on wordnet, inwhich words are ony on knowedge base. Fromknowedge base and the probems of Pane Geometr in

    Vietnam, POSSF database are buit from anayzing andcassifying sentences. The method starts with extractingsentence by sentence then searching in wordnet forgetting word's POS. The NLP agorithms are using forthis method.

    With this mode, structuring to POSSF database andprocessing it are the most important modues. So twomain projects are:

    1. Constrcting a SF database with pre-SL form11. Buiding a mode for transating NL to SL

    knowedge base domain with two steps:

    -transforming to pre-SL sentence

    -detaiing pre-SL to SL sentence3.1. Construct a SF database with pre-SL formsentence

    Pre-SLform sentence module

    With NLP agorithms in Vietnamese, the sentence innatura anguage is transformed to word form sentence byword segmenting and POS tagging (WS-PT) modue. Bytesting, a sentences (over 2000 sentences) in PaneGeometry probems are input for constructing knowedgebase (but in testing the simpe sentences are used). Wehave got about over 300 sentence forms in this knowedgebase domain (D). Up ti now, this database has beenupdated with new ones automaticay when the systemrecognizes any new forms. These sentence forms wihep us research the rue to transform to pre-SL. These aresome exampes of sentence forms:dacdiem ten dacdiem gioitu ten

    roper/nameroper/adverb/name

    doituong ten dacdiem doreituong ten

    (object/name/properbject/name

  • 8/3/2019 An Approach for Translating Mathematics

    5/7

    To transform to pre-SL and to store in database, asentence forms in soving probem system are coectedand cassied. Some pre-SL form was showed:

    pre-SLform

    a. Object nition: ([name,"object"]

    Ex:Tam gic ABC(ABC triange). [ABC,T AMGIAC"]

    b. Object attribute: [oject.attribute]Ex:Cho din tch tam gic ABC(et the area ofABC triange) [ABC,"TAMGAC"].S

    c. Relation (reationname, {object set}),(reationname,atribute, vaue])

    object, vaue), ([reationname,

    Ex:Cnh AB vung g6c vi BC (The segment AB isperpendicuar to BC)

    [VUONGGOC",[AB,"CANH"],[BC,?]]"?: not yet kow

    d Proper: [property,object]Ex:Cho A vung(A is right)[VUONG",A ]

    e. Function: (property, {object set})Ex: Cho M trung im ca AB.(M is the

    midpoint of AB)[ trungdiem",M,AB]

    3.2 Pocess model

    Cckc

    Fig. SF processing diagram

    NLP algorihm

    {wordnet

    eectppetes Wf-sentence

    8

    A genera agorithm for automatic transform sentenceform to preSL sentence form is researched for asentences. The sentence form and its pre-SL sentence arestored to database. When processing, the program justsearches the sentence and its pre-SL sentences in databasewithout taking time to process for transforming pre-SL

    sentence. If the new sentence form appears, it is processedand inserted into database. Besides, it is easy for expertsto update for a specia sentence when the wrong form indatabase is checked. But here, the process agorithm isprior. The process mode of this modue is presented ing.3.2

    3.3 Build a model for translating NPL to SL inknowledge base domain

    This process incudes two steps: rst step is name astransforming to POSSF and the second one is detailingPOSSF to SL

    Step 2 is the main part of research. When stdying this

    step, it is necessar to associate with Knowedge base ofdomain to succeed on fuy detaiing POSSF to SL. Thisprocess is showed in g. 3.1 mode.

    a. The step for having pre-SL sentence and the step forbuiding sentence form database are the same. First, withsentence form aer segmenting words and tagging POS,pre-SL sentence form is given by searching in sentenceform database. Second, gathered with the sentence, preSL-sentence is shown out. For exampe:Cho g6c ABC (let an angle AB . .

    Sentence form: chodoituongten

    Searching in data base , resut is

    Cho doituongten with pre-SL sentence form :Cho[ten doituongJ

    Gathering with let angle ABC, pre-SL sentence is:

    Cho[ABC,GOC

    This algorithm for solve this problem is not specia so wedo not present this here.

    b. to detail POSSF to SL sentenceAs in ggue 3.l, the pre-SL sentence from the previous

    step is ver cose to SL sentence of probem sovingsystem. But how is it transformed to SL sentence? The SLsentence is the one with fu detais of knowedge.Actuay, the process of transforming pre-SL sentence to

    SL one is the process of adding more detais to the pre-SL.when stdying the knowedge base, the NLP agorithmand knowedge base mode are found necessary tocombine, to detai SL sentence to SL one.

    The probem in natura anguage is input then it isextracted to the sentences. After that, they are transformedto WS-PT sentences by WS-PT modue. The pre-SLsentence (which has not been detaied yet) is presented bysearching in SF database,. For exampe:

  • 8/3/2019 An Approach for Translating Mathematics

    6/7

  • 8/3/2019 An Approach for Translating Mathematics

    7/7

    Cho tam gic ABC. Vng ti A. cnh BC = cm. ACbng 3cm. nh din tch tam gic ABC.

    endrobembegin_hypothesis[ {}, {[TAMGIAC[A,B,,TAMGIAC],[A,DIEM],[

    B,DIEM],[C,DIEM],[DOAN[B,,DOAN],[DOAN

    [A,,DOAN]}, {DOAN[B,.a = cm, DOAN[A,.a= 3cm}, {[VUOG,A]} ]

    end_hypothesisbegin_goa[TH[TAMGAC[A,B,q.S]]end_goaend exercise

    5. CONCLUSIONThe paper proposes a method for transating the

    mathematics probems in nara langage (L) tospecication langage (SL) with two main agorithms.Transforming NL sentence to pre-SL sentence and

    detaiing pre-SL sentence (associated with B mode) toSL sentence. This resut improves the natura interactiveabiity of the inteigent soving probem systems.

    Besides, the approach for transating nara langageassociated with knowedge base mode is presented. Thisis aso shown the abiity to appy to other knowedge basedomains. This may be appied to inteigent search engineeffectivey.

    The transation software, coded for testing in paneGeometry, is abe to transate a probems in the 7thgrade to specication langage for inpt of theinteigent probem soving system. It is workingeffectivey.

    However, some probems have sti had to be stdiedfor improving this work. Such as, the sentences sed intest are in simpe sentence form. The more researches forthe variation of sentences and compex sentences innatura anguage are reuired. Method of knowedge basepresentation is more changeabe for appying to otherknowedge base domains. And it is necessary to havemore stdies the genera knowedge base mode fortransating langage.

    REFERENCES

    [1] Dinh Dien, Hoang Kiem, Nguyen VanTietnamese Word Segmentation. . 749 -76The sixth Natural Language Processing Pacic RimSymposium, Toko, Japan.Natural LanguageProcessing Pacc Rim Symposium 2001 (NLPRS-2001)

    [2] Le Thanh Huong, Pham Hong Quang, and NguyenThanh Thuy. An aroach to automatically analyze

    syntax o Vietnamese text Joual ofInformatics 2000[3] Do Van Nhon, Constructing intelligent systems or

    comutation - Research and develoment ofowledge reresentation models to design systems

    or automated solving roblems PhD. thesis,NationalUniversity ofHo Chi Minh City (20012002)

    [4] Le An Ha, A method or word segmentationVietnamese. Proceddings of Corus Linguistics 2003Lancaster, UK.

    [5] Nguyn Kim Anh- -Translation the concetual grahsqueries into SQL queries -- Informatics faculty,

    technology university of Ha noi[6] Nguyen Kim Anh, Ph\m Th Thu Hoai- Imperfect

    natural language queries to relational databasesInfatcs faculty, technology unversity ofHa no[7] L Thanh Huong, A Case Study on Vietnamese

    Syntactic Parsing

    [8] Dinh Dien, natural language roccessing bookScience university of .HCM city, 12/2004

    [9] Nhon do, An Ontolo or Knowledge Reresentationand Alications - Proceedings Of World AcademyOf Scence, Engneerng And Technology Volume 32August 2008Issn 20703740