natural language database
DESCRIPTION
its a report of NLDBI (natural language database interface) containing design, implementation and testing.TRANSCRIPT
PROJECT REPORT
ON
Natural Language Database Interface
By
Ryan Sequeira
Abhijeet Badale
Roshan Tirkey
Swapnil Waghmode
DEPARTMENT OF INFORMATION TECHNOLOGY
IMPERIAL COLLEGE OF ENGINEERING and
RESEARCH
WAGHOLI, PUNE
PROJECT REPORT
ON
Natural Language Database Interface
By
Ryan Sequeira
Abhijeet Badale
Roshan Tirkey
Swapnil Waghmode
DEPARTMENT OF INFORMATION TECHNOLOGY
IMPERIAL COLLEGE OF ENGINEERING and
RESEARCH
WAGHOLI, PUNE
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Literature Survey 4
2.1 Advantages of NLDBI system . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Disadvantages of NLDBI System . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Some Already Developed NLDBI Systems . . . . . . . . . . . . . . . . . 6
2.4 Commercially Available Systems . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Various Approaches Used for Development of NLDBI Systems . . . . . . 9
2.5.1 Symbolic Approach (Rule Based Approach) . . . . . . . . . . . . 9
2.5.2 Empirical Approach (Corpus Based Approach) . . . . . . . . . . . 9
2.5.3 Connectionist Approach (Using Neural Network) . . . . . . . . . 10
3 Proposed Work 11
3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Project Design 14
4.1 SRS - Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Hardware Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 UML Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.4.1 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.4.2 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4.3 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.5 High Level Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.5.1 Module Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.5.2 Database Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.6 Screen-shots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Test Design & Report 23
5.1 Test Design for NLDBI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 Test Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2.1 Input/Output testing . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2.2 Suggestion list testing . . . . . . . . . . . . . . . . . . . . . . . . 35
6 Implementation and Coding 39
6.1 List of questions answered by the system . . . . . . . . . . . . . . . . . . 39
6.1.1 Who questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.1.2 What questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.1.3 Which questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.1.4 When question . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2 Parse Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.3 Context Free Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.3.1 LEX specification: . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.3.2 YACC Specification: . . . . . . . . . . . . . . . . . . . . . . . . . 57
7 Schedule of work 60
8 Conclusion and Future Scope 62
8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.2 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Bibliography 62
List of Figures
1.1 Existing NLDBI system with old user interface development techniques . 2
4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.5 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.6 NLDBI screen-shot interfaced with movie database and designed to answer
questions related to movies . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1 System behaviour for wrong question inputs . . . . . . . . . . . . . . . . 25
5.2 System behaviour for irrelevant inputs . . . . . . . . . . . . . . . . . . . 27
5.3 System behaviour for correct questions . . . . . . . . . . . . . . . . . . . 29
5.4 System behaviour for no input or null input . . . . . . . . . . . . . . . . 31
5.5 System behaviour if data not present in database . . . . . . . . . . . . . 33
5.6 System’s suggestion list behaviour for no keyword inputs . . . . . . . . . 35
5.7 System’s suggestion list behaviour for relevant keyword inputs . . . . . . 36
5.8 System’s suggestion list behaviour for irrelevant inputs . . . . . . . . . . 37
6.1 Who questions class one . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.2 Who questions class two . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.3 Who questions class three . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.4 What questions class one . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.5 What questions class two . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.6 What questions class three . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.7 What questions class four . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.1 Gantt chart for preliminary research . . . . . . . . . . . . . . . . . . . . 60
7.2 Gantt chart for initial system design . . . . . . . . . . . . . . . . . . . . 60
7.3 Gantt chart for final development phase . . . . . . . . . . . . . . . . . . 61
Chapter 1
Introduction
Natural language Interface to Database (NLDBI) can acts as an alternative interface for
finding structured information from database particularly on a small hand-held devices
because writing questions in natural language is much easier for a casual user than
the complicated and time consuming navigation required in the traditional database
interfaces. NLDBI shifts a users burden of learning use of interface to describe his or
her need for information to the system. NLDBI thus demands less input-output and
processing facilities which make it more useful for mobile devices .
Natural language interface to database is not a new area, a lot of research is
going on since a long time. Most of the NLDBI systems developed so far used shallow
analysis. Shallow analysis uses domain knowledge rather than linguistic knowledge to
interpret the meaning of input question, which results in a low success rate. This is a
main difficulty in implementing NLDBI for any practical use. To increase the success rate
deep analysis of input question can be used. Deep analysis uses linguistic knowledge for
detail understanding. Deep analysis has an advantage of portability that is not possible
in shallow analysis due to too much dependency on database.
To explore deep analysis for NLDBI a movie inquiry is selected as a domain. For
movie inquiry domain a corpus of question is collected. This corpus is then analysed to
find the pattern of questions used in the domain. Based on these patterns a context free
grammar is designed to represent syntax of input questions. Representing syntax is an
important step as in deep analysis method the subsequent steps are dependent on it.
1
Introduction
1.1 Motivation
Downsides of existing query languages (e.g., SeRQL, SPARQL):
1. Complex syntax,
2. Not easy to learn,
3. Writing queries is error-prone task,
4. Requires understanding of Semantic Web technologies.
Downsides of existing systems ): This system requires user to select the columns
that he would like to use to make his query first. Next the user has to write an appropriate
query with no guidance from the system. Then the user is supposed to press Do query
to generate the output.
This system appears to be complex and requires the user to perform a lot of steps
with no guidance what so ever. Even though there are so many technologies available for
designing rich and user friendly User Interfaces the existing system still seem to be difficult
to use thus leaving a scope to apply the knowledge of Human Computer Interaction to
develop a user friendly system that performs better than the existing systems.
Figure 1.1: Existing NLDBI system with old user interface development techniques
2
Introduction
1.2 Background
Traditionally , most natural language interfaces have a natural- language front end
paradigm, whereby natural language queries are translated into formal database queries
(i.e.: in some formal query language ). These queries are then processed by a database
management system The requested data could be returned as is, or could be passed back
to the natural-language component, which generates a natural-language version of the
data. This effectively divides the problem of natural-language database access into tow
sub problems: the natural-language component, and the database access component.
The methods for translating natural language to formal query languages have
varied over the years. The Lunar Sciences Natural Language Information System (or
LUNAR) is one of the oldest natural-language interfaces to database: it performed a
simple syntactic analysis of English query, followed by semantic analysis of the resulting
parse tree.
3
Chapter 2
Literature Survey
2.1 Advantages of NLDBI system
Advantages of Natural Language Interface to Database are as follows:
� No Artificial Language
One advantage of NLIDBs is supposed to be that the user is not required to learn
an artificial communication language. Formal query languages like SQL are difficult
to learn and master, at least by non-computer-specialists.
� No Need for Training
Graphical interfaces and form-based interfaces are easier to use by occasional users;
still, invoking forms, linking frames, selecting restrictions from menus, etc. consti-
tute artificial communication languages that have to be learned and mastered by
the end-user. In contrast, an ideal NLDBI would allow queries to be formulated in
the user’s native language.
� Better for Some Questions
It has been argued that there are some kind of questions (e.g. questions involving
negation, or quantification) that can be easily expressed in natural language, but
that seem difficult (or at least tedious) to express using graphical or form-based
interfaces. For example, ”Which department has no programmers?” (Negation),
or ”Which company supplies every department?” (Universal quantification), can
be easily expressed in natural language, but they would be difficult to express in
most graphical or form-based interfaces. Questions like the above can, of course,
be expressed in database query languages like SQL, but complex database query
language expressions may have to be written.
4
Literature Survey
� Easy to Use for Multiple Database Tables
Queries that involve multiple database tables like ”list the address of the farmers
who got bonus greater than 10000 rupees for the crop of wheat”, are difficult to form
in graphical user interface as compared to natural language interface.
2.2 Disadvantages of NLDBI System
Disadvantages of natural language interface to database system are as follows:
� Linguistic Coverage Not Obvious
A frequent complaint against NLDBIs is that the system’s linguistic capabilities
are not obvious to the user. Current NLDBIs can only cope with limited subsets
of natural language. Users find it difficult to understand (and remember) what
kinds of questions the NLDBI can or cannot cope with. Formal query languages,
form-based interfaces, and graphical interfaces typically do not suffer from these
problems. In the case of formal query languages, the syntax of the query language
is usually well-documented, and any syntactically correct query is guaranteed to be
given an answer. In the case of form-based and graphical interfaces, the user can
usually understand what sorts of questions can be input, by browsing the options
offered on the screen; and any query that can be input is guaranteed to be given
an answer.
For example, database doesn’t contain information about profit of farmers and user
inputs the query ”which farmer gets maximum profit”. This query is out of the
linguistic coverage of the system.
� Linguistic vs. Conceptual Failures
When the NLDBI cannot understand a question, it is often not clear to the user
whether the rejected question is outside the system’s linguistic coverage, or whether
it is outside the system’s conceptual coverage. Thus, users often try to rephrase
questions referring to concepts the system does not know (e.g. rephrasing questions
about salaries towards a system that knows nothing about salaries), because they
think that the problem is caused by the system’s limited linguistic coverage. In
other cases, users do not try to rephrase questions the system could conceptually
5
Literature Survey
handle, because they do not realize that the particular phrasing of the question is
outside the linguistic coverage, and that an alternative phrasing of the same ques-
tion could be answered. Some NLIDBs attempt to solve this problem by providing
diagnostic messages, showing the reason a question cannot be handled.
For example, user asks a query ”list the names of farmers who are 35 years old” and
the database has information about the age of the farmer but ”age” word is not
there in query. So this query is not out of linguistic coverage but conceptually it is
not right, because system does not understand what 35 is representing i.e. whether
it is for age or for address.
� Users assume intelligence
NLDBI users are often misled by the system’s ability to process natural language,
and they assume that the system is intelligent, that it has common sense, or that it
can deduce facts, while in fact most NLDBIs have no reasoning abilities. This prob-
lem does not arise in formal query languages, form-based interfaces, and graphical
interfaces, where the capabilities of the system are more obvious to the user. For
example, when user asks a query ”list the names of farmers who are 35 years old”,
he/she is not specifying the word ”age”, assuming that system will understand it
automatically. But system is not so intelligent.
� Inappropriate Medium
It has been argued that natural language is not an appropriate medium for com-
municating with a computer system. Natural language is claimed to be too verbose
or too ambiguous for human-computer interaction. NLDBI users have to type long
questions, while in form-based interfaces only fields have to be filled in, and in
graphical interfaces most of the work can be done by mouse-clicking. In natural
language interface user has to type full sentence with all the connectors (articles,
prepositions, etc) but in graphical or form based interfaces it is not required.
2.3 Some Already Developed NLDBI Systems
� LUNAR System
This system comes in early seventies (1973). The system LUNAR is a system that
answers questions about samples of rocks brought back from the moon. The 11
meaning of system’s name is that is in relation to the moon. The system was infor-
mally introduced in 1971. To accomplish its function the LUNAR system uses two
6
Literature Survey
databases; one for the chemical analysis and the other for literature references. The
LUNAR system uses an Augmented Transition Network (ATN) parser and Woods
Procedural Semantics. According to words , the LUNAR system performance was
quite impressive; it managed to handle 78
� LADDER
This system was developed in 1978. It was designed as a natural language interface
to a database of information about US Navy ships. According to Hendrix , the
LADDER system uses semantic grammar to parse questions to query a distributed
database. Although semantic grammars helped to implement systems with im-
pressive characteristics, the resulting systems proved difficult to port to different
application domains. Indeed, a different grammar had to be developed whenever
LADDER was configured for a new application . The system uses semantic gram-
mars technique that interleaves syntactic and semantic processing. The question
answering is done via parsing the input and mapping the parse tree to a database
query. The system LADDER was implemented in LISP. At the time of creation
of the LADDER system, it was able to process a database that is equivalent to a
relational database with 14 tables and 100 attributes.
� CHAT-80
This is one of the best-known NLDBIs of the early eighties. CHAT-80 was imple-
mented entirely in Prolog. It transformed English questions into Prolog expressions,
which were evaluated against the Prolog database. The code of CHAT-80 was cir-
culated widely and formed the basis of several other experimental NLDBIs. The
database of CHAT-80 consists of facts (i.e. oceans, major seas, major rivers and
major cities) about 150 of the countries world and a small set of English language
vocabulary that are enough for querying the database.
7
Literature Survey
2.4 Commercially Available Systems
Some of the commercially available NLDBs are as follows:
� IBMS LANGUAGE ACCESS: This system stopped being commercially avail-
able in October 1992.
� SQL Tutor: It was developed in 1998. SQL can be very difficult for beginner users
to understand. The SQL-Tutor program tutors students by assisting the students
through a number of database questions from four different databases. A student
model is kept for each student based on query constraints (each constraint repre-
sents a part of the query that is necessary to answer the question). Each time a
particular query constraint is used, SQL-Tutor records whether it was used success-
fully or unsuccessfully. In this way a model of a student’s strengths and weaknesses
is generated and SQL-Tutor can select questions which re-enforce problem areas or
introduce new query concepts.
� INTELLECT: It was from Trinzic (formed by the merger of AICorp and Aion).
This system is based on experience from ROBOT.
� BBNS PARLANCE: It was built on experience from the development of the Rus
and Irus systems.
� ENGLISH WIZARD: It was developed by Linguistic Technology Corporation.
The company was founded by the author of AICorps original Intellect.
8
Literature Survey
2.5 Various Approaches Used for Development of
NLDBI Systems
Natural language is the topic of interest from computational viewpoint due to the implicit
ambiguity that language possesses. Several researchers applied different techniques to
deal with language.
2.5.1 Symbolic Approach (Rule Based Approach)
Natural Language Processing appears to be a strongly symbolic activity. Words are
symbols that stand for objects and concepts in real worlds, and they are put together
into sentences that obey well specified grammar rules. Hence, for several decades Natural
Language Processing research has been dominated by the symbolic approach. Knowledge
about language is explicitly encoded in rules or other forms of representation. Language
is analysed at various levels to obtain information. On this obtained information certain
rules are applied to achieve linguistic functionality. As Human Language capabilities
include rule-base reasoning, it is supported well by symbolic processing. In symbolic
processing rules are formed for every level of linguistic analysis. It tries to capture the
meaning of the language based on these rules.
2.5.2 Empirical Approach (Corpus Based Approach)
Empirical approaches are based on statistical analysis as well as other data driven anal-
ysis, of raw data which is in the form of text corpora. A corpus is collections of machine
readable text. The approach has been around since NLP began in the early 1950s. Only
in the last 10 years or so empirical NLP has emerged as a major alternative to ratio-
nalist rule-based Natural Language Processing. Corpora are primarily used as a source
of information about language and a number of techniques have emerged to enable the
analysis of corpus data. Syntactic analysis can be achieved on the basis of statistical
probabilities estimated from a training corpus. Lexical ambiguities can be resolved by
considering the likelihood of one or another interpretation on the basis of context. Recent
research in computational linguistics indicates that empirical or corpus based methods
are currently the most promising approach for developing robust, efficient Natural Lan-
guage Processing (NLP) systems. These methods automate the acquisition of much of
the complex knowledge required for NLP by training on suitably annotated natural lan-
guage corpora, e.g. tree-banks of parsed sentences. Most of the empirical NLP methods
employ statistical techniques such as n-Gram models, Hidden Markov Models (HMMs),
9
Literature Survey
and Probabilistic Context Free Grammars 16(PCFGs). Given the successes of empir-
ical NLP methods, researchers have recently begun to apply learning methods to the
construction of information extraction systems. Several different symbolic and statisti-
cal methods have been employed, but most of them are used to generate one part of a
larger information extraction system. Experimented n-Gram based language modeling
and claimed to develop language independent approach to IR and Natural Language
Processing.
2.5.3 Connectionist Approach (Using Neural Network)
Since human language capabilities are based on neural network in the brain, Artificial
Neural Networks (also called as connectionist network) provides an essential starting
point for modelling language processing. In the recent years, the field of connection-
ist processing has seen a remarkable development. The sub-symbolic neural network
approach holds a lot of promise for modelling the cognitive foundations of Natural Lan-
guage Interface using Shallow Parsing language processing. Instead of symbols, the ap-
proach is based on distributed representations that correspond to statistical regularities
in language. There has also been significant research applying neural-network methods
to language processing. However, there has been relatively little recent language research
using sub-symbolic learning, although some recent systems have successfully employed
decision trees transformation rules, and other symbolic methods. SHRUTI system is a
neurally inspired system for event modelling and temporal processing at a connectionist
level.
10
Chapter 3
Proposed Work
3.1 Problem Definition
Natural Language Interface to database system provides an interface to the user which
helps him/her to query the database in his/her natural language. As the query languages
like SQL are very difficult to use for the common people and they find it very hard to
learn. So, to make database applications easy to use for these people who don’t know
query languages, Natural Language Interface for various databases has been developed.
3.2 Features
1. The GUI is very similar to a chatting application, making the GUI familiar to the
user and hence easy to use.
2. The system provides Key-Word based suggestion to help with finding the appro-
priate question. This system filters the list of suggested questions based on the
keywords entered.
3. Provides question analysis information i.e. the Goal of your question and the
Constraints applied.
4. Provides the Sql query generated during translating, which may help the user to
learn or validate the queries.
5. The system also provides auto spell check to correct the misspelled words.
11
Proposed Work
3.3 Goals
The goal of NLP as stated above is ”to accomplish human-like language processing”.
The choice of the word ’processing’ is very deliberate, and should not be replaced with
’understanding’. For although the field of NLP was originally referred to as Natural
Language Understanding (NLU) in the early days of AI, it is well agreed today that
while the goal of NLP is true NLU, that goal has not yet been accomplished. A full NLU
System would be able to:
1. Paraphrase an input text
2. Translate the text into another language
3. Answer questions about the contents of the text
4. Draw inferences from the text
While NLP has made serious inroads into accomplishing goals 1 to 3, the fact
that NLP systems cannot, of themselves, draw inferences from text, NLU still remains
the goal of NLP. There are more practical goals for NLP, many related to the particular
application for which it is being utilized. For example, an NLP-based IR system has
the goal of providing more precise, complete information in response to a user’s real
information need. The goal of the NLP system here is to represent the true meaning
and intent of the user’s query, which can be expressed as naturally in everyday language
as if they were speaking to a reference librarian. Also, the contents of the documents
that are being searched will be represented at all their levels of meaning so that a true
match between need and response can be found, no matter how either are expressed in
their surface form. The goal of natural language processing (NLP) is to design and build
computer systems that are able to analyse natural languages like German or English, and
that generate their outputs in a natural language, too. Typical applications of NLP are
information retrieval, language understanding, and text classification. The goal of NLP
is to accomplish human-like language processing. The choice of the word ’processing’ is
very deliberate, and should not be replaced with ’understanding’. Although the field of
NLP was originally referred to as Natural Language Understanding (NLU) in the early
days of AI, it is well agreed today that while the goal of NLP is true NLU, that goal has
not yet been accomplished. A full NLU System would be able to paraphrase an input
text, translate the text into another language, answer questions about the contents of
the text, and draw inferences from the text.
12
Proposed Work
3.4 Objectives
The objectives of the system are:
1. Provide a set of questions that provides answers to most of the users queries.
2. Divide the questions into two parts
(a) Goal: It is the data that the user wishes to find.
(b) Condition: These are the constraints used to narrow down the results to find
the appropriate data.
3. Compiling the question to translate the goals and conditions into a syntactically
correct and database relevant query.
4. Retrieving the answers and presenting it in a well formatted way.
3.5 Constraint
The constraints of the system are:
1. It uses Corpus based approach, hence the system is not capable of answering any
question outside the corpus.
2. The natural-language interface is designed for specific database. Hence any change
in the structure of the database will need lot of time to translate that change into
the interface.
3. The system answers one question at a time i.e. the question must contain only one
goal.
13
Chapter 4
Project Design
4.1 SRS - Software Tools
The SRS tools used are as follows :
� NetBeans IDE
� JavaCC plug-in compatible with NetBeans (for compiler implementation in java)
� MySql Server 5.5
� Stanford Parser
4.2 Hardware Tools
The Hardware tools required are as follows :
� Microsoft Windows XP Professional SP3/Vista SP1/Windows 7 Professional:
� Processor: 2.6 GHz Intel Pentium IV or equivalent
� Memory: 2 GB
� Disk space: 1 GB of free disk space
14
Project Design
4.3 System Architecture
Figure 4.1: System Architecture
15
Project Design
4.4 UML Diagrams
4.4.1 Use Case Diagram
Figure 4.2: System Architecture
16
Project Design
4.4.2 Sequence Diagram
Figure 4.3: System Architecture
17
Project Design
4.4.3 Activity Diagram
Figure 4.4: System Architecture
18
Project Design
4.5 High Level Design
4.5.1 Module Tree
Figure 4.5: System Architecture
19
Project Design
4.5.2 Database Design
Movie Table
This table contains a list of movie names and other details related to movies like the
genre of the movie, its rating, the budget , language and its release date. As its the main
table it contains references to actor table , director table and producer table by using
actor id , director id, producer id as foreign keys. This is done to make the database
normalized to 3NF.
Movie
Parameter name Parameter Type
movie name varchar(50)
director id int(11)
producer id int(11)
actor id int(11)
genre varchar(50)
rating int(11)
budget varchar(50)
language varchar(50)
release date varchar(50)
Table 4.1: Movie Table
Actor Table
This table contains a list of all the actors that have acted in the movies stored in the
Movie Table.The primary key in this table is the actor id.
Actor
Parameter name Parameter Type
actor id int(11)
actor name varchar(50)
gender varchar(10)
Table 4.2: Actor Table
20
Project Design
Director Table
This table contains a list of all the directors that have directed the movies stored in the
Movie Table. The primary key in this table is the director id.
Director
Parameter name Parameter Type
director id int(11)
director name varchar(50)
Table 4.3: Director Table
Producer Table
This table contains a list of all the producers that have produced in the movies stored in
the Movie Table.The primary key in this table is the producer id.
Producer
Parameter name Parameter Type
producer id int(11)
producer name varchar(50)
Table 4.4: Producer Table
21
Project Design
4.6 Screen-shots
Figure 4.6: NLDBI screen-shot interfaced with movie database and designed to answer
questions related to movies
22
Chapter 5
Test Design & Report
5.1 Test Design for NLDBI
The tests have been divided into 2 parts:
1. Input/Output testing
(a) Test for wrong questions
(b) Test for irrelevant input
(c) Test for correct questions
(d) Test for no input
(e) Test for data not present
2. Suggestion list testing
(a) Test for no keywords
(b) Test for relevant keywords
(c) Test for irrelevant keywords
23
Test Design & Report
Test Description
Input/Output testing
This test is designed to test how the system responds to the data that is given as input.
The main output, sql query, goal and conditions generated by the system are considered
for testing.
Suggestion list testing
This test is designed to test whether the suggestion utility works as intended as this
utility plays an important role is making the system easy to use and is a key feature of
the system.
24
Test Design & Report
5.2 Test Report
The test report contains screen shots and the description of each test with the observed
output. The report is a proof that the system has passes all the intended tested under
the specified constraints.
5.2.1 Input/Output testing
Test for wrong questions
Figure 5.1: System behaviour for wrong question inputs
25
Test Design & Report
Description Expected output Actual output Passed
Designed to test system
for wrong questions, or
questions out of the
domain
The system prompts the
user that the query asked
cannot be answered
System prompts ”I didn’t
understand your ques-
tion”Yes
The system should not
generate any goal
The system does not gen-
erate any goal
The system should not
generate any condition
The system does not gen-
erate any condition
The system should not
generate any query
The system does not gen-
erate any query
Table 5.1: Test report for wrong question
26
Test Design & Report
1.2. Test for irrelevant input
Figure 5.2: System behaviour for irrelevant inputs
27
Test Design & Report
Description Expected output Actual output Passed
Designed to test system
for irrelevant input
The system prompts the
user that the query asked
cannot be answered
System prompts ”I didnt
understand your ques-
tion”Yes
The system should not
generate any goal
The system does not gen-
erate any goal
The system should not
generate any condition
The system does not gen-
erate any condition
The system should not
generate any query
The system does not gen-
erate any query
Table 5.2: Test report for irrelevant input
28
Test Design & Report
3. Test for correct questions
Figure 5.3: System behaviour for correct questions
29
Test Design & Report
Description Expected output Actual output Passed
Designed to test system
for correct questions
that are within its
answering domain
The system answers the
question with correct
data
System answers the ques-
tion with right dataYes
The system should gener-
ate correct goal
The system generates
right goal
The system should gener-
ate correct conditions
The system generates
right conditions
The system should gener-
ate correct query
The system generate
right query
Table 5.3: Test report for correct question
30
Test Design & Report
4. Test for no input
Figure 5.4: System behaviour for no input or null input
31
Test Design & Report
Description Expected output Actual output Passed
Designed to test system
for no or Null input
The system should not
react to null input
System doesn’t respond
YesThe system should not
generate any goal
The system does not gen-
erate any goal
The system should not
generate any condition
The system does not gen-
erate any condition
The system should not
generate any query
The system does not gen-
erate any query
Table 5.4: Test report for no input
32
Test Design & Report
5. Test for data not present
Figure 5.5: System behaviour if data not present in database
33
Test Design & Report
Description Expected output Actual output Passed
Designed to test system
when it does not have
data to answer the
question
The system prompts the
user that the data is
not present to answer the
question
System prompts ”Data
Not Found”Yes
The system should gener-
ate correct goal
The system generates
right goal
The system should not
generate any condition
The system generates
right conditions
The system should gener-
ate correct query
The system generate
right query
Table 5.5: Test report for data not present
34
Test Design & Report
5.2.2 Suggestion list testing
Test for no keywords
Figure 5.6: System’s suggestion list behaviour for no keyword inputs
35
Test Design & Report
Description Expected output Actual output Passed
Designed to test sugges-
tion utility for no key-
words
The system should not
display the suggestion
list
System reacts as ex-
pected by not displaying
the suggestion list
Yes
Table 5.6: Test report for no keywords
1.2. Test for relevant keywords
Figure 5.7: System’s suggestion list behaviour for relevant keyword inputs
36
Test Design & Report
Description Expected output Actual output Passed
Designed to test sugges-
tion utility for correct
keywords
The system should dis-
play the suggestion list
with correct suggestions
System displays the sug-
gestions with helpful sug-
gestions
Yes
Table 5.7: Test report for relevant keywords
3. Test for irrelevant keywords
Figure 5.8: System’s suggestion list behaviour for irrelevant inputs
37
Test Design & Report
Description Expected output Actual output Passed
Designed to test sugges-
tion utility for wrong
keywords
The system should not
display the suggestion
list
System reacts as ex-
pected by not displaying
the suggestion list
Yes
Table 5.8: Test report for irrelevant keywords
38
Chapter 6
Implementation and Coding
6.1 List of questions answered by the system
These are the list of questions answered by the NLDBI system
6.1.1 Who questions
1. Who is the director of ”[movie name]” ?
2. Who is the producer of ”[movie name]” ?
3. Who is the actor of ”[movie name]” ?
4. Who is the director of the movie ”[movie name]” ?
5. Who is the producer of the movie ”[movie name]” ?
6. Who is the actor of the movie ”[movie name]” ?
7. Who directed ”[movie name]” ?
8. Who produced ”[movie name]” ?
6.1.2 What questions
1. What is the genre of ”[movie name]” ?
2. What is the rating of [movie name]” ?
3. What is the cast of ”[movie name]” ?
4. What is the release date of ”[movie name]” ?
39
Implementation and Coding
5. What is the language of ”[movie name]” ?
6. What is the duration of ”[movie name]” ?
7. What is the budget of ”[movie name]” ?
8. What are the movies directed by ”[director name]” ?
9. What are the movies produced by ”[producer name]” ?
10. What are the movies acted by ”[actor name]” ?
11. What are the movies directed by ”[director name]” and produced by ”[producer
name]” ?
12. What are the movies produced by ”[producer name]” and directed by ”[director
name]” ?
13. What are the movies directed by ”[director name]” and acted by ”[actor name]” ?
14. What are the movies acted by ”[actor name]” and directed by ”[director name]” ?
15. What are the movies acted by ”[actor name]” and produced by ”[producer name]”
?
16. What are the movies produced by ”[producer name]” and acted by ”[actor name]”
?
17. What are the movies directed by ”[director name]” and produced by ”[producer
name]” and acted by ”[actor name]” ?
18. What are the movies directed by ”[director name]” and acted by ”[actor name]”
and produced by ”[producer name]” ?
19. What are the movies produced by ”[producer name]” and directed by ”[director
name]” and acted by ”[actor name]” ?
20. What are the movies produced by ”[producer name]” and acted by ”[actor name]”
and directed by ”[director name]” ?
21. What are the movies acted by ”[actor name]” and produced by ”[producer name]”
and directed by ”[director name]” ?
40
Implementation and Coding
22. What are the movies acted by ”[actor name]” and directed by ”[director name]”
and produced by ”[producer name]” ?
23. What are the movies directed by ”[director name]” in ”[year]” ?
24. What are the movies produced by ”[producer name]” in ”[year]” ?
25. What are the movies acted by Brad in ”[year]” ?
26. What are the movies directed by ”[director name]” and produced by ”[producer
name]” in ”[year]” ?
27. What are the movies produced by ”[producer name]” and directed by ”[director
28. What are the movies directed by ”[director name]” and acted by ”[actor name]” in
”[year]” ?
29. What are the movies acted by ”[actor name]” and directed by ”[director name]” in
”[year]” ?
30. What are the movies acted by ”[actor name]” and produced by ”[producer name]”
in ”[year]” ?
31. What are the movies produced by ”[producer name]” and acted by ”[actor name]”
in ”[year]”?
32. What are the movies directed by ”[director name]” and produced by ”[producer
name]” and acted by ”[actor name]” in ”[year]” ?
33. What are the movies directed by ”[director name]” and acted by ”[actor name]”
and produced by ”[producer name]” in ”[year]” ?
34. What are the movies produced by ”[producer name]” and directed by ”[director
name]” and acted by ”[actor name]” in ”[year]” ?
35. What are the movies produced by ”[producer name]” and acted by ”[actor name]”
and directed by ”[director name]” in ”[year]” ?
36. What are the movies acted by ”[actor name]” and produced by ”[producer name]”
and directed by ”[director name]” in ”[year]” ?
37. What are the movies acted by ”[actor name]” and directed by ”[director name]”
and produced by ”[producer name]” in ”[year]” ?
41
Implementation and Coding
38. What are the ”[genre]” movies released in ”[year]” ?
39. What are the ”[genre]” movies directed by ”[director name]” ?
40. What are the ”[genre]” movies produced by ”[producer name]” ?
41. What are the ”[genre]” movies acted by ”[actor name]” ?
42. What are the ”[genre]” movies directed by ”[director name]” and produced by
”[producer name]” ?
43. What are the ”[genre]” movies produced by ”[producer name]” and directed by
”[director name]” ?
44. What are the ”[genre]” movies directed by ”[director name]” and acted by ”[actor
name]” ?
45. What are the ”[genre]” movies acted by ”[actor name]” and directed by ”[director
name]” ?
46. What are the ”[genre]” movies acted by ”[actor name]” and produced by ”[producer
name]” ?
47. What are the ”[genre]” movies produced by ”[producer name]” and acted by ”[actor
name]” ?
48. What are the ”[genre]” movies directed by ”[director name]” and produced by
”[producer name]” and acted by ”[actor name]” ?
49. What are the ”[genre]” movies directed by ”[director name]” and acted by ”[actor
name]” and produced by ”[producer name]” ?
50. What are the ”[genre]” movies produced by ”[producer name]” and directed by
”[director name]” and acted by ”[actor name]” ?
51. What are the ”[genre]” movies produced by ”[producer name]” and acted by ”[actor
name]” and directed by ”[director name]” ?
52. What are the ”[genre]” movies acted by ”[actor name]” and produced by ”[producer
name]” and directed by ”[director name]” ?
53. What are the ”[genre]” movies acted by ”[actor name]” and directed by ”[director
name]” and produced by ”[producer name]” ?
42
Implementation and Coding
54. What are the ”[genre]” movies directed by ”[director name]” in ”[year]” ?
55. What are the ”[genre]” movies produced by ”[producer name]” in ”[year]” ?
56. What are the ”[genre]” movies acted by Brad in ”[year]” ?
57. What are the ”[genre]” movies directed by ”[director name]” and produced by
”[producer name]” in ”[year]” ?
58. What are the ”[genre]” movies produced by ”[producer name]” and directed by
”[director name]” in ”[year]” ?
59. What are the ”[genre]” movies directed by ”[director name]” and acted by ”[actor
name]” in ”[year]” ?
60. What are the ”[genre]” movies acted by ”[actor name]” and directed by ”[director
name]” in ”[year]” ?
61. What are the ”[genre]” movies acted by ”[actor name]” and produced by ”[producer
name]” in ”[year]” ?
62. What are the ”[genre]” movies produced by ”[producer name]” and acted by ”[actor
name]” in ”[year]”?
63. What are the ”[genre]” movies directed by ”[director name]” and produced by
”[producer name]” and acted by ”[actor name]” in ”[year]” ?
64. What are the ”[genre]” movies directed by ”[director name]” and acted by ”[actor
name]” and produced by ”[producer name]” in ”[year]” ?
65. What are the ”[genre]” movies produced by ”[producer name]” and directed by
”[director name]” and acted by ”[actor name]” in ”[year]” ?
66. What are the ”[genre]” movies produced by ”[producer name]” and acted by ”[actor
name]” and directed by ”[director name]” in ”[year]” ?
67. What are the ”[genre]” movies acted by ”[actor name]” and produced by ”[producer
name]” and directed by ”[director name]” in ”[year]” ?
68. What are the ”[genre]” movies acted by ”[actor name]” and directed by ”[director
name]” and produced by ”[producer name]” in ”[year]” ?
43
Implementation and Coding
6.1.3 Which questions
1. Which ”[genre]” movies released in ”[year]” ?
2. Which movies are directed by ”[director name]” ?
3. Which movies are produced by ”[producer name]” ?
4. Which movies are acted by ”[actor name]” ?
5. Which movies are directed by ”[director name]” and produced by ”[producer name]”
?
6. Which movies are produced by ”[producer name]” and directed by ”[director name]”
?
7. Which movies are directed by ”[director name]” and acted by ”[actor name]” ?
8. Which movies are acted by ”[actor name]” and directed by ”[director name]” ?
9. Which movies are acted by ”[actor name]” and produced by ”[producer name]” ?
10. Which movies are produced by ”[producer name]” and acted by ”[actor name]” ?
11. Which movies are directed by ”[director name]” and produced by ”[producer name]”
and acted by ”[actor name]” ?
12. Which movies are directed by ”[director name]” and acted by ”[actor name]” and
produced by ”[producer name]” ?
13. Which movies are produced by ”[producer name]” and directed by ”[director name]”
and acted by ”[actor name]” ?
14. Which movies are produced by ”[producer name]” and acted by ”[actor name]” and
directed by ”[director name]” ?
15. Which movies are acted by ”[actor name]” and produced by ”[producer name]” and
directed by ”[director name]” ?
16. Which movies are acted by ”[actor name]” and directed by ”[director name]” and
produced by ”[producer name]” ?
17. Which movies are directed by ”[director name]” in ”[year]” ?
18. Which movies are produced by ”[producer name]” in ”[year]” ?
44
Implementation and Coding
19. Which movies are acted by Brad in ”[year]” ?
20. Which movies are directed by ”[director name]” and produced by ”[producer name]”
in ”[year]” ?
21. Which movies are produced by ”[producer name]” and directed by ”[director name]”
in ”[year]” ?
22. Which movies are directed by ”[director name]” and acted by ”[actor name]” in
”[year]” ?
23. Which movies are acted by ”[actor name]” and directed by ”[director name]” in
”[year]” ?
24. Which movies are acted by ”[actor name]” and produced by ”[producer name]” in
”[year]” ?
25. Which movies are produced by ”[producer name]” and acted by ”[actor name]” in
”[year]”?
26. Which movies are directed by ”[director name]” and produced by ”[producer name]”
and acted by ”[actor name]” in ”[year]” ?
27. Which movies are directed by ”[director name]” and acted by ”[actor name]” and
produced by ”[producer name]” in ”[year]” ?
28. Which movies are produced by ”[producer name]” and directed by ”[director name]”
and acted by ”[actor name]” in ”[year]” ?
29. Which movies are produced by ”[producer name]” and acted by ”[actor name]” and
directed by ”[director name]” in ”[year]” ?
30. Which movies are acted by ”[actor name]” and produced by ”[producer name]” and
directed by ”[director name]” in ”[year]” ?
31. Which movies are acted by ”[actor name]” and directed by ”[director name]” and
produced by ”[producer name]” in ”[year]” ?
32. Which ”[genre]” movies are released in ”[year]” ?
33. Which ”[genre]” movies are directed by ”[director name]” ?
34. Which ”[genre]” movies are produced by ”[producer name]” ?
45
Implementation and Coding
35. Which ”[genre]” movies are acted by ”[actor name]” ?
36. Which ”[genre]” movies are directed by ”[director name]” and produced by ”[pro-
ducer name]” ?
37. Which ”[genre]” movies are produced by ”[producer name]” and directed by ”[di-
rector name]” ?
38. Which ”[genre]” movies are directed by ”[director name]” and acted by ”[actor
name]” ?
39. Which ”[genre]” movies are acted by ”[actor name]” and directed by ”[director
name]” ?
40. Which ”[genre]” movies are acted by ”[actor name]” and produced by ”[producer
name]” ?
41. Which ”[genre]” movies are produced by ”[producer name]” and acted by ”[actor
name]” ?
42. Which ”[genre]” movies are directed by ”[director name]” and produced by ”[pro-
ducer name]” and acted by ”[actor name]” ?
43. Which ”[genre]” movies are directed by ”[director name]” and acted by ”[actor
name]” and produced by ”[producer name]” ?
44. Which ”[genre]” movies are produced by ”[producer name]” and directed by ”[di-
rector name]” and acted by ”[actor name]” ?
45. Which ”[genre]” movies are produced by ”[producer name]” and acted by ”[actor
name]” and directed by ”[director name]” ?
46. Which ”[genre]” movies are acted by ”[actor name]” and produced by ”[producer
name]” and directed by ”[director name]” ?
47. Which ”[genre]” movies are acted by ”[actor name]” and directed by ”[director
name]” and produced by ”[producer name]” ?
48. Which ”[genre]” movies are directed by ”[director name]” in ”[year]” ?
49. Which ”[genre]” movies are produced by ”[producer name]” in ”[year]” ?
50. Which ”[genre]” movies are acted by Brad in ”[year]” ?
46
Implementation and Coding
51. Which ”[genre]” movies are directed by ”[director name]” and produced by ”[pro-
ducer name]” in ”[year]” ?
52. Which ”[genre]” movies are produced by ”[producer name]” and directed by ”[di-
rector name]” in ”[year]” ?
53. Which ”[genre]” movies are directed by ”[director name]” and acted by ”[actor
name]” in ”[year]” ?
54. Which ”[genre]” movies are acted by ”[actor name]” and directed by ”[director
name]” in ”[year]” ?
55. Which ”[genre]” movies are acted by ”[actor name]” and produced by ”[producer
name]” in ”[year]” ?
56. Which ”[genre]” movies are produced by ”[producer name]” and acted by ”[actor
name]” in ”[year]”?
57. Which ”[genre]” movies are by ”[director name]” and produced by ”[producer
name]” and acted by ”[actor name]” in ”[year]” ?
58. Which ”[genre]” movies are directed by ”[director name]” and acted by ”[actor
name]” and produced by ”[producer name]” in ”[year]” ?
59. Which ”[genre]” movies are produced by ”[producer name]” and directed by ”[di-
rector name]” and acted by ”[actor name]” in ”[year]” ?
60. Which ”[genre]” movies are produced by ”[producer name]” and acted by ”[actor
name]” and directed by ”[director name]” in ”[year]” ?
61. Which ”[genre]” movies are acted by ”[actor name]” and produced by ”[producer
name]” and directed by ”[director name]” in ”[year]” ?
62. Which ”[genre]” movies are acted by ”[actor name]” and directed by ”[director
name]” and produced by ”[producer name]” in ”[year]” ?
6.1.4 When question
1. When did ”[movie name]” release ?
47
Implementation and Coding
6.2 Parse Trees
These parse trees are generated by capturing the commonalities in the list of questions.
Figure 6.1: Who questions class one
48
Implementation and Coding
Figure 6.2: Who questions class two
49
Implementation and Coding
Figure 6.3: Who questions class three
50
Implementation and Coding
Figure 6.4: What questions class one
51
Implementation and Coding
Figure 6.5: What questions class two
52
Implementation and Coding
Figure 6.6: What questions class three
53
Implementation and Coding
Figure 6.7: What questions class four
54
Implementation and Coding
6.3 Context Free Grammar
The context free grammar is the heart of the compiler and does the work of translating
the natural language question into sql query using lexical analysis, syntax analysis and
semantic analysis.
6.3.1 LEX specification:
The lex specification defines all the tokens used in the compiler. Each of the word entered
by the user first goes through lexical analysis. This is the step where the compiler
distinguishes the words based on the regular expressions specified below and then passes
these to the other phases of the compiler.
%{#include”y . tab . h”
#include<stdio . h>
#include<string . h>
%}
%%
[\ t ] ;
[ ? ] {yylval . s=strdup ( yytext ) ; r e turn QS ;}
directed |acted |produced {yylval . s=strdup ( yytext ) ; r e turn VBD ;}
released {yylval . s=strdup ( yytext ) ; r e turn VBN ;}
who |what {yylval . s=strdup ( yytext ) ; r e turn WP ;}
is {yylval . s=strdup ( yytext ) ; r e turn VBZ ;}
the {yylval . s=strdup ( yytext ) ; r e turn DT ;}
horror |comedy |action |drama |suspence |thriller |sci−fi {yylval . s=strdup ( yytext ) ; r e turn JJ ;}
and {yylval . s=strdup ( yytext ) ; r e turn CC ;}
55
Implementation and Coding
which {yylval . s=strdup ( yytext ) ; r e turn WDT ;}
when {yylval . s=strdup ( yytext ) ; r e turn WRB ;}
did {yylval . s=strdup ( yytext ) ; r e turn VBDX ;}
director |producer |actor |actress |movie |release−date |genre |rating |cast |language |duration |budget |movies |year {yylval . s=strdup ( yytext ) ; r e turn NN ;}
release {yylval . s=strdup ( yytext ) ; r e turn VB ;}
are {yylval . s=strdup ( yytext ) ; r e turn VBP ;}
of |by |in {yylval . s=strdup ( yytext ) ; r e turn IN ;}
[0−9]+ { i f ( atoi ( yytext )>1950 && atoi ( yytext )<2012)
{yylval . s=strdup ( yytext ) ; r e turn CD ;}}
[ a−zA−Z ]+ {yylval . s=strdup ( yytext ) ; r e turn NNP ;}
%%
in t yywrap ( )
{r e turn 1 ;
}
56
Implementation and Coding
6.3.2 YACC Specification:
%{#include<stdio . h>
#include<string . h>
i n t mflag=0,i=0;
char goal [ 5 ] [ 1 0 0 ] ; // gc − goa l c ond i t i on tab l e
char condition [ 5 ] [ 1 0 0 ] ;
char test [ 5 ] [ 1 0 0 ] ;
i n t gp=0,cp=0;
%}
%union
{i n t y ;
char *s ;
}
%token <s> QS
%token <s> VBD
%token <s> WP
%token <s> NN
%token <s> NNP
%token <s> VBZ
%token <s> DT
%token <s> IN
%token <s> VBP
%token <s> CD
%token <s> CC
%token <s> VBN
%token <s> JJ
%token <s> WDT
%token <s> VBDX
%token <s> VB
%token <s> WRB
%%
START : SBAR QS {printf ( ”=============goa l==============\n” ) ;f o r ( i=0;i<gp ; i++)
printf ( ”%s \n” , goal [ i ] ) ;
printf ( ”=============conn==============\n” ) ;f o r ( i=0;i<cp ; i++)
printf ( ”%s = \ t %s \n” , condition [ i ] , test [ i ] ) ;
} ;SBAR : WHNP SBARQ | WHADVP SQ ;
S : NPW VPJJ | NPW VBP VP ;
NPW : JJtemp NNtemp | NNtemp ;
SBARQ : S | SQ ;
WHADVP : WRB {addgoal ( ” r e l e a s e−date ” ) ; addcon ( ”moviename” ) ;} ;
WHNP : WP | WDT ;
SQ : VBZ NPZ PP | VBDtemp NPZ | VBP NPX | VBDX NNPtemp VB ;
57
Implementation and Coding
NPX : NPZ VP | NPJJ VPJJ | NPJJ VP ;
VPX : VBDtemp PP PPX ;
VP : VP CC VPX | VPX ;
PPX : | PP ;
NPZ : DT NNtemp | DT NNtemp NNPtemp | NNPtemp | CD {addcon ( ” r e l e a s e−date ” ) , addtest ( $1←↩) ;} ;
NPJJ : DT JJtemp NNtemp ;
VPJJ : VBN PP ;
JJtemp : JJ {addcon ( ” genre ” ) ; addtest ( $1 ) ;} ;
PP : IN NPZ ;
VBDtemp : VBD {i f ( mflag==0)
{i f ( strcmp ( $1 , ” d i r e c t ed ” )==0)
addgoal ( ” d i r e c t o r ” ) ;
i f ( strcmp ( $1 , ”produced” )==0)
addgoal ( ” producer ” ) ;
i f ( strcmp ( $1 , ” acted ” )==0)
addgoal ( ” ac to r ” ) ;
addcon ( ”moviename” ) ;
}e l s e
{i f ( strcmp ( $1 , ” d i r e c t ed ” )==0)
addcon ( ” d i r e c t o r ” ) ;
i f ( strcmp ( $1 , ”produced” )==0)
addcon ( ” producer ” ) ;
i f ( strcmp ( $1 , ” acted ” )==0)
addcon ( ” ac to r ” ) ;
}
} ;NNtemp : NN {i f ( strcmp ( $1 , ”movie” )==0) ;
i f ( strcmp ( $1 , ” year ” )==0) { addgoal ( ” r e l e s e−date ” ) ; addcon ( ”moviename” ) ;}i f ( strcmp ( $1 , ”movies ” )==0) { mflag=1;addgoal ( ”moviename” ) ;}e l s e { addgoal ( $1 ) ; addcon ( ”moviename” ) ;}
} ;
NNPtemp : NNP { addtest ( $1 ) ; } ;
%%
main ( )
{yyparse ( ) ;
r e turn 0 ;
}
yyerror ( char * s1 )
{printf ( ”%s” , s1 ) ;
}
58
Implementation and Coding
void addcon ( char x [ 1 0 0 ] )
{strcpy ( condition [ cp++],x ) ;
}void addgoal ( char x [ 1 0 0 ] )
{strcpy ( goal [ gp++],x ) ;
}void addtest ( char x [ 1 0 0 ] )
{strcpy ( test [ cp−1] ,x ) ;
}
59
Chapter 7
Schedule of work
Figure 7.1: Gantt chart for preliminary research
Figure 7.2: Gantt chart for initial system design
60
Schedule of work
Figure 7.3: Gantt chart for final development phase
61
Chapter 8
Conclusion and Future Scope
8.1 Conclusion
The Natural Language Interface to Database system, for movie enquiry using Movie
database, accepts query in natural-language that is translated into SQL query, by making
use of a compiler tailored for this purpose . Then this SQL query is executed on database
to provide output to the user. Major concepts in this system are:
� Tokenizing the input query and check itf all the tokens are relevant . After analysing
query it divides it into goals and conditions and uses this information to form
appropriate SQL query.
� Execute SQL query on the database to retrieve required information. And provide
output to the user in a well formatted manner.
8.2 Future Scope
In future the system can be extended to perform the following tasks:
� The system can make use of the internet using web search to find relevant data.
� The system could make use of natural language processing techniques to under-
stand. any natural language question.
62
Bibliography
[1] Avinash J. Agrawal and Dr. O. G. Kakde , ” Corpus Based Context Free Grammar
Design for Natural Language Interface to Database ”. , ISSN (Online): 1694-0814.
[2] Aho, Ullman, ”Principle of Compiler Design” , A book Published by Narosa Publi-
cations, 2002.
[3] Androutsopoulos I., Ritchie G.D., and Thanisch P., ”Natural Language Interfaces
to Databases ? An Introduction”, Journal of Natural Language Engineering Part 1,
1995, 29?81.
[4] ”The Standford Parser(statistical parser)”, http://nlp.stanford.edu/software/lex-
parser.
[5] Ruwanpura,S.(2000), SQ-HAL:Natural Language to SQL Translator ,
http://www.csse.monash.edu.au/hons/projects/2000/Supun.Ruwanpura,Monash
University.
[6] ”Artificial Intelligence” , by Elaine Rich and Kevin Knight,(2006), McGraw Hill com-
panies Inc., Chapter 15, page 377-426.
[7] Sleator,D.,Temperley,D. (1991), ”Parsing english with a link Grammar” , Carnegie
Mellon University Computer Science Technical Report,October 1991.
[8] A. El-Mouadib, Zakaria Suliman Zubi, Ahmed A. Almagrous, I. El-Feghi, ”Interactive
Natural Language Interface” , WSEAS transactions on computers, issue 4, volume 8,
(2009), pp 5-9.
[9] Tanveer Siddiqui, U.S. Tiwary, ”Natural Language Processing and Information Re-
trieval ” , Oxford university press, (2008).
[10] Suh, K.S., Perkins, W.C., ”The effects of a system echo in a restricted natural
language database interface for novice users”, in IEEE System sciences, 4, pp. 594-
599, 1994.
63