chemistry studio: an intelligent tutoring system ankit kumar, abhishek kar, ashish gupta, akshay...
TRANSCRIPT
CHEMISTRY STUDIO: AN INTELLIGENT TUTORING
SYSTEMAnkit Kumar, Abhishek Kar, Ashish Gupta, Akshay
Mittal
Mentors:
Dr. Sumit Gulwani (MSR, Redmond)
Dr. Ashish Tiwari (SRI Intl.)
Dr. Amey Karkare (IIT Kanpur)
Introduction
Aim to build an intelligent tutoring system targeted at the domain of Periodic Table (Chemistry)
Targeted at solving problems by emulating thought processes/lines of reasoning employed by students
Much more than a problem solver – aid learning by generating hints and intelligent problems
System Overview
System divided into two components – Natural Language Component
Translate natural language input to an intermediate logical representation
Problem Solving Component Solve problems, generate hints and new
problems of graded difficulty More info: Problem Solving team
Natural Language Component
Lexer
Option Parsin
g• Terms in logic
Parser Tier 1• Domain
information
Parser Tier 2• Toke
ns
• Full logical representation
• Input Problem
An Example - Lexer
Which element in group 2 has the maximum metallic property?– i)Be ii)Mg iii)Ca iv)Sr
Which element in Group 2 has the maximum metallic character?
Group 2 has the maximum metallic character? 2 has the maximum metallic character? maximum metallic character? metallic character?
Group 2 Max MetallicProperty
Parsing Tier 2
Max
Hole Hole
Same
Group 2
Hole
Max
MetallicProperty Same
Group 2
$1
MetallicProperty
$1
Introduction of Variables
Implicit introduction of free variables needed to formulate a valid logical formula.
Example: Alkali metals belong to Group 1 Intelligently guess the requirement of a
variable Two situations:
Hole (of type elem) present. Not satisfied by tokens in unused list (even after replication)
Hole (of type elem) present. No tokens left in unused list. No original tokens replicated satisfy
Introduce a new variable!
Handling Quantifiers
Universal Quantifiers: General scheme - <insert ∀ x: A(x) B(x)>
Existential Quantifiers: General scheme - <insert ∃ x: A(x) ∧ B(x)>
Assumptions: Quantification over a single variable No nesting of quantifiers
Universal Quantification
Problems Finding the position of implication Finding the antecedent and consequent
Example – Alkali metals show metallic characterSolution – ForAll($1, AlkaliMetal($1)Metallic($1))
Position of implication ≈ Position of verb Deciding the antecedent and consequent
is more complicated
ForAll Resolution Algorithm
Active vs. Passive Voice (Stanford CoreNLP) Alkali metals show metallic character Metallic character is shown by alkali metals Both have the same translation!
Assertion Based Questions
Assert facts Pose questions Span multiple sentences Example - An element A forms covalent
bond with oxygen. It has high electronegativity and belongs to group 13. What is its atomic number?
Problem – Anaphora Resolution! Solution – Use Stanford CoreNLP to get
coreference graph
Assertion Based Questions
Method for translating assertion based questions Construct logical formula corresponding to
sentence independently Use coreference graph to find variables
referring to the same entity Construct the formula – A1(x)∧A2(x)…∧An(x),
where Ai(x) = logical formula of ith sentence Quantify over the free variable(s).
Typically ask about a single entity. Existential quantification suffices
Negations
Non-: Which of the following non-metals is a gas at
STP? Couple non with the predicate immediately next
to it And(IsGasAtSTP($1), Not(Metallic($1)))
Not: Not all alkali metals form basic oxides. Negation of statement to the right of not Not(ForAll($1, Implies(AlkaliMetal($1),
BasicOxide($1))))
Negations
No: No halogen is metallic in nature. Natural interpretation of no as “there does
not exist” Not(Exists($1,
And(Halogen($1),Metallic($1))))
Ranking Algorithm
Need to rank different representation trees generated
Heuristics Greater cover Greater confidence Higher confidence to filling a hole with a token
closer to its parent in the English sentence Penalize when:
Replicate tokens – Larger tokens More penalization Insert handcrafted tokens – And, Or, Implies Unused tokens – Greater proportion of unused
tokens More penalization
Evaluation
Currently able to solve 70 out of the 126 problems collected from Tata McGraw Hill textbook for Grade XI
More problems can be solved by modeling of more chemistry-specific predicates.
This just corresponds to adding domain knowledge to our system
Another evaluation metrics could be the ratio of the number of rules encoded to the corpus size of problems solved.
We encode 173 predicates/entities/functions in our algorithm (out of which 118 are names of elements).
Conclusions
While contemporary works focus on analyzing languages by learning, we hypothesize that for a simpler structured domain like Chemistry, a much simpler type-theoretic approach armed with some heuristics observed from the domain can achieve similar, if not better, success.
During the later phase of the project, we tried to use some techniques of learning to improve upon our system and were successful in doing so.
In conclusion, we feel that a combination of such a type-theoretic approach and the standard machine learning techniques can achieve good success for a well structured domain like Chemistry.
Future Work
Disambiguate – At, As, In (names of elements)
1 = 1st = first (Stanford CoreNLP NER Tool)
And(And(x,y),z) = And(And(x,z),y) Model electronic configuration Better modelling of conjunctions – “Alkali
metals belong to group 1 and are metallic in nature”
Stanford CoreNLP
Collection of commonly used NLP tools – POS tagging, parsing, coreference analysis, NER
Problem – Integrating Java package with C#
Command line interface slow – needs to large load data models (17 secs per question!)
Solution - Query online demo Get XML response
http://nlp.stanford.edu:8080/corenlp/