reading comprehension on squad - stanford … · reading comprehension on squad zachary taylor,...
TRANSCRIPT
Reading Comprehension on SQuAD
Zachary Taylor, Brexton Pham, Meghana Rao
Stanford University, Department of Computer Science CS224n Winter 2017
Codalab username: ztaylor
Abstract
We implement three models to answer questions in the SQuAD dataset based on their context paragraphs. We compare their results. Our first model consists of a
question encoding using a bidirectional LSTM, a context encoding using an LSTM and taking the question into account, a straightforward attention
mechanism, and a linear decoding. Our second model is an extension of this first model to include Dynamic Coattention. Our third model is also our simplest, and
consists of separate question and context encodings using LSTMs and a dot product decoding of each context hidden state with the question representation. Initial evaluations demonstrated overfitting on the training data. The team chose
to focus on debugging and understanding errors with a barebones baseline model. Ultimately, the model achieved a 24.25 F1 score and a14.01 exact match
score.
1 Introduction 1.1 The Task
The objective of this research is to build a question-answering machine comprehension system that performs on the SQuAD dataset. In the SQuAD task, answering a question is defined as predicting an answer span within a given context paragraph. The goal is to predict an answer span tuple {as, ae} given a question of length n, q = {q1, q2, ..., qn}, and a supporting context paragraph p = {p1, p2, ..., pm} of length m. Therefore, the model must comprehend the question and reason through the passage to identify the correct answer span. The model learns a function that, given a pair of sequences (q, p) returns a sequence of two scalar indices {as, ae} indicating the start position and end position of the answer in paragraph p, respectively. Note that in this task as ≤ ae, and 0 ≤ as, ae ≤ m. Previously, the task of machine comprehension was difficult to attempt due to datasets limited in size, but the SQuAD dataset enables researchers to now build end-to-end models. It is over double the size of other current machine comprehension datasets and all questions are hand-written by humans, requiring different forms of reasoning to answer.
To approach the task, a baseline neural network model will be implemented. The architecture of the baseline model consists of feeding the context paragraph and question into an LSTM, and for each hidden state in our context representation, computing a simple dot product against the final question state to compute the probability that any index was the start or end index. After implementing the baseline model, the objective is to implement more complex models that are inspired from current state-of-the-art research. The first advanced model to implement involves utilizing a dynamic coattention network. The model also incorporates a bidirectional LSTMs and relies on linear decoding.
1.2 Background/Related Work Wang et al from the IBM Watson Research Center built a Multi-Perspective Context
Matching (MPCM) model to approach the task of machine comprehension which predicts answer beginnings and endings in context paragraphs. The model multiplies each word embedding vector in the paragraph by a relevancy weight computed against a question and then uses a bi-directional LSTM to encode the question and passage. The model compares the context at each point in the paragraph against the question from multiple perspectives to find the beginning and ending point of the answer (Wang et al., 2016). A team of researchers from Salesforce Research implemented a dynamic coattention network which overcomes limitations that occur in a single-pass model. In their research, they implemented co-dependent representations of both the context paragraphs and the questions and then iterate over all potential answer spans using a dynamic pointing decoder. The co-dependent representations are computed by making an affinity matrix that contains affinity scores for each document word and each question word which they then fuse with temporal information using an LSTM. Their ensemble method achieved an F1 score of 80.4% on the SQuAD dataset (Socher, 2017). Jiang and Wang from Singapore Management University tackled the machine comprehension challenge by implementing a model that relies on a match-LSTM and an answer pointer. In the match-LSTM model, two sentences are fed in, where one is the premise and the other is the hypothesis. The model checks the words of the hypothesis against the words of the premise and creates a vector representation at each token. An attention mechanism is used to obtain the weighted vector representation which is then combined with the other previously constructed representation and fed into an LSTM. The answer pointer selects a position from the input text as an output symbol instead of picking an output token from a given fixed vocabulary. They achieved a 77% F1 score on the SQuAD dataset (Jiang, 2016).
2 Approach Data Preprocessing:
All data preprocessing was completed in train.py. The validation dataset was built by reading through the validation answers, spans, and context files. The training dataset was built by reading through and compiling the training answers, spans, context paragraphs. Each dataset also consisted of question and context paragraph masks and placeholders for the start and end indices of the answer. Padded token vectors were appended to the ends of the context paragraphs and questions that did not align in length in the dataset. Initialization of the model also occurs in train.py and is where previous saved versions of the model can be evaluated.
Baseline Model:
The baseline model was implemented in qa_model.py. In the baseline model, the questions and context are both encoded using a basic LSTM. In the encoder class, the basic LSTM cell is initialized and the outputs are created by fully dynamically unrolling the inputs using a dynamic RNN cell. The decoder class takes in a knowledge representation and outputs a probability estimation over all paragraph tokens, which helps determine which token should be the start or end of the answer spam. The knowledge representation takes in q and X, our final question hidden state and our context hidden states. The class QASystem is then initialized and is passed in the encoded question and context as well as the decoder object, embedding path and question and context masks. In the QASystem class, the placeholders for the start and end answer indices, the questions, the question masks, the context paragraphs, and the context paragraph masks are made. The probability distributions for the start and end answer are initialized.
Within QASystem, the embeddings, system, and loss are also initialized. The embeddings are initialized using the glove pretrained embeddings (we use 100 dimensional glove vectors). The loss is calculated using the softmax cross entropy loss with logits, and is the sum of the loss over the start and end placeholders. The loss is optimized using the Adam optimizer.
In the main training loop of the model, minibatches of random samples from the dataset of a preset size are assembled. Within each epoch, new minibatches are made until the entire training dataset has been covered. The inputs, and start and end answer labels are optimized. After
each epoch, the loss on the validation set is calculated. To obtain the predicted start and end indices, the answer function is called, which calls a separate decode function. In decode, the probability distribution over different positions in the paragraph are returned. In the answer evaluation portion of the model, a random subsample of the validation dataset is read into the model. The F1 and exact match scores are initialized. The data is broken into segments of the size of the batch size in order to fit the placeholders initialized with the system setup.
For each sample, the string of the predicted answer is obtained by indexing into the context paragraph and grabbing the words that correspond to the predicted span of the answer. The true answer is obtained from indexing into the trained answers file. Both the predicted and true answer are passed into the F1 and EM functions and averaged.
Advnaced Model Our most advanced model incorporated dynamic coattention (see “Dynamic Coattention Networks for Question Answering”) with our previous structure of bidirectional LSTMs and co-question-context representations. We calculate attention over the question and context representations simultaneously by first calculating an affinity matrix between them, then we introduce some nonlinearities (we used softmax). We then calculate context summaries, taking into account each word of the context, by multiplying the original question representation concatenated with the combined question representation after the softmax by the post-softmax paragraph representation. We then do a similar set of calculations to get question summaries. We then use linear transformations with learnable weights to feed into our decoder. 3 Experiments To improve our model, we experimented with a wide variety of different techniques and hyperparameters. We had the goal of optimizing both our F1 score and our exact match (EM) score, which represent (roughly) how many of the words we predict as being in the answer actually are in the answer, as well as how many answers we get exactly right. When we initially implemented a baseline, we saw near-zero performance in terms of EM score on our validate set. We were able to overfit a small training sample, and even to overfit the overall training sample relatively well, but we weren’t seeing performance on the validate set even after 7 epochs:
Model F1 Score EM Score
Bidirectional Encoding w/ Attention 3.02 0.02
Partly in the attempt to fix this problem and partly in the attempt to implement a better model, we implemented a version of attention very similar to the Dynamic Coattention mechanism outlined in the paper “Dynamic Coattention Networks for Question Answering” in our references. After debugging our evaluate script and running this network on our validation set, however, we experienced similar problems:
Model F1 Score EM Score
Dynamic Coattention 3.42 0.01
Looking deeper into these results showed that our both of our models just weren’t learning anything generalizable. For example, here are some answers that our model predicted compared
with the correct answers (there appears to be little correlation, and in fact, the lengths are consistently much longer than the correct answers):
Predicted Answer Correct Answer
The Sultanate of Ifat , led by the Walashma dynasty with its the Red Sea
Europe regularly for holidays . In 1889 Ireland
Highland Quichuas living in the valleys of the Sierra region . Primarily consisting
Inca
Kingdom of Great Britain and Northern Ireland and its interests and Great Britain and Northern Ireland
As the following loss chart (loss during training) for our Dynamic Coattention model shows, we were fitting our train data at least somewhat (the loss decreases a little bit), but the test performance was not good. We suspect that there was a bug in our model somewhere.
At this point in the project, we had invested incredible amounts of time but we were still seeing no results and had limited time remaining, so we decided to experiment with an extremely simple model in order to get some limited performance. To this end, we designed our simplest baseline model that fed the question and context through separate LSTMs and computed dot products as scores for both the start and the end index. This allowed us to train epochs extremely quickly compared to before (~30 minutes vs 3 hours), which in turn allowed us to debug our model more quickly and get the framework correct at a simple level. One experimental change that ended up helping us was to decrease the learning rate. By using a learning rate that started at .001 instead of .01 (the Adam Optimizer theoretically changes this, but it still effected our learning), our simple model seemed to make more progress in terms of loss after many epochs:
Our simple model, as seen in the graph above, achieved a sustained lower loss than the other models (it levelled out at around 6.5). This wasn’t sure proof that we had fixed our problem, but we were delighted to find that this model actually was doing much better than our previous ones on the validation set. It achieved the following performance on the small subset we were predicting on:
Model F1 Score EM Score
Separate Encodings 22.51 9.96
We then dug deeper to find out why the model was performing better. We looked at examples such as the following:
Predicted Answer Correct Answer
hunting no natural predators
claims religious bigotry
March Bonus March
1639 1639
Catholic true apostolic succession
Toronto Toronto
system Jawed
inventor universities
The examples above show a couple answers that our models predicts correctly. Given the time crunch, we were not able to get our more advanced models to surpass this performance, but we did tune hyperparameters of our simple model (learning rate, batch size, hidden state size, etc) in order to find a good balance of speed, which allowed for more training, and flexibility. This eventually led to our FINAL TEST SET RESULTS (on codalab):
Model F1 Score EM Score
Separate Encodings 24.25 14.01
In the analysis section, we will discuss what the drawbacks and the advantages of our model were, and what specific types of questions it answers poorly and well.
Evaluation:
Each model was trained for 10 epochs, and each epoch consisted of 2544 minibatches. Every 500 minibatches, the model was saved at its current timestamp in the .tmp directory in order to be evaluated. The validation cost on the validation dataset was calculated after each epoch using softmax cross entropy loss with logits. To evaluate the model, the F1 score and exact match score were calculated, two metrics introduced in the original SQuAD paper. Exact match measures the percentage of the predicted answers that match the exact ground truth answers. The F1 score is the average overlap between the ground truth and prediction, and is calculated by multiplying the precision and recall scores by two and dividing by their sum. The F1 score and exact match score were calculated as the average of the scores over the number of samples evaluated.
4 Analysis Our prediction result on dev-v1.1.json resulted in performance metrics of 23.85 and 13.556 for F1 and EM, respectively. One crucial characteristic of our model that was both a help and a hindrance to the model’s performance, based on the examples in the experiments section, was its limitation to predicting one word answers. Since we used the same decoding for the start and end labels, each start and end label was the same. This meant that the model got none of the answers correct that were more than one word, which is an obvious and large drawback. However, it also didn’t have the downside that our earlier models had of predicting spans that were too long in length. In fact, since there were a fair amount of one-word answers in the dataset, the model was able to get about 13% of the questions correct (13.556 EM score), and it also managed to sometimes pick a word that was part of the answer even if it didn’t capture the whole thing (23.85 F1). Another interesting phenomena we noticed was that our model was better at predicting certain types of one word answers than others. For example, if the correct answer was a name or a year, it was more likely to guess that answer correctly. We hypothesize that this is because the frequency of overall answers that are proper nouns or years is higher than other types of words, and the model learned to differentiate those. Also, there are usually clear identifiers in the question that would let the model know that it is looking for a certain category of noun (‘when’, ‘where’, ‘who’, etc). This probably implies that our simple model is not utilizing context as well as it could, but does take into account the likelihood based on word type. For example, consider the following visualization of the question and context paragraph that led to one of the answers from before:
Visualization
Question: Where is the Center for New Religions located ?
Context Paragraph: Robert S. Wood has argued that the United States is a model for the world in terms of how a separation of church and state—no state-run or state-established church—is good for both the church and the state , allowing a variety of religions to flourish . Speaking at the Toronto-based Center for New Religions , Wood said that the freedom of conscience and assembly allowed under such a system has led to a " remarkable religiosity " in the United States that is n't present in other industrialized nations . Wood believes that the U.S. operates on " a sort of civic religion , " which includes a generally-shared belief in a creator who " expects better of us . " Beyond that , individuals are free to decide how they want to believe and fill in their own
creeds and express their conscience . He calls this approach the " genius of religious sentiment in the United States . "
Predicted Answer, Correct Answer: Toronto, Toronto
Analysis: Our suspicion is that the level that our simple model is operating at is one where it didn’t necessarily take the word “based” into account, as we were not using a bidirectional LSTM, making it hard to capture that relationship because the word “based” came later in the sentence. We would only capture forward relationships. Rather, our model probably managed to get the right answer by accurately representing the key word “located” as part of the question representation, and learning that the word “located” corresponds to location answers, such as Toronto. Being the only clear city name in the paragraph, it chose it in this case, though the model would likely not be able to differentiate very well between multiple city options if they were present (we saw this is in some other examples).
5 Conclusion
After completing this project, we have gained perspective on the challenges involved with implementing a successful neural network for the SQuAD reading comprehension task. We ultimately demonstrated a functioning baseline model after investing into implementing more complex models that had long training durations and overfit the training data. Initial successes on the training data with more complex models did not perform well on the validation set and thus catalyzed a series of simplifications and debugging procedures to scale back the model to a leaner and more iterable scale. In retrospect, we wish to have implemented the barebones model first before attempting to implement more ambitious models. Nevertheless, we at least managed to get some positive performance eventually, and we learned an incredible amount about implementing neural network models for NLP. Also, we thought that even 13% of correct answers was a worthy accomplishment on such a difficult machine comprehension task. For possible future directions, an obvious area of improvement would be changing the model to be able to predicted longer-than-one-word answers. This would involve careful experimentation and modification of the one model that we managed to get some meaningful performance out of. We could then graduate to implementing more advanced attention mechanisms. Potentially, we could learn what about our more advanced models didn’t manage to translate to the evaluation set, and fix that element. In terms of other advanced models that we are interested in, answer pointers seem like a potentially valuable next step to take if we got the other elements working. 6 Contributions Zach- Implemented the data preprocessing/reading into the model. Implemented the full baseline model, including read in, variable set up, loss, optimization, encoder/decoder, etc. Implemented the dynamic coattention model. Implemented the simple baseline model that achieved our group’s top performance. Implemented qa_answer.py to produce predictions for evaluation. Went through the codalab submission process and was responsible for producing results on the leaderboard. Wrote the “advanced models”, “experiments”, and “analysis” portions of the paper. Contributed to the poster. Meghana - Implemented evaluation answer, decode, and validation cost in baseline model. Monitored multiple training runs of the baseline model and worked on debugging the saving and reloading mechanism. Initialized validation dataset. Wrote the introduction, background, conclusion, and baseline approach of the paper and formatted the paper. Main person in charge of managing VM resources, etc. Formatted and contributed to the poster. Brexton - Also implemented evaluation answer, decode, and validation cost. Contributed to
debugging the saving and reloading mechanism. Implemented methods of validating and evaluating on larger batch sizes. Spent lots of time debugging the performance of the initial models. Ran multiple tests on the virtual environment of baseline and advanced models. Contributed to poster and background information. Main person in charge of previous work research.
7 References Chen, D., Bolton, J., Manning, C. 2016. Stanford Attentive Reader. A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task. arXiv preprint arXiv:1606.02858v2
Jiang, J., Wang, S. 2016. Match-LSTM with Answer-Pointer. arXiv preprint arXiv: 1608.07905
Lee et al., 2017. Recurrent Span Representation, RaSoR. arXiv preprint arXiv:1611.01436v2.
Seo et al., 2017. Bi-directional Attention Flow, BiDAF. arXiv preprint arXiv arXiv:1611.01603v5.
Socher, R., Xiong, C., Zhong, V. 2016. Dynamic Coattention Networks for Question Answering. arXiv preprint arXiv:1611.01604
Wang et al., 2016. Multi-perspective context matching for machine comprehension. arXiv preprint arXiv:1612.04211, 2016
Yang et al., 2016. Words or Characters? Fine-Grained Grating for Reading Comprehension. arXiv preprint arXiv:1611.01724v1.
Yu et al., 2016. Dynamic Chunk Reader. arXiv preprint arXiv:1610.09996v2.
8 Supplementary Materials t r a i n . p y f r o m _ _ f u t u r e _ _ i m p o r t a b s o l u t e _ i m p o r t f r o m _ _ f u t u r e _ _ i m p o r t d i v i s i o n f r o m _ _ f u t u r e _ _ i m p o r t p r i n t _ f u n c t i o n i m p o r t o s i m p o r t j s o n i m p o r t t e n s o r f l o w a s t f i m p o r t n u m p y a s n p f r o m q a _ m o d e l i m p o r t E n c o d e r , Q A S y s t e m , D e c o d e r f r o m o s . p a t h i m p o r t j o i n a s p j o i n i m p o r t l o g g i n g i m p o r t s y s l o g g i n g . b a s i c C o n f i g ( l e v e l = l o g g i n g . I N F O ) t f . a p p . f l a g s . D E F I N E _ f l o a t ( " t r a i n i n g _ c a p " , 3 2 , " T r a i n i n g c a p . " ) t f . a p p . f l a g s . D E F I N E _ f l o a t ( " l e a r n i n g _ r a t e " , 0 . 0 0 1 , " L e a r n i n g r a t e . " ) t f . a p p . f l a g s . D E F I N E _ f l o a t ( " m a x _ g r a d i e n t _ n o r m " , 2 0 . 0 , " C l i p g r a d i e n t s t o t h i s n o r m . " ) t f . a p p . f l a g s . D E F I N E _ f l o a t ( " d r o p o u t " , 0 . 1 5 , " F r a c t i o n o f u n i t s r a n d o m l y d r o p p e d o n n o n - r e c u r r e n t c o n n e c t i o n s . " ) t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " b a t c h _ s i z e " , 3 2 , " B a t c h s i z e t o u s e d u r i n g t r a i n i n g . " ) t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " e p o c h s " , 1 5 , " N u m b e r o f e p o c h s t o t r a i n . " )
# t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " s a m p l e S i z e " , 4 0 , " N u m b e r o f e p o c h s t o t r a i n . " ) # t h i s i s t h e n u m b e r o f s a m p l e s f e d i n t o e v a l u a t e a n s w e r t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " s t a t e _ s i z e " , 1 2 4 , " S i z e o f e a c h m o d e l l a y e r . " ) t f . a p p . f l a g s . D E F I N E _ s t r i n g ( " t r a i n _ d i r " , " t r a i n " , " T r a i n i n g d i r e c t o r y ( d e f a u l t : . / t r a i n ) . " ) t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " o u t p u t _ s i z e " , 4 0 0 , " T h e o u t p u t s i z e o f y o u r m o d e l . " ) t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " e m b e d d i n g _ s i z e " , 1 0 0 , " S i z e o f t h e p r e t r a i n e d v o c a b u l a r y . " ) t f . a p p . f l a g s . D E F I N E _ s t r i n g ( " d a t a _ d i r " , " d a t a / s q u a d " , " S Q u A D d i r e c t o r y ( d e f a u l t . / d a t a / s q u a d ) " ) t f . a p p . f l a g s . D E F I N E _ s t r i n g ( " l o a d _ t r a i n _ d i r " , " t r a i n " , " T r a i n i n g d i r e c t o r y t o l o a d m o d e l p a r a m e t e r s f r o m t o r e s u m e t r a i n i n g ( d e f a u l t : { t r a i n _ d i r } ) . " ) t f . a p p . f l a g s . D E F I N E _ s t r i n g ( " l o g _ d i r " , " l o g " , " P a t h t o s t o r e l o g a n d f l a g f i l e s ( d e f a u l t : . / l o g ) " ) t f . a p p . f l a g s . D E F I N E _ s t r i n g ( " o p t i m i z e r " , " a d a m " , " a d a m / s g d " ) t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " p r i n t _ e v e r y " , 1 , " H o w m a n y i t e r a t i o n s t o d o p e r p r i n t . " ) t f . a p p . f l a g s . D E F I N E _ i n t e g e r ( " k e e p " , 0 , " H o w m a n y c h e c k p o i n t s t o k e e p , 0 i n d i c a t e s k e e p a l l . " ) t f . a p p . f l a g s . D E F I N E _ s t r i n g ( " v o c a b _ p a t h " , " d a t a / s q u a d / v o c a b . d a t " , " P a t h t o v o c a b f i l e ( d e f a u l t : . / d a t a / s q u a d / v o c a b . d a t ) " ) t f . a p p . f l a g s . D E F I N E _ s t r i n g ( " e m b e d _ p a t h " , " " , " P a t h t o t h e t r i m m e d G L o V e e m b e d d i n g ( d e f a u l t : . / d a t a / s q u a d / g l o v e . t r i m m e d . { e m b e d d i n g _ s i z e } . n p z ) " ) F L A G S = t f . a p p . f l a g s . F L A G S d e f i n i t i a l i z e _ m o d e l ( s e s s i o n , m o d e l , t r a i n _ d i r ) : p r i n t ( ' t r a i n d i r ' , t r a i n _ d i r ) c k p t = t f . t r a i n . g e t _ c h e c k p o i n t _ s t a t e ( t r a i n _ d i r ) p r i n t ( ' c k p t ' , c k p t ) v 2 _ p a t h = c k p t . m o d e l _ c h e c k p o i n t _ p a t h + " . i n d e x " i f c k p t e l s e " " p r i n t ( ' v 2 _ p a t h ' , v 2 _ p a t h ) i f c k p t a n d ( t f . g f i l e . E x i s t s ( c k p t . m o d e l _ c h e c k p o i n t _ p a t h ) o r t f . g f i l e . E x i s t s ( v 2 _ p a t h ) ) : l o g g i n g . i n f o ( " R e a d i n g m o d e l p a r a m e t e r s f r o m % s " % c k p t . m o d e l _ c h e c k p o i n t _ p a t h ) p r i n t ( ' c k p t . m o d e l _ c h e c k p o i n t _ p a t h ' , c k p t . m o d e l _ c h e c k p o i n t _ p a t h ) m o d e l . s a v e r . r e s t o r e ( s e s s i o n , c k p t . m o d e l _ c h e c k p o i n t _ p a t h ) e l s e : l o g g i n g . i n f o ( " C r e a t e d m o d e l w i t h f r e s h p a r a m e t e r s . " ) s e s s i o n . r u n ( t f . g l o b a l _ v a r i a b l e s _ i n i t i a l i z e r ( ) ) l o g g i n g . i n f o ( ' N u m p a r a m s : % d ' % s u m ( v . g e t _ s h a p e ( ) . n u m _ e l e m e n t s ( ) f o r v i n t f . t r a i n a b l e _ v a r i a b l e s ( ) ) ) r e t u r n m o d e l d e f i n i t i a l i z e _ v o c a b ( v o c a b _ p a t h ) : i f t f . g f i l e . E x i s t s ( v o c a b _ p a t h ) : r e v _ v o c a b = [ ] w i t h t f . g f i l e . G F i l e ( v o c a b _ p a t h , m o d e = " r b " ) a s f : r e v _ v o c a b . e x t e n d ( f . r e a d l i n e s ( ) ) r e v _ v o c a b = [ l i n e . s t r i p ( ' \ n ' ) f o r l i n e i n r e v _ v o c a b ] v o c a b = d i c t ( [ ( x , y ) f o r ( y , x ) i n e n u m e r a t e ( r e v _ v o c a b ) ] ) r e t u r n v o c a b , r e v _ v o c a b e l s e : r a i s e V a l u e E r r o r ( " V o c a b u l a r y f i l e % s n o t f o u n d . " , v o c a b _ p a t h ) d e f g e t _ n o r m a l i z e d _ t r a i n _ d i r ( t r a i n _ d i r ) : " " "
A d d s s y m l i n k t o { t r a i n _ d i r } f r o m / t m p / c s 2 2 4 n - s q u a d - t r a i n t o c a n o n i c a l i z e t h e f i l e p a t h s s a v e d i n t h e c h e c k p o i n t . T h i s a l l o w s t h e m o d e l t o b e r e l o a d e d e v e n i f t h e l o c a t i o n o f t h e c h e c k p o i n t f i l e s h a s m o v e d , a l l o w i n g u s a g e w i t h C o d a L a b . T h i s m u s t b e d o n e o n b o t h t r a i n . p y a n d q a _ a n s w e r . p y i n o r d e r t o w o r k . " " " # g l o b a l _ t r a i n _ d i r = ' . / t r a i n / 2 0 1 7 0 3 1 6 _ 0 0 2 0 0 2 ' # g l o b a l _ t r a i n _ d i r = ' . / t r a i n / 2 0 1 7 0 3 1 6 _ 0 0 2 0 0 2 ' # g l o b a l _ t r a i n _ d i r = ' . / t r a i n / 2 0 1 7 0 3 1 6 _ 0 0 2 0 0 2 / m o d e l . w e i g h t s ' g l o b a l _ t r a i n _ d i r = " / t m p / c s 2 2 4 n - s q u a d - t r a i n " # g l o b a l _ t r a i n _ d i r = t r a i n _ d i r # o s . r e m o v e ( g l o b a l _ t r a i n _ d i r ) i f o s . p a t h . e x i s t s ( g l o b a l _ t r a i n _ d i r ) : o s . u n l i n k ( g l o b a l _ t r a i n _ d i r ) i f n o t o s . p a t h . e x i s t s ( t r a i n _ d i r ) : o s . m a k e d i r s ( t r a i n _ d i r ) o s . s y m l i n k ( o s . p a t h . a b s p a t h ( t r a i n _ d i r ) , g l o b a l _ t r a i n _ d i r ) r e t u r n g l o b a l _ t r a i n _ d i r d e f m a i n ( _ ) : e m b e d _ p a t h = F L A G S . e m b e d _ p a t h o r p j o i n ( " d a t a " , " s q u a d " , " g l o v e . t r i m m e d . { } . n p z " . f o r m a t ( F L A G S . e m b e d d i n g _ s i z e ) ) v o c a b _ p a t h = F L A G S . v o c a b _ p a t h o r p j o i n ( F L A G S . d a t a _ d i r , " v o c a b . d a t " ) v o c a b , r e v _ v o c a b = i n i t i a l i z e _ v o c a b ( v o c a b _ p a t h ) d e f p a d d e d _ t o k e n _ v e c t o r ( v e c t o r , m a x _ l e n g t h ) : r e t u r n [ v o c a b [ v e c t o r [ i ] ] i f i < l e n ( v e c t o r ) e l s e v o c a b [ ' < p a d > ' ] f o r i i n r a n g e ( m a x _ l e n g t h ) ] # D o w h a t y o u n e e d t o l o a d d a t a s e t s f r o m F L A G S . d a t a _ d i r # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # T R A I N D A T A S E T # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # q u e s t i o n s = [ ] q u e s t i o n _ m a s k = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / t r a i n . q u e s t i o n " ) a s f i l e : f o r l i n e i n f i l e : q = s t r . s p l i t ( l i n e ) q u e s t i o n _ m a s k . a p p e n d ( l e n ( q ) ) q u e s t i o n s . a p p e n d ( p a d d e d _ t o k e n _ v e c t o r ( q , 6 0 ) ) o v e r f l o w _ i n d i c e s = [ ] c o n t e x t = [ ] c o n t e x t _ m a s k = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / t r a i n . c o n t e x t " ) a s f i l e : # i = 0 f o r i , l i n e i n e n u m e r a t e ( f i l e ) : # i f i < 3 2 : c = s t r . s p l i t ( l i n e ) i f l e n ( c ) > F L A G S . o u t p u t _ s i z e : o v e r f l o w _ i n d i c e s . a p p e n d ( i ) c o n t e x t _ m a s k . a p p e n d ( l e n ( c ) ) c o n t e x t . a p p e n d ( p a d d e d _ t o k e n _ v e c t o r ( c , F L A G S . o u t p u t _ s i z e ) ) # i + = 1 c o n t e x t W o r d s = [ ] # g e t a l l t h e w o r d s f r o m e a c h c o n t e x t
w i t h o p e n ( F L A G S . d a t a _ d i r + " / t r a i n . c o n t e x t " ) a s f i l e : # i = 0 f o r l i n e i n f i l e : # i f i < 3 2 : n e w C o n t e x t = [ ] c = s t r . s p l i t ( l i n e ) f o r e a c h C o n t e x t i n c : w o r d = e a c h C o n t e x t . s p l i t ( " " ) n e w C o n t e x t . a p p e n d ( w o r d ) c o n t e x t W o r d s . a p p e n d ( n e w C o n t e x t ) # i + = 1 l a b e l s = [ ] l a b e l s _ s t a r t = [ ] l a b e l s _ e n d = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / t r a i n . s p a n " ) a s f i l e : l i n e N u m = 0 # i = 0 f o r l i n e i n f i l e : # i f i < 3 2 : s t a r t , e n d = [ i n t ( s ) f o r s i n s t r . s p l i t ( l i n e ) ] l a b e l s _ s t a r t . a p p e n d ( s t a r t ) l a b e l s _ e n d . a p p e n d ( e n d ) s t a r t _ w o r d = c o n t e x t W o r d s [ l i n e N u m ] [ s t a r t ] e n d _ w o r d = c o n t e x t W o r d s [ l i n e N u m ] [ e n d ] l i n e N u m = l i n e N u m + 1 # p r i n t ( s t a r t _ w o r d , e n d _ w o r d ) # s y s . e x i t ( ) # c r e a t e t h e s e t w o r a r r a y s s ) # i + = 1 t r a i n A n s w e r s = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / t r a i n . a n s w e r " ) a s f i l e : t r a i n A n s w e r s = f i l e . r e a d l i n e s ( ) t r a i n A n s w e r s = [ x . s t r i p ( ) f o r x i n t r a i n A n s w e r s ] # [ : 3 2 ] c o n t e x t P a r a g r a p h s = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / t r a i n . c o n t e x t " ) a s f i l e : c o n t e x t P a r a g r a p h s = f i l e . r e a d l i n e s ( ) c o n t e x t P a r a g r a p h s = [ x . s t r i p ( ) f o r x i n c o n t e x t P a r a g r a p h s ] # [ : 3 2 ] # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # V A L I D A T E D A T A S E T # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # v a l _ q u e s t i o n s = [ ] v a l _ q u e s t i o n _ m a s k = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / v a l . q u e s t i o n " ) a s f i l e : f o r l i n e i n f i l e : q = s t r . s p l i t ( l i n e ) v a l _ q u e s t i o n _ m a s k . a p p e n d ( l e n ( q ) ) v a l _ q u e s t i o n s . a p p e n d ( p a d d e d _ t o k e n _ v e c t o r ( q , 6 0 ) ) v a l _ o v e r f l o w _ i n d i c e s = [ ] v a l _ c o n t e x t = [ ] v a l _ c o n t e x t _ m a s k = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / v a l . c o n t e x t " ) a s f i l e : f o r i , l i n e i n e n u m e r a t e ( f i l e ) : c = s t r . s p l i t ( l i n e ) i f l e n ( c ) > F L A G S . o u t p u t _ s i z e : v a l _ o v e r f l o w _ i n d i c e s . a p p e n d ( i ) v a l _ c o n t e x t _ m a s k . a p p e n d ( l e n ( c ) ) v a l _ c o n t e x t . a p p e n d ( p a d d e d _ t o k e n _ v e c t o r ( c , F L A G S . o u t p u t _ s i z e ) )
v a l _ c o n t e x t W o r d s = [ ] # g e t a l l t h e w o r d s f r o m e a c h c o n t e x t w i t h o p e n ( F L A G S . d a t a _ d i r + " / v a l . c o n t e x t " ) a s f i l e : f o r l i n e i n f i l e : n e w C o n t e x t = [ ] c = s t r . s p l i t ( l i n e ) f o r e a c h C o n t e x t i n c : w o r d = e a c h C o n t e x t . s p l i t ( " " ) n e w C o n t e x t . a p p e n d ( w o r d ) v a l _ c o n t e x t W o r d s . a p p e n d ( n e w C o n t e x t ) v a l _ l a b e l s = [ ] v a l _ l a b e l s _ s t a r t = [ ] v a l _ l a b e l s _ e n d = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / v a l . s p a n " ) a s f i l e : l i n e N u m = 0 f o r l i n e i n f i l e : s t a r t , e n d = [ i n t ( s ) f o r s i n s t r . s p l i t ( l i n e ) ] v a l _ l a b e l s _ s t a r t . a p p e n d ( s t a r t ) v a l _ l a b e l s _ e n d . a p p e n d ( e n d ) v a l A n s w e r s = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / v a l . a n s w e r " ) a s f i l e : v a l A n s w e r s = f i l e . r e a d l i n e s ( ) v a l A n s w e r s = [ x . s t r i p ( ) f o r x i n v a l A n s w e r s ] # [ : 3 2 ] v a l P a r a g r a p h s = [ ] w i t h o p e n ( F L A G S . d a t a _ d i r + " / v a l . c o n t e x t " ) a s f i l e : v a l P a r a g r a p h s = f i l e . r e a d l i n e s ( ) v a l P a r a g r a p h s = [ x . s t r i p ( ) f o r x i n c o n t e x t P a r a g r a p h s ] # [ : 3 2 ] f o r i i n r e v e r s e d ( o v e r f l o w _ i n d i c e s ) : q u e s t i o n s . p o p ( i ) q u e s t i o n _ m a s k . p o p ( i ) c o n t e x t . p o p ( i ) c o n t e x t _ m a s k . p o p ( i ) c o n t e x t W o r d s . p o p ( i ) l a b e l s _ s t a r t . p o p ( i ) l a b e l s _ e n d . p o p ( i ) t r a i n A n s w e r s . p o p ( i ) c o n t e x t P a r a g r a p h s . p o p ( i ) # l a b e l s = [ ] # w i t h o p e n ( F L A G S . d a t a _ d i r + " / t r a i n . s p a n " ) a s f i l e : # f o r l i n e i n f i l e : # s t a r t , e n d = [ i n t ( s ) f o r s i n s t r . s p l i t ( l i n e ) ] # a = [ 1 i f i > = s t a r t a n d i < = e n d e l s e 0 f o r i i n r a n g e ( 7 6 6 ) ] # l a b e l s _ s t a r t . a p p e n d ( s t a r t ) # l a b e l s _ e n d . a p p e n d ( e n d ) # c r e a t e t h e s e t w o r a r r a y s s ) i n p u t s = ( q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k ) v a l _ d a t a s e t = ( v a l _ q u e s t i o n s , v a l _ c o n t e x t , v a l _ q u e s t i o n _ m a s k , v a l _ c o n t e x t _ m a s k , v a l _ l a b e l s _ s t a r t , v a l _ l a b e l s _ e n d , v a l _ c o n t e x t W o r d s , v a l A n s w e r s , v a l P a r a g r a p h s ) # d a t a s e t = ( q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k , l a b e l s ) d a t a s e t = ( q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k , l a b e l s _ s t a r t , l a b e l s _ e n d , c o n t e x t W o r d s , t r a i n A n s w e r s , c o n t e x t P a r a g r a p h s )
q u e s t i o n _ e n c o d e r = E n c o d e r ( s i z e = F L A G S . s t a t e _ s i z e ) c o n t e x t _ e n c o d e r = E n c o d e r ( s i z e = F L A G S . s t a t e _ s i z e ) d e c o d e r = D e c o d e r ( o u t p u t _ s i z e = F L A G S . o u t p u t _ s i z e ) q a = Q A S y s t e m ( q u e s t i o n _ e n c o d e r , c o n t e x t _ e n c o d e r , d e c o d e r , e m b e d _ p a t h , q u e s t i o n _ m a s k , c o n t e x t _ m a s k ) i f n o t o s . p a t h . e x i s t s ( F L A G S . l o g _ d i r ) : o s . m a k e d i r s ( F L A G S . l o g _ d i r ) f i l e _ h a n d l e r = l o g g i n g . F i l e H a n d l e r ( p j o i n ( F L A G S . l o g _ d i r , " l o g . t x t " ) ) l o g g i n g . g e t L o g g e r ( ) . a d d H a n d l e r ( f i l e _ h a n d l e r ) p r i n t ( v a r s ( F L A G S ) ) w i t h o p e n ( o s . p a t h . j o i n ( F L A G S . l o g _ d i r , " f l a g s . j s o n " ) , ' w ' ) a s f o u t : j s o n . d u m p ( F L A G S . _ _ f l a g s , f o u t ) w i t h t f . S e s s i o n ( ) a s s e s s : # l o a d _ t r a i n _ d i r = g e t _ n o r m a l i z e d _ t r a i n _ d i r ( F L A G S . l o a d _ t r a i n _ d i r o r F L A G S . t r a i n _ d i r ) # l o a d _ t r a i n _ d i r = ' / t m p / c s 2 2 4 n - s q u a d - t r a i n / 2 0 1 7 0 3 1 9 _ 1 6 0 4 0 3 ' l o a d _ t r a i n _ d i r = ' . / t r a i n / 2 0 1 7 0 3 2 1 _ 1 1 0 7 2 3 ' i n i t i a l i z e _ m o d e l ( s e s s , q a , l o a d _ t r a i n _ d i r ) s a v e _ t r a i n _ d i r = g e t _ n o r m a l i z e d _ t r a i n _ d i r ( F L A G S . t r a i n _ d i r ) q a . t r a i n ( s e s s , d a t a s e t , s a v e _ t r a i n _ d i r , v a l _ d a t a s e t ) q a . e v a l u a t e _ a n s w e r ( s e s s , v a l _ d a t a s e t , l o g = T r u e ) i f _ _ n a m e _ _ = = " _ _ m a i n _ _ " : t f . a p p . r u n ( ) q a _ m o d e l . p y f r o m _ _ f u t u r e _ _ i m p o r t a b s o l u t e _ i m p o r t f r o m _ _ f u t u r e _ _ i m p o r t d i v i s i o n f r o m _ _ f u t u r e _ _ i m p o r t p r i n t _ f u n c t i o n i m p o r t t i m e i m p o r t p d b i m p o r t l o g g i n g i m p o r t d a t e t i m e i m p o r t o s i m p o r t r a n d o m i m p o r t s y s i m p o r t n u m p y a s n p f r o m s i x . m o v e s i m p o r t x r a n g e # p y l i n t : d i s a b l e = r e d e f i n e d - b u i l t i n i m p o r t t e n s o r f l o w a s t f f r o m t e n s o r f l o w . p y t h o n . o p s i m p o r t v a r i a b l e _ s c o p e a s v s f r o m e v a l u a t e i m p o r t e x a c t _ m a t c h _ s c o r e , f 1 _ s c o r e f r o m t e n s o r f l o w . p y t h o n . o p s . n n i m p o r t
s p a r s e _ s o f t m a x _ c r o s s _ e n t r o p y _ w i t h _ l o g i t s a s s s c e l o g g i n g . b a s i c C o n f i g ( l e v e l = l o g g i n g . I N F O ) d e f g e t _ o p t i m i z e r ( o p t ) : i f o p t = = " a d a m " : o p t f n = t f . t r a i n . A d a m O p t i m i z e r e l i f o p t = = " s g d " : o p t f n = t f . t r a i n . G r a d i e n t D e s c e n t O p t i m i z e r e l s e : a s s e r t ( F a l s e ) r e t u r n o p t f n c l a s s E n c o d e r ( o b j e c t ) : d e f _ _ i n i t _ _ ( s e l f , s i z e ) : s e l f . s i z e = s i z e s e l f . F L A G S = t f . a p p . f l a g s . F L A G S d e f e n c o d e ( s e l f , i n p u t s , m a s k s , s c o p e _ n a m e , e n c o d e r _ s t a t e _ i n p u t = N o n e ) : " " " I n a g e n e r a l i z e d e n c o d e f u n c t i o n , y o u p a s s i n y o u r i n p u t s , m a s k s , a n d a n i n i t i a l h i d d e n s t a t e i n p u t i n t o t h i s f u n c t i o n . : p a r a m i n p u t s : S y m b o l i c r e p r e s e n t a t i o n s o f y o u r i n p u t : p a r a m m a s k s : t h i s i s t o m a k e s u r e t f . n n . d y n a m i c _ r n n d o e s n ' t i t e r a t e t h r o u g h m a s k e d s t e p s : p a r a m e n c o d e r _ s t a t e _ i n p u t : ( O p t i o n a l ) p a s s t h i s a s i n i t i a l h i d d e n s t a t e t o t f . n n . d y n a m i c _ r n n t o b u i l d c o n d i t i o n a l r e p r e s e n t a t i o n s : r e t u r n : a n e n c o d e d r e p r e s e n t a t i o n o f y o u r i n p u t . I t c a n b e c o n t e x t - l e v e l r e p r e s e n t a t i o n , w o r d - l e v e l r e p r e s e n t a t i o n , o r b o t h . " " " w i t h v s . v a r i a b l e _ s c o p e ( s c o p e _ n a m e ) : c e l l = t f . n n . r n n _ c e l l . B a s i c L S T M C e l l ( s e l f . s i z e ) i f e n c o d e r _ s t a t e _ i n p u t = = N o n e : i n i t i a l _ s t a t e = c e l l . z e r o _ s t a t e ( s e l f . F L A G S . b a t c h _ s i z e , t f . f l o a t 6 4 ) # o u t p u t s , ( o u t p u t _ f w , o u t p u t _ b w ) = t f . n n . b i d i r e c t i o n a l _ d y n a m i c _ r n n ( c e l l _ f w = c e l l , c e l l _ b w = c e l l , i n p u t s = i n p u t s , s e q u e n c e _ l e n g t h = m a s k s , i n i t i a l _ s t a t e _ f w = i n i t i a l _ s t a t e , i n i t i a l _ s t a t e _ b w = i n i t i a l _ s t a t e ) # c , h = ( t f . c o n c a t ( 1 , [ o u t p u t _ f w . c , o u t p u t _ b w . c ] ) , t f . c o n c a t ( 1 , [ o u t p u t _ f w . h , o u t p u t _ b w . h ] ) ) # o u t p u t = t f . n n . r n n _ c e l l . L S T M S t a t e T u p l e ( c , h ) # o u t p u t _ s t a t e s = t f . c o n c a t ( 2 , [ o u t p u t s [ 0 ] , o u t p u t s [ 1 ] ] ) o u t p u t _ s t a t e s , o u t p u t = t f . n n . d y n a m i c _ r n n ( c e l l , i n p u t s = i n p u t s , s e q u e n c e _ l e n g t h = m a s k s , i n i t i a l _ s t a t e = i n i t i a l _ s t a t e ) # o u t p u t = t f . c o n c a t ( 1 , [ o u t p u t . c , o u t p u t . h ] ) e l s e : i n i t i a l _ s t a t e = e n c o d e r _ s t a t e _ i n p u t o u t p u t _ s t a t e s , o u t p u t = t f . n n . d y n a m i c _ r n n ( c e l l , i n p u t s = i n p u t s , s e q u e n c e _ l e n g t h = m a s k s , i n i t i a l _ s t a t e = i n i t i a l _ s t a t e ) r e t u r n o u t p u t . h , o u t p u t _ s t a t e s
c l a s s D e c o d e r ( o b j e c t ) : d e f _ _ i n i t _ _ ( s e l f , o u t p u t _ s i z e ) : s e l f . o u t p u t _ s i z e = o u t p u t _ s i z e s e l f . F L A G S = t f . a p p . f l a g s . F L A G S d e f d e c o d e ( s e l f , k n o w l e d g e _ r e p ) : " " " t a k e s i n a k n o w l e d g e r e p r e s e n t a t i o n a n d o u t p u t a p r o b a b i l i t y e s t i m a t i o n o v e r a l l p a r a g r a p h t o k e n s o n w h i c h t o k e n s h o u l d b e t h e s t a r t o f t h e a n s w e r s p a n , a n d w h i c h s h o u l d b e t h e e n d o f t h e a n s w e r s p a n . : p a r a m k n o w l e d g e _ r e p : i t i s a r e p r e s e n t a t i o n o f t h e p a r a g r a p h a n d q u e s t i o n , d e c i d e d b y h o w y o u c h o o s e t o i m p l e m e n t t h e e n c o d e r : r e t u r n : " " " b o u n d a r y _ m o d e l = T r u e i f b o u n d a r y _ m o d e l : # h _ q , h _ p = k n o w l e d g e _ r e p # w i t h v s . v a r i a b l e _ s c o p e ( " a n s w e r _ s t a r t " ) : # a _ s = t f . n n . r n n _ c e l l . _ l i n e a r ( [ h _ q , h _ p ] , s e l f . o u t p u t _ s i z e , T r u e , 1 . 0 ) # w i t h v s . v a r i a b l e _ s c o p e ( " a n s w e r _ e n d " ) : # a _ e = t f . n n . r n n _ c e l l . _ l i n e a r ( [ h _ q , h _ p ] , s e l f . o u t p u t _ s i z e , T r u e , 1 . 0 ) # r e t u r n ( a _ s , a _ e ) q , X = k n o w l e d g e _ r e p a = t f . s q u e e z e ( t f . m a t m u l ( X , t f . r e s h a p e ( q , [ - 1 , s e l f . F L A G S . s t a t e _ s i z e , 1 ] ) ) ) r e t u r n ( a , a ) e l s e : P = k n o w l e d g e _ r e p p r i n t ( ' P : ' , P ) h _ s i z e = s e l f . F L A G S . s t a t e _ s i z e W _ s = t f . g e t _ v a r i a b l e ( " W _ s " , s h a p e = [ 1 , h _ s i z e , 1 ] , i n i t i a l i z e r = t f . c o n t r i b . l a y e r s . x a v i e r _ i n i t i a l i z e r ( ) , d t y p e = t f . f l o a t 6 4 ) t i l e d _ W _ s = t f . t i l e ( W _ s , t f . p a c k ( [ s e l f . F L A G S . b a t c h _ s i z e , 1 , 1 ] ) ) S = t f . s q u e e z e ( t f . m a t m u l ( P , t i l e d _ W _ s ) ) p r i n t ( ' S : ' , S ) c e l l = t f . n n . r n n _ c e l l . B a s i c L S T M C e l l ( h _ s i z e ) i n i t i a l _ s t a t e = c e l l . z e r o _ s t a t e ( s e l f . F L A G S . b a t c h _ s i z e , t f . f l o a t 6 4 ) P _ n e w , f i n a l _ s t a t e s = t f . n n . d y n a m i c _ r n n ( c e l l , i n p u t s = P , i n i t i a l _ s t a t e = i n i t i a l _ s t a t e ) p r i n t ( ' f i n a l L S T M o u t p u t s ' , P _ n e w ) W _ e = t f . g e t _ v a r i a b l e ( " W _ e " , s h a p e = [ 1 , h _ s i z e , 1 ] , i n i t i a l i z e r = t f . c o n t r i b . l a y e r s . x a v i e r _ i n i t i a l i z e r ( ) , d t y p e = t f . f l o a t 6 4 ) t i l e d _ W _ e = t f . t i l e ( W _ e , t f . p a c k ( [ s e l f . F L A G S . b a t c h _ s i z e , 1 , 1 ] ) ) E = t f . s q u e e z e ( t f . m a t m u l ( P _ n e w , t i l e d _ W _ e ) ) p r i n t ( ' E ' , E ) r e t u r n S , E c l a s s Q A S y s t e m ( o b j e c t ) : d e f _ _ i n i t _ _ ( s e l f , q u e s t i o n _ e n c o d e r , c o n t e x t _ e n c o d e r , d e c o d e r , * a r g s ) : " " " I n i t i a l i z e s y o u r S y s t e m : p a r a m e n c o d e r : a n e n c o d e r t h a t y o u c o n s t r u c t e d i n t r a i n . p y : p a r a m d e c o d e r : a d e c o d e r t h a t y o u c o n s t r u c t e d i n t r a i n . p y : p a r a m a r g s : p a s s i n m o r e a r g u m e n t s a s n e e d e d
" " " s e l f . F L A G S = t f . a p p . f l a g s . F L A G S s e l f . q u e s t i o n _ e n c o d e r = q u e s t i o n _ e n c o d e r s e l f . c o n t e x t _ e n c o d e r = c o n t e x t _ e n c o d e r s e l f . d e c o d e r = d e c o d e r s e l f . m a x _ q u e s t i o n _ l e n g t h = 6 0 # r a n s c r i p t t o f i g u r e t h i s o u t s e l f . m a x _ c o n t e x t _ l e n g t h = s e l f . F L A G S . o u t p u t _ s i z e s e l f . e m b e d _ p a t h = a r g s [ 0 ] s e l f . q u e s t i o n _ m a s k = a r g s [ 1 ] s e l f . c o n t e x t _ m a s k = a r g s [ 2 ] # = = = = s e t u p p l a c e h o l d e r t o k e n s = = = = = = = = s e l f . l a b e l s _ p l a c e h o l d e r = t f . p l a c e h o l d e r ( t f . i n t 3 2 , s h a p e = [ N o n e , s e l f . m a x _ c o n t e x t _ l e n g t h ] ) s e l f . l a b e l s _ s t a r t _ p l a c e h o l d e r = t f . p l a c e h o l d e r ( t f . i n t 3 2 , s h a p e = [ s e l f . F L A G S . b a t c h _ s i z e ] ) s e l f . l a b e l s _ e n d _ p l a c e h o l d e r = t f . p l a c e h o l d e r ( t f . i n t 3 2 , s h a p e = [ s e l f . F L A G S . b a t c h _ s i z e ] ) s e l f . q u e s t i o n s _ p l a c e h o l d e r = t f . p l a c e h o l d e r ( t f . i n t 3 2 , s h a p e = [ N o n e , s e l f . m a x _ q u e s t i o n _ l e n g t h ] ) s e l f . q u e s t i o n _ m a s k _ p l a c e h o l d e r = t f . p l a c e h o l d e r ( t f . i n t 3 2 , s h a p e = [ N o n e , ] ) s e l f . c o n t e x t _ p l a c e h o l d e r = t f . p l a c e h o l d e r ( t f . i n t 3 2 , s h a p e = [ N o n e , s e l f . m a x _ c o n t e x t _ l e n g t h ] ) s e l f . c o n t e x t _ m a s k _ p l a c e h o l d e r = t f . p l a c e h o l d e r ( t f . i n t 3 2 , s h a p e = [ N o n e , ] ) s e l f . a _ s = N o n e s e l f . a _ e = N o n e s e l f . c o n t e x t W o r d s = N o n e # u s e s i m p l e r f o r m f o r n o w \ \ s e l f . a t t e n t i o n = t f . g e t _ v a r i a b l e ( " A " , s h a p e = [ ] , i n i t i a l i z e r = t f . c o n t r i b . l a y e r s . x a v i e r _ i n i t i a l i z e r ( ) ) # = = = = a s s e m b l e p i e c e s = = = = w i t h t f . v a r i a b l e _ s c o p e ( " q a " , i n i t i a l i z e r = t f . u n i f o r m _ u n i t _ s c a l i n g _ i n i t i a l i z e r ( 1 . 0 ) ) : s e l f . s e t u p _ e m b e d d i n g s ( s e l f . e m b e d _ p a t h ) s e l f . s e t u p _ s y s t e m ( ) s e l f . s e t u p _ l o s s ( ) # = = = = s e t u p t r a i n i n g / u p d a t i n g p r o c e d u r e = = = = o p t f n = g e t _ o p t i m i z e r ( " a d a m " ) o p t i m i z e r = o p t f n ( s e l f . F L A G S . l e a r n i n g _ r a t e ) g r a d s _ a n d _ v a r s = o p t i m i z e r . c o m p u t e _ g r a d i e n t s ( s e l f . l o s s ) g r a d i e n t s , p a r a m e t e r s = z i p ( * g r a d s _ a n d _ v a r s ) g r a d i e n t s = t f . c l i p _ b y _ g l o b a l _ n o r m ( g r a d i e n t s , s e l f . F L A G S . m a x _ g r a d i e n t _ n o r m ) [ 0 ] s e l f . g r a d _ n o r m = t f . g l o b a l _ n o r m ( g r a d i e n t s ) n e w _ g r a d s _ a n d _ v a r s = z i p ( g r a d i e n t s , p a r a m e t e r s ) s e l f . t r a i n _ o p = o p t i m i z e r . a p p l y _ g r a d i e n t s ( n e w _ g r a d s _ a n d _ v a r s ) p a s s d e f s e t u p _ s y s t e m ( s e l f ) : " " " A f t e r y o u r m o d u l a r i z e d i m p l e m e n t a t i o n o f e n c o d e r a n d d e c o d e r y o u s h o u l d c a l l v a r i o u s f u n c t i o n s i n s i d e e n c o d e r , d e c o d e r h e r e t o a s s e m b l e y o u r r e a d i n g c o m p r e h e n s i o n s y s t e m ! : r e t u r n : " " "
q u e s t i o n _ r e p , q u e s t i o n _ s t a t e s = s e l f . q u e s t i o n _ e n c o d e r . e n c o d e ( s e l f . q u e s t i o n _ e m b e d d i n g s , s e l f . q u e s t i o n _ m a s k _ p l a c e h o l d e r , " e n c o d e _ q u e s t i o n " ) p r i n t ( ' q u e s t i o n r e p ' , q u e s t i o n _ r e p ) # 3 2 b y 4 0 0 c o n t e x t _ r e p , c o n t e x t _ s t a t e s = s e l f . c o n t e x t _ e n c o d e r . e n c o d e ( s e l f . c o n t e x t _ e m b e d d i n g s , s e l f . c o n t e x t _ m a s k _ p l a c e h o l d e r , " e n c o d e _ c o n t e x t " ) p r i n t ( ' c o n t e x t s t a t e s ' , c o n t e x t _ s t a t e s ) # p r i n t ( ' c o n t e x t ' , c o n t e x t _ r e p ) # # c o n t e x t r e p i s 3 2 x 4 0 0 # p r i n t ( ' q ' , q u e s t i o n _ r e p ) # r e s h a p e d _ q u e s t i o n _ r e p = t f . r e s h a p e ( q u e s t i o n _ r e p , s h a p e = [ 2 , - 1 , 1 , s e l f . F L A G S . s t a t e _ s i z e ] ) # # p r i n t ( ' r e s h a p e d q u e s t i o n ' , r e s h a p e d _ q u e s t i o n _ r e p ) # a t t e n t i o n _ v e c = t f . n n . s o f t m a x ( t f . r e d u c e _ s u m ( t f . m u l t i p l y ( c o n t e x t _ r e p , q u e s t i o n _ r e p ) , a x i s = [ 3 ] ) ) # a n s w e r _ v e c s = t f . m u l t i p l y ( c o n t e x t _ s t a t e s , t f . r e s h a p e ( a t t e n t i o n _ v e c , s h a p e = [ 2 , - 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , 1 ] ) ) # # a n s w e r _ v e c s = t f . a d d _ n ( c o n t e x t _ s t a t e s , t f . r e s h a p e ( a t t e n t i o n _ v e c , s h a p e = [ 2 , - 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , 1 ] ) ) # s e l f . p r o b _ d i s t r i b u t i o n = s e l f . d e c o d e r . d e c o d e ( t f . r e s h a p e ( a n s w e r _ v e c s [ 1 , : , : , : ] , s h a p e = [ - 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , s e l f . F L A G S . s t a t e _ s i z e ] ) ) p r i n t ( ' c o n t e x t s t a t e s ' , c o n t e x t _ s t a t e s ) p r i n t ( ' c o n t e x t ' , c o n t e x t _ r e p ) # c o n t e x t r e p i s 3 2 x 4 0 0 p r i n t ( ' q ' , q u e s t i o n _ r e p ) b o u n d a r y _ m o d e l = T r u e i f b o u n d a r y _ m o d e l : # p r i n t ( ' r e s h a p e d q u e s t i o n ' , r e s h a p e d _ q u e s t i o n _ r e p ) # a t t e n t i o n _ v e c = t f . n n . s o f t m a x ( t f . r e d u c e _ s u m ( t f . m u l t i p l y ( c o n t e x t _ r e p , q u e s t i o n _ r e p ) , a x i s = [ 3 ] ) ) # a n s w e r _ v e c s = t f . m u l t i p l y ( c o n t e x t _ s t a t e s , t f . r e s h a p e ( a t t e n t i o n _ v e c , s h a p e = [ 2 , - 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , 1 ] ) ) # a n s w e r _ v e c s = t f . a d d _ n ( c o n t e x t _ s t a t e s , t f . r e s h a p e ( a t t e n t i o n _ v e c , s h a p e = [ 2 , - 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , 1 ] ) ) # s e l f . p r o b _ d i s t r i b u t i o n = s e l f . d e c o d e r . d e c o d e ( t f . r e s h a p e ( a n s w e r _ v e c s [ 1 , : , : , : ] , s h a p e = [ - 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , s e l f . F L A G S . s t a t e _ s i z e ] ) ) # r e s h a p e d _ q u e s t i o n _ r e p = t f . r e s h a p e ( q u e s t i o n _ r e p , s h a p e = [ 2 , - 1 , 1 , s e l f . F L A G S . s t a t e _ s i z e ] ) # s e l f . a _ s , s e l f . a _ e = s e l f . d e c o d e r . d e c o d e ( [ q u e s t i o n _ r e p . h , c o n t e x t _ r e p ] ) s e l f . a _ s , s e l f . a _ e = s e l f . d e c o d e r . d e c o d e ( [ q u e s t i o n _ r e p , c o n t e x t _ s t a t e s ] ) e l s e : Q = q u e s t i o n _ s t a t e s D = c o n t e x t _ s t a t e s L = t f . m a t m u l ( D , t f . t r a n s p o s e ( Q , [ 0 , 2 , 1 ] ) ) A D = t f . n n . s o f t m a x ( L ) A Q = t f . n n . s o f t m a x ( t f . t r a n s p o s e ( L , [ 0 , 2 , 1 ] ) ) C Q = t f . m a t m u l ( A Q , D ) p r i n t ( ' Q : ' , Q ) p r i n t ( ' C Q : ' , C Q ) p r i n t ( ' A D : ' , A D ) C D = t f . m a t m u l ( A D , t f . c o n c a t ( 2 , [ Q , C Q ] ) )
h _ s i z e = s e l f . F L A G S . s t a t e _ s i z e W = t f . g e t _ v a r i a b l e ( " W " , s h a p e = [ 1 , h _ s i z e * 6 , h _ s i z e ] , i n i t i a l i z e r = t f . c o n t r i b . l a y e r s . x a v i e r _ i n i t i a l i z e r ( ) , d t y p e = t f . f l o a t 6 4 ) b = t f . g e t _ v a r i a b l e ( " b " , s h a p e = [ 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , h _ s i z e ] , i n i t i a l i z e r = t f . c o n t r i b . l a y e r s . x a v i e r _ i n i t i a l i z e r ( ) , d t y p e = t f . f l o a t 6 4 ) p r i n t ( ' C D : ' , C D ) p r i n t ( ' D : ' , D ) t i l e d _ W = t f . t i l e ( W , t f . p a c k ( [ s e l f . F L A G S . b a t c h _ s i z e , 1 , 1 ] ) ) p r i n t ( ' t i l e d _ W ' , t i l e d _ W ) D = t f . m a t m u l ( t f . c o n c a t ( 2 , [ C D , D ] ) , t i l e d _ W ) + b p r i n t ( ' n e w D : ' , D ) s e l f . a _ s , s e l f . a _ e = s e l f . d e c o d e r . d e c o d e ( D ) d e f s e t u p _ l o s s ( s e l f ) : " " " S e t u p y o u r l o s s c o m p u t a t i o n h e r e : r e t u r n : " " " w i t h v s . v a r i a b l e _ s c o p e ( " l o s s " ) : l 1 = s s c e ( l a b e l s = s e l f . l a b e l s _ s t a r t _ p l a c e h o l d e r , l o g i t s = s e l f . a _ s ) l 2 = s s c e ( l a b e l s = s e l f . l a b e l s _ e n d _ p l a c e h o l d e r , l o g i t s = s e l f . a _ e ) s e l f . l o s s = l 1 + l 2 s e l f . s a v e r = t f . t r a i n . S a v e r ( ) d e f s e t u p _ e m b e d d i n g s ( s e l f , e m b e d _ p a t h ) : " " " L o a d s d i s t r i b u t e d w o r d r e p r e s e n t a t i o n s b a s e d o n p l a c e h o l d e r t o k e n s : r e t u r n : " " " w i t h v s . v a r i a b l e _ s c o p e ( " e m b e d d i n g s " ) : e m b e d d i n g _ d i c t = n p . l o a d ( e m b e d _ p a t h ) p r e t r a i n e d _ e m b e d d i n g = t f . c o n s t a n t ( e m b e d d i n g _ d i c t [ ' g l o v e ' ] ) p r i n t ( s e l f . q u e s t i o n s _ p l a c e h o l d e r ) s e l f . q u e s t i o n _ e m b e d d i n g s = t f . r e s h a p e ( t f . n n . e m b e d d i n g _ l o o k u p ( p r e t r a i n e d _ e m b e d d i n g , s e l f . q u e s t i o n s _ p l a c e h o l d e r ) , s h a p e = [ - 1 , s e l f . m a x _ q u e s t i o n _ l e n g t h , s e l f . F L A G S . e m b e d d i n g _ s i z e ] ) s e l f . c o n t e x t _ e m b e d d i n g s = t f . r e s h a p e ( t f . n n . e m b e d d i n g _ l o o k u p ( p r e t r a i n e d _ e m b e d d i n g , s e l f . c o n t e x t _ p l a c e h o l d e r ) , s h a p e = [ - 1 , s e l f . m a x _ c o n t e x t _ l e n g t h , s e l f . F L A G S . e m b e d d i n g _ s i z e ] ) p r i n t ( s e l f . q u e s t i o n _ e m b e d d i n g s ) d e f o p t i m i z e ( s e l f , s e s s i o n , t r a i n _ x , t r a i n _ y 1 , t r a i n _ y 2 ) : " " " T a k e s i n a c t u a l d a t a t o o p t i m i z e y o u r m o d e l T h i s m e t h o d i s e q u i v a l e n t t o a s t e p ( ) f u n c t i o n : r e t u r n : " " " # p r i n t ( ' t r a i n y 1 ' , t r a i n _ y 1 ) # p r i n t ( ' t r a i n y 2 ' , t r a i n _ y 2 ) i n p u t _ f e e d = { } t r a i n _ q u e s t i o n s , t r a i n _ c o n t e x t , t r a i n _ q _ m a s k , t r a i n _ c _ m a s k = t r a i n _ x i n p u t _ f e e d [ s e l f . q u e s t i o n s _ p l a c e h o l d e r ] = t r a i n _ q u e s t i o n s
i n p u t _ f e e d [ s e l f . c o n t e x t _ p l a c e h o l d e r ] = t r a i n _ c o n t e x t i n p u t _ f e e d [ s e l f . q u e s t i o n _ m a s k _ p l a c e h o l d e r ] = t r a i n _ q _ m a s k i n p u t _ f e e d [ s e l f . c o n t e x t _ m a s k _ p l a c e h o l d e r ] = t r a i n _ c _ m a s k i n p u t _ f e e d [ s e l f . l a b e l s _ s t a r t _ p l a c e h o l d e r ] = t r a i n _ y 1 i n p u t _ f e e d [ s e l f . l a b e l s _ e n d _ p l a c e h o l d e r ] = t r a i n _ y 2 o u t p u t _ f e e d = [ s e l f . t r a i n _ o p , s e l f . l o s s ] o u t p u t s = s e s s i o n . r u n ( o u t p u t _ f e e d , i n p u t _ f e e d ) # r e t u r n s a l l t h r e e o u t p u t f e e d v a l u e s p r i n t ( ' l o s s : ' , n p . m e a n ( o u t p u t s [ 1 ] ) ) r e t u r n o u t p u t s [ 0 ] d e f d e c o d e ( s e l f , s e s s i o n , t e s t _ x ) : " " " R e t u r n s t h e p r o b a b i l i t y d i s t r i b u t i o n o v e r d i f f e r e n t p o s i t i o n s i n t h e p a r a g r a p h s o t h a t o t h e r m e t h o d s l i k e s e l f . a n s w e r ( ) w i l l b e a b l e t o w o r k p r o p e r l y : r e t u r n : " " " i n p u t _ f e e d = { } q u e s t i o n , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k , l a b e l s _ s t a r t , l a b e l s _ e n d , c o n t e x t W o r d , t r a i n A n s w e r , c o n t e x t P a r a g r a p h = z i p ( * t e s t _ x ) i n p u t _ f e e d [ s e l f . q u e s t i o n s _ p l a c e h o l d e r ] = q u e s t i o n i n p u t _ f e e d [ s e l f . c o n t e x t _ p l a c e h o l d e r ] = c o n t e x t i n p u t _ f e e d [ s e l f . q u e s t i o n _ m a s k _ p l a c e h o l d e r ] = q u e s t i o n _ m a s k i n p u t _ f e e d [ s e l f . c o n t e x t _ m a s k _ p l a c e h o l d e r ] = c o n t e x t _ m a s k i n p u t _ f e e d [ s e l f . l a b e l s _ s t a r t _ p l a c e h o l d e r ] = l a b e l s _ s t a r t i n p u t _ f e e d [ s e l f . l a b e l s _ e n d _ p l a c e h o l d e r ] = l a b e l s _ e n d o u t p u t _ f e e d = [ s e l f . a _ s , s e l f . a _ e ] o u t p u t s = s e s s i o n . r u n ( o u t p u t _ f e e d , i n p u t _ f e e d ) r e t u r n o u t p u t s d e f a n s w e r ( s e l f , s e s s i o n , t e s t _ x ) : y p , y p 2 = s e l f . d e c o d e ( s e s s i o n , t e s t _ x ) a _ s = n p . a r g m a x ( y p , a x i s = 1 ) a _ e = n p . a r g m a x ( y p 2 , a x i s = 1 ) r e t u r n ( a _ s , a _ e ) d e f s t a r t V a l i d a t e ( s e l f , s e s s i o n , v a l _ d a t a s e t ) : s a m p l e = 3 2 # c u r r e n t l y o n l y l o o k i n g a t 3 2 e x a m p l e s , c o u l d b e l o o k i n g a t e n t i r e d a t a s e t v a l _ q u e s t i o n s , v a l _ c o n t e x t , v a l _ q u e s t i o n _ m a s k , v a l _ c o n t e x t _ m a s k , v a l _ l a b e l s _ s t a r t , v a l _ l a b e l s _ e n d , v a l _ c o n t e x t W o r d s , v a l A n s w e r s , v a l P a r a g r a p h s = v a l _ d a t a s e t v a l _ t h i n g s = z i p ( v a l _ q u e s t i o n s , v a l _ c o n t e x t , v a l _ q u e s t i o n _ m a s k , v a l _ c o n t e x t _ m a s k , v a l _ l a b e l s _ s t a r t , v a l _ l a b e l s _ e n d ) i n d i c e s = [ r a n d o m . r a n d i n t ( 0 , l e n ( v a l _ q u e s t i o n s ) - 1 ) f o r i i n r a n g e ( s a m p l e ) ] s a m p l e D a t a = [ v a l _ t h i n g s [ i ] f o r i i n i n d i c e s ] # c a l c u l a t e t h e v a l i d a t i o n c o s t f o r t h e v a l _ d a t a s e t # d o w e d o t h i s t o 1 0 0 s a m p l e s o r t h e w h o l e v a l i d a t i o n d a t a s e t 3
v a l _ c o s t = 0 . 0 v a l _ c o s t = s e l f . v a l i d a t e ( s e s s i o n , s a m p l e D a t a ) p r i n t ( ' t h i s i s v a l _ c o s t ' , v a l _ c o s t ) r e t u r n v a l _ c o s t d e f v a l i d a t e ( s e l f , s e s s , v a l _ t h i n g s ) : " " " I t e r a t e t h r o u g h t h e v a l i d a t i o n d a t a s e t a n d d e t e r m i n e w h a t t h e v a l i d a t i o n c o s t i s . T h i s m e t h o d c a l l s s e l f . t e s t ( ) w h i c h e x p l i c i t l y c a l c u l a t e s v a l i d a t i o n c o s t . H o w y o u i m p l e m e n t t h i s f u n c t i o n i s d e p e n d e n t o n h o w y o u d e s i g n y o u r d a t a i t e r a t i o n f u n c t i o n : r e t u r n : " " " v a l i d _ c o s t = 0 v a l i d _ c o s t = s e l f . t e s t ( s e s s , v a l _ t h i n g s ) v a l i d _ c o s t = ( n p . s u m ( v a l i d _ c o s t ) * 1 . 0 ) / l e n ( v a l _ t h i n g s ) r e t u r n v a l i d _ c o s t d e f t e s t ( s e l f , s e s s i o n , v a l _ t h i n g s ) : " " " i n h e r e y o u s h o u l d c o m p u t e a c o s t f o r y o u r v a l i d a t i o n s e t a n d t u n e y o u r h y p e r p a r a m e t e r s a c c o r d i n g t o t h e v a l i d a t i o n s e t p e r f o r m a n c e : r e t u r n : " " " v a l _ q u e s t i o n s , v a l _ c o n t e x t , v a l _ q u e s t i o n _ m a s k , v a l _ c o n t e x t _ m a s k , v a l _ l a b e l s _ s t a r t , v a l _ l a b e l s _ e n d = z i p ( * v a l _ t h i n g s ) i n p u t _ f e e d = { } i n p u t _ f e e d [ s e l f . q u e s t i o n s _ p l a c e h o l d e r ] = v a l _ q u e s t i o n s i n p u t _ f e e d [ s e l f . c o n t e x t _ p l a c e h o l d e r ] = v a l _ c o n t e x t i n p u t _ f e e d [ s e l f . q u e s t i o n _ m a s k _ p l a c e h o l d e r ] = v a l _ q u e s t i o n _ m a s k i n p u t _ f e e d [ s e l f . c o n t e x t _ m a s k _ p l a c e h o l d e r ] = v a l _ c o n t e x t _ m a s k i n p u t _ f e e d [ s e l f . l a b e l s _ s t a r t _ p l a c e h o l d e r ] = v a l _ l a b e l s _ s t a r t i n p u t _ f e e d [ s e l f . l a b e l s _ e n d _ p l a c e h o l d e r ] = v a l _ l a b e l s _ e n d o u t p u t _ f e e d = [ s e l f . l o s s ] o u t p u t s = s e s s i o n . r u n ( o u t p u t _ f e e d , i n p u t _ f e e d ) r e t u r n o u t p u t s d e f c h u n k s ( s e l f , l , n ) : " " " Y i e l d s u c c e s s i v e n - s i z e d c h u n k s f r o m l . " " " f o r i i n r a n g e ( 0 , l e n ( l ) , n ) : y i e l d l [ i : i + n ] d e f c l i p ( s e l f , l , s i z e ) : " " " C l i p s l i s t t o e l i m i n a t e e x c e s s d a t a " " " i f l e n ( l [ - 1 ] ) ! = s i z e : r e t u r n l [ : - 1 ] e l s e : r e t u r n l d e f a u g m e n t ( s e l f , l , s i z e ) : i f l e n ( l [ - 1 ] ) ! = s i z e : a u g m e n t e d _ l a s t _ b a t c h = [ l [ - 1 ] [ i ] i f i < l e n ( l [ - 1 ] ) e l s e N o n e f o r i i n r a n g e ( s i z e ) ] l . p o p ( - 1 ) l . a p p e n d ( a u g m e n t e d _ l a s t _ b a t c h )
r e t u r n l d e f p r e d i c t _ a n s w e r s ( s e l f , s e s s i o n , d a t a s e t ) : q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k = d a t a s e t z i p p e d D a t a = s e l f . c h u n k s ( z i p ( q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k ) , s e l f . F L A G S . b a t c h _ s i z e ) z i p p e d D a t a = s e l f . c l i p ( [ i t e m f o r i t e m i n z i p p e d D a t a ] , s e l f . F L A G S . b a t c h _ s i z e ) a _ s = [ ] a _ e = [ ] f o r c h u n k i n z i p p e d D a t a : i n p u t _ f e e d = { } q u e s t i o n , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k = z i p ( * c h u n k ) i n p u t _ f e e d [ s e l f . q u e s t i o n s _ p l a c e h o l d e r ] = q u e s t i o n i n p u t _ f e e d [ s e l f . c o n t e x t _ p l a c e h o l d e r ] = c o n t e x t i n p u t _ f e e d [ s e l f . q u e s t i o n _ m a s k _ p l a c e h o l d e r ] = q u e s t i o n _ m a s k i n p u t _ f e e d [ s e l f . c o n t e x t _ m a s k _ p l a c e h o l d e r ] = c o n t e x t _ m a s k o u t p u t _ f e e d = [ s e l f . a _ s , s e l f . a _ e ] a _ s _ c h u n k , a _ e _ c h u n k = s e s s i o n . r u n ( o u t p u t _ f e e d , i n p u t _ f e e d ) a _ s _ c h u n k = n p . a r g m a x ( a _ s _ c h u n k , a x i s = 1 ) a _ e _ c h u n k = n p . a r g m a x ( a _ e _ c h u n k , a x i s = 1 ) a _ s _ c h u n k = a _ s _ c h u n k . t o l i s t ( ) a _ e _ c h u n k = a _ e _ c h u n k . t o l i s t ( ) f o r i i n r a n g e ( s e l f . F L A G S . b a t c h _ s i z e ) : a _ s . a p p e n d ( a _ s _ c h u n k [ i ] ) a _ e . a p p e n d ( a _ e _ c h u n k [ i ] ) # p r i n t ( ' a _ s ' , a _ s ) # p r i n t ( ' a _ e ' , a _ e ) r e t u r n a _ s , a _ e d e f e v a l u a t e _ a n s w e r ( s e l f , s e s s i o n , d a t a s e t , s a m p l e = 3 2 , l o g = F a l s e ) : # n e e d t o g e t t h i s t o w o r k w h e n s a m p l e s i z e i s n o t b a t c h s i z e " " " E v a l u a t e t h e m o d e l ' s p e r f o r m a n c e u s i n g t h e h a r m o n i c m e a n o f F 1 a n d E x a c t M a t c h ( E M ) w i t h t h e s e t o f t r u e a n s w e r l a b e l s T h i s s t e p a c t u a l l y t a k e s q u i t e s o m e t i m e . S o w e c a n o n l y s a m p l e 1 0 0 e x a m p l e s f r o m e i t h e r t r a i n i n g o r t e s t i n g s e t . : p a r a m s e s s i o n : s e s s i o n s h o u l d a l w a y s b e c e n t r a l l y m a n a g e d i n t r a i n . p y : p a r a m d a t a s e t : a r e p r e s e n t a t i o n o f o u r d a t a , i n s o m e i m p l e m e n t a t i o n s , y o u c a n p a s s i n m u l t i p l e c o m p o n e n t s ( a r g u m e n t s ) o f o n e d a t a s e t t o t h i s f u n c t i o n : p a r a m s a m p l e : h o w m a n y e x a m p l e s i n d a t a s e t w e l o o k a t : p a r a m l o g : w h e t h e r w e p r i n t t o s t d o u t s t r e a m : r e t u r n : " " " s a m p l e = 2 0 0 q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k , l a b e l s _ s t a r t , l a b e l s _ e n d , c o n t e x t W o r d s , t r a i n A n s w e r s , c o n t e x t P a r a g r a p h s = d a t a s e t i n d i c e s = [ r a n d o m . r a n d i n t ( 0 , l e n ( q u e s t i o n s ) - 1 ) f o r i i n r a n g e ( s a m p l e ) ] z i p p e d D a t a = z i p ( q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k , l a b e l s _ s t a r t , l a b e l s _ e n d , c o n t e x t W o r d s , t r a i n A n s w e r s , c o n t e x t P a r a g r a p h s ) # s a m p l e D a t a = [ z i p p e d D a t a [ i ] f o r i i n i n d i c e s ] z i p p e d D a t a = [ z i p p e d D a t a [ i ] f o r i i n i n d i c e s ] # [ : 3 2 ]
f 1 = 0 e m = 0 # p r i n t ( ' z i p p e d D a t a ' , z i p p e d D a t a ) z i p p e d D a t a = s e l f . c h u n k s ( z i p p e d D a t a , s e l f . F L A G S . b a t c h _ s i z e ) z i p p e d D a t a = [ i t e m f o r i t e m i n z i p p e d D a t a ] z i p p e d D a t a = s e l f . c l i p ( z i p p e d D a t a , s e l f . F L A G S . b a t c h _ s i z e ) a _ s = [ ] a _ e = [ ] # p r i n t ( ' z i p p e d l e n g t h s ' , [ l e n ( x ) f o r x i n z i p p e d D a t a ] ) f o r c h u n k i n z i p p e d D a t a : a _ s _ c h u n k , a _ e _ c h u n k = s e l f . a n s w e r ( s e s s i o n , c h u n k ) a _ s _ c h u n k = a _ s _ c h u n k . t o l i s t ( ) a _ e _ c h u n k = a _ e _ c h u n k . t o l i s t ( ) a _ s . a p p e n d ( a _ s _ c h u n k ) a _ e . a p p e n d ( a _ e _ c h u n k ) # p r i n t ( ' a _ s l e n s ' , [ l e n ( x ) f o r x i n a _ s ] ) a _ s = s e l f . c l i p ( a _ s , s e l f . F L A G S . b a t c h _ s i z e ) a _ e = s e l f . c l i p ( a _ e , s e l f . F L A G S . b a t c h _ s i z e ) # p r i n t ( ' a _ s l e n s ' , [ l e n ( x ) f o r x i n a _ s ] ) i n d i c e s = s e l f . c h u n k s ( i n d i c e s , s e l f . F L A G S . b a t c h _ s i z e ) i n d i c e s = [ i t e m f o r i t e m i n i n d i c e s ] i n d i c e s = s e l f . c l i p ( i n d i c e s , s e l f . F L A G S . b a t c h _ s i z e ) c o n t e x t W o r d s C o u n t e r = 0 # f o r i i n r a n g e ( l e n ( z i p p e d D a t a ) ) : f o r i , k i n e n u m e r a t e ( i n d i c e s ) : f o r j , i n d e x i n e n u m e r a t e ( k ) : c u r r e n t _ z i p p e d D a t a _ b u c k e t = z i p p e d D a t a [ i ] # p r i n t ( ' c z b ' , c u r r e n t _ z i p p e d D a t a _ b u c k e t ) c u r r e n t _ a _ s _ b u c k e t = a _ s [ i ] c u r r e n t _ a _ e _ b u c k e t = a _ e [ i ] # f o r j i n r a n g e ( l e n ( c u r r e n t _ z i p p e d D a t a _ b u c k e t ) ) : p = c o n t e x t W o r d s [ i n d e x ] p = [ i t e m [ 0 ] f o r i t e m i n p ] s t a r t A n s w e r I n d e x = c u r r e n t _ a _ s _ b u c k e t [ j ] e n d A n s w e r I n d e x = c u r r e n t _ a _ e _ b u c k e t [ j ] a n s w e r = ( p [ s t a r t A n s w e r I n d e x : e n d A n s w e r I n d e x + 1 ] ) a n s w e r = " " . j o i n ( s t r ( x ) f o r x i n a n s w e r ) t r u e A n s w e r = t r a i n A n s w e r s [ i n d e x ] p r i n t ( ' t r u e a n s w e r : ' , t r u e A n s w e r ) p r i n t ( ' p r e d i c t e d a n s w e r : ' , a n s w e r ) f 1 + = f 1 _ s c o r e ( a n s w e r , t r u e A n s w e r ) e m + = e x a c t _ m a t c h _ s c o r e ( a n s w e r , t r u e A n s w e r ) c o n t e x t W o r d s C o u n t e r + = 1 l e n g t h = s u m ( [ l e n ( i ) f o r i i n i n d i c e s ] ) f 1 = ( f 1 * 1 . 0 ) / ( l e n g t h ) e m = ( e m * 1 . 0 ) / ( l e n g t h ) p r i n t ( ' f 1 ' , f 1 ) p r i n t ( ' e m ' , e m ) r e t u r n f 1 , e m d e f t r a i n ( s e l f , s e s s i o n , d a t a s e t , t r a i n _ d i r , v a l _ d a t a s e t ) : " " "
I m p l e m e n t m a i n t r a i n i n g l o o p T I P S : Y o u s h o u l d a l s o i m p l e m e n t l e a r n i n g r a t e a n n e a l i n g ( l o o k i n t o t f . t r a i n . e x p o n e n t i a l _ d e c a y ) C o n s i d e r i n g t h e l o n g t i m e t o t r a i n , y o u s h o u l d s a v e y o u r m o d e l p e r e p o c h . M o r e a m b i t i o u s a p p o a r c h c a n i n c l u d e i m p l e m e n t e a r l y s t o p p i n g , o r r e l o a d p r e v i o u s m o d e l s i f t h e y h a v e h i g h e r p e r f o r m a n c e t h a n t h e c u r r e n t o n e A s s u g g e s t e d i n t h e d o c u m e n t , y o u s h o u l d e v a l u a t e y o u r t r a i n i n g p r o g r e s s b y p r i n t i n g o u t i n f o r m a t i o n e v e r y f i x e d n u m b e r o f i t e r a t i o n s . W e r e c o m m e n d y o u e v a l u a t e y o u r m o d e l p e r f o r m a n c e o n F 1 a n d E M i n s t e a d o f j u s t l o o k i n g a t t h e c o s t . : p a r a m s e s s i o n : i t s h o u l d b e p a s s e d i n f r o m t r a i n . p y : p a r a m d a t a s e t : a r e p r e s e n t a t i o n o f o u r d a t a , i n s o m e i m p l e m e n t a t i o n s , y o u c a n p a s s i n m u l t i p l e c o m p o n e n t s ( a r g u m e n t s ) o f o n e d a t a s e t t o t h i s f u n c t i o n : p a r a m t r a i n _ d i r : p a t h t o t h e d i r e c t o r y w h e r e y o u s h o u l d s a v e t h e m o d e l c h e c k p o i n t : r e t u r n : " " " # s o m e f r e e c o d e t o p r i n t o u t n u m b e r o f p a r a m e t e r s i n y o u r m o d e l # i t ' s a l w a y s g o o d t o c h e c k ! # y o u w i l l a l s o w a n t t o s a v e y o u r m o d e l p a r a m e t e r s i n t r a i n _ d i r # s o t h a t y o u c a n u s e y o u r t r a i n e d m o d e l t o m a k e p r e d i c t i o n s , o r # e v e n c o n t i n u e t r a i n i n g d e f g e t _ m i n i b a t c h e s ( d a t a , m i n i b a t c h _ s i z e , s h u f f l e = T r u e ) : " " " I t e r a t e s t h r o u g h t h e p r o v i d e d d a t a o n e m i n i b a t c h a t a t t i m e . Y o u c a n u s e t h i s f u n c t i o n t o i t e r a t e t h r o u g h d a t a i n m i n i b a t c h e s a s f o l l o w s : f o r i n p u t s _ m i n i b a t c h i n g e t _ m i n i b a t c h e s ( i n p u t s , m i n i b a t c h _ s i z e ) : . . . O r w i t h m u l t i p l e d a t a s o u r c e s : f o r i n p u t s _ m i n i b a t c h , l a b e l s _ m i n i b a t c h i n g e t _ m i n i b a t c h e s ( [ i n p u t s , l a b e l s ] , m i n i b a t c h _ s i z e ) : . . . A r g s : d a t a : t h e r e a r e t w o p o s s i b l e v a l u e s : - a l i s t o r n u m p y a r r a y - a l i s t w h e r e e a c h e l e m e n t i s e i t h e r a l i s t o r n u m p y a r r a y m i n i b a t c h _ s i z e : t h e m a x i m u m n u m b e r o f i t e m s i n a m i n i b a t c h s h u f f l e : w h e t h e r t o r a n d o m i z e t h e o r d e r o f r e t u r n e d d a t a R e t u r n s : m i n i b a t c h e s : t h e r e t u r n v a l u e d e p e n d s o n d a t a : - I f d a t a i s a l i s t / a r r a y i t y i e l d s t h e n e x t m i n i b a t c h o f d a t a .
- I f d a t a a l i s t o f l i s t s / a r r a y s i t r e t u r n s t h e n e x t m i n i b a t c h o f e a c h e l e m e n t i n t h e l i s t . T h i s c a n b e u s e d t o i t e r a t e t h r o u g h m u l t i p l e d a t a s o u r c e s ( e . g . , f e a t u r e s a n d l a b e l s ) a t t h e s a m e t i m e . " " " l i s t _ d a t a = t y p e ( d a t a ) i s l i s t a n d ( t y p e ( d a t a [ 0 ] ) i s l i s t o r t y p e ( d a t a [ 0 ] ) i s n p . n d a r r a y ) d a t a _ s i z e = l e n ( d a t a [ 0 ] ) i f l i s t _ d a t a e l s e l e n ( d a t a ) i n d i c e s = n p . a r a n g e ( d a t a _ s i z e ) i f s h u f f l e : n p . r a n d o m . s h u f f l e ( i n d i c e s ) f o r m i n i b a t c h _ s t a r t i n n p . a r a n g e ( 0 , d a t a _ s i z e , m i n i b a t c h _ s i z e ) : m i n i b a t c h _ i n d i c e s = i n d i c e s [ m i n i b a t c h _ s t a r t : m i n i b a t c h _ s t a r t + m i n i b a t c h _ s i z e ] y i e l d [ m i n i b a t c h ( d , m i n i b a t c h _ i n d i c e s ) f o r d i n d a t a ] i f l i s t _ d a t a \ e l s e m i n i b a t c h ( d a t a , m i n i b a t c h _ i n d i c e s ) d e f m i n i b a t c h ( d a t a , m i n i b a t c h _ i d x ) : i f t y p e ( d a t a ) i s n p . n d a r r a y : r e t u r n d a t a [ m i n i b a t c h _ i d x ] e l s e : r e s u l t = [ ] f o r i i n m i n i b a t c h _ i d x : r e s u l t . a p p e n d ( d a t a [ i ] ) r e t u r n r e s u l t t i c = t i m e . t i m e ( ) p a r a m s = t f . t r a i n a b l e _ v a r i a b l e s ( ) n u m _ p a r a m s = s u m ( m a p ( l a m b d a t : n p . p r o d ( t f . s h a p e ( t . v a l u e ( ) ) . e v a l ( ) ) , p a r a m s ) ) t o c = t i m e . t i m e ( ) l o g g i n g . i n f o ( " N u m b e r o f p a r a m s : % d ( r e t r e i v a l t o o k % f s e c s ) " % ( n u m _ p a r a m s , t o c - t i c ) ) q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k , l a b e l s _ s t a r t , l a b e l s _ e n d , c o n t e x t W o r d s , t r a i n A n s w e r s , c o n t e x t P a r a g r a p h s = d a t a s e t s e l f . c o n t e x t W o r d s = c o n t e x t W o r d s f o r e p o c h i n r a n g e ( s e l f . F L A G S . e p o c h s ) : p r i n t ( " E p o c h % d o u t o f % d " % ( e p o c h + 1 , s e l f . F L A G S . e p o c h s ) ) i = 1 p r i n t ( s e l f . F L A G S . b a t c h _ s i z e ) f o r q _ m b , c _ m b , q _ m _ m b , c _ m _ m b , l a b e l s _ s t a r t _ m i n i b a t c h , l a b e l s _ e n d _ m i n i b a t c h i n g e t _ m i n i b a t c h e s ( [ q u e s t i o n s , c o n t e x t , q u e s t i o n _ m a s k , c o n t e x t _ m a s k , l a b e l s _ s t a r t , l a b e l s _ e n d ] , s e l f . F L A G S . b a t c h _ s i z e ) : i f ( l e n ( c o n t e x t ) - ( i - 1 ) * ( s e l f . F L A G S . b a t c h _ s i z e ) ) > = s e l f . F L A G S . b a t c h _ s i z e : i f i % 1 0 = = 0 : p r i n t ( ' M i n i b a t c h n u m b e r : ' , i ) i + = 1 i n p u t s _ m i n i b a t c h = ( q _ m b , c _ m b , q _ m _ m b , c _ m _ m b ) # p r i n t ( i n p u t s _ m i n i b a t c h ) s e l f . o p t i m i z e ( s e s s i o n , i n p u t s _ m i n i b a t c h , l a b e l s _ s t a r t _ m i n i b a t c h , l a b e l s _ e n d _ m i n i b a t c h ) i f i % 5 0 0 = = 0 : r e s u l t s _ p a t h = " / { : % Y % m % d _ % H % M % S } / " . f o r m a t ( d a t e t i m e . d a t e t i m e . n o w ( ) ) # r e s u l t s _ p a t h = " / n e w F i l e " # m o d e l _ p a t h = r e s u l t s _ p a t h + " m o d e l . w e i g h t s / "
# g l o b a l _ t r a i n _ d i r = " / t m p / c s 2 2 4 n - s q u a d - t r a i n " m o d e l _ p a t h = t r a i n _ d i r + r e s u l t s _ p a t h p r i n t ( " t r a i n _ d i r " , t r a i n _ d i r ) i f n o t o s . p a t h . e x i s t s ( m o d e l _ p a t h ) : o s . m a k e d i r s ( m o d e l _ p a t h ) l o g g i n g . g e t L o g g e r ( ) . i n f o ( " N e w b e s t s c o r e ! S a v i n g m o d e l i n % s " , m o d e l _ p a t h ) # s e l f . s a v e r . s a v e ( s e s s i o n , m o d e l _ p a t h ) s e l f . s a v e r . s a v e ( s e s s i o n , m o d e l _ p a t h + " m o d e l . c k p t " ) r e s u l t s _ p a t h = " / { : % Y % m % d _ % H % M % S } / " . f o r m a t ( d a t e t i m e . d a t e t i m e . n o w ( ) ) # r e s u l t s _ p a t h = " / n e w F i l e " # m o d e l _ p a t h = r e s u l t s _ p a t h + " m o d e l . w e i g h t s / " m o d e l _ p a t h = t r a i n _ d i r + r e s u l t s _ p a t h i f n o t o s . p a t h . e x i s t s ( m o d e l _ p a t h ) : o s . m a k e d i r s ( m o d e l _ p a t h ) l o g g i n g . g e t L o g g e r ( ) . i n f o ( " N e w b e s t s c o r e ! S a v i n g m o d e l i n % s " , m o d e l _ p a t h ) # s e l f . s a v e r . s a v e ( s e s s i o n , m o d e l _ p a t h ) p r i n t ( " T H I S I S H E R E " , m o d e l _ p a t h + " / m o d e l . c k p t " ) s e l f . s a v e r . s a v e ( s e s s i o n , m o d e l _ p a t h + " m o d e l . c k p t " ) s e l f . s t a r t V a l i d a t e ( s e s s i o n , v a l _ d a t a s e t ) q a _ a n s w e r . p y
f rom __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import io
import os
import json
import sys
import random
from os.path import join as pjoin
from tqdm import tqdm
import numpy as np
from six.moves import xrange
import tensorf low as tf
from qa_model import Encoder, QASystem, Decoder
from preprocessing.squad_preprocess import data_from_json, maybe_download, squad_base_url , \
invert_map, tokenize, token_idx_map
import qa_data
import logging
logging.basicConfig( level=logging.INFO)
FLAGS = tf .app.f lags.FLAGS
tf .app.f lags.DEFINE_float(" learning_rate", 0.001, "Learning rate.")
t f .app.f lags.DEFINE_float("dropout", 0.15, "Fraction of units randomly dropped on non-recurrent connections.")
t f .app.f lags.DEFINE_integer("batch_size", 32, "Batch size to use during training.")
t f .app.f lags.DEFINE_integer("epochs", 0, "Number of epochs to train.")
t f .app.f lags.DEFINE_integer("state_size", 124, "Size of each model layer.")
t f .app.f lags.DEFINE_integer("embedding_size", 100, "Size of the pretrained vocabulary.")
t f .app.f lags.DEFINE_integer("output_size", 400, "The output size of your model.")
t f .app.f lags.DEFINE_integer("keep", 0, "How many checkpoints to keep, 0 indicates keep al l .")
#tf .app.f lags.DEFINE_string("train_dir" , "train/20170316_002002/model.weights", "Training directory (default : . / train) .")
# tf .app.f lags.DEFINE_str ing("train_dir" , "train", "Training directory (default : . / train) .")
#tf .app.f lags.DEFINE_string("train_dir" , "train/20170318_192824/model.weights", "Training directory (default : . / train) .")
#tf .app.f lags.DEFINE_string("train_dir" , "train /20170319_154527/model.weights", "Training directory (default : . / train) .")
t f .app.f lags.DEFINE_string("log_dir" , " log", "Path to store log and f lag f i les (default : . / log)")
t f .app.f lags.DEFINE_string("vocab_path", "data/squad/vocab.dat", "Path to vocab f i le (default : . /data/squad/vocab.dat)")
t f .app.f lags.DEFINE_string("embed_path", "" , "Path to the tr immed GLoVe embedding (default : . /data/squad/glove.tr immed.{embedding_size}.npz)")
t f .app.f lags.DEFINE_string("dev_path", "data/squad/dev-v1.1. json", "Path to the JSON dev set to evaluate against (default : . /data/squad/dev-v1.1. json)")
t f .app.f lags.DEFINE_float("max_gradient_norm", 20.0, "Cl ip gradients to this norm.")
t f .app.f lags.DEFINE_string("optimizer", "adam", "adam / sgd")
def init ial ize_model(session, model, train_dir) :
ckpt = tf .train.get_checkpoint_state(train_dir)
v2_path = ckpt.model_checkpoint_path + " . index" i f ckpt else ""
#print( 'checkpoint path' , ckpt.model_checkpoint_path)
#print( ' f i rst bool ' , t f .gf i le.Exists(v2_path))
#print( 'second bool ' , t f .gf i le.Exists(ckpt.model_checkpoint_path))
i f ckpt and (tf .gf i le.Exists(ckpt.model_checkpoint_path) or tf .gf i le.Exists(v2_path)) :
logging. info("Reading model parameters from %s" % ckpt.model_checkpoint_path)
model.saver.restore(session, ckpt.model_checkpoint_path)
else:
logging. info("Created model with fresh parameters.")
session.run(tf .global_variables_init ial izer() )
logging. info( 'Num params: %d' % sum(v.get_shape() .num_elements() for v in tf .trainable_variables()) )
return model
def init ial ize_vocab(vocab_path):
i f t f .gf i le.Exists(vocab_path):
rev_vocab = [ ]
with tf .gf i le.GFile(vocab_path, mode="rb") as f :
rev_vocab.extend(f .readlines())
rev_vocab = [ l ine.str ip( ' \n ' ) for l ine in rev_vocab]
vocab = dict([ (x, y) for (y, x) in enumerate(rev_vocab)])
return vocab, rev_vocab
else:
raise ValueError("Vocabulary f i le %s not found.", vocab_path)
def read_dataset(dataset, t ier, vocab):
"""Reads the dataset, extracts context, question, answer,
and answer pointer in their own f i le. Returns the number
of questions and answers processed for the dataset"""
context_data = [ ]
query_data = [ ]
question_uuid_data = [ ]
for art icles_id in tqdm(range(len(dataset[ 'data'] ) ) , desc="Preprocessing {}" . format(t ier)) :
art icle_paragraphs = dataset[ 'data'] [art icles_id][ 'paragraphs']
for pid in range(len(art icle_paragraphs)) :
context = art icle_paragraphs[pid][ 'context ' ]
# The fol lowing replacements are suggested in the paper
# BidAF (Seo et al . , 2016)
context = context.replace(" ' '" , '" ' )
context = context.replace("` `" , '" ' )
context_tokens = tokenize(context)
qas = art icle_paragraphs[pid][ 'qas']
for qid in range(len(qas)) :
question = qas[qid][ 'question']
question_tokens = tokenize(question)
question_uuid = qas[qid][ ' id ' ]
context_ids = [str(vocab.get(w, qa_data.UNK_ID)) for w in context_tokens]
qustion_ids = [str(vocab.get(w, qa_data.UNK_ID)) for w in question_tokens]
context_data.append( ' ' . join(context_ids))
query_data.append( ' ' . join(qustion_ids))
question_uuid_data.append(question_uuid)
return context_data, query_data, question_uuid_data
def prepare_dev(prefix, dev_f i lename, vocab):
# Don't check f i le size, since we could be using other datasets
dev_dataset = maybe_download(squad_base_url , dev_f i lename, prefix)
dev_data = data_from_json(os.path. join(prefix, dev_f i lename))
context_data, question_data, question_uuid_data = read_dataset(dev_data, 'dev' , vocab)
return context_data, question_data, question_uuid_data
def generate_answers(sess, model, dataset, rev_vocab):
"""
Loop over the dev or test dataset and generate answer.
Note: output format must be answers[uuid] = "real answer"
You must provide a str ing of words instead of just a l ist , or start and end index
In main() function we are dumping onto a JSON f i le
evaluate.py wil l take the output JSON along with the original JSON f i le
and output a F1 and EM
You must implement this function in order to submit to Leaderboard.
:param sess: active TF session
:param model: a bui lt QASystem model
:param rev_vocab: this is a l ist of vocabulary that maps index to actual words
: return:
"""
question_data, context_data, question_masks, context_masks, question_uuid_data = dataset
predict_dataset = (question_data, context_data, question_masks, context_masks)
a_s, a_e = model.predict_answers(sess, predict_dataset)
answers = (context_data[i ] [s:e + 1] for i , (s, e) in enumerate(zip(a_s, a_e)))
answers = [" " . join(rev_vocab[x] for x in answer) for answer in answers]
answers = [answers[i ] i f i < len(answers) else "overf low" for i in range(len(question_data))]
answers_dict = { }
for a, uuid in zip(answers, question_uuid_data):
answers_dict[uuid] = a
return answers_dict
def get_normalized_train_dir(train_dir) :
"""
Adds symlink to {train_dir} from /tmp/cs224n-squad-train to canonicalize the
f i le paths saved in the checkpoint. This al lows the model to be reloaded even
i f the location of the checkpoint f i les has moved, al lowing usage with CodaLab.
This must be done on both train.py and qa_answer.py in order to work.
"""
global_train_dir = ' / tmp/cs224n-squad-train'
# global_train_dir = ' . / train/20170316_002002/model.weights'
i f os.path. lexists(global_train_dir) :
pr int( ' lexists' )
os.unl ink(global_train_dir)
i f not os.path.exists(train_dir) :
pr int( 'path does not exist ' )
os.makedirs(train_dir)
pr int( 'path: ' , os.path.abspath(train_dir) )
pr int( 'global train dir : ' , global_train_dir)
os.symlink(os.path.abspath(train_dir) , global_train_dir)
return global_train_dir
def main(_) :
def padded_vector(vector, max_length):
return [vector[ i ] i f i < len(vector) else vocab[ '<pad>'] for i in range(max_length)]
vocab, rev_vocab = init ial ize_vocab(FLAGS.vocab_path)
embed_path = FLAGS.embed_path or pjoin("data", "squad", "glove.tr immed.{} .npz".format(FLAGS.embedding_size))
i f not os.path.exists(FLAGS.log_dir) :
os.makedirs(FLAGS.log_dir)
f i le_handler = logging.Fi leHandler(pjoin(FLAGS.log_dir , " log.txt"))
logging.getLogger() .addHandler(f i le_handler)
pr int(vars(FLAGS))
with open(os.path. join(FLAGS.log_dir , "f lags. json") , 'w') as fout:
json.dump(FLAGS.__f lags, fout)
# ========= Load Dataset =========
# You can change this code to load dataset in your own way
max_question_length = 60
max_context_length = FLAGS.output_size
dev_dirname = os.path.dirname(os.path.abspath(FLAGS.dev_path))
dev_f i lename = os.path.basename(FLAGS.dev_path)
context_data, question_data, question_uuid_data = prepare_dev(dev_dirname, dev_f i lename, vocab)
context_data = [ [ int(n) for n in c.spl it ( ) ] for c in context_data]
question_data = [ [ int(n) for n in q.spl it ( ) ] for q in question_data]
context_masks = [ len(c) for c in context_data]
question_masks = [ len(q) for q in question_data]
context_data = [padded_vector(c, max_context_length) for c in context_data]
question_data = [padded_vector(q, max_question_length) for q in question_data]
#print( 'question uuids' , question_uuid_data)
#print( 'context data' , context_data)
#print( 'dirname, f i lename: ' , dev_dirname, dev_f i lename)
dataset = (question_data, context_data, question_masks, context_masks, question_uuid_data)
# ========= Model-specif ic =========
# You must change the fol lowing code to adjust to your model
question_encoder = Encoder(size=FLAGS.state_size)
context_encoder = Encoder(size=FLAGS.state_size)
decoder = Decoder(output_size=FLAGS.output_size)
qa = QASystem(question_encoder, context_encoder, decoder, embed_path, question_masks, context_masks)
with tf .Session() as sess:
train_dir = get_normalized_train_dir( ' train' )
#train_dir = ' . / train/20170321_110723'
init ial ize_model(sess, qa, train_dir)
answers = generate_answers(sess, qa, dataset, rev_vocab)
# write to json f i le to root dir
with io.open( 'dev-prediction. json' , 'w' , encoding='utf -8 ' ) as f :
f .write(unicode(json.dumps(answers, ensure_ascii=False)))
i f __name__ == "__main__":
t f .app.run()
Reading Comprehension on SQuADStanford University (CS224N)
Meghana Rao, Brexton Pham, Zach TaylorMarch 2017
Background
What are we looking at? Question AnsweringPrevious approaches
- Multiperspective Context Matching (MPCM) Model
- Dynamic Coattention Network (DCN)
- Match-LSTM with Answer Pointer
DatasetsFor our dataset, we used the Stanford
Question Answering Dataset (SQuAD),
which contains context paragraphs that
were extracted from a set of articles from
Wikipedia. Humans then generated
questions using that paragraph as a context,
and selected a span from the same
paragraph as the target answer.SQuAD is comprised of 100K question-answer pairs, along with a context paragraph. The answers are of variable lengths and require various forms of multi-sentence reasoning. The span of answers are accessible in a reference document.
Methods/Algorithms/Models1: A simple baseline in which we fed both the
question and the context through an LSTM, and
then for each hidden state in our context LSTM,
we ran a simple dot product mechanism against
the final question state to compute the probability
that any index was the start or end index.
2: A model utilizing bidrectional LSTM’s along
with a question-aware context encoding. This
model also had a simple attention mechanism, and
we decoded the representations linearly.
3: A dynamic coattention model with an LSTM
classifier as the decoder
Experimental Evaluation and Findings
Future DirectionsFor possible future directions, an obvious area of improvement would be changing the model to be able to predicted longer-than-one-word answers. This would involve careful experimentation and modification of the one model that we managed to get some meaningful performance out of. We could then graduate to implementing more advanced attention mechanisms. Potentially, we could learn what about our more advanced models didn’t manage to translate to the evaluation set, and fix that element. In terms of other advanced models that we are interested in , answer pointers seem like a potentially valuable next step to take if we got the other elements working.
Problem StatementIn the SQuAD task, the goal is to predict an
answer span tuple {as, ae} given a question
of length n, q = {q1, q2, ..., qn}, and a
supporting context paragraph p = {p1, p2, ...,
pm} of length m. Thus, the model learns a
function that, given a pair of sequences (q,
p) returns a sequence of two scalar indices
{as, ae} indicating the start position and end
position of the answer in paragraph p,
respectively.
Scenarios to Consider1. Overfitting and underfitting
2. Accounting for unknown words in
vocab
3. Amount of parameters and
hyperparameter tuning
ConclusionAfter completing this project, we have gained
perspective into the challenges involved with
implementing a successful neural network for the
SQuAD reading comprehension task. We
ultimately demonstrated a functioning baseline
model after investing into implementing more
complex models that had long training durations
and overfit the training data. Initial successes on
the training data with more complex models did
not perform well on the validation set and thus
catalyzed a series of simplifications and
debugging procedures to scale back the model to
a slender and iterable scale. In retrospect, we wish
to have implemented the barebones model first
before attempting to implement more ambitious
models. Nevertheless, we at least managed to get
some positive performance eventually, and we
learned an incredible amount about implementing
neural network models for NLP. Also, we thought
that even 13% of correct answers was a worthy
accomplishment on such a difficult machine
comprehension task.
Predicted Answer
Correct Answer
hunting no natural predators
claims religious bigotry
March Bonus March
1639 1639
Catholic true apostolic succession
Toronto Toronto
Model F1 Score
EM Score
Separate Encodings
24.25 14.01
FINAL TEST SET RESULTS (on codalab):
Visualization
Question: Where is the Center for New Religions located ?
Context Paragraph: Robert S. Wood has argued that the
United States is a model for the world in terms of how a
separation of church and state—no state-run or
state-established church—is good for both the church and the
state , allowing a variety of religions to flourish . Speaking at
the Toronto-based Center for New Religions , Wood said that
the freedom of conscience and assembly allowed under such
a system has led to a " remarkable religiosity " in the United
States that is n't present in other industrialized nations .
Wood believes that the U.S. operates on " a sort of civic
religion , " which includes a generally-shared belief in a
creator who " expects better of us . " Beyond that , individuals
are free to decide how they want to believe and fill in their
own creeds and express their conscience . He calls this
approach the " genius of religious sentiment in the United
States .
"Predicted Answer, Correct Answer: Toronto, Toronto