It's been a while since my last post, I took some time off to really think about the project and figure out how the heck I would actually build it (it was a bit more ambitious than I had thought). I've now finished and implemented the training algorithm / model construction. My next steps are to train various models, and play around with it. I will also need to do more testing for checkpoint saving and restoration.
This post, like most of my others will probably only be of interest to people just starting out in deep learning. I am very new to it myself, and am still just 'learning the ropes'. This post will take my readers through my triumphs and failures (emphasis on the latter).
Unlike in machine translation, conversational modelling has continuous source target pairs. This is because our conversation several time steps later is partially determined by what we are talking about now. Whereas if I am translating something someone says one translation time step doesn't depend on previous ones, though I suppose it could provide context. If we treated conversation the same way it would be like talking to someone with a short term memory of a couple seconds. This changes a few things. As far as I can tell because conversations are continuous, order will matter in training, which means I shouldn't be randomly shuffling the data rows. This also means that the hidden state should not be cleared until
n source target pair have been passed through. I will probably do multiple tests to determine a good
n value. Ideally I would want
n to be infinity, but practicality dictates I take a smaller discrete value.
A second consideration I have is that I am now using an embedding layer in the encoder that is learned during training. Originally I was planning on pre-initializing it with word2vec representations. In hindsight, I'm glad I chose to use an embedding layer for a few reasons. First off, it's conceptually easier and less work to just 'throw' in an input embedding layer. Secondly, it allows for program specific embeddings to be learned.
What I mean by program specific embeddings is suppose that the chatbot is created to help with IT support. In this context you may want words like 'mouse' and 'usb' to be relatively close in cosine distance, however if a more general context is used it may put those words slightly farther apart depending on the corpus it was trained.
Redoing Most of it
What really tripped me up a few months ago when I began this project was really just figuring out how to build the computation graph in code. I had of course looked through other github projects and related articles/papers to try to get a grasp for it, but nothing was clicking. Eventually I noticed a new torch library by Element Research. After all the work I had done before to try and 'make it work' I was hesitant to redo it all. In the end, sanity won out and I went with the newer cleaner library. In doing so I feel like I actually gained a much broader understanding of how these directed acyclic graphs can actually be built in code.
In all honesty, to redo the project it did not take me as long as I thought. I would not hesitate to redo similar projects in the future if I discover easier ways to do it that significantly improve understanding and reduce development time.
Road Map for What's Next
I still have some work to do on this project, most notably:
1.Finish GPU computation ability (without which training models is absurdly difficult)
- Train some more complex models
- Finish the user interface to actually get user readable results
- Add some sort of script to make it easy for anyone to one click run the beast (which includes data downloader)
My next post related to this project will be a final summary of the model, some parameters that work, and most importantly text results.
Stay Tuned - Sentiment Analysis
On another note I've built a sentiment analyzer using tensorflow. I was unable to find one (done with tensorflow using deep neural nets) on Github, so I figured it would be a good project. It took considerably less time than the seq2seq chatbot for various reason not limited to:
Python has a library for everything
It was a much easier graph to build (just an embedding layer to LSTM to logistic regression output)
Just getting better with experience
I will probably just make one post for it highlighting how it was built, going through my successes and failures in building it. Of course, all this will be done when I get caught up on the mountain of school work I've been neglecting.
Thanks for reading,