Programming Details
The text is generated by a 3-gram model as described in chapter 6 of "Foundations of Statistical Natural Language Processing".
It took me a while to realize that these three things are all more or less the same:
- n-gram model
- markov model (a 'visible' markov model where you know all the states)
- nondeterministic finite state automaton.
The text is guided to certain topics (for the debate and the picture captions) by increasing the probability of words related to a few key words - by activating a part of WordNet.
The language model was made by the CMU-Cambridge Statistical Language Modeling toolkit on data from Project Gutenberg and whitehouse.gov. I used the JWNL interface to WordNet.
A list of resources for Computer Generated Writing.