This post first appeared on pygaze.org, in March 2016.
Sigmund Freud is back! He returned in the form of a Twitter bot that replies when someone uses the hashtag #askFreud in their tweets. Not unlike the real Freud, Sigbot produces nonsensical, but real-looking text that is produced using a Markov chain. The bot can recognise and respond to specific keywords, and it can speak both German and English.
What does it say?
Sigbot Freud replies to every tweet that has the #askFreud hashtag. The bot can pick up on keywords that relate to psychoanalysis, and it will reply in Freud’s writing style. You can try for yourself by posting a tweet with the #askFreud hashtag, or see Sigbot’s timeline below:
As you might well have heard, Sigmund Freud (1856 – 1939) was a famous (and infamous) psychologist. He is the founder of psychoanalysis, which a lot of people consider to be at the basis of our current psychological therapy. Nowadays, you would be hard pressed to find a psychologist who actually believes in Freud’s theories, but that does not mean his legacy is worthless. Freud introduced or popularised several key psychological concepts, such as talk-based therapy, the unconsciousness, and childhood trauma (or regular development). In addition, he was an important voice against (theistic) religion, and overly prude and restrictive societies. In sum, although Freud’s ideas are highly controversial, he did have an undeniable impact on philosophy and psychology.
How does it work?
Thanks to the efforts of Project Gutenberg, a lot of Sigmund Freud’s books are available online, for free (copyrights on his works have expired). Sigbot has read a lot of these books to learn about how Freud would phrase things. The bot was specifically interested in the superficial statics of Freud’s texts, and asked the question: How often do words occur in each other’s vicinity?
After learning about the statistics of Freud’s writing, Sigbot uses a Markov chain to generate random text that is based on the statistics of Freud’s writing. The principle is as follows: If you feed Sigbot two words, it will check what other words in Freud’s writing are likely to follow your two words. The bot will then randomly choose a likely match (with more probable matches being selected more often). You now have three words: the two original ones, and the one generated by the bot. The bot will use the last two words (one original and one bot-generated one) to generate a fourth word. It will then go on to generate a fifth word, again based on the last two words. The cool thing is that the bot will continue to generate words until you are satisfied with the length of the produced text. (If you’re confused at this point, don’t worry: There is an example in the next paragraphs.)
The above theory might be a bit confusing, so here is an example. In the sentences “My mother works on Mondays” and “My mother cycles around town”, the words “My mother” are followed by “works” and “cycles”. A Markov-based bot that is learning about these sentences will notice and remember the co-occurrence of “My mother” and “cycles” / “works”. In other words, the bot learned about the statistics of the two sentences.
After the bot has learned about the sentences, you can feed it the words “My mother”. The bot will produce either “cycles” or “works”, because these were the words that co-occurred with “My mother” in the two training sentences. Let’s say the bot chooses “cycles”. It can then go on by itself, using the words “mother cycles” to generate another word. In the two sentence that the bot learned about, the only word that could follow “mother cycles” was “around”, which means that the bot can only generate that word.
The current sentence is “My mother cycles around”, and the bot can again use the last two words to generate a word that is likely to follow “cycles around”. In this case that word would be “around”, as this is the only word in the two learned sentences that co-occurs with “cycles around”.
The example bot has only learned about two sentences, so it will only be able to produce a very limited amount of text. The Sigbot, on the other hand, had 6 English and 21(!) German books to learn from. This means it can produce an incredibly large number of different texts!
To make Sigbot more interactive, I calculated word frequencies in Freud’s books. The result is a very long list of words that occur very often. Some of these are obvious, such as ‘and’, ‘or’, and ‘the’, but I have filtered those out (that was the only manual labour; if you know of a way to automate it, I would love to hear about it!). What remains after filtering the boring words, is a large selection of hundreds of keywords that relate to Freud’s work. If you use these keywords in tweets with #askFreud, Sigbot will recognise them, and it will use them to generate its response to your tweets.
Is Sigbot Intelligent?
After hearing about Sigbot’s efforts, you might think that it is quite a clever bot. After all, it managed to read books, it can find and remember patterns in language, and it can use its knowledge to talk. It can even reply to the specific things that people ask. However, this does not mean Sigbot is intelligent! The bot has no idea about what it does, it does not understand language, and it does not understand your questions. Everything Sigbot does, is purely probabilistic.
In Freud’s writing, sometimes words occurred more frequently in combination with other words. Sigbot simply uses those frequencies to produce text that matches the word-combination frequencies in Freud’s work. That’s it. Unlike the bot in the relevant xkcd comic above, Sigbot won’t become sentient, and it won’t try to harm you.
UPDATE (3 March 2016, 00:36): Having said all that stuff about Sigbot not becoming any cleverer, I was a bit surprised to see the tweet below. Sigbot seems to be trying to sell someone an eBook. Maybe it is becoming sentient!
@van_Vulpen Dream about the Mission of Project Gutenberg’s Reflections on War and Death, by Sigmund Freud This eBook is for rest.
— Sigmund Freud (@SigbotFreud) March 2, 2016
Freud wrote in German, and his books have been translated into English by others. Sigbot is completely agnostic to language, and has learned about the books of each language separately. This means he can produce both German and English responses to your questions. Twitter can automatically detect the language of your tweets, and Sigbot will use this information to select the right language.