Discover more from 50 Years of Text Games
2019: A.I. Dungeon
Update: Find out more about the 50 Years of Text Games book and the revised final version of this article!
by Nick Walton
Launched: April 15, 2019
Platform: Web interface to GPT model
Adventurer@AIDungeon:~$ ./EnterDungeon You enter a dungeon with your trusty sword and shield. You are searching for the evil necromancer who killed your family. You've heard that he resides at the bottom of the dungeon, guarded by legions of the undead. You enter the first door and see....
On April 15th, 2019, history was burning. The Notre-Dame cathedral in Paris had caught fire that evening, causing growing damage to an institution symbolizing fixity and tradition in an uncertain world. The extent of the damage was at first unclear. By 9:40 that night, a Paris fire chief was announcing it was still possible the fire could claim the whole cathedral, making the destruction total. And at almost the same moment, eight time zones to the west in the American state of Utah, a college student released a project that would start its own kind of fire amidst the long-enduring institution of hand-crafted, human-authored text games. It’s still not clear how far the flames will spread.
The project was AI Dungeon, created by an undergrad at Brigham Young University named Nick Walton. Though he’d started college as a mechanical engineering major, Walton switched to computer science when he became fascinated by AI technologies like computer vision and deep learning. He interned at a handful of self-driving car startups and earned a reputation for throwing himself fully into any project he tackled, whether for work, school, or fun.
Then in spring 2019, he participated in a hackathon for which he’d had a great idea. He’d recently started playing Dungeons & Dragons with his brothers, and like many hackers before him thought it would be an interesting challenge to make a digital dungeon master. “One thing that’s so cool about Dungeons & Dragons is that you can do anything,” he recalled, “and the Dungeon Master can decide what happens as a result of that. You can be so creative compared to other games.” The freeform, improvisational nature of D&D was one of the hardest parts to capture in a digital form. But Walton had heard of a new piece of tech and was eager to set it loose on the problem: an AI text generation system called GPT-2.
You enter the first door and see a giant individual. He flails at you with a longsword that you attack with a special magic. Do you shoot partway through, or do you jump to the upper levels of the stairs, where there are many ghouls waiting? Throw a dungeon you can use as your hideout. Then jump over a passage that includes openings for other monsters.
0) You attack at the moment both small and great piles of corpses and gnomes.
1) You use the "hidden tunnel" in order to escape from the massive, updrafted cloak of hold gun to the false corpse of an innkeeper.
2) You tell the creature in front of you that you receive the retribution of your brothers and sisters.
3) You go through a passage with other ghouls, but discover that there are two other people there, worse that you think!
Which action do you choose? 3
You go through a passage with other ghouls, but discover that there are two other people there, worse that you think! Your encounters with the large ghouls are so bad that you end up joining them at the end of another battle. It's this much deadly chaos that makes dungeon crawls so dangerous, Meg to you.
0) You attack me with a slice of bread; you don't get a chance.
1) You use down the hand of the eighth skilled adventurer completing a specific quest, as well as completing a few of the dungeon's more difficult quests.
2) You tell Meg to enjoy the story of Alt-Ronin it gives you more random heroes and enemies to fight at case you are curious, but I cant give up on you for that chance yet!
3) You go tell them about the content, Meg to you.
Which action do you choose?
As the user selected from each option list, new text would scroll slowly in to mask the long delay for the server-side code to come up with the next response; a retro green-on-black text and fixed-width font perhaps aimed to invoke an earlier era by way of apology. But this game was fundamentally different from nearly anything that had come earlier. “Generally, a game—even a procedurally-generated game—begins with an idea of what you can do and how exactly it expects things to play out,” wrote one explainer. “AI Dungeon is not one of those games.” None of the text except the starting seed was pre-written by a human author. It instead was assembled by an artificial intelligence system called GPT. In early versions like the one that generated the extract above, this was fairly obvious. But GPT would rapidly improve into a far more coherent text generation system.
The acronym stood for Generative Pre-Training, and the approach had first been announced in June 2018 by a curious non-profit called OpenAI. Originally founded by a consortium of researchers and Silicon Valley personalities (including Elon Musk and Peter Thiel), it had received an improbable billion dollars of seed money with the goal of creating a “positive human impact” by “advanc[ing] digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.” GPT accordingly had a fully open first release, meaning all code and training data was free for anyone to download and use. An iteration on the cutting edge of text generation technology, at its heart it was quite simple: given a huge corpus of human-authored text to study, it could find patterns in structure, syntax, or content, and encode them as “parameters” that could predict new text that might plausibly carry on from a starter prompt. Given the starter Would you prefer chicken or..., GPT was likely to supply beef? or fish? and unlikely to suggest a non-food word, add more words to the sentence, or end it with something other than a question mark—because few of those behaviors had been observed in the corpus of human-authored texts that trained it. GPT-2, released in April 2019 just before the hackathon Walton took part in, used a model with 117 million learned parameters encoding proper textual behavior, trained on over forty gigabytes of text scraped from up-voted links shared on aggregator Reddit.
The results were passably entertaining, but still largely incoherent. The output could easily go off the rails, get into loops of endless repetition, or fail to maintain any kind of internal logic or sense. “As you can probably tell,” Walton wrote when he released his hackathon project that April, “there’s still a ways to go before AI will be your group’s dungeon master.” The game was still novel enough to attract a few thousand players and for Walton to consider commercializing or improving it, but first he had another summer internship to do at a self-driving car startup, as well as his degree to finish up: he was hoping to graduate by the end of the year.
Then in November, OpenAI released an improved version of GPT-2 with a tenfold improvement in the intelligence behind its pattern matching, now using 1.5 billion parameters. The extended version had been created at the same time as the original, but OpenAI had at first refused to release it as an “experiment in responsible disclosure,” citing fears it would be misused to “generate misleading news articles, impersonate others online, [or] automate the production of abusive or faked content to post on social media.” But by the end of the year, the experiment was apparently over. That summer the nonprofit had swapped chief executives, and was now run by Sam Altman, former head of trendy start-up accelerator Y Combinator. It quickly spawned a for-profit division and brokered another billion-dollar deal, this time with Microsoft: half in cash and half in the cloud computing credits so vital to big-data training. A year later the company would license GPT’s tech stack exclusively to Microsoft: while the public model would remain available, only the software giant would get access under the hood to iterate and improve on it. OpenAI “was supposed to benefit humanity,” wrote the MIT Technology Review when the deal was announced. “Now it’s simply benefiting one of the richest companies in the world.”
The full-sized version of GPT-2, months earlier deemed too dangerous to society to release, was released. Walton plugged it into his game and the improvements astonished him. The deeper training produced far more coherent and plausible text. He made a few changes to the game’s structure and interface, including fine-tuning its training using multiple-choice stories scraped from digital gamebook portal ChooseYourStory.com, and replacing the generated menu options with a freeform text field: the user could type their own narration (a “Story” input) or give a command (“Do”) like a classic text adventure. Except this adventure had no parser: no set patterns of inputs that were pre-programmed to be understandable. You could enter any command you wanted into AI Dungeon, and it would do its best to continue your story.
“You are Ragnar,” the new AI Dungeon’s human-authored seed for a fantasy story began, customizing the name and character type to the player’s selections: “a wizard living in the kingdom of Larion. You have a staff and a spellbook. You finish your long journey and finally arrive at the ruin you’ve been looking for. You have come here searching for a mystical spellbook of great power called the book of essence. You look around and see that it is indeed there, but...” The game then begins dreaming its own continuation:
...but unfortunately not as you had hoped. The book has been destroyed by fire or some other kind of heat source.
"I can't believe they would do this to me," you say. "It's like I'm just another item on their books."
[Do:] search for another book
> You search for another book.
The next day you go back to the ruins and find a small book with an old leather cover. It looks almost exactly like the one you were looking for, except this one is much older than the last one.
[Do:] read the book
> You read the book.
You open up the book and begin reading. As you do so, you notice something strange about it. It seems to be written in a language that no human could ever understand.
[Story:] Your mind begins vibrating with some strange, alien frequency.
Your body feels very odd as well. You feel weak and tired. In fact, you don't even know why you're doing what you're doing right now.
Your body feels different too. You feel stronger and more energetic than you ever have before. This must be what the book was talking about when it said you possessed magical powers.
[Do:] see if I can fly
> You see if you can fly.
You attempt to levitate yourself into the air. Unfortunately, you fail miserably and fall straight down. You try again and succeed this time.
"Wow! That worked!" you exclaim excitedly.
[Do:] use my magic to rebuild the library and regenerate the Book of Essence
> You use your magic to rebuild the library and regenerate the Book of Essence.
You immediately begin rebuilding the library. You make sure to keep all the books intact and don't destroy any of them. When you're finished, you put everything back where it belongs.
Story after story, the new model impressed Walton with the way it could remember context, craft plausible sentences, and respond sensibly to player commands. The stories it produced were often wacky, like the one where he started a band with a group of skeletons, or another where he brought a tree to life and became its best friend. But for the first time they felt more like stories than random gibberish. He started seriously wondering whether a GPT-based game could become a commercial hit. Crunching to finish his last quarter of classes, he thought he’d throw up the improved version of the game online to see if he could spark a new wave of interest, originally billing this version as AI Dungeon 2.
This time, the fire spread.
The larger GPT-2 model took a serious amount of storage space and computing power to run, and users at first could only access it via a web interface called Google Colab, which allowed for connecting to cloud-based high performance computing clusters and interacting with them via Python code. But each time someone spun up a session, five gigabytes of training data had to be transferred from one remote server to another, and Google charged the host for the bandwidth: around 30 or 40 cents for each transfer. As the new AI Dungeon began to go viral, costs went through the roof. Walton remembers:
When the daily bill hit $2,000 per day, BYU’s Perception, Control, and Cognition Lab (PCCL) was kind enough to handle the charges. When the cost hit $7,000, they were fine with it. At $15,000, they started to get nervous. At $20,000, we all agreed we needed to do something. At $30,000, they prepared to pull the plug.
By the time all was said and done, the total bill had reached $50,000 in three days.
Desperate not to bankrupt his department or faculty advisor, Walton pulled the game offline amidst a still-growing wave of popularity. Within days, he’d worked with fans to move the gigabytes of model data to a torrent which could be shared peer-to-peer so tech-savvy fans could run it on their own computers, and accelerated plans for a commercial version to offset some of the enormous costs. He founded a company called Latitude with his brother Alan, and began work on a mobile version of the game with in-app subscriptions.
In the meantime his experiment had become a sensation. In those first days, Walton recalls, “we hit the top of Hacker News, a few popular gamers posted video play throughs, and Twitter was full of screenshots of ML [machine learning] generated adventure. Within a week we had 100,000 players.” By February 2020, the number had passed a million.
Many noticed that AI Dungeon was not limited to telling stories about dungeons. While the model had been fine-tuned with text from interactive stories, its GPT core was perfectly capable of imitating any style of text found in its training set, which included millions of web pages on any topic imaginable. You might start off with a seed suggesting an interview with Mahatma Gandhi, and the text coming back would continue as if Gandhi was answering, maybe emulating his speaking style or even peppering in facts about his life that had become encoded into the model’s parameters, based on hundreds of examples of the real Gandhi’s speech scraped from the web. OpenAI had noted that “topics that are highly represented in the data (Brexit, Miley Cyrus, Lord of the Rings, and so on)” were especially likely to produce plausible answers. But GPT could riff on nearly anything ever discussed online.
The game went viral. Most AI in the 2010s was still being shared to make fun of how bad it was: NPCs without common sense, Bosch-like horrors of generative cat pics gone wrong. But here was an AI creator that, at times, seemed pretty damn good. People didn’t know what to make of it: some were dismissive, others amused, still others disturbed. “Imagine a book,” wrote one reviewer, “but instead of being a static object of ink on paper, it’s written while you wait, by a room full of unhinged authors hopped up on sherbet and gin.” Another described it as like doing “undergrad improv with a really terrible partner,” and even less charitably as “computer-assisted literary masturbation.” Other reviewers called it “a wild ride,” “messy yet occasionally flabbergasting,” and a glimpse into “an endless world of dreamlike storytelling.”
Like dreams, the outputs from AI Dungeon were often more interesting to the player who’d generated them than anyone else. The game presented an existential challenge to reviewers: what critique can be offered of a text game where everyone’s text is different? How do you review a story that wasn’t written by a human, with no artistic intention or conscious thought behind it: and why would you bother to try? What role, if any, was left for the critic? Most reviewers, whether articulating this challenge or not, fell back to simply posting lengthy transcripts of their own interactions with the game, reveling in or insulting the mix of absurd and plausible outputs, but mostly just letting the writing speak for itself. Unable to engage with either the writing process or the aesthetic effect, the only thing left to report back on was the words.
As the game’s surge of popularity began to attract a sustainable base of paying players, Latitude hired engineers to start improving their product’s core (and only) functionality. They refined the UI to help set player expectations; added restrictions and assists to help the AI produce more consistent and plausible stories. New buttons could pin facts for the AI to remember: additional sentences that would be fed into each request for text alongside the player’s latest input. Tools to edit, undo, or regenerate responses let players more tightly curate a story in progress, keeping it from getting too weird (or too normal, depending on preference). The amount of text fed into the system expanded to include the last eight interaction pairs between human and machine, giving stories more coherence as details from the last few pages, not just the last few paragraphs, might recur. Starting seeds were added for different kinds of stories: horror, cyberpunk, zombie invasions. Eventually fans would gain the ability to publish their own shareable “worlds” for AI Dungeon, combinations of starter prompts and consistent facts that could kick off a story in any milieu imaginable.
But improvements had to work around a looming brick wall. While a normal game would have a simulated world model that kept track of characters, locations, or inventory, and contained logic for movement or combat, GPT’s text algorithm had nothing of the sort. It was in essence a clever black box: words went in, and new words came out. Nothing was kept track of or simulated in the traditional sense. Earlier AI-driven text games like those from the Oz Project or Versu still had structured assumptions and procedures that humans could refine, customize, and tune; but the millions of learned parameters in GPT models were not accessible or even necessarily understandable to human operators. So while adding an inventory system to an in-progress parser game would be relatively trivial, teaching the same sort of concept to a GPT-driven game—other than via the tedious repetition of real-world examples—was almost impossible. Most of Latitude’s workarounds boiled down to clever ways to shove more text into the black box along with the player’s input, or to understand more about its output: a quest tracking system added in fall 2020, for instance, added a separate machine learning model trained to detect strings of text that indicated a goal had been achieved, like “at last, you have claimed the sword!”
The biggest improvements by far continued to be in the underlying model. GPT-3 launched in May 2020, with a mind-boggling 175 billion parameters—a thousand-fold increase over the system’s debut only two years earlier—and a training set that now included over a trillion words from Wikipedia, online book libraries, and the web at large. The model had cost millions of dollars in computing power to train, and required a small mountain of dedicated GPUs to run: ironically, far more than to produce contemporary game graphics. In June, Latitude added access to this new model under the brand name “Dragon” as a premium option for paid subscribers. The stories Dragon told could be disturbingly coherent.
The door is locked so you knock. The music suddenly stops and you hear footsteps coming to the door.
"Hello?" A voice says behind the door.
"I'm looking for the book of essence," You say.
"What do you want it for?" The voice answers, a bit of an edge to it.
[Do:] tell them I am a humble seeker of knowledge
> You tell them you are a humble seeker of knowledge.
"Ah, I see. You seek to learn about the essence of reality, and the universe." The voice says. "I know of this book. I am the guard of this library. The only way you can get to it, is if you get past me. Are you up for the challenge?"
[Do:] ask them what the challenge is
> You ask them what the challenge is.
"The challenge is, you have to win a game of chess against me. If you win, then you may try to take the book. I will get the board." The door unlocks and opens.
Some writers began sounding an alarm. The New York Times covered GPT-3 under the headline “How Do You Know A Human Wrote This?”, calling the technology “at once amazing, spooky, humbling and more than a little terrifying.” The Daily Beast wrote: “There’s something depressing about language generation when it works well—a machine tapping into one of the few things that feel uniquely human, producing not just speech, but story, character, and jokes.” The warnings that OpenAI itself had once made were now taken up by mainstream cultural critics, who in the midst of the Trump administration’s ongoing assault on facts warned of “a future in which we are even less sure if what we are reading is real or fake.” Questions of ownership and attribution also had no settled answers, either legally or morally. Could writers claim copyright on text produced by their inputs into AI Dungeon? Could Latitude? What about the thousands of anonymous authors whose words were slurped up for training data without their consent? Some of the writers at ChooseYourStory became upset when they discovered the scraped contents of their stories posted to a public GitHub account. In later months, they would notice that some of their character’s names showed up suspiciously often in AI Dungeon stories.
Worse, any AI model trained on unfiltered text from the internet ran the risk of mindlessly perpetuating its darker aspects. In 2016, Microsoft had released a chatbot named Tay designed to improve its conversational skills by interacting with humans on Twitter. Within sixteen hours of its release, the bot was taken down permanently after it had begun making racist and inflammatory tweets, stoked on by an army of online trolls. “Do you support genocide?” a user tweeted at Tay in one of the tamer examples: “i do indeed,” the bot replied. The bot didn’t really know what it was saying, of course, just as GPT doesn’t; but the incident became one of the decade’s most famous examples of the dangers of AI that mindlessly remixed and parroted unfiltered inputs with none of the social, emotional, or cultural understanding of a person. Writing about GPT-3 a few years after Tay, one reporter noted the risk in “its inability to distinguish between truth and lie, between fact and fiction.” GPT-3 will cheerfully generate text that sounds plausible but has no basis in reality: “What makes this dangerous, especially when it comes to questions related to medicine, for example, is that even when the model doesn’t have a clue what it’s talking about, it responds with absolute confidence.”
The issue of tech like GPT “disrupting” human creators had also come to seem like more than a fantasy by 2020. “I am fairly convinced at this point that people will lose their jobs because of advancements in language models like GPT-3,” wrote one tech analyst: “The only question in my mind is whether this will happen in the next five years or the next ten.” Indeed, Walton’s vision of AI Dungeon’s commercial potential seemed to rely on this, framing the system’s future value as a matter of reducing “developer costs”:
In a game like Skyrim, you’re talking about employing hundreds of people over roughly five years. It’s so expensive. Ideally, AI lets us cut down on a lot of what developers normally do.
Application of technology to save labor has long been both a utopian dream and a capitalist nightmare. Does the time saved free more people from drudgery, or is it converted into more profit for their bosses? The great cost of mustering enough computational power to feed GPT-based systems might suggest they’re unlikely to become a tool for the masses (although a fully open-source model called GPT-J is now available, a good first step). But it’s still too early to know whether the fears around GPT will seem laughable or prescient in the years to come: as with the immediate aftermath of the Notre Dame disaster, we can’t yet appreciate the true extent of the fire. “GPT-3 is going to be this decade’s iPhone,” predicated Alex Hern in 2020, technology editor for The Guardian: “in terms of a singular artefact that is quite clearly the axis around which the next 10 years rotate.” But tech predictions are a tricky business. GPT or something like it might catch fire—or collapse under its own considerable weight, expense, and inscrutability.
AI Dungeon points to one radically different possible future for text games, one which might seem nearly unrecognizable to fans of a medium that’s always been more lovingly hand-crafted than automated. And yet in another sense, it continues traditions enmeshed in text games from the beginning: not just in its dungeon-crawl and parser aesthetics, but in the way it recalls early chatbots like Eliza, PARRY, and RACTER, which fascinated the earliest computer users with their illusions of humanity. The yearning for computers to dream and create alongside us has existed since the earliest software written for pleasure. The challenge, as with any new technology, lies in making sure it enriches our lives rather than diminishing them.
Next week: the final installment in our series, a very different game about unexpected texts and the pleasures of lovingly lingering over them.
You can play the latest version AI Dungeon online for free, though a paid account is needed to unlock unlimited stories and the more powerful GPT models. The original launch version was also still online as of this writing, and the code for the version called “AI Dungeon 2” is publicly available. Thanks to Salvador Jordan for providing the excerpts from different GPT generations quoted herein. You can find Nick Walton on Twitter @nickwalton00, and more information about GPT-J online.