Members of the Legendary Computer Bugs Tribunal, esteemed guests, may I have your attention? I will humbly submit a new contender for your honorable judgment. You may or may not find that novel, you may even call it a bug, but I assure you that you will find it entertaining.
Consider NetHack. It’s one of the trickiest games of all time, and I mean that in the strictest sense of the term. Content is procedurally generated, deaths are permanent, and the only thing you carry over from game to game is your skill and knowledge. I realize that the only thing two cheating fans can agree on is how wrong the third ugly fan is in their definition of a cheater, but please, let’s move on.
NetHack it’s great for machine learning…
Being a difficult game full of sequential choices and random challenges, as well as a “single agent” game that can be generated and played at lightning speed on modern computers, NetHack it’s great for those working in machine learning—or imitation learning, actually, as detailed in Jens Tuyls’ paper on how computational scaling affects single-agent game learning. Using Tuyls’ expert model NetHack behavior, Bartłomiej Cupiał and Maciej Wołczyk trained a neural network to play and improve itself using reinforcement learning.
By mid-May of this year, both had their model consistently scoring 5,000 points on their metrics. Then, in one run, the model suddenly deteriorated, to the extent of 40 percent. He scored 3000 points. Machine learning in general is gradually going in a direction with these kinds of problems. It didn’t make sense.
Cupiał and Wołczyk tried quite a few things: rolling back their code, restoring their entire software stack from a Singularity backup, and rolling back their CUDA libraries. Outcome? 3000 points. They rebuild everything from scratch, and it’s still 3000 points.
… except on certain nights
As detailed in Cupiał’s X (formerly Twitter) thread, this was several hours of confused trial and error by him and Wołczyk. “I’m starting to feel like a lunatic. I can’t even watch a TV show thinking about the bug all the time,” Cupiał wrote. In desperation, he asks model author Tuyls if he knows what could be wrong. He wakes up in Kraków to an answer:
“Oh yeah, maybe it’s a full moon today.”
IN NetHack, the game in which the DevTeam has thought of everything, if the game detects from your system clock that it should be a full moon, it will generate a message: “You’re in luck! Full moon tonight.” A full moon gives several benefits to players: a single point is added to Fate and were creatures kept mostly in their animal forms.
It’s an easier game, all things considered, so why would the learning agent’s score be lower? It simply has no data for the full moon variables in its training data, so a series of branching decisions will likely lead to smaller results, or just plain confusion. It was indeed a full moon Krakow when the 3000 results started showing up. What a terrible night to have a learning model.
Of course, “outcome” is not a true metric for success NetHack, as Cupiał himself noted. Ask a model to get the best score and it will save the early stage monsters because it never gets bored. “Finding items needed for [ascension] or even [just] Doing a quest is too much for pure RL agent,” Cupiał wrote. Another neural network, AutoAscend, does a better job of progressing through the game, but “even it can only solve sokoban and reach the bottom of the mines ,” notes Cupiał.
Is it a mistake?
I submit that, though NetHack answered the full moon in its intended way, this strange, very difficult to understand stop on a car learning journey was truly a bug and worthy of the pantheon. It’s not a Harvard moth, nor a 500 mile email, but what is?
Because the team used Singularity to back up and restore their stack, they inadvertently advanced the machine time and the resulting bug every time they tried to solve it. The resulting behavior of the machine was so strange and seemingly based on unseen forces that it sent one coder into a tizzy. And the story has a beginning, a climactic middle, and an ending that teaches us something, however dark.
of NetHack The lunar learning defect is, I submit, quite worth remembering. Thank you for your time.
#kind #bug #machine #learning #suddenly #worse #NetHack
Image Source : arstechnica.com