Confront the 20 Percentres
As CCP Games found out, not all users are the same. Finding ways to deal with the small fraction of users that demand the most resources can result in immense savings.
In CCP's case, analysis of game use pinpointed accounts that were dominating the game's resources. The worst offenders were users that traded the in-game currency, called ISK, for real-world cash. These real-money traders ran macros to automate activities (such as mining) that generate money for their in-game players. Those macros take an inordinate amount of resources, says CCP's Wyld.
Combine that with the fact that the exploitative users played nearly all the time, and CCP Games saw an extreme twist on the 80/20 rule: 2 per cent of their user base was responsible for 30 per cent of the workload in the data centre.
"The impact on our data centre was that we had to keep expanding the dedicated resources for mission-running areas of the game," says Wyld, "and eventually, we had a large number of these areas quite clogged and overused."
Dealing with the resource-hogging users resulted in server loads that dropped from 100 percent to zero overnight for the computers hosting strained sections of the game world.
Virtual Data Exists in a Real World
In 2004, Blizzard learned that the physical world can have an inordinate impact on the virtual, when a tornado hit the data centre hosting a late beta project.
While the company maintains data centres all over the world, hosting some 13,000 blade servers using more than 112 terabytes of memory, back in 2004 it had just a single data centre hosting its World of Warcraft beta. When IT managers heard about the bad weather, they called the data centre managers who inexplicably told them that everything was fine, J. Allen Brack told attendees in September at the Game Developer Conference. Yet Blizzard had cameras watching their servers--cameras that showed rain water pouring into the data centre.
Blizzard sent a team out to the facility to help protect the servers and get them back up and running. It took three days, Brack said, and taught them a valuable lesson. "It is important to monitor more than just the hardware," Brack told attendees. "You also have to monitor the conditions in the data centre, and you have to be prepared for disaster recovery."
Focus on What Is Important
No one wants downtime, but game companies traditionally undergo shutdowns on a monthly, if not weekly, basis. For a game, such downtime is tolerable. For a bank, of course, it's not.
"The risk of a game shutting down and losing a $10 subscription is a really different problem than a multimillion-dollar infrastructure shutting down," Greenberg says. Yet, that's not to say that game companies can take a freewheeling attitude toward their network connectivity, since they have some of the most dedicated users, the analyst says.
"CIOs can take a lesson from MMORPGs and learn that the quality of their service to their client has to be great," Greenberg says. "Game companies care more because of their rabid user base."
Sign up for Computerworld eNewsletters.