"Everything in war is simple, but the simplest thing is difficult." -- Carl von Clausewitz
Games are proving grounds for software architecture. They combine scale, high performance, challenging problems, a rabid user base, cost sensitivity, and the need for profit. And when games have in-game currency, like EVE Online has, there's money at play, so you can't just get away with a c'est la vie attitude. Engineering must be applied.
In Planning for war: how the EVE Online servers deal with a 3,000 person battle, we learn some techniques EVE Online uses to handle large games:
- Do nothing. Most games are manageable or have spikes that quickly dissipate.
- Run it Hot. There's nothing to throttle as servers run at 100%. Why waste money? Use all your CPU.
- Shard it. Games are sharded by solar system and multiple solar systems run on a node.
- Move it. Games are moved when a machine becomes overloaded. Live Node Remap, where a live game is moved to a new node without disconnecting users, is not supported. Users are disconnected during a move, but they get a better experience once the move has completed. If there's a large game on a node, the smaller games can be moved, leaving more resources available for the larger game.
- Supernodes. When anticipating a big fight, you can request that a game be run on a machine with exceptional hardware.
- Throttle expensive operations. This is explained by user JKEFKA in a comment. A big source of load is a session change, which happens when a user changes to another system, joins or leaves a ship, or joins a fleet. A character has possibly hundred of skills which can affect dozens of aspects of a game. On session change the game must recalculate a user's impact on a ship, which is a big operation. You are limited to no more than one session change every 10 seconds.
- Brain-in-a-Box. Also explained by JKEFKA. When a fleet enters a new system the session change overhead has a huge impact. The idea is instead of recalculating on a change, dedicate nodes to tracking a player's skills and the ship they are in and then sending the results in one update.
1 Really Surprising...Time Dilation
Time Dilation. A tricky strategy for responding to server overload by slowing down time. It's like how time goes slow-mo during an accident, but it happens for everyone playing the game, instead of some people experiencing the game in real-time and others experience an unresponsive client. Dilation is not a function of your computer's speed or graphics processing prowess. It's not about cheap servers or bad internet connections. It's about the load on EVE servers and any server can become overloaded.
Time Dilation is not the same as lag, where your client is out of sync with the game state because there aren't enough cycles or network to keep clients up to date. By slowing down time the game can process more missles (or whatever) so clients can be kept in sync. Everyone is still getting an accurate view of the game which means players are still playing against an accurate game state. Brilliant.
- On Hacker News for How the EVE Online Servers Deal with a 3,000 Person Battle
- ASAKAI AFTERMATH: ALL OVER A COBALT MOON
- How FarmVille Scales To Harvest 75 Million Players A Month
- Zynga's Z Cloud - Scale Fast Or Fail Fast By Merging Private And Public Clouds
- Playfish's Social Gaming Architecture - 50 Million Monthly Users And Growing
- Scaling In Games & Virtual Worlds
- World Of Warcraft's Lead Designer Rob Pardo On The Role Of The Cloud In Games
- How FarmVille Scales - The Follow-Up
- Simpler, Cheaper, Faster: Playtomic's Move From .NET To Node And Heroku
- Scaling Spam Eradication Using Purposeful Games: Die Spammer Die!
- Second Life Architecture - The Grid
- EVE Online Architecture
- EVE Evolved: EVE Online's server model