Wednesday, January 22, 2014

Only you can prevent Hamster Fires

In my previous post, I took a stab at explaining the mathematics of lag. Subsequently, I was goaded pestered challenged encouraged by CSM Vice-Chair Ripard Teg to explain in more detail how the problem of lag might be addressed. As he put it, it "might require you doing more than one post in a month, but it'd be worth it."

In my post, I pointed out that (a) faster computers aren't going to help, and (b) more efficient code is helpful, but not helpful enough. The fundamental problem is that because it's never a bad idea to bring more people to a fight, "fleets expand to fit the lag available."

Fixing that is a hugely difficult problem that will touch many areas of the game. But perhaps there are some significantly simpler game design changes that can, if not solve the problem, at least buy us more time in which to deal with it. This is what I'd like to address in this post.

As with the previous post, all the examples given below are very simplified for purposes of explanation. And I want to make it absolutely clear that I am not involved in Sov-warfare, so I am sure that there will be many flaws in the example I give below. This is just an exercise in identifying the problems that need to be solved.

At the most basic level, what causes lag is that everybody has to fight in the same place at the same time. Since every object on a grid can potentially interact with every other object on the same grid, the amount of computation needed to process a fight increases non-linearly -- doubling the number of objects more than doubles the horsepower you need.

The key thing to keep in mind is that the important variable is "number of objects that can interact". For purposes of illustration, let's say that the computation load increases as the square of the number of objects; twice as many objects means four times as much computation.

As is well known, one of CCP's secret weapons is that their servers are powered not by common everyday electricity, but by genetically-engineered hamsters. Let's say that with the hamsters running flat-out in 10% TiDi, CCP's best server can handle 2000 players fighting it out on the same grid. 2000-squared is 4,000,000, so the server can be rated at 4 million hamsterpower.

If you draw a graph with # of players along the horizontal axis, and # of hamsters required along the vertical, you get something that looks like this:


The technical term for this situation is "Hamster Abuse."

Alas, when the line hits the top of the graph, we run out of hamsters  even worse, all the hamsters are working so hard that they catch on fire. As the aroma of roasted rodent chokes CCP's server complex,  two things happen: soul-crushing lag descends upon New Eden, and tomorrow's lunch menu at CCP gets an additional "meat dish".

But all is not lost; if we could split the fights up into 1000 player battles in the same system, then each battle only requires 1000-squared, or 1 million hamsterpower. Running flat out, the faithful hamsters can handle 4 of these battles simultaneously on a single server.

And even better, if we could spread out the fights to different systems, then each could potentially run on its own server, with its own set of hamsters.

So this is the first thing we need: Sov-warfare should take place on a constellation-wide basis, and require multiple simultaneous fights in multiple systems. And to discourage people from just jumping and bridging around to create local concentrations of force, Force projection by cyno must be cleverly nerfed.

Another thing to consider is that at present, battles are focused not only in space but also in time, thanks to the timer system. This is one consequence of the fact that Sov is a binary state; you either have it, or you don't. So timers have to go, and that means that Sovereignty must become a continuum.

Finally, Shooting structures is boring, so let's get rid of it.

I spent a few hours and came up with a humble suggestion that tries to incorporate all of these concepts:

Sovereignty is determined on a constellation-wide basis. In each system in the constellation, there is a Sovereignty Control Monitor (SCM).

The SCMs give out Political Points (PP) for ratting, mining, and just plain being in space, but they also give points for any kills that took place in the system. These points are given to the player who struck the final blow, and the amount depends on the value of the kill. SCMs that are giving out more than the constellation's average amount of PP reduce their awards, to encourage multiple fights. Obviously, this has to be structured so that sitting in a big blob in one or a few systems while your opponent hangs around in all the other ones is a losing strategy. I'll be the first to admit that figuring out a PP-awarding system that is resistant to abuse is a challenging problem, but I don't think it is an unsolvable one.

During peacetime, and periods when one side has timezone dominance, people can do stuff and get some points, but the real payoff happens when you fight and win multiple fights.

And yes, a super-rich alliance could descend upon a constellation and AWOX themselves to generate PP.  Good for them, they are demonstrating their awesome economic might in a massive space Potlach!

Each ship has a PP accumulator. The accumulator records how much PP the ship has, and what constellations they are valid in. Every day at downtime, 25% of your PP drains away, so it's use it or lose it. You can transfer your PP to another ship if you do not have an aggression timer, and you can also siphon PP off a wreck by salvaging it. If you have PP that is valid in a particular constellation, you can go to any SCM in the constellation and deposit them to either increase or degrade Sov, which is now a continuum -- let's say from 0 to 10. If your deposit is the one that drives Sov to 0, then Sov resets to 1 and the entire constellation belongs to your alliance.

Depositing PP takes a little time, and everyone in the system will know you are doing it. Stealing an idea from the ESS, the SCMs have bubbles around them.

When Sov is low, it takes more PP to degrade it than it does to improve it; when Sov is high, it takes more PP to improve it than to degrade it (and a certain amount just to maintain it). The closer you are to either 0 or 10, the harder it should be to push it there. Sov battles become a tug of war, with the emphasis on war.

Finally, we need to nerf force projection. We want to make both caps and subcaps useful and important ships in combat with distinct roles, and in particular, we don't want cynos and bridging to be used to leap around the constellation playing whackamole, because that stresses out the hamsters. So how about this? Ships get a new aggression counter; if they cyno or bridge within an hour of committing an aggressive act, they trigger a cooldown that does not let them cyno or bridge again for an hour or two.

So you can cyno and bridge across the galaxy, and you can cyno and bridge into a combat, and you can cyno and bridge out of that combat, but then things get a little interesting.

As I said before, I am sure there are many awful problems with the above, but perhaps it will inspire someone to come up with something that will work.

The hamsters will thank you!

Sunday, January 19, 2014

The Cold Equations of Soul-Crushing Lag

Yesterday, there was a bit of a commotion in HED-GP. Thousands of pilots attempted to engage in the largest battle in EVE history, and despite the best efforts of CCP, they were devoured by the monster that is Soul-Crushing Lag.

The dreaded Lag Monster has returned to New Eden more and more often in recent months, and it will continue to do so with increased frequency in the months and years to come. This post is my attempt to explain why I believe that even heroic technical achievements like Time Dilation will only put Lag into remission; to cure it requires some fundamental changes to the core game design of EVE.

As befits players of a game that has been described as Spreadsheets in Space, let us begin with some mathematics, to explain why lag can't be conquered by technical means. For most of you, especially those who are computer programmers, this will be basic stuff but bear with me. All the numbers given below were chosen simply to illustrate the problem.

The basic problem is this: "doubling the number of people in a fight more than doubles the amount of work the server needs to do."

To give a horribly simplified example (forgive me, anyone who's taken a 200 level CS course), consider a fight with 100 people; for each person in the fight, the server needs to update their position, execute their commands, and so on. That part of the computational load is roughly linear, so adding another 100 ships would roughly double the amount of work. But then the server has to tell everyone how the universe has changed, and everyone can potentially have a different view of the universe (consider: cloaked vs. uncloaked; information about the ships you have locked; watchlists; etc). So now instead of 100 ships each getting info about 100 ships, you have 200 ships each getting info about 200 ships; that's 4 times as much information. If you have 500 ships, that's 25 times as much; 1600 ships requires 256 times as much. All that data has to be organized and sent to the clients. Things get very bad, very fast; then they get even worse, even faster.

Now in the real world of extremely clever programmers, of which CCP has more than their fair share, there are a lot of things you can do to make things (to use the technical term) "less horrible". But what you can't do is get linear scaling; doubling the number of people in a fight will always more than double the amount of work the server needs to do.

So what can be done to address this problem? Well, there are several basic approaches, and CCP can use any or all of them; they are not either-or choices:
  • Use faster computers (aka "If brute force isn't working, you aren't using enough"). Unfortunately, because of design decisions made back in the early days of EVE development (decisions that I think were made for very good reasons, btw), computers are not getting faster in a way that benefits EVE as much as everyone would like. Time Dilation is actually an example of this approach -- when TiDi hits 10%, the server has 10 times as much time to process each tick of the game clock, so it's like it's running 10 times faster.
  • Make the code more efficient, so that it scales better. In our simple example above, 2x the number of ships meant 4x the work; if that gets reduced to 3x the work, then handling 1600 ships is only 81 times more difficult instead of 256 times as hard, and that turns out to roughly double the number of people that the server can handle before the lag monster appears. A lot of work -- extremely difficult work -- is being done in this area, and IMHO CCP needs to put even more resources into it (and not just because of lag, either).
  • Change the game so that fight sizes are naturally limited to sizes the servers can handle. By "naturally limited" I mean make changes such that effective fleet commanders will have sound tactical and strategic reasons to limit their fleet sizes and/or divide into sub-fleets with different objectives.
Let's examine these possibilities in more detail.

Time Dilation was introduced about 3 years ago, and has been in-game for just over 2 years. As explained above, it basically speeds up a server by a factor of 10. Yet in only 2 years, lag is back. Why? Well, to quote that original devblog, "Here's how I envision this working for a large engagement (say, 1600 or so)". Yesterday, there were over 3400 ships in HED-GP. Even worse, the meta has changed and the type of ships being used likely made the load even worse.

TiDi sped things up by 10x, and EVE players chewed through that in 2 years. While I expect that making the code more efficient will reap great benefits, particularly in chopping the peaks off lag spikes that occur when very "expensive" events occur (such as bridging), I don't think it's going to bring a further 10x improvement. If Team Gridlock proves me wrong, then I will be very impressed and will nominate them for the Galactic Institute's Prize for Extreme Cleverness, but even so, that's just another 2 years or so before EVE players start whining about lag again.

Four years ago, when I first ran for CSM, one of the planks of my manifesto concerned Lag. Here is what I wrote at the time:
While there are clearly many cute hacks that can (and will) reduce lag, the blunt fact of the matter is that such fixes are at best temporary fixes, because as soon as you defeat the lag-monster for N-player battles, the current design of the game encourages bringing extra people to the fight -- which means you have N+500-player battles, and lag returns. In other words, "Fleets expand to fill the lag available". As the EVE population grows, the problem will only get worse.
And it's even worse than what I said back then, because since the introduction of TiDi, the size of "large engagements" has increased much faster than the number of EVE subscribers.

Anyway, the tl/dr is that technical fixes, while wonderful and needed, just address the symptoms; they don't tackle the disease. The disease itself is simple: in EVE, like Soviet Russia, quantity has a quality all its own.

So what game design changes could be made to address this? Figuring that out is why CCP devs get paid the big bucks. But until they do, lag will never go away.