As you all know, gearing is a pretty standard component of rpg’s. MMO’s make this a more intricate process with players needing to track down many more pieces of gear, 14-16 as compared with the usual 6ish. This is also a more time consuming process since ways to acquire gear are rarely guaranteed, but randomly rewarded and not all the gear acquired is worth using due to poor stat optimization. But today I want to discuss the underlying system of what was on gear, what it meant as well as how and why it’s evolved. While some evolutions were done to add more fun and variation, half the time it was to fix a problem that a prior evolution had created. Fair warning, this post is going to cover 11 years of a game’s design across 7 iterations; it’s going to be long and detailed.
It should be noted first that in all cases, item level followed a linear escalation of stats on gear. What stats were present has a fair amount of variance. With the exception of trinkets, gear had 2-3 stats on it, 1 primary and 1 secondary stat and the 3rd being skewed in one direction. Eventually that shifted to 4 stats with a 2/2 split. Occasionally there’s a piece with only 3 stats, and in that case it’s a 2/1 split with primary/secondary. Trinkets are usually more creative then just flat number bonuses, but often have powerful temporary bonuses that can trigger randomly or be be triggered by the player on command.
Vanilla, The Burning Crusade and Wrath of the Lich King
Stats on gear have always been divided between primary and secondary stats. Most rpg’s have some variation of this. Primary stats were Strength, Agility, Intellect and Stamina. And like most rpg’s, those values didn’t hold an intrinsic meaning, but rather created modifiers to smaller and more concrete abilities like skills, talents, damage etc. In this system, Strength created a secondary stat called Attack Power which itself was a modifier for how much melee weapon damage a player did. Agility generated Attack Power as well, but far less of it, but mostly generated the secondary stats Critical Strike (bonus damage) and Dodge (reduced chance to be hit by all attacks). Intellect determined the size of a players mana pool, and about 85% of the classes in the game relied on mana to use spells and abilities. Stamina determined the size of a player’s health pool.
Secondary stats included the 2 listed above, Critical Strike (bonus damage) and Haste (faster attack speed). Attack Power and Spell Power were occasionally present and gave a flat bonus to all damage dealt. Spirit increased mana regeneration – no mana, no actions available – so most everyone wanted atleast some. However, since damage dealers could dish out more damage then safe for their surviving, they didn’t need to go full force. Damage dealers who were too good at what they did drew the ire of bosses, in lingo too much threat draws aggro, and would get squashed in several seconds. While Spirit was helpful, damage dealers didn’t have an issue where more is more, so they could diversify. Healers would need to find a balance of how much they needed based on the needs of the group. The other 2 stats were Hit Chance and Expertise, which are fairly similar. Hit Chance represented a small, natural chance to miss with all attacks. All players and enemies possessed a naturally small chance to block, parry or dodge attacks – Expertise would prevent that. These stats were considered essential to accumulate to their caps, the point where your attacks would never miss or be blocked etc., and accruing more would suffer drastic diminishing returns if not be totally ineffective. So, there was a rough balance of 3 damage enhancing stats and 3 stats which provided support for damage dealt.
There’s one outlier here and one other thing to be explained. Gear also came with an Armor rating (I don’t think rings, necklaces and trinkets did though). Armor provided a reduction to physical damage taken. Since Armor was standard on gear and increased with item level, it wasn’t something players thought about, more was better and it followed a linear progression. Gear also followed a normal rpg trope of being divided into cloth, leather, mail and plate, and not everyone used a shield. Classes came with a linear escalation of what armor they could equip, so anyone trained in mail, could in theory use cloth or leather. As part of this linear progression, plate offered more Armor then mail which offered more Armor then leather etc. If a player chose to use a “lesser” form of armor, they sacrificed some Armor rating, but nothing else was intrinsically affected by their choice. Since all classes that used cloth were casters, cloth gear always had ideals stats for it. For leather and mail, it was difficult to find good gear for casters, so it was common for classes that used those to fight for cloth gear drops in order to be more effective.
In this picture, most every stat is useful and some balance is required. Since this wasn’t a terribly delicate balance, a higher item level on a piece of gear was enough to tell you that it was better then what you had. Player power growth was easy to track and quite linear. At the time, item level was referred to as gear score, the total value of the item level of all your pieces of gear. Players would have an addon to do this simple but tedious addition for them. Gear Score was the metric of how powerful your character was and if they were equipped enough for a raid. Nowadays, the game offers its own version of this system by averaging the level of a player’s gear and calling it plainly Average Item Level.
Technically Haste didn’t exist in the game for the first year or two, but I’m going to brush over that. This description more or less was the system through the initial game and 2 expansions. I’m neglecting to cover Armor Penetration as it was before my time and not around for that long as well as a niche stat like Block. Also, I’m not going into the PvP systems with stats like resilience because that’s a whole essay in itself and I am nowhere near informed enough to talk about it.
Cataclysm and Mists of Pandaria
Cataclysm saw its first major overhaul. The primary stats were narrowed in scope. Strength provided damage only to specific classes where it would make sense and did nothing for everyone else. Agility provided damage and bonus Critical Strike only to physical damage classes that didn’t use Strength. Intellect still gave a mana pool, but it now gave damage to casters. Here we began to see classes in the game shift away from using mana to other more unique and custom resource bars. Those classes that still used mana now had bonuses to ensure they had enough mana to do what they needed to do without collecting gear with Intellect on it. Secondary stats started to get more interesting. Attack Power and Spell Power were removed as being unnecessary with primary stats now driving a player’s baseline output. Spirit became more firmly a secondary stats, relevant only to casters and 2 other specs where it doubled as both mana regeneration and hit rating. A new stat was created, Mastery, which provided a varied and unique bonus to each of the 30-some odd specializations. Players were also strongly encouraged to stay within their armor type with a 5% bonus to stats for keeping to it, and enough appropriate gear was provided to make this viable.
In this iteration, we see players being penned into specific boxes where they can be more adequately quantified and the game balanced around that. Players don’t see a varied balance of stats, but instead follow a more narrow scope. Primary stats and Armor on gear became standardized, so it was easier to calculate player power with the bulk of their power coming from those core stats that drove the class. This was ensured with how powerful these primary stats were in determining a player’s output, contributing some 2/3 of it – lending credence to the higher item level means more power equation. Secondary stats see their balance shifted though with more emphasis on damage enhancing stats. But, with more choices of damage enhancing stats, there becomes clearer right and wrong answers, or rather optimal and sub-optimal choices that affect performance. With player output being more optimized in this system, rules changes were made to threat generation for tanks to keep up with what damage dealers could do. The whole meta-game of needing to monitor threat while dealing damage fell by the wayside as damage dealer gameplay became about doing optimal numbers. Conversely, tank gameplay stopped being about threat management and instead focused on surviving bigger hits. In essence, each role became more specialized at what it did and their interdependence increased. Now, if a damage dealer did pull threat, they would die near instantly.
But there was one major new change in stats, the creation of reforging. Reforging was created to simplify gearing and ensure that any gear drop of a higher item level was going to be an upgrade. Reforging allowed a player to modify the secondary stats on a piece of gear, draining 40% of one secondary stat and funneling it into any other secondary stats not already on the item. This sounds like a win for the players, being able to better optimize their gear. It also is a cheat for the designers to ensure the linearity of their system stays intact. It didn’t work out so well. Instead of making things simpler, it made things way more complex. See, the whole idea of reforging added a lot of mathematical complexity when it came to balancing for Hit Chance and Expertise which all players knew had to be maxxed out, but that going over was a waste. See, if most a player’s power is built in from primary stats which are standardized, then that means the damage enhancing secondary stats are weaker. Yet, Hit Chance and Expertise don’t enhance damage, but ensure that attacks connect, and so they were considerably more powerful then the other secondary stats which were damage enhancers. To put this in more concrete terms – if 2/3 of a players damage comes from primary stats already on their gear, and 1/3 comes from damage enhancement, then a stat which ensures you hit is going to be grouped on the 2/3 of your damage side of things and 2/3 is a lot more then 1/3. So, the reforge system became an obsessive function of how to perfectly balance for the exact amount of Hit Chance and Expertise. And only then maximizing for whatever the player needed most – Critical Strike, Haste or Mastery. An external site cropped up offering to do this math for players. Players in-the-know used the site or an addon to calculate the optimal build, while casual players flailed under the new system. The skill gap widened between well-informed players and players who don’t study game systems in order to twink.
Warlords of Draenor
Despite the problems, this system lasted for 2 expansions before we got another overhaul. The core scaling from primary stats along with item level standardization was working ok, it provided a strong baseline to calculate player strength which made the game easier to balance. However, the emphasis on primary stats and how much power they provided brought home to roost another problem ingrained in the game’s gearing system from day 1. When a game has an ever-increasing scale of player damage/output, at what point do you cross into absurdly high numbers that become silly. Raiding is largely responsible for this. During leveling, gear sees smooth and incremental gains. At the end of each expansion, players may see 3 or 4 raid tiers, with each tier needing to provide a significant upgrade in power for players to want to chase after and swap to the new gear. These jumps in power are not linear, but become exponential. So, each expansion bringing new raiders was eventually going to take the system to absurd heights, and in the game’s 4th expansion we hit that threshold. In the original game, a raider might have 4K health, with that growing to 12K by the end of the first expansion, 40K by the end of the second, 180K by the end of the 3rd and 500k by the end of the 4th (Tanks had about 1M health actually due to overspecialization in roles). Damage output saw similar escalation. To correct this and bring the numbers down to more reasonable amounts, they retroactively reduced the numbers on gear from all prior expansions. Even then, this did not fix the problem, but instead pushed it back to the horizon.
Blizzard release this graph showing the escalation of player power over time. Each cliff in the graph denotes a new expansion. Note that the numbers for levels 1-60 barely even rate.
As part of this change, Primary stats were removed from trinkets, rings and necklaces as a means to not rush to that horizon as quickly. Trinkets saw the largest overhaul though. Over time, trinkets had become more standardized and often this meant they had a secondary or primary stat on them with a chance to trigger or control release of a very large short-term bonus to either a secondary or primary stat. Damage dealers milked that system to produce the insane numbers discussed by stacking their bonuses. When you stack bonuses, they synergize, having a multiplicative effect – as opposed to an additive effect when used independently. Think of the difference of 10 + 10 vs 10 x 10. The bonuses from trinkets had to be reigned in in order to curb the escalation of numbers since multiplying ones way there was way too fast. Part of this meant that trinket bonuses were limited to secondary stats instead of primary stats, roughly halving their effect.
A rough graph of what player power looks like post-squish. The turned up points denote raid content. The gear progression does not take these into account anymore since no one goes into old raids to gear up while leveling – it’s too time consuming and not worth the effort.
But, that horizon of absurd gear numbers is forever going to be coming faster with the introduction of new raid difficulties over the past few years. In the beginning, there were simply raids, so the escalation of gear was small. In the 3rd expansion, raids were offered in 2 difficulties originally separated by size 10/25 man and later to be separated into normal/heroic. Heroic raids needed to offer better gear then normal since it was more difficulty and needed better gear to be overcome, but also offer proper rewards. So, each raid tier meant a bigger jump in item level to account for higher gear rewards. Looking For Raid (LFR) brought another tier of difficulty and gear rewards to this cycle. And then Flex raiding 2 years later introduced a 4th tier. So, what might’ve only been a 5 item level jump between raid tiers, instead has become a 20 item level jump. Why new raid tiers? Blizzard released internal metrics a few years ago saying that only about 10% of the player base actually got to raid during current content prior to the creation of LFR. Since LFR, that amount had grown to 48% of the player base. I suspect that number has gone up in the 3 years since those stats were released. Since the game is so focused on end-game content, it’s a good thing to want more of your player base to not feel barred off from it.
You can see the slowly increasing jump of item levels in gear with each passing expansion due to new raid difficulties being implemented.
The complexity issue needed to be addressed as well. Reforging was removed for obvious reasons. Hit Chance and Expertise were also removed because they added more problems then they brought fun to the game. With the greater emphasis on primary stats, Hit Chance and Expertise were too powerful and players no longer chose them, but were obligated to pursue them, functioning as a sort of stat penalty or tax where players needed to commit x amount of gear to satisfying them before moving on to other things. In their place, we got 2 new damage enhancing secondary stats – Multistrike and Versatility. Multistrike worked like a mini-me version of Critical Strike, instead of infrequent large hits, this was more frequent smaller hits. I wrote a blog entry on Multistrike awhile back saying it’s the designer’s preferred Critical Strike because it’s more predictable and easier to model for balance purposes. Versatility was a flat bonus to all damage dealt and a reduction to all damage taken.
A small change that was implemented was the Warforged system. Whenever gear dropped, it had a chance to become a variant from the standard gear. It would randomly trigger 1 or more of 3 bonuses: a +6 to gear item level for an incremental increase in stats, a socket for a player made gem with a stat bonus to be placed in, or a new set of stats called tertiary stats. Tertiary stats did not affect dps, but were just little perks like a 1-2% bonus to movement speed, a 1-2% self-heal based on damage dealt or dying would not incur damage to gear saving on repair costs. Since these did not boost output, players were lackluster in their response to tertiary stats. These bonuses are small, but were included for raiders who had thorough beaten a raid and had no more gear to gain – now they always had a chance of getting a minor upgrade to their current gear. Hopefully, this would keep them from getting bored. For everyone else, it was just meant to be a little something extra above and beyond an expected reward. Blizzard has held to the belief that these unexpected extras are what breed excitement for players. And they pursue this as a design goal.
The problem with this iteration was that there were simply too many secondary stats. You may ask how, 2 were taken away and 2 were added. But the 2 taken away weren’t choices, they were absolute must haves with satiation points. The 2 new ones added were equal to the other 3 damage enhancing stats where more is more. First, since gear has a combination of 2 of these stats, with the possibility of only 1 stats but a large amount of it, there were now 20 combinations of secondary stats. It was harder for players to find the optimal gear. Second, secondary stats aren’t created equal; they perform with different effectiveness for each class based on how they’re built. So, more secondary stats means a greater variance in player performance and a harder time closing that gap. Moreover, with 20 stat combinations and Warforged procs, it still was not simple to determine whether a piece of gear was an upgrade, downgrade or didn’t change anything at all. Players still needed to resort to third-party programs to number crunch things for them. Lastly, Multistrike was made too powerful, so almost every specialization in the game saw it in their top 3 desired stats – which lacks choice.
This brings us to the current expansion. Multistrike was removed as a secondary stat. This cut down the number of possible gear combinations by a full 1/4, to something more manageable and removing some randomness. We see that the standardization of item levels which provided the backbone to empowering primary stats and a core to balance the game around, as well as the inclusion of new raid tiers to open up raiding to the masses brought home to roost the problem of inflating numbers to absurd proportions. They’re still combating that, and we’re likely to see another retroactive stat squish in the next expansion. They were too pressed for time to do it for the one that just launched. To cover for this, they removed ui options from the game’s menu to obscure player numbers. We now see a health bar instead of the bar along with the plastering of our millions of health. Our damage taken has always been displayed with scrolling numbers, which now are massive and difficult to read in a split second, so they’re obscured entirely. Instead, we get the cloudy-red haze over our screen when we get near death that way most cover shooters use.
A new design philosophy was introduced. As part of the gear escalation throughout an expansion, yes players end an expansion far more powerful, but their class handles differently. For example, a Fury Warrior is centered on scoring critical strikes with an ability to go into an enraged state where they do a large amount of bonus damage (figure 20-40% depending on gear). This enrage window only lasts for 6 seconds though. They work off of a resource called rage, and they can stockpile enough of it to let off about 4 abilities, which can be gotten off in the brief enrage window. The ability which triggers this enrage has a 4 second cooldown, so every 4 seconds you have a chance to do really great damage, and that’s fun. That chance is determined by a player’s Critical Strike chance. At the start of an expansion, a player may have a 35% chance to crit with that ability, so on average, they enrage every 12 seconds for 6 seconds, a 30% uptime spent in the fun-zone. Since the damage in that window is so much higher, the player spends their downtime conserving rage in order to spend it all in that brief window. This means the player spends all that time waiting and trying to press as few buttons as possible. By the end of the expansion, that chance will have grown to 70 or 80%, and so the player is enraged for something more like 14 seconds out of an 18 second widow, spending most of their time in the fun-zone. About half of the specializations have some equivalent to this where greater secondary stats accrued over the course of the expansion allows the class to play more smoothly and in its zone. So, player enjoyment goes up over the course of the expansion as secondary stats generate higher percentages of random events. To address this, gear started seeing a large increase in secondary stats in order to rush players into the zone far earlier in the expansion.
This is a good design philosophy to even out the fun curve and remove the delay in getting to it. But it’s created a new set of problems in that the abundance of secondary stats on gear has made them more powerful then primary stats. The first half of that problem is that it exacerbates the discrepancy between optimal and sub-optimal builds of secondary stats from gear. This frustrates the balance struck by the revision to primary stats wherein they provided a strong baseline of performance for classes to balance the game around. Second, with primary stats no longer the most potent, item level is no longer able to answer the question of whether a piece of gear is actually an upgrade or not. This is because the secondary stats provide such variance in performance that having more is not always better. It’s the difference of having 10 pennies vs 5 nickles. Sure 10 is more then 5, but it’s worth so much less that it’s not really more. So, players still need to consult a third-party to evaluate gear to see if it’s an upgrade or not. There are some cases where it’s clear when something is an upgrade, but about half the time the player has no clue. I should clarify that this problem should even out in time, not sure how long, as the big jump in secondary stats was there for launch but will not continue to escalate as such, so eventually primary stats will reassert their place in ratings again. Blizzard is aware of this problem and discussing how to fix it for the future.
Muddying the waters further, the Warforged system was expanded upon to scale even higher. The bonus item level procs are now in increments of 5 instead of 6, easier number scheme to count with. But, the proc can now go stack some 15 times over, meaning a player could earn a piece of gear that should be 815, but turns out to be 895. Sure, that’ll be exciting when you get it, because you’re not gonna find anything better for the next 6 months. But it also means that every piece of gear you get for that slot for the next 6 months is probably worthless and not worth getting excited about or pursuing. And what if that 1 in a million (fairly accurate odds actually) bit of luck boosted a piece of gear with terrible stats and would be improved upon by an item 20 item levels lower? Wouldn’t feel so good anymore. This has the effect of viable gear coming from every area of the game. First, it means we once more need to a third-party to sort through the dozens of possible gear options for every piece of gear to find what’s an upgrade and what isn’t. Second, while Blizzard looks at this as a plus, no content becomes obsolete (what we get for leaving the game in droves last expansion because of a lack of content), the response is that players obsess again over running all the content that might offer an upgrade, devoting more play time to their main and neglecting alts as well as just devoting more play time in general. I’ve mentioned player obsession twice now, and it should be made clear, we do this to ourselves because players who twink want every advantage. It’s a sad truth, but designers need to be careful to save players from themselves. Our eyes are always bigger then our stomachs so to speak. A designer has to strike a difficult balance between giving us what we want and knowing when to cut us off.
Related to the Warforged system is the expansion of dungeons with the Mythic+ system. Dungeons used to cap at heroic, so after a few weeks raiders were done with them and all of their gear came from raids or vendors associated with that raid. The Mythic+ system allows dungeons to scale up infinitely providing an ever-increasing difficulty challenge for players to rival raids and provide equivalent gear. Except there’s 10 dungeons, and they have very large loot tables. Moreover, in these higher difficulties, the loot does not drop from the boss, but from a chest at the end of the dungeon. The loot table for that chest is that of the entire dungeon. To clarify that, let’s say a boss drops 12 pieces of gear in total, 3 of which can be used by any given player. Let’s assume a player wants 1 of those pieces of the loot specifically with the other 2 not being of value. If the boss dropped the loot and you won any, you’d have a 33% chance of getting what you wanted. Well, if there’s 5 bosses in a dungeon, each of which drop 3 pieces of loot you can use, but still only 2 or 3 total are useful across the whole thing. Well, that chest at the end of the dungeon can drop from a variety of 15 pieces of gear, and a player may only be interested in 3 of them, giving them a 20% chance of getting the loot they want – assuming any loot drops at all. Then factor in that the gear that does drop may have a Warforged proc, elevating its rating. Players need to look in several places now to find the best gear. While diversity is fine, it also means having to contend with an exponential amount more randomness in the quest for gear.
On the simplification side of things, players no longer hunt for weapons in this expansion. Blizzard assessed that weapons are not an interesting piece of gear anymore, instead just being another item to collect for its stats lost in the cumulative stat numbers of the other 13 pieces of gear. So, they removed weapons and instead gave each specialization a unique weapon that would grow over time and provide new abilities and bonuses that are essentially baked into the class’ balance. This was also part of their push to make each class and specialization feel more unique and be more fully immersed in its fantasy with unique elements. While this removed our need to farm for another piece of gear, it meant we had a year-long journey to fully empower this weapon, a different grind that’s so far proven to be more interesting.
Conversely, there’s several new legendary pieces of gear to be farmed for in the world. They’re the most powerful pieces of gear in the game we can acquire and would relegate almost any other gear for the same slot useless. The idea once more, we’ll be excited when we get it. But then we won’t care about any of the other gear we get for that slot for months to come. And moreover, the grind to get one of these things is really long and arduous based solely on luck. It has what they refer to as bad luck protection. Let’s say you run content which has a .5% chance of dropping one of these items. If it fails to drop, the next time that becomes a .6% chance and so on until eventually you will get it. Except there’s 8 of these items, which one you get is randomly, and only 2 or 3 are considered worth having by players.
There’s several handful design goals at work here, and they don’t always play well with each other:
- Higher item level gear should be an upgrade – players should not need a third-party program to evaluate gear
- Keep content relevant to players – this comes in the form of casual players having access to raids but also means raiders shouldn’t discount content they’ve out-geared.
- Gear rewards should provide occasional moments of joy when something really awesome drops
- Players shouldn’t want to vendor off a gear reward they worked for, especially one of a higher item level
- A class should be fun to play early in the expansion, not need to wait for a year to hit its sweet spot
- Keep the system’s numbers to a reasonable level
- Stats on gear should feel meaningful
- Have a baseline to be able to balance such a complex game
- Embrace class fantasy – there are 12 classes with 36 specializations in the game and they should each feel unique in gameplay yet still arrive at roughly equal performance