Does diversity make a good metagame?

If you follow discussions of Magic: the Gathering metagames, especially Modern, it seems common knowledge that a diverse metagame is a good metagame. The more viable decks there are, the more fun you can have, right? While I don’t necessarily disagree with this statement, I am always wary of trusting common knowledge – especially common knowledge formed from anonymous internet discussions.

For those who don’t know me, my background is in biological science, and by harnessing years of student debt, I hope to apply some scientific method to assess whether diversity is in fact good for Magic. In nature, biodiversity is a great tool for assessing how healthy an area is. For example, having lots of different plant species present in a field results in a greater variety of insects and birds which can forage for their favourite foods. Consequently, biological methods are quite good at finding ways to quantify how diverse an area is.

When I think about the top decklists for a GP, I can’t help but imagine a similar situation, one where each deck archetype is its own “species” and together they form the biodiverse landscape of the format. I think we would all agree that a diverse Magic meta includes both a lot of different decks (richness) and that none of these decks are overwhelmingly present compared to any other (evenness). It is these concepts of richness and evenness that we will be using to define diversity, and in turn to see if “good” and “bad” metas have different levels of diversity.


This is our plan of attack:

First we will explore exactly how we calculate diversity.

Second, we will introduce our six candidate Standard metagames, spanning the last fifteen years.

Third we will process these Standard samples, and with a bit of statistics, see if they have different diversities.


So to begin with, a science lesson…

Research Assistant and Lab Maniac

Simpson’s Diversity Index is a (you guessed it) measure of diversity. It uses our concepts of richness, the number of different species, and evenness to determine the relative abundances of each species to one another. The Index is below, but to really understand it we’re going to run through an example together.

Simpson's Diversity Index

D is our Simpson’s Diversity Index value, and the higher it is the more diversity is present. Lowercase n is the number of individuals in each group, while uppercase N is the total number of individuals. The weird E (an upright capital Greek Sigma) just means sum of, so it’s showing the total of n(n-1) for all of the different groups.

So, let’s say that you’ve headed out to a field, waved a net around, and captured sixteen butterflies. You group them by species and count how many of each species makes up your total sixteen.

Butterfly Example

We know we caught sixteen overall so N=16, and N(N-1) is 16*15 is 240. As we use n(n-1), all of the species we only found one of end up as 1*0, so we don’t need to worry about them. The remaining species give us 6 (3*2), 12 (4*3), and 2 (2*1). If we sum those together, we get a total of 20.

Plug these numbers into the equation, and we have 1-(20/420) = 0.916667, which for our purposes we’ll say is 0.92. Et voila! Our diversity index (or SDI) for the field is a very diverse 0.92!

Obviously this field isn’t real, but GP Warsaw back in 2013 was, which is what I’ve chosen for our sample of the beloved Innistrad – Return to Ravnica Standard. If we remove those butterfly species and replace them with their respective ten different decks that made up the top 16 of that event, you can begin to see my justification for utilizing this approach.

Metagame for INN-RTR
If that’s not good enough to convince you, the Royal Geographical Society even suggest using Simpson’s Diversity beyond ecological applications 

Dreadbore and Huntmaster of the Fells

Now we’ve learnt how to delve into diversity, it’s time for me to show which metas I’ve chosen to compare and be grouped into “good” or “bad” metas. I’d like to clarify that all of Team MoM helped me decide on these metagames before saying what the article was about, as to not bias the suggestions.

The Good:

ISD-RTR: The Fan Favourite GP Warsaw 2013

We’ve already shown this was a very diverse event, and from my understanding the ability to play almost whatever you wanted was why this period of time is so well remembered. The wealth of powerful cards meant you felt evenly matched whether playingLiliana of the Veil, Snapcaster Mage, or Huntmaster of the Fells – most decks felt comfortable.

IXL – GRN: Recently well-received GP Shizuoka 2018 

It’s hard to be nostalgic about something so recent, but general consensus seems to be that recently Standard has been in exceptional shape. A variety of strong and proactive strategies existed at the tail-end of last year, and I think control being non-dominant also appealed to a lot of players. Carnage Tyrant, March of the Multitudes, Niv-Mizzet, Parun – all of these made sure whatever your inclination you could play something viable.

ALA-C11: Calm before the Caw GP Manilla 2010 

Bloodbraid Elf, Jace, the Mind Sculptor, Noble Hierarch. Who needs Modern when you had ALA-C11? A lot of very powerful cards meant a lot of powerful and interesting strategies could exist without any one deck ruining the format. 

The Bad:

ZEN-NPH: The Caw Blade Era at GP Singapore 2011 

Did you ever hear the tragedy of Caw Blade? I thought not. It’s not a story that Standard players would tell you. It’s a pre-Modern legend. Caw Blade was a deck so powerful and format-warping it could use Squadron Hawk carrying a Batterskull to gain you life… It had such dominance it could use Jace, the Mind Sculptor and Stoneforge Mystic. It became so powerful the only thing it was afraid of was totally killing Standard as a format, which eventually, of course, it did. Unfortunately it taught WotC all it could, and then WotC banned the deck out of existence shortly afterwards. Ironic. Other decks could not beat it, but it brought about its own demise.

BFZ-KLD: The Era of Aetherworks and Emrakul GP Madrid 2016

There was actually a lot of deck choices around this time period, but it turns out that not printing answers to mechanics in the set they feature in leads to a generally miserable time. This particular GP was subject to the fun of Aetherworks Marvel cheating titans into play and Emrakul, the Promised End stealing turns from the opponent before they were bannedimprisoned in the moon”.

ONS-DST: Affinity Aggravation GP Bruxelles 2004

Affinity was a deck that abused Skullclamp along with cheap creatures and the artifact lands (such as Vault of Whispers) to do all manner of unfair things. Peak Affinity is generally considered to be when Fifth Dawn released, along with Cranial Plating which edged out Goblins as a tier 1 deck. Sadly for us the only deck information I could get for that era was the top 8 of GP Kuala Lumpur 2004, so we’ll have to go with the slightly less-miserable cutoff at Darksteel.

The Ugly:

Unfortunately, with a lot of these events I couldn’t get more than the top 16. Since I want to make sure each event has the same sample size, I wasn’t able to carry out my original plan of looking at top 64s. If you’d like me to look at the wider metagame, please leave a comment on Facebook or Twitter!

Aetherworks Marvel and Squadron Hawk

With that out the way, let’s get on with the results! (Don’t worry, I’m not going to go step by step through all of it!)

Here are the raw numbers from the top 16s of the GPs we’re looking at.

Metagame Diversity Roundup
Other than INN-RTR being particularly diverse and ZEN-NPH being exceptionally lacking, there’s a lot of overlap. But why should we stop with looking at it when we could use… statistics! I don’t want to subject you all to another lesson, but the short story is that we’re going to have a look at the mean values and then use something called a t-test to see if the two groups are distinct from each other (if you want to know more, I highly recommend Crash Course). In the handy graph below, the bars are the mean values for each group, with the sideways H things displaying the standard deviation, which shows the amount of variance there is within each group.

Plot of Standard diversity from different GPs

Though the mean values are quite far apart, our standard deviation bars suggest a lot of overlap, meaning that our Good and Bad metas don’t really differ that much in terms of diversity. Our t-test says the same thing, showing only weak support for the diversity of the two groups being different (t=-1.3, df=2.98, p=0.285 for anyone wanting the full information). To improve our confidence in this result, we would want to look at more top 16 GP samples to make sure this wasn’t just a fluke.

Wrap up

So what did we learn from all this?

We can definitely take away that particular events can show something being very wrong, or very right. I never played in the Caw-Blade era myself but I’d heard the horror stories, and the format being 25% less diverse than when Aetherworks Marvel was in Standard really put it into perspective. In contrast, I was surprised by just how diverse INN-RTR was, with so many strong cards and viable deck strategies being available.

If we take what I’ve done at full face value, then it seems that diversity isn’t really indicative of which formats are good or bad; or at least they aren’t the only reason. Another point is that top 16s are showing only the highest performing players in big tournaments, so if there’s a high number of tier 2 or 3 decks, then your experience at your average FNM won’t be reflected.  A lot of other factors can contribute to how much people enjoy formats; it could be that when the best decks are control or combo people enjoy the format less, similarly if every deck is reliant on expensive manabases cough KTK-OGW 4c Control cough.

Likewise, looking at just three GP top 16s isn’t terribly reliable. It might be that if we took say, ten GPs and looked at the top 64 of these events, we might find that our initial results are more of an outlier, and our conclusions would come crumbling down… or maybe it would show them even more clearly! It’s worth noting, too, that “good” and “bad” metas are subjective – maybe you loved Caw-Blade, or maybe you think that GP Madrid isn’t reflective of when that Standard format was at its worst. Out of interest, I looked at the top 8 of GP Kuala Lumpur 2004 when Fifth Dawn was in Standard, and it had a SDI of 0.6, so the ability to find more information for older GPs is frustratingly a bit of a limitation to this method.

I hope you enjoyed the article, I’d love to do more like it. I consider this a proof of concept more than anything else, and I’m already working on a project using top 64 decklists. If you like this approach to Magic, or have any ideas to suggest, then please let us know through the website, Facebook, or Twitter!


Liked it? Take a second to support Master of Magics on Patreon!

In response...