The Math Behind ELO
If you’ve ever participated in some form of ranked sport or video game, chances are that you have heard about the ELO Rating System. Originally designed for ranking professional chess players, it has been adopted to many other domains such as college football, Scrabble, and video games. ELO’s popularity has been primarily driven by its impressive accuracy considering its simplicity, making it an ideal solution for someone that wants to add a ranking system to their own projects. even the initial version of the “Facebook Algorithm” was implemented using ELO!
I’ll give a brief overview of the ELO algorithm, its applications, and how it can be extended to better model the skill distributions of various playerbases.
The Algorithm #
ELO is generally used to rank a set of players by using a set of historical match results. Suppose we have 4 competitive pingpong players; Amy, Brad, Cindy, and Dirk.
Winner  Loser  ELO Change  Amy  Brad  Cindy  Dirk  

      1200  1200  1200  1200  
Amy  Brad  20  1220  1180  1200  1200  
Dirk  Cindy  20  1220  1180  1180  1220  
Amy  Cindy  18  1238  1180  1162  1220  
Dirk  Cindy  17  1238  1180  1162  1237 
Above, the ELO Change column is the result that ELO algorithm generates for us. Given three values  PlayerA ELO, PlayerB ELO, and PlayerA’s Score (whether they win or lost)  the algorithm calculates the change amount, which represents the amount of ELO points that playerA will take from playerB.
The concept of an ELO implementation can be defined in a few steps:
 Every player in a league has an “Elo Rating”, which is a point total representing their skill level relative to the rest of the league.
 New players are automatically assigned an initial rating.
 When matches are completed between two players, their ELO points are compared. The winner of the match will “take” a certain number of ELO points from the loser.
 Players with higher ELOs are expected to win, so winning will net them few points. Conversely, when an underdog wins, they take many points from the winner.
Performance in an ELO system is not measured absolutely; rather, it is inferred from wins, losses, and draws against other players.
The actual ELO algorithm consists of 5 parts:
 Rn  The new rating of the player (After the match)
 Ro  The old rating of the player (Before the match)
 S  The actual score/outcome of the match
 E  The expected score/outcome of the match
 K  The Kfactor
Rn = Ro + K * (S  E)
Rn and Ro should be selfexplanatory; these are simply the previous ELOs of each player. To fully understand the algorithm, lets take a deeper look at S, E, and K.
S (Actual Score) #
This is the actual score of the match. It will always be 0 <= x <= 1
. In the case of chess:
 a win counts as 1,
 a tie counts as 0.5,
 and a loss counts as 0.
In each case, this is the outcome for the player whose ELO is currently being calculated (So in our example above where Amy beat Brad, Amy would have a score of 1 for this calculation, while Brad would have a score of 0).
Note: Some elo implementations will account for the margin of victory (One example being 538’s NFL Ratings). This is nonstandard, and will not be covered it in this post.
E (Expected Outcome) #
This is the meat of the algorithm. It takes two inputs: Ra and Rb (The old ratings for player A and player B), then returns an expected win percentage for player A. It uses a Cumulative Distribution Function for Score Estimation.
While you don’t need to know what a CDF is, it is helpful to understand: 1 2
While any CDF can be used as a score estimation function, the most popular choices are the Normal and Logistic functions. While Arpad Elo’s original algorithm utilized the Normal CDF, the World Chess Federation has since transitioned to use the Logistic implementation.
(Logistic is blue, Normal is red)
The core difference is that the default Logistic distribution has thicker tails, meaning that “upsets” are assumed to be more likely (which helps account for inconsistency in player ability). I’ll show how to implement the algorithm using a Logistic CDF, but note that a Normal CDF (or any other CDF) could be easily substituted for the score estimation function.
Score Estimation  Logistic CDF
The Logistic CDF looks like this:
I’ve implemented it in pseudocode below .
(You can see that we use mean μ=0)
x = Ra  Rb
exponent = (x/scale)
expected = 1/(1 + e^exponent)
There are a few parts here:
 Ra  Player A’s old elo
 Rb  Player B’s old elo
 x  The difference of Ra and Rb
 e  Euler’s Number (A mathematical constant)
 scale  The preselected standard deviation for the scoring system
The scale affects the “width” of the distribution. Larger scales will stretch the distribution, increasing the ELOs of the best players, while decreasing the ELOs of the worst players (Conversely, smaller scales will compress the distribution).
However, the scale parameter by itself is difficult to reason about. How much more is a scale of 2 than a scale of 1? We can make some substitutions to this scale parameter to allow us to better reason about its relationship with player skill.
The FIDE Chess implementation transforms this scale into a factor which represents a “10X” skill differential. By substituting s > n/ln(10)
we are now effectively saying that if PlayerA has n more ELO points than PlayerB, they are 10 times better (and should win 10/11, ~90.1% of the time).
[See footnote for indepth explanation of this substitution]^{1}
Our formula becomes:
x = Ra  Rb
s = n/ln(10)
exponent = (x/s)
E = 1/(1+e^exponent)
Which can be further simplified into:
x = Ra  Rb
exponent = (x/n)
E = 1/(1+10^exponent)
Now, after selecting a value for n, all you need to do is pass in Ra and Rb, and the Score estimation result will be returned. (The FIDE implementation uses n=400, but any positive value can be used). Let’s take a look at a few examples.
# When players are equal strength (n = 100)
A = 1000
B = 1000
x = (1000  1000) = 0
exponent = (x/100) = 0
expected = 1/(1+10^exponent) = 0.5
# ∴ Score estimation for player A = 50%
# When player A is 10x better (n = 100)
A = 1100
B = 1000
x = (1100  1000) = 100
exponent = (100/100) = 1
expected = 1/(1+10^exponent) = 0.9090...
# ∴ Score estimation for player A = 90.90%
# When player A is 40 ELO points better (n = 100)
A = 1234
B = 1184
x = (1234  1194) = 40
exponent = (40/100) = 0.4
expected = 1/(1+10^exponent) = 0.7152
# ∴ Score estimation for player A = 71.52%
# Calculation for player B
x2 = (1184 1234) = 50
exponent2 = (50/100) = 0.5
expected2 = 1/(1+10^exponent) = 0.2847
# ∴ Score estimation for player B = 28.47%
# Notice how it equals (1estimation(A))
Now, we can make predictions of match outcomes  we just need to figure out the Kfactor multiplier that is required to calculate the postmatch ELO.
Kfactor #
An effective Kfactor is an extremely important part of any ELO implementation. The Kfactor determines the “sensitivity” of how wins and losses impact your ELO.
Suppose we use the following model:
 everyone starts with an ELO of 1200
 we use a score estimation function where n=400.
If K is large (eg. 70), then unexpected wins (or losses) can have a huge impact on a player’s ELO, and cause the rankings to be volatile.
playerA = 1000
playerB = 1300
# playerA is expected to lose against playerB
E = 1/(1+10^(300/400)) = 0.150979557
# playerA wins against playerB
playerA = 1000 + 70 * (1  0.150979557)
# ∴ playerA = 1059.43143
# ∴ playerB = 1240.56857
This is a huge shift in ELO  even though playerB is expected to be ~5.6 (0.849/0.151) times better than playerA, a few lucky wins for playerA (or a few unlucky losses for playerB) could easily close the gap between their ELO rankings.
However, small K factors (eg. 5) can result in stagnant rankings, and would make it very difficult for players to gain ELO  even if they have been practicing/training, and are much more skilled than they had previously been.
playerA = 1000
playerB = 1300
# playerA is expected to lose against playerB
E = 1/(1+10^(300/400)) = 0.150979557
# playerA wins against playerB
playerA = 1000 + 5 * (1  0.150979557)
# ∴ playerA = 1004.2451
# ∴ playerB = 1295.7549
It could take hundreds of games for playerA to match playerB’s ELO, even if playerA has trained and is now of similar skill [to playerB].
Kfactor and Score Estimation
One last thing to consider is that Kfactor should be appropriate for the current value of n (for score estimation).
The Kfactor essentially represents the “Maximum” number of points that a player can gain during a match (If the players are of equal skill level, they will gain k/2 ELO points).
The n score estimation parameter represents the ELO point differential for a player that is 10x better than another player (meaning that a player with an ELO of 1400 is 10x better than a 1000 ELO player).
This means that, if a player always plays against players of equal ELO level, they would have to win n/(k/2)
games in order to be 10 times better than they originally were. For a typical online chess player, this would typically be 400/(15/2) = 53.33 games
, or K = 800/53.33
The official World Chess Federation ratings actually use a tiered Kfactor system, meaning that players could have different Kfactors (and recall that they use an n of 400 for score estimation):
 K=40 for new players until they play 30 games
 K=20 for players with > 30 games and never had an ELO > 2400
 K=10 for players with > 30 games and have had an ELO > 2400
This system assumes an amount of uncertainty for the ratings of new players, making it easier for them to reach their appropriate ELO quickly. It and also makes it more difficult for extremely skilled players to lose ELO (eg. due to unlucky losses). Similar implementations can be seen in video games such as League of Legends or Overwatch, where you must play 10 “placement matches” before your ELO becomes publicly available.
Note that when players have differing ELOs:
 playerA gains/loses
Rn = Ro + K_a*(SE)
 playerB gains/loses
Rn = Ro + K_b*(SE)
You can see that this differs from our analogy of “taking” points from the other player (because each player would gain/lose a different amount). That is okay and is intentional, but should be taken into consideration when deciding to implement variable Kfactors.
Soccer/Football rankings also use the Kfactor in interesting ways. Sometimes, models will give matches of high importance a higher Kvalue (also with n = 400):
 K=60 for World Cup Finals
 K=40 for World Cup Qualifiers
 K=30 for other other tournaments K=20 for friendly matches
In matches such as the World Cup Finals, the players are typically trying as hard as possible, so it is an accurate representation of their skill level. However, in friendly matches the team might not be trying as hard as they can  so upsets may be more likely  and thus why the ranking systems assign less importance to those matches.
To wrap up my recommendation for Kfactors:
 If few matches will be played (such as American Football, with ~16 games), then the rankings should be more volatile. Higher K factors [relative to n] would be better, so a K factor from
K = (2*n)/20
toK = (2*n)/25
is more appropriate.  If many matches will be played (such as Baseball, with ~162 games), then the rankings should be less volatile. Lower K factors [relative to n] would be better, so a K factor from
K = (2*n)/30
toK = (2*n)/45
is more appropriate.
(Recall that the denominator in the above calculations is the “Number of wins to become 10 times better”).
Some leagues will start with a base Kfactor for the first season or two of play, and then refine the Kfactor once they have more data. While Kfactor tuning is outside the scope of this post, you can take a look here, here, or here.
Some algorithms even use a variable K factor for every player, based on the current “confidence” of their ELO ranking. One fantastic example is the Glicko Rating System. Another example is the Sonas Rating System.
You can also read some more posts about the KFactor here, here, or here
Example Implementation #
At this point, we know how to design, tune, and implement an ELO algorithm from the ground up. Lets walk through the implementation and application of ELO for our initial pingpong matches.
First, we need to make 3 decisions.
 n for scoring estimation. Lets (arbitrarily) select 50.

Kfactor. Because it is unlikely that we will have a lot of matches, lets select a higher Kfactor [relative to n]:
K = (2*50)/20 = 5
.  Initial ELO. Lets (arbitrarily) select 100.
Now, we can see our initial table rankings (no games have been played yet.
Winner  Loser  ELO Change  Amy  Brad  Cindy  Dirk  

      100  100  100  100 
Game 1
We have the first game coming up: Amy vs. Brad. The score estimations are easy:
# x = Ra  Rb
# E = 1/(1+10^(x/n))
Amy_E = 1/(1+10^(0/50)) = 0.5
Brad_E = 1/(1+10^(0/50)) = 0.5
The game is played, and Amy wins! Lets calculate their new ELOs.
# Rn = Ro + K*(S  E)
Amy = 100 + 5*(1Amy_E) = 102.5
Brad = 100 + 5*(0Brad_E) = 97.5
The resulting table of rankings:
Winner  Loser  ELO Change  Amy  Brad  Cindy  Dirk  

      100  100  100  100  
Amy  Brad  2.5  102.5  97.5  100  100 
Game 2
In the next game Dirk plays against Cindy, and wins. Using similar calculations to those above, we will get the following result:
Winner  Loser  ELO Change  Amy  Brad  Cindy  Dirk  

      100  100  100  100  
Amy  Brad  2.5  102.5  97.5  100  100  
Dirk  Cindy  2.5  102.5  97.5  97.5  102.5 
Game 3
Now, Amy plays against Cindy. They have differing ELOs, so this calculation is a little more interesting than the last two:
Amy_E = 1/(1+10^(5/50)) = 0.557311634
Cindy_E = 1/(1+10^(5/50)) = 0.442688366
You can see that the algorithm has given Amy a slight edge. She has won one game, and Cindy has lost one game, so we know that Amy is better  but at this point, it is tough to say how much better she really is.
The game is played, and Amy wins again. Lets calculate their new ELOs.
Amy = 102.5 + 5*(1Amy_E) = 104.713442
Cindy = 97.5 + 5*(0Cindy_E) = 95.2865582
Some algorithms will round to a full integer, and others will leave it as a full float value. In our case, lets round to two decimal points.
Winner  Loser  ELO Change  Amy  Brad  Cindy  Dirk  

      100  100  100  100  
Amy  Brad  2.5  102.5  97.5  100  100  
Dirk  Cindy  2.5  102.5  97.5  97.5  102.5  
Amy  Cindy  2.5  104.71  97.5  95.29  102.5 
Game 4
Lets play one more match. Dirk vs. Cindy.
Cindy_E = 1/(1+10^(7.21/50)) = 0.417746101
Dirk_E = 1/(1+10^(7.21/50)) = 0.582253899
We can see that Dirk has the advantage. The game is played, and Dirk wins against Cindy. Their resulting ELOs:
Cindy = 95.29 + 5*(0Cindy_E) = 93.201269495
Dirk = 102.5 + 5*(1Dirk_E) = 104.588730505
Winner  Loser  ELO Change  Amy  Brad  Cindy  Dirk  

      100  100  100  100  
Amy  Brad  2.5  102.5  97.5  100  100  
Dirk  Cindy  2.5  102.5  97.5  97.5  102.5  
Amy  Cindy  2.5  104.71  97.5  95.29  102.5  
Dirk  Cindy  2.5  104.71  97.5  93.20  104.59 
A really interesting point to note is that Amy actually ends up with a higher ELO than Dirk  even though they started with the same ELOs before they played against Cindy (which they both ended up winning). The reason for this is that Amy played Cindy before Dirk did. At that point, Cindy had a higher ELO (and was expected to be better). By the time Dirk played her, Cindy had a lower ELO, and thus he received less points for winning (as his expected win chance was higher). This is an interesting side effect of ELO, and shows how ELO rankings can be relatively unreliable until many games are played (which is when the ELO rankings start to become a better representation of each player’s relative skill).
WrapUp #
All things considered, ELO is a fantastic ranking system. It’s relative simplicity makes it easy to implement and understand, while also making it easy to tweak the algorithm and apply it to many different situations.
The next time you need to add a ranking system to your app or game, I hope that you consider using ELO.
Footnotes #
^{1} We are trying to make the scale parameter more meaningful for our model. The goal will be to find a substitution ny for s such that, when playerA has n more ELO points than playerB, they will be expected to be 10 times better than playerB (Meaning that our of 11 games, they are expected to win 10 and lose 1). In this case we have arbitrarily chosen 10x  we could have also chosen 5x or 2x (and ended up with a different resulting model). The unknown that we are solving for is y.
# Original Logistic CDF
10*(1/(1+e^(x/s))) == 1/(1+e^(x/s))
# Substitute s > ny
10*(1/(1+e^(x/ny))) == 1/(1+e^(x/ny))
10*(1/(1+e^(n/(ny)))) == 1/(1+e^(n/(ny)))
10*(1/(1+e^(1/y))) == 1/(1+e^(1/y))
10*(1/(1+e^(1/y))) == 1/(1+e^(1/y))
# Reciprocals of both sides
(1/10)*(1+e^(1/y)) == 1+e^(1/y)
1+e^(1/y) == 10*(1+e^(1/y))
1+e^(1/y) == 10*+10*e^(1/y)
1+e^(1/y) == 10*+10/e^(1/y)
1+e^(1/y) == (10*(e^(1/y))*+10)/e^(1/y)
e^(1/y)(1+e^(1/y))== 10*(e^(1/y))*+10
e^(1/y)(1+e^(1/y))== 10 + 10*(e^(1/y))
e^(1/y)(1+e^(1/y))== 10(1 + e^(1/y))
e^(1/y)== 10
1/y== ln(10)
y == 1/ln(10)
# Therefore, s > n*(1/ln(10)) is the required substitution.