|2016 Major League Baseball||regular season (2016-10-02) (0)||all games (2016-10-20) (0)|
|2016 Major League Soccer||all games (2016-10-16) (0)|
|2016 NCAA Division I FBS||all games (2016-10-20) (1)|
|2016 National Football League||all games (2016-10-20) (1)|
What are the Z-Ratings?
The Z-Ratings are an application of the Bradley-Terry system for paired comparisons. The Bradley-Terry system was first applied to American collegiate ice hockey many years ago by Ken Butler, an English statistician studying in Canada. This application, popularly known as KRACH, has been refined and enhanced by John Whelan, a math professor at Rochester Institute of Technology and one of the most venerated members of the Lynah Faithful, the rabid fan base of Cornell University men's hockey. The demonstrated superiority of KRACH over other rating systems (most notably the RPI, as discussed below) has led me to develop Bradley-Terry based ratings for other NCAA sports and professional leagues.
If it's the Bradley-Terry system, why the name 'Z-Ratings'?
I wanted something more compact. The system was originally discovered in 1929 by a German mathematician named Zermelo, hence the use of the letter Z.
How do the Z-Ratings work?
The Z-Ratings take into account only the strength of a team's schedule (according to the games it has already played) and the team's won-lost-tied record. Only games played against other teams in the rating set are considered in the win-loss-tie records (i.e. a Division I team's win over a Division II team doesn't count). Ties are considered to be half a win and half a loss, and only the final result of a game is considered (i.e. a loss in overtime is treated the same as a loss in regulation time). Other factors like margin of victory, home site, injured players, and future games are not considered. Because of this quality, the Z-Ratings are retrodictive, even though a probability of victory for any conceivable matchup is obtainable. (Differences between retrodictive and predictive rating systems)
To quote John Whelan's KRACH page (see above link), "a team's rating is meant to indicate its strength on a multiplicative scale, so that the ratio of two teams' ratings gives the expected odds of each team winning a game between them." For example, if team A has a rating of 300 and team B a rating of 100, then team A is three times as likely as team B to win when they play.
A team's expected winning percentage against any opponent can be easily derived. Continuing with the teams above, team A's expected winning percentage against team B is equal to [300/(300+100)] = 300/400 = .750. Similarly, team B's expected winning percentage is .250, which makes sense since team A is three times as likely to win as team B. This probability is also known as the Expected Winning Percentage, or EWP, of team A over B. The average of a team's EWP's against all other rated teams is the Balanced Schedule Winning Percentage, or BSWP, and is listed in the ratings tables.
If you sum a team's EWP's in all the games it has played, the sum will equal the number of wins that team actually has. The correct ratings are those that satisfy this condition for all teams. Once this condition is satisfied, you could multiply all the ratings by any constant, and they would still be valid. Following the example of KRACH, the Z-Ratings are defined such that a team with a rating of 100 has an BSWP of .500.
In order to deal with teams that have either won or lost all their games, each team is assumed to have tied a (fictitious) game against a team with rating 100. Ken Butler's KRACH page explains more fully why this is necessary. The weighting of these 'tied' games is different for each rating set, dependent on the number of such teams in the set, and is given with each set in italics above (and on the ratings sheets as 'γ'). If all the teams in a set have at least one win or tie and one loss or tie, the fictitious games are not necessary and are not used. Generally, as a season progresses, this weighting factor can be continually lowered, to the point where it is nearly insignificant by the end of the season (if it's still used at all).
A note on the NHL Z-Ratings: During the first four seasons of the shootout in regular-season NHL games (2005-06 through 2008-09), teams going into a shootout received a win and a loss, respectively. Since the 2009-10 season, the shootout has been removed from the records; if an NHL game goes to a shootout, both the teams involved are credited with a tie.
Break down the ratings tables for me.
Note: in this table, I use the word "conference" when referring to subgroupings. These are the same as the divisions in the MLB, NFL, NBA, and NHL ratings sets.
|Rk (leftmost)||The ordinal rank of the team, ordered by the team's Z-Rating.|
|Conf. / Div.||A two-letter abbreviation for the team's conference. The correspondence between conference names and abbreviations can be found in the "Conference ratings" table at the bottom of the ratings sheet, or by moving the mouse over the abbreviation.|
|Z-Rating||The actual rating of the team; as stated above, the ratio X = Za/Zb can be read to mean that the system evaluates team 'a' to be X times as strong as team 'b'.|
|BSWP||Balanced Schedule Winning Percentage: a team's expected winning percentage in an equal number of games against every other team in the group.|
|nCWP/nDWP||non-(Conference/Division) Winning Percentage: a team's expected winning percentage in an equal number of games against every other team in the group, except for the other teams in its own conference.|
|Record||The team's win-loss-draw record.|
|Pct.||The team's winning percentage; the ratio of games won (plus half of games tied, where applicable) to total games played.|
|Ratio||The team's 'success ratio'; the ratio of wins to losses (with ties counting as one-half of each), counting the 'fake' tied games with a weighting of γ.|
|SOS||Strength of Schedule. This is the Z-Rating divided by the success ratio.|
|Rk (rightmost)||The team's ordinal rank, ordered by strength of schedule.|
How are the conferences' strengths computed?
The BSWP listed for each conference in the "Conference ratings" table is the average of the nCWP of all the conference's teams. In other words, the winning percentage of the teams of the Ivy League (for example) if each played a game against every non-Ivy team. The independents are calculated together as a conference - that is, their BSWPs against each other are not considered in the calculation of their BSWP as a group.
How is strength of schedule computed?
The simple SOS definition for team 'a' is Za / [(Wa + (γ/2)) / (La + (γ/2))], where Z is a team's rating, W is number of games won, L is the number of games lost, and γ is the weighting of the 'tied' games (again, actual ties count as half a win and half a loss). The Z-Ratings are a product of the SOS and winning ratio (defined as wins divided by losses, after factoring in the 'tied' games necessary to render the ratings finite).
Why are the Z-Ratings better than RPI?
The Rating Percentage Index (RPI) is a linear combination of a team's winning percentage (25%), the winning percentages of its opponents (50%), and the winning percentages of its opponents' opponents (25%). The latter two factors are included so that the strength of a team's schedule is factored into its rating. Using these criteria, a group of teams (e.g. a conference) can accumulate inflated ratings by playing a large number of games amongst themselves.
In NCAA hockey, the shortcomings are much more apparent than in NCAA basketball (owing to the fact that as much as 80% of a college hockey team's schedule is within its conference, as compared to around 50% in college basketball and lacrosse). However, the shortcomings still exist, and can skew the RPI in favor of teams of lesser quality. Because the Z-Ratings use the ratings themselves to determine the strength of a team's opposition, the system is more robust, particularly when comparing teams from different conferences.
As noted above, only the opponents played and the won-lost records of the teams being compared factor into the Z-Ratings. While some, including ESPN's Jay Bilas, have attacked the RPI for not including other factors, I think limiting the composition of the ratings to wins and losses is essential. To do otherwise could invite allegations of bias. How much do you value home site advantage? Should teams be encouraged to run up the score so their rankings improve?
The NCAA has apparently come up with an answer to the first of those two questions for men's basketball - starting in 2004-05, home wins and road losses are valued at 0.6, and road wins and home losses are valued at 1.4. This adjustment has been seen to benefit top teams in the "mid-major" conferences, at the expense of teams in the middle of the "power" leagues. Applying that sort of "one-size-fits-all" solution discounts the fact that the advantage a team gains from playing at home varies greatly. Since there's really no way to quantify this effect, I have chosen to keep site advantage out of the Z-Ratings.
What are some other rating systems out there?
Disclaimer: The Z-Ratings are not affiliated in any way with WHTZ-FM, more commonly known as Z-100.