Turning March Madness Upset Down Part 2:

Bracket Results Analysis

If there is one thing that screams April, it is the championship game of March Madness. This year I decided to join a March Madness pool along with 55 other people and two cats. My pool’s scoring system is unconventional; instead of simply getting points for making the correct pick, your score is based off of how many other people did not make the correct pick. Obviously, I had to come up with an algorithm to optimize my bracket and win back my five dollars with interest.

My pool awards points for making correct picks that other participants do not. For every correct pick, round number x number of people who did not make that pick = points. Since points x probability of occurring = expected payoff, I can easily calculate the expected payoff of any given bracket. I used expected payoffs to inform a greedy algorithm, which I tweaked a bit after my last post but before the Madness actually began. My pre-March Madness post can be found here.

In this post, I first revisit my algorithm to explain my tweaks. Next I analyze sources of error in my approach and areas for improvement. Finally, I detail the results in my pool.

 

Optimized Final Four + Greedy Algorithm

The algorithm I used for my bracket was slightly different than what I described in my previous post. Instead of just running a greedy algorithm, I first found an optimal final four and then ran a greedy algorithm.[1] By finding an optimal final four first, I can ensure the greedy algorithm is not choosing a slightly better first place winner, but then being stuck with much worse options for spots two through four.

I found the optimal final four by looking at all 524,288[2] possible final four placements and calculating the expected value. I then ran the final fours with the highest values through the greedy algorithm and saw which bracket had the highest total expected value.

There was a clear group of four Final Fours with expected values above 580 that I tested further with the greedy algorithm. I chose this cutoff because there was a 12 point drop off between the fourth and fifth best Final Final Four combination. The top four Final Fours are displayed in Table 1.

Table 1. Final Fours with Highest Expected Values[3]

East Region West Region Midwest Region South Region Expected Value
Virginia Gonzaga (1st) Louisville Kentucky (2nd) 587
Villanova Gonzaga (1st) Louisville Kentucky (2nd) 582
SMU Gonzaga (1st) Louisville Kentucky (2nd) 580
Villanova (1st) Saint Mary’s Louisville Kentucky (2nd) 580

The greedy algorithm repeatedly calculates the expected value of each team to go as far as possible in the tournament given the which teams have already been picked, and then picks the team the highest expected value.[4] The second Final Four combination had the highest bracket expected value at 1,318, so I chose that one.

 

Sources of Error and Areas for Improvement

Like any smart data scientist, I made sure the model wasn’t perfect so I would have more work in the future. The three possible sources of error in the algorithm are inaccurate win probabilities from FiveThirtyEight, not applicable user pick data from ESPN, and model misspecification.

 

Win Probabilities

Inaccurate win probabilities hinder the algorithm by leading to inaccurate expected values and causing the wrong team to be chosen. FiveThirtyEight has easy to use round by round probabilities for each team that seem to be well calibrated and competitive with the other best predictors. FiveThirtyEight is also free. Since I am not going to calculate my own win probabilities, the only way I can improve the win probabilities is finding a better data source next year. There is minimal room for improvement here.

 

User Picks

User pick data is the other input to my model, and I used ESPN user picks to estimate how often teams would be picked in my pool. ESPN user pick data is good because it has such a large sample size. Its drawback is that ESPN uses a different scoring system so users have different incentives than in my league.

I used raw ESPN user pick data without trying to adjust it to my league’s incentives. The mean absolute difference between ESPN user picks and my league was 2.1 percent. This may seem small, but there is still a lot of room for improvement because the largest discrepancies were in the most important picks. Five of the top eight seeds had absolute differences above five percent. Unfortunately for me, the team with the largest variance by far was my champion Gonzaga, which was picked 15+ percent more in the later rounds in my league than in ESPN.[5] There is room for improvement in my user pick estimation.

In the future, I would like to improve the approximated user picks by using a machine learning model to minimize the difference (or sum of the squared error) between the ESPN user picks and my league’s picks. I could weight the error to put more emphasis on getting later round approximations correct because later rounds are worth more points. This model could be a simple multiplier to discount favorites and boost underdogs. It could also be a more complex machine learning algorithm that takes in a team’s ESPN user picks, seed, ELO rating, conference, etc. and tries to predict the percent of user picks in my pool.

 

Greedy Algorithm

The final area for improvement is the greedy algorithm. A major weakness in the greedy algorithm is that win probabilities are not matchup dependent. For example, suppose the algorithm picks Gonzaga to win the tournament, and subsequently chooses Gonzaga to beat North Carolina (one seed), Villanova (one seed), Arizona (two seed), and West Virginia (very good number four seed). This is the hardest path possible for Gonzaga. The likelihood of Gonzaga winning the tournament will be significantly less than the average likelihood given by FiveThirtyEight, quite possibly making another team a more attractive tournament champion. Ideally, the algorithm should be able to account for how individual matchups affect the likelihood of advancing to each round.

In addition, a drawback to all greedy algorithms is that by making a locally optimal decision, they may be sacrificing a globally optimal outcome. I have somewhat assuaged this possibility by finding the best Final Four combinations and then running the greedy algorithm. Since the choice the of Final Four accounts for 40-50 percent of the total score, it’s safe to assume a very good Final Four combination will lead to a very good bracket. Still, the algorithm could be improved by considering more than the top four Final Four combinations or starting with the best Elite Eight combinations.

 

Analyzing My Bracket’s Results

As is often the case in data science, I spent 80 percent of my time devoted to the fixing the names of the play-in teams and 20 percent of my time devoted to actually writing the algorithm. And, as is often the case in data science, after dozens of hours of work I produced one single graphic: the bracket. Figure 1 displays my bracket with correct picks in green and incorrect picks in red.

Figure 1. March Madness Bracket with Results

 

The bracket’s three most important aspects are two lower seeded elite eight teams, four of the one and two seeds being eliminated early, and four of the one two seeds making the final four. These picks were key to my performance.

Elite Eight Underdogs

The algorithm chose seventh seed Saint Mary’s and sixth seeded Southern Methodist to both reach the Elite Eight. This makes sense as both teams were underpicked by ESPN users given their win probabilities. This is especially true of Saint Mary’s, which ESPN users picked at a rate 27 percent and 22 percent lower than its win probability in the second and third rounds. Even if I had perfect information about what the other users picked in my pool, both would still be reasonable choices. Table 2 displays the first three rounds of data for Saint Mary’s and Southern Methodist

Table 2. Saint Mary’s and Southern Methodist Win Probability and Pick Percentage in First Three Rounds

Saint Mary’s

Southern Methodist

Win Probability ESPN Users My Pool Users Win Probability ESPN Users My Pool Users
Round of 64 74% 56% 66% 78% 85% 81%
Round of 32 35% 8% 17% 46% 37% 45%
Sweet 16 26% 4% 12% 20% 9% 9%

In order to win my pool, I probably needed at least one of these teams to reach the Elite Eight. Neither did. Saint Mary’s lost in the second round and Southern Methodist lost in the first. Out of their combined expected value of 157 points,[6] I got 20 points.

 

Early Upsets

The second important aspect of my bracket was the elimination of four of the top seeds early on. The algorithm picked these upsets because top seeded teams get very few points in the early rounds, so once the Final Four spots are picked it does not make sense to pick a top seed to only reach a middle round.

The top seeds the algorithm picked against were Duke, Arizona, Kansas, and North Carolina. I probably needed to correctly pick at least two of these upsets to win my pool. I only picked one. My bracket correctly picked South Carolina to beat Duke in the second round, which ended up being worth 112 points. Arizona did lose in the third round, but I did not correctly pick Xavier (like pretty much everyone else). Both Kansas and North Carolina reached at least round four. I was hoping to get at least 250 points from these upsets, but I only scored 112.

 

Final Four

The third and final important aspect of my bracket was, just like everyone else, correctly picking the Final Four. My only correct Final Four team was Gonzaga. Villanova’s early loss was bad but not devastating as they had the lowest expected value of all my Final Four teams at 74 points. Louisville’s second round loss was slightly worse as I only received 3 points versus an expected value of 103 and a maximum value over 300. Kentucky did okay for me, scoring 108 points versus an expected value of 138.[7] Gonzaga nearly made up for all of the other Final Four teams by scoring 387 points which more than double their expected value of 179. Getting just one more Final Four team correct could have easily added 200+ points to final score.

 

Conclusion

In the end, I scored 1,063 points finishing in 24th out of 58, behind one of the cats. I know some dogs can detect cancer so maybe I can incorporate cats into my algorithm next year to help predict the tournament. Using the ESPN user pick data my expected was value 1,318 points, which decreased to 1,225 points once users in my pool made their picks. My score was not too far below my expected value considering the wide variances in bracket scoring. If Kentucky’s 2 point loss to North Carolina had gone the other way, I would have hit my expected value. If Gonzaga had won the championship, I would have hit my expected value. The number of “what if’s” can drive you crazy. I guess that’s why they call it March Madness. It is not the crazy upsets, but the madness within ourselves.

 

[1] The greedy algorithm is described in detail in my previous post.

[2] There are 16^4=65,536 possible Final Fours, and each can finish eight ways. 8×65,536=524,288

[3] Expected value in pool with 58 other people using ESPN user pick data from March 13, 2017 and FiveThirtyEight pre-tournament round probabilities.

[4] The greedy algorithm is described in detail in my previous post.

[5] Overpicking Gonzaga meant that expected value I had calculated for Gonzaga was 16 percent too high.

[6] Expected value was calculated using pool user pick data

[7] Ibid