Full Transparency: March is my favorite sports month of the year. Baseball is about to start, the NHL and NBA playoffs are upon us, and even lacrosse coming back is interesting. However, none of these can top the magnificent glory of the NCAA Tournament. For three weeks, both fans and novices alike take time to fill out brackets, talk about sleepers, and stop working to watch the games, 67 in total.
I always look forward to the release of the bracket so I can start the painstaking task of closely analyzing each game while both doubt, and the ever-so-small voice creep into my head with thoughts of “what if you get the perfect bracket?” Of course I never get very close, as the annual parade of “wait, how did a 13 seed beat my champ on the first day?” marches on. No matter, I love it all the same.
The people I’m in these pools with see me strain over these predictions and become confused. It’s all basically random, right? just go by your gut and you’ll have the same results, right? WRONG! That’s not how you have to approach this kind of task. I realized years ago that I could do better in my predictions if I could only find some kind of mathematical way to inform my picks.
Colin Cowherd said a few years ago that road games are the most accurate way to tell if a team will make a run. He argued that road games test a team’s resilience and talent, arguably the two most important team attributes for March, better than playing in front of a friendly crowd. The theory sounded plausible enough, so I picked every game that year strictly on each team’s road record. That was, predictably, a disaster. I ended up having 14-seed Bucknell in the elite 8, and they got blown out in their first game.
The next year, I realized the process needed to be standardized. A team going .500 on the road in the Big 12 is still much better than a low-major not losing a game away from home. I still liked the idea of factoring in the road record, but I added neutral court games since that’s what the tourney games are. I also cut off the record to games after January first since recent performance is much more indicative of a team than a non-con game before thanksgiving. I turned the record into a percentage and multiplied it by the team’s RPI so that each team’s ability was factored into the equation. Sadly, this too was inadequate. My brackets were certainly improved by the change, but I still didn’t get any of the champions right and generally finished in the middle of my pools.
This year, I’m trying something different: going with my gut. Just kidding, that would be too easy. I spent all day compiling data from the last 8 tournaments and this is what I’ve found.
My theory this year is that the reason my statistically-based projections have failed, is because I’ve focused on individual teams rather than larger trends. Meta-analysis generally works better in forecasts than taking small amounts of data, like road wins over a couple months for example. I wanted to look at expected wins, so I took the records of each team in one seed line and compiled them each year. I made sure to aggregate these records and divide the number of wins by years(8), and teams(4). From these data, we can gather three statistics: a seed’s win percentage, wins per seed line, and expected wins per team. It all looked something like this:
From there I calculated each team’s standard deviation of wins and constructed a 95% confidence interval. I then highlighted any win total outside of this interval, red for below, and green for above. Lastly, I counted which seeds had the most years outside of one standard deviation to test volatility. by the end, each seed group looked like this:
In layman’s terms, I divided each seed’s total wins by 32 to see how many games each team from that seed won on average in each tournament, then looked at which win totals were very far off from the average and flagged those. This allows us to see which seeds are more likely to have wildly different win totals than average. If a seed line doesn’t have many years outside of the confidence interval we can be fairly confident in the expected win total for both the seeds in general and for the individual team.
First, we can list the win percentages from each seed:
1 (78.7%), 2 (69.6%), 4 (64.4%), 3 (63.1%), 7 (53.7%), 5 (50%), 8 (49.2%), 11 (48.4%), 6 (41.8%), 10 (37.3%), 12 (33.3%), 9 (31.9%), 13 (17.9%), 14 (15.8%), 15 (13.5%), and 16 (0%)
A few takeaways:
-11 seeds have a much higher win percentage than 6 seeds despite having a nearly equal record against each other in the first round. This means we can conclude that 11 seeds are more likely to win multiple games in a tournament than 6 seeds, assuming they get past each other.
-I know there are no elite teams this year, but for the love of god, don’t ever, ever, ever pick a 16 seed, They’ve never won. don’t do it.
-the 8/9 matchup is always the most tricky, but 9s never make it very far even with a win in the first round
-1 seeds have won it all 5 of the past 8 years, so don’t feel nervous about picking the team with pressure on them.
Next, we can rank the seeds going from most volatile to least volatile:
3, 4, 2, 8, 10, 13, 1, 7, 12, 14, 5, 6, 11, 15, 9, 16
A few takeaways:
-you can just about bank on the 6/11 matchups being split 2-2. they have similar win expectancies with the least volatility among the relevant seeds.
-I have absolutely no idea what’s going to happen with the 3 seeds. They’re extremely volatile, as only one observed year has had a win total within one standard deviation. you can still probably expect each of them to win at least 1 game.
-1 seeds can be trusted to make it out of the first round. the data show that they are relatively stable collecting at least two wins, so pencil them into the sweet 16.
Make sure to take all this with a grain of salt. I’ve done the best I can do for years now but there’s simply no math, no algorithm, no gut feeling can fully predict the correct bracket. all these projections can do is give you trends to better inform your picks. That being said, this whole bracket experience is about fun, and nothing is more fun than winning your pool. Hopefully, I’ve been able to help you; I’ll be on my couch watching college basketball.