Interesting article here from Arxiv titled Finding Common Characteristics Among NBA Playoff Teams: A Machine Learning Approach.
Everyone should skim the paper because it’ll show you that these academic papers aren’t overwhelming, and a lot of times, they’re good at showing steps that go into tackling a machine learning problem. This paper, for example, goes over the basics of decision trees, pruning decision trees, as well as more high powered decision trees. Great progression.
As for what they found, apparently opponent’s stats are the most important determinants in whether a team makes the playoffs, with opponent field goal percentage and opponent points per game leading the way. Play good defense, and doesn’t matter how many points you score yourselves.
Only comment is that if you’re sitting out there thinking “well NBA teams don’t play defense anyway blah blah blah” you’re wrong. Defense is probably the most impressive aspect of a game to watch, especially in the playoffs currently going on.
Importance of variables is a really interesting topic in machine learning, sports especially. Knowing which variables matter can help someone in charge figure out who to draft (Moneyball style) or possibly what aspects to focus on during games or practice. Considering I have a bunch of PGA Tour data, maybe figuring out which stats are important for golfers should be something to focus on here…
Funny, they say that their “dataset was quite large”, so they only provided a sample of the data. 30 teams * 15 years * 44 variables = 19,800 data points. Too big to fit on a table in a paper sure, but don’t think I agree it’s quite large. More like big-ish. Fits in perfectly here.