Bayesian thinking & inference, part 2
Level ∞ - BILL BETTOR
Do you have a friend who likes betting and would like to sharpen his/her action? Help them on their path by sharing the BowTiedBettor Substack. Win and help win!
Welcome Degen Gambler!
Bayes Craze! It’s time for the second part of our Bayesian series, a series which aims to introduce you to the world of Bayesian learning & put you in a Bayesian state of mind, or dare we say, a correct state of mind.
If you haven’t read part 1 yet, begin there.
The concepts presented in this series are relatively tough to grasp and unless you have had a proper amount of training with thinking in conditionals, there’ll be a lot of hidden insights that you simply won’t *see* at first glance. Therefore, take your time with the material. Read. Think. Read again. Or, equivalently: Have a belief. Read. Update your belief.
In the conclusion of Part 1, two problems were posed. The first, the investigation of a backtest/sequence of bets, was handled in a comprehensive manner in “Bet Sequences, an analysis”. Today we’ll have a deep look at the second problem:
After having read our posts on web scraping, you have just scraped and stored all the NHL data from the previous season. You are now interested in learning as much as possible about it. For example, since you find live odds intriguing, you wonder what the probability of a team winning a game, conditional on taking a lead into the third period, is.
We’ll examine this question on multiple levels.
First, we’ll do the most basic thing: collect data on prior NHL games, look at the subset of games where one team has taken a lead into the third period and from there determine how often those teams have been able to maintain their lead. This will provide us with a simple, fundamental fraction to keep in mind while e.g. betting same game parlays in the NHL.
As soon as we’re done with this initial step, we’ll advance the analysis with an alternative perspective, a Bayesian perspective. By approaching the problem from these two angles, the deficiency & NGMI-level of the former will become apparent. As we’ll see, there can be huge differences between “learning” and *learning*.
Throughout the post we’ll do our best to discuss details to consider when running similar statistical analyses *in real life*. Remember, we’re not here to write papers, we’re here to back our beliefs with real world money. Thus, we cannot afford to overlook or misinterpret important features related to the management & understanding of data.
A trivial conditional measure
Let’s begin with the basics. We’ve fetched the regular season data for the NHL 2022/2023 from what we believe is the official NHL API [EXAMPLE of full game data for the first 2022-2023 game]. Relevant data for our purposes has been stored in the file below.
Python code for this post can be found in this folder in our RandomSubstackMaterial repo on Github.