Conditional probability & Bayes Theorem

Level 2 - DEEP DIVER

Sep 15, 2022

Do you have a friend who likes betting and would like to earn some risk-free money? Help them on their path by sharing the BowTiedBettor Substack. Win and help win!

Share BowTiedBettor

Welcome Degen Gambler!

It is *theory time*.

→ You better turn your brain *on*.

Probability is as we have seen before a critical concept in betting due to the duality between odds and probabilities. In our Betting 101 we described and discussed this notion, probably the most fundamental one in betting, of how to generate probabilities given certain odds and vice versa. In fact, by iterating this idea over a full set of odds and/or combining different market odds in a clever way, it is sometimes even possible to generate complete probability distributions in games/horse fields etc.. For example, by using the market odds for the win, exacta and show markets in horse racing it is not difficult to map the different odds combinations into implied probability distributions for the top three positions as seen in the picture below. The advantage with this mapping is that in most cases, probabilities are much more intuitive to analyze and work with than pure odds.

What you aim to do in practice is to map market odds → probabilities, perform your research and adjust the probabilities, then execute the inverse map probabilities → odds and identify +EV bets. Every experienced bettor does this on the go, often without even considering the process more than subconsciously.

A one-to-one correspondence between odds and probabilities, i.e. if one is known so is the other one

Autist note: How would you generate the implied probability of a horse finishing third only by having access to the show and exacta markets?

Show odds - odds for a horse finishing in the top 3 positions
Exacta odds [A, B] - odds for horse A winning with horse B finishing second

Though fundamental and obviously extremely useful, standard, ordinary probabilities are not all there is to probability theory. Another handy, however slightly harder to grasp, probability theory tool is conditional probability, which is the primary topic of this post.

What is conditional probability?

In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred.

The standard notation for such a probability is P(A|B) (read probability of event A given event B), i.e. the vertical bar is a symbol for ‘given’ or ‘conditional on’.

To get a better understanding of what a conditional probability is, we list a couple of examples below:

The probability of becoming a BILL BETTOR conditional on reading this text is higher than the probability of becoming a BILL BETTOR conditional on *not* reading this text.
P(BILL BETTOR | READER) > P(BILL BETTOR | NOT A READER).
P(Stanley Cup winners | Conference finals winners) is a conditional probability, and it is of course higher than the simple P(Stanley Cup winners).
P(Man Utd winning the game | Ronaldo scoring a hat-trick) should probably be quite high.
On the other hand, P(Ronaldo scoring a hat-trick | Man Utd winning the game) should be pretty low.
As the previous example shows, P(A|B) should *never* be confused with P(B|A). They are related (see the section below) but they are absolutely not the same. To see this even more clearly, you could for example consider the earlier example of P(Stanley Cup winners | Conference finals winners), which obviously cannot be the same as the sure thing P(Conference finals winners | Stanley Cup winners).
P(Tails | Heads 14 times in a row) is a conditional probability, equal to P(Tails) if the flips are independent, a standard assumption when flipping a coin. This is also the reason to why the probability of a set of different independent events occurring together can be calculated by computing the product of the elementary probabilities. P(A and B and C) = P(A) * P(B and C | A) = P(A) * P(B | A) * P(C | B and A) = P(A) * P(B) * P(C) since by the assumption of independence, knowing something about one of the events gives no information about the rest of them.
However, say one have assigned a non-zero probability to the event ‘the coin is loaded’ in the previous example. In that case one may reason that P(Tails | Heads n times in a row) should decrease as n increases, a reasoning that indeed makes sense. As will be seen in higher level posts where we will be discussing Bayesian inference*, this kind of thinking serves as an excellent basis for learning the Bayesian way of observing real-world data.
*Inference; learning from data. Geeks would probably involve esoteric terminology to explain the word, we prefer understandable explanations.

Bayes Theorem

We mentioned Bayesian inference in the previous paragraph and you have probably seen the terms ‘Bayes’ and ‘Bayesian’ being thrown around many times before. What is all of this ‘Bayes’ stuff fundamentally about?

One simple theorem, yet an infinitude of possible applications. Bayes formula, which acts as the foundation for learning from data in a continuous manner, is an equation which offers the possibility to calculate a conditional probability by usage of already existing/available parameters. This is convenient as it allows us to update our probabilities as new information emerges, in other words we are now in possession of an instrument that at all times helps us make use of the crucial up-to-date data out there.

Quick example: Lakers are estimated to have a 53 % probability of winning a game. News arrive, LeBron is out. What should the updated probability be? Ask Bayes!

How is the formula constructed then? Imagine two events, A and B, with non-zero probabilities P(A) and P(B). What is then the probability of the occurrence of both A and B? It is simply the probability of A multiplied by the probability of B *given* A, or more compactly P(A) * P(B | A). Equivalently it could of course be described as the probability of B times the probability of A *given* B, briefly P(B) * P(A | B). Hence, we have,

P(A and B) = P(B) * P(A | B) = P(A) * P(B | A)

By focusing on the right-most equality,

P(B) * P(A | B) = P(A) * P(B | A)

and dividing through by P(B), we arrive at Bayes theorem, which gives an explicit expression of the conditional probability of A given B. As hinted about in the previous section, you can now see how P(A | B) and P(B | A) are related.

Bayes visually. P(A | B) is the yellow area divided by the green one, a subset we now know for certain we must be within since B is given.

An application of Bayes, the famous Monty Hall problem

Yes, this is a level 2 post, and the Monty Hall & Bayes combination should probably not be introduced at this level. However, being such a famous brain teaser + having it mentioned by our resident Devil on Twitter earlier today, we just felt we had to include it in this text since we are more or less presenting the full toolbox needed to solve such a problem rigorously.

We begin by stating the probability puzzle:

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

Note: It should be clarified that the host obviously will not open a door which hides the car as it would completely ruin the show.

Now, to solve the problem in our Bayesian setting, we will need to identify which parts of the problem might interest us. The first evident fact is that we have three doors and one car, hence our initial probabilities should equal 1/3 for all three doors unless more information is provided. Then comes the crucial step, after we have chosen a door the host will open one of the other ones, and since he possesses the knowledge of which door it is that contains the car, this could perhaps provide us with some valuable information. As we have learnt in the previous section, new information should give rise to the thought “we need to update our existing beliefs” immediately, and consequently we will now make an attempt at doing so by an application of Bayes Theorem.

Let

P(door #1) = probability of the car being hidden behind door #1 

P(door #2) = … door #2 

P(door #3) = … door #3

Assume now, as in the description of the problem, that we request to have door #1 opened at the beginning of the game, and that the host (who knows what is behind the doors), continues by revealing the goat behind door #3 prior to asking us the decisive ‘switch’ question. Translating the problem to mathematics/probabilities, what we would like to compute is P(door #1 | host opens door #3). By applying Bayes Theorem, we have

and therefore, in order for us to solve the problem, we will need to compute the three probabilities on the right hand side of the equality.

We begin with the simplest one, the prior probability (in lack of further information) of the car being behind door #1,

P(door #1) = 1/3

owing to the simple fact that there are three doors and one car.

Onto the next one!

P(host opens door #3 | door #1)

Okay, so now we assume we know that the car is behind door #1, a fact that also the host is familiar with. Since our initial request was to open door #1 and the car is behind this door, the host is indifferent to which of the other doors to open. Since he has two options, door #2 or door #3, he opens door #3 with probability 1/2, and thus we conclude,

P(host opens door #3 | door #1) = 1/2

We are getting closer! Last one remaining,

P(host opens door #3)

Again, remember that we told him to open door #1 initially, i.e. he is forced to open either door #2 or door #3 as part of his act. There are three cases to consider:

If the car hides behind door #1, we just concluded that he opens door #3 with probability 1/2. Since the probability of the car being behind door #1 is 1/3, the product 1/2 * 1/3 = 1/6 yields the probability of the car hiding behind door #1 AND the host opening door #3.
If the car hides behind door #2, he is forced to open door #3, therefore in this case him opening door #3 is a sure thing with probability 1. For that reason the probability of the car hiding behind door #2 AND the host opening door #3 is 1/3.
At last, if the car is hiding behind door #3, he will by design of the show not open it → probability equals 0 in this case.

Since the car *must* be behind exactly one of the three doors (the cases are disjoint*), we may now sum up the probabilities and get that,

*Disjoint/mutually exclusive outcomes: Disjoint/mutually outcomes cannot coexist, *either* team A wins *or* team B wins. *Either* horse C wins *or* another horse wins.

P(host opens door #3) = 

P(case 1 & host opens door #3) + 
P(case 2 & host opens door #3) + 
P(case 3 & host opens door #3) 

= 

1/6 + 
1/3 + 
0

= 

1/2

and we are finally ready to compute the probability we are trying to find, P(door #1 | host opens door #3).

Indeed a very interesting and unintuitive result, revealing that the optimal strategy must be to switch our choice. Two doors, one car, yet only seemingly equal probabilities. It should furthermore be noted that by symmetry of the problem the strategy of switching must be the optimal one independent of both our and the host’s initial choices of doors to request/open.

Using Bayes, we managed to solve the puzzle!

Switch your choice!

Confused? Do not worry, you are most likely in good company.

Note: To make the puzzle slightly easier to decipher, you could instead imagine 1 000 doors with one car and 999 goats. After expressing your choice, the host, which again knows the position of the car (will not open that door), opens 998 doors and asks, “do you want to switch?”

Summary + BR-updates

Conditional probability and Bayesian thinking is, at least initially and especially with problems such as Monty Hall, probably the most confusing subject one encounters while on the path to mastery of the theory of betting. This means one thing, and one thing only:

Truly learning it, in a way that it becomes second nature, might take some time but will result in huge betting advantages. Why?
- It is applicable basically everywhere.
- We all know *no one else* will take their time to learn it. This stuff is, if at all, given a cursory treatment (1-2 lectures) in university curriculums. You think people will learn it without being forced to? *Zero* competition.

BR-updates: