I confess to never having thought deeply enough about the differences between outright winner and places but this spells them out nicely. Very interesting.

Something I’m not able to grasp. How is an invented modell of 5 horses time performance extrapolated to reality and concrete races, where there can be much more horses, different race distances etc…

By training your models on previous data, you can come up with estimates for each horse's mean & variance, which, under the normal distribution assumption, fully defines the horse's performance distribution. Then simulate from these distributions ["run" 10 000 races] to come up with precise probabilities.

It's a very, very demanding project though.

What I've personally had more success with it is to analyze those invented, theoretical races *extensively* in order to understand the role means & variances play for different kinds of bets in a horse race. Makes you ask the crucial questions.

- Is the mean really that important?

- Can there be any dependence between the performance of two/more horses? What would that imply for the simulations mentioned above? What does it mean for exactas/trifectas etc.?

- Are the performance distributions of horses necessarily unimodal?

Okey right. Would you be able to apply this reasoning in any horse race only with the data provided by the bookie? I mean, for instance with the last 4 races position each horse has occupied?

Awesome post!

I confess to never having thought deeply enough about the differences between outright winner and places but this spells them out nicely. Very interesting.

Something I’m not able to grasp. How is an invented modell of 5 horses time performance extrapolated to reality and concrete races, where there can be much more horses, different race distances etc…

Thank you and best regards!

edited Jul 10, 2023Practical example starting at 22:30 in here: https://www.youtube.com/watch?v=YOVrZrJ-wtc.

By training your models on previous data, you can come up with estimates for each horse's mean & variance, which, under the normal distribution assumption, fully defines the horse's performance distribution. Then simulate from these distributions ["run" 10 000 races] to come up with precise probabilities.

It's a very, very demanding project though.

What I've personally had more success with it is to analyze those invented, theoretical races *extensively* in order to understand the role means & variances play for different kinds of bets in a horse race. Makes you ask the crucial questions.

- Is the mean really that important?

- Can there be any dependence between the performance of two/more horses? What would that imply for the simulations mentioned above? What does it mean for exactas/trifectas etc.?

- Are the performance distributions of horses necessarily unimodal?

Okey right. Would you be able to apply this reasoning in any horse race only with the data provided by the bookie? I mean, for instance with the last 4 races position each horse has occupied?

No, absolutely not.

If you're taking the modelling route, need:

- Sufficient database.

- Model.

- Performance, can't take your model 2 hours to run.

Years to build out the infrastructure for such a project (at minimum).

If not, need:

- Domain knowledge.

- Some kind of mental model for how different features of a horse affect its performance distribution.

Wouldn't recommend the first route to anyone (if you're *the guy* won't listen to my recommendations anyway). The second one is doable but takes time.