Disclaimer: I am not a betting expert. Please do not take this as financial advice. Do your own research before making any bets.

I did this analysis just for fun. I am not sure if it has any practical value, however I was just curious to see if betting always on the favourite would be a good strategy. My intuitive answer would be to the question, that it cannot be a good strategy, because the odds are always lower for the favourite, and the favourite does not win every time.

Betting data

To answer the question, I collected the data for 3 different leagues:

  • English Premier League
  • Spanish La Liga
  • German Bundesliga

The data is containing all the matches from the season 2018/2019 to 2023/2024.

The data which I used looks like this:

id home away home-name away-name sport-id date-start-timestamp result homeResult awayResult home-winner away-winner postmatchResult country-id country-name 1_avg_odds x_avg_odds 2_avg_odds 1_max_odds x_max_odds 2_max_odds
0 3263915 8002902 8002903 Liverpool Manchester City 1 1538926200 0:0 0 0 draw draw 0:0 198 England 2.75 3.60 2.60 2.75 3.60 2.60
1 3263917 8002906 8002907 Southampton Chelsea 1 1538918100 0:3 0 3 lost win 0:3 198 England 6.50 4.20 1.57 6.50 4.20 1.57
2 3263913 8002898 8002899 Fulham Arsenal 1 1538910000 1:5 1 5 lost win 1:5 198 England 5.00 4.40 1.66 5.00 4.40 1.66
3 3263916 8002904 8002905 Manchester Utd Newcastle 1 1538843400 3:2 3 2 win lost 3:2 198 England 1.40 4.75 10.00 1.40 4.75 10.00
4 3263911 8002894 8002895 Burnley Huddersfield 1 1538834400 1:1 1 1 draw draw 1:1 198 England 2.35 3.00 3.80 2.35 3.00 3.80

In the data we have all necessary information to calculate the return on bet. However the data needed to be cleaned and transformed before it can be used.

The following preprocessing steps were applied:

  1. Removes specified columns
  2. Converts timestamp to datetime
  3. Adds a winner column
  4. Maps the winner column to numeric values
  5. Converts result columns to integers

After transforming the data, the table looks like this:

home away home-name away-name result homeResult awayResult home-winner away-winner 1_avg_odds x_avg_odds 2_avg_odds date winner winner_num home_result away_result
0 8002902 8002903 Liverpool Manchester City 0:0 0 0 draw draw 2.75 3.60 2.60 2018-10-07 15:30:00 draw 0 0 0
1 8002906 8002907 Southampton Chelsea 0:3 0 3 lost win 6.50 4.20 1.57 2018-10-07 13:15:00 away 2 0 3
2 8002898 8002899 Fulham Arsenal 1:5 1 5 lost win 5.00 4.40 1.66 2018-10-07 11:00:00 away 2 1 5
3 8002904 8002905 Manchester Utd Newcastle 3:2 3 2 win lost 1.40 4.75 10.00 2018-10-06 16:30:00 home 1 3 2
4 8002894 8002895 Burnley Huddersfield 1:1 1 1 draw draw 2.35 3.00 3.80 2018-10-06 14:00:00 draw 0 1 1

Betting strategy

I decided to build a class for the betting strategy, which can be easily reused and adjusted, for different strategies. However the analysis of different strategies is out of scope of this post. Let`s focus on the most simple strategy, which is betting always on the favourite.


from abc import ABC, abstractmethod


class BettingStrategy(ABC):
    def __init__(self, df: pl.DataFrame, target_columns: list[str]):
        self.df = df
        self.target_columns = target_columns

    @abstractmethod
    def add_bet(self, df: pl.DataFrame) -> pl.DataFrame:
        """
        Add a column to the dataframe which will be the bet number (1, 0 or 2)
        """
        pass

    def add_bet_won_column(self, df):
        """
        Add a column to the dataframe which will be True if the bet has won.
        """
        return df.with_columns(bet_won=pl.col("winner_num").eq(pl.col("bet")))

    def add_odds_to_use(self, df):
        """
        Add a column to the dataframe which will be the odds to use for the bet.
        """
        return df.with_columns(
            odds_to_use=pl.when(pl.col("winner_num").eq(1))
            .then(pl.col("1_avg_odds"))
            .when(pl.col("winner_num").eq(2))
            .then(pl.col("2_avg_odds"))
            .otherwise(pl.col("x_avg_odds"))
        )

    def calculate_return(self, df):
        """
        Add a column to the dataframe which will be the return on the bet.
        """
        return df.with_columns(
            return_on_bet=pl.when(pl.col("bet_won"))
            .then(pl.col("odds_to_use") - 1)
            .otherwise(pl.lit(-1))
        )

    def get_underdog(self, df):
        """
        Add a column to the dataframe which will be the underdog team (1=home, 2=away, or None)
        """
        return df.with_columns(
            underdog=pl.when(pl.col("1_avg_odds") < pl.col("2_avg_odds"))
            .then(pl.lit(2))
            .when(pl.col("2_avg_odds") < pl.col("1_avg_odds"))
            .then(pl.lit(1))
            .otherwise(pl.lit(None))
        )

    def get_favourite(self, df):
        """
        Add a column to the dataframe which will be the favourite team (1=home, 2=away, or None)
        """
        return df.with_columns(
            favourite=pl.when(pl.col("1_avg_odds") < pl.col("2_avg_odds"))
            .then(pl.lit(1))
            .when(pl.col("2_avg_odds") < pl.col("1_avg_odds"))
            .then(pl.lit(2))
            .otherwise(pl.lit(None))
        )

    def has_favourite_won(self, df):
        """
        Add a column to the dataframe which will be True if the favourite has won.
        """
        return df.with_columns(
            has_favourite_won=pl.col("favourite").eq(pl.col("winner_num"))
        )

    def has_underdog_won(self, df):
        """
        Add a column to the dataframe which will be True if the underdog has won.
        """
        return df.with_columns(
            has_underdog_won=pl.col("underdog").eq(pl.col("winner_num"))
        )

    def apply_strategy(self):
        """
        Apply the strategy to the dataframe.
        """
        prep_df = (
            self.df.pipe(self.get_underdog)
            .pipe(self.get_favourite)
        )
        bet_df = self.add_bet(prep_df)

        # check if the method has been implemented correctly
        if "bet" not in bet_df.columns:
            raise ValueError(
                "The add_bet method has not been implemented correctly."
                "Please add a column called 'bet'."
            )

        # check if the bet column only contains 1, 0 or 2
        required_bet_values = set([1, 0, 2])
        if not bet_df["bet"].is_in(required_bet_values).all():
            raise ValueError(
                "The add_bet method has not been implemented correctly."
                f"Please add a column called 'bet' with the values {required_bet_values}."
            )

        return (
            bet_df.pipe(self.add_bet_won_column)
            .pipe(self.add_odds_to_use)
            .pipe(self.has_favourite_won)
            .pipe(self.has_underdog_won)
            .pipe(self.calculate_return)

        )

The class which I defined is an abstract class, which means that it cannot be instantiated directly. However it can be used as a base class for other strategies. There are multiple methods which are common for all strategies, like the add_bet_won_column or the calculate_return method. However the add_bet method is abstract and has to be implemented in the subclass every time. With the help of the abstract class, we can define the BetAlwaysOnFavourite strategy like this:

class BetAlwaysOnFavourite(BettingStrategy):
    def add_bet(self, df: pl.DataFrame) -> pl.DataFrame:
        return df.with_columns(bet=pl.lit(1))

As you can see the implementation is straightforward, only the add_bet method needs to be implemented. In some complicated strategies it is might necessary to implement other methods as well.

Applying the strategy to the data:


betting_strategy = BetAlwaysOnFavourite(df, target_columns)
result = betting_strategy.apply_strategy()

We will get the following result:

home away home-name away-name result homeResult awayResult home-winner away-winner 1_avg_odds x_avg_odds 2_avg_odds date winner winner_num home_result away_result underdog favourite bet bet_won odds_to_use has_favourite_won has_underdog_won return_on_bet
0 8002902 8002903 Liverpool Manchester City 0:0 0 0 draw draw 2.75 3.60 2.60 2018-10-07 15:30:00 draw 0 0 0 1.0 2.0 1 False 3.60 False False -1.0
1 8002906 8002907 Southampton Chelsea 0:3 0 3 lost win 6.50 4.20 1.57 2018-10-07 13:15:00 away 2 0 3 1.0 2.0 1 False 1.57 True False -1.0
2 8002898 8002899 Fulham Arsenal 1:5 1 5 lost win 5.00 4.40 1.66 2018-10-07 11:00:00 away 2 1 5 1.0 2.0 1 False 1.66 True False -1.0
3 8002904 8002905 Manchester Utd Newcastle 3:2 3 2 win lost 1.40 4.75 10.00 2018-10-06 16:30:00 home 1 3 2 2.0 1.0 1 True 1.40 True False 0.4
4 8002894 8002895 Burnley Huddersfield 1:1 1 1 draw draw 2.35 3.00 3.80 2018-10-06 14:00:00 draw 0 1 1 2.0 1.0 1 False 3.00 False False -1.0

In the last column of the table we can see the return on the bet. The sum of this column shows the total return on the bet if we would bet always on the favourite. In our case the sum of the return on bet is -300.87 which means if we would bet always on the favourite with just 1€ we would end up with -300.87€. This is obviously a losing strategy.

Furthermore we can just plot the cumulative sum of the return see how the strategy performs over time:

svg

There are some periods where the strategy performs better, however in general the trend is negative, and you cannot make money!

Statistics

I was curious to see how performs strategy performs if we try it multiple times but with always other part of the data. To do so I created a function which runs the selected strategy multiple times with random games.


def run_strategy_n_times(
    strategy: BettingStrategy, df: pl.DataFrame, n_times: int, n_games: int, target_columns: list[str]
) -> list[float]:
    """
    Run a betting strategy multiple times on randomly sampled subsets of data.

    This function applies a given betting strategy to randomly sampled subsets of the input
    DataFrame multiple times and returns a list of the total returns for each run.

    Args:
        strategy (BettingStrategy): The betting strategy class to be applied.
        df (pl.DataFrame): The input DataFrame containing the full dataset of games and their information.
        n_times (int): The number of times to run the strategy.
        n_games (int): The number of games to sample for each run of the strategy.
        target_columns (list[str]): The target columns to be used in the strategy.

    Returns:
        list[float]: A list containing the total return on bets for each run of the strategy.

    Notes:
        - The function uses random sampling with replacement, so the same game may appear
          multiple times in a single run or across different runs.
        - The 'target_columns' variable is assumed to be defined in the outer scope.
    """
    results = []
    for i in range(n_times):
        result = strategy(
            df.sample(n_games, shuffle=True), target_columns
        ).apply_strategy()
        results.append(result["return_on_bet"].sum())
    return results

Let`s run the strategy 100.000 times with 100 random games:

If we plot the results we can see the distribution of the returns:

svg

With the simulation we get a nearly normal distribution, where the mean is -4.01. This is the expected value for the return on bet if we would bet always on the favourite, after 100 games.

Conclusion

As expected the strategy, betting always on the favourite is a losing strategy. However I was suprised to see how much we lose on average. And If we keep betting, the losses are accumulating. With that said, I would not recommend betting always on the favourite.