Disclaimer: I am not a betting expert. Please do not take this as financial advice. Do your own research before making any bets.

I did this analysis just for fun. I am not sure if it has any practical value, however I was just curious to see if betting always on the favourite would be a good strategy. My intuitive answer would be to the question, that it cannot be a good strategy, because the odds are always lower for the favourite, and the favourite does not win every time.

Betting data

To answer the question, I collected the data for 3 different leagues:

English Premier League
Spanish La Liga
German Bundesliga

The data is containing all the matches from the season 2018/2019 to 2023/2024.

The data which I used looks like this:

	id	home	away	home-name	away-name	sport-id	date-start-timestamp	result	homeResult	awayResult	home-winner	away-winner	postmatchResult	country-id	country-name	1_avg_odds	x_avg_odds	2_avg_odds	1_max_odds	x_max_odds	2_max_odds
0	3263915	8002902	8002903	Liverpool	Manchester City	1	1538926200	0:0	0	0	draw	draw	0:0	198	England	2.75	3.60	2.60	2.75	3.60	2.60
1	3263917	8002906	8002907	Southampton	Chelsea	1	1538918100	0:3	0	3	lost	win	0:3	198	England	6.50	4.20	1.57	6.50	4.20	1.57
2	3263913	8002898	8002899	Fulham	Arsenal	1	1538910000	1:5	1	5	lost	win	1:5	198	England	5.00	4.40	1.66	5.00	4.40	1.66
3	3263916	8002904	8002905	Manchester Utd	Newcastle	1	1538843400	3:2	3	2	win	lost	3:2	198	England	1.40	4.75	10.00	1.40	4.75	10.00
4	3263911	8002894	8002895	Burnley	Huddersfield	1	1538834400	1:1	1	1	draw	draw	1:1	198	England	2.35	3.00	3.80	2.35	3.00	3.80

In the data we have all necessary information to calculate the return on bet. However the data needed to be cleaned and transformed before it can be used.

The following preprocessing steps were applied:

Removes specified columns
Converts timestamp to datetime
Adds a winner column
Maps the winner column to numeric values
Converts result columns to integers

After transforming the data, the table looks like this:

	home	away	home-name	away-name	result	homeResult	awayResult	home-winner	away-winner	1_avg_odds	x_avg_odds	2_avg_odds	date	winner	winner_num	home_result	away_result
0	8002902	8002903	Liverpool	Manchester City	0:0	0	0	draw	draw	2.75	3.60	2.60	2018-10-07 15:30:00	draw	0	0	0
1	8002906	8002907	Southampton	Chelsea	0:3	0	3	lost	win	6.50	4.20	1.57	2018-10-07 13:15:00	away	2	0	3
2	8002898	8002899	Fulham	Arsenal	1:5	1	5	lost	win	5.00	4.40	1.66	2018-10-07 11:00:00	away	2	1	5
3	8002904	8002905	Manchester Utd	Newcastle	3:2	3	2	win	lost	1.40	4.75	10.00	2018-10-06 16:30:00	home	1	3	2
4	8002894	8002895	Burnley	Huddersfield	1:1	1	1	draw	draw	2.35	3.00	3.80	2018-10-06 14:00:00	draw	0	1	1

Betting strategy

I decided to build a class for the betting strategy, which can be easily reused and adjusted, for different strategies. However the analysis of different strategies is out of scope of this post. Let`s focus on the most simple strategy, which is betting always on the favourite.


from abc import ABC, abstractmethod


class BettingStrategy(ABC):
    def __init__(self, df: pl.DataFrame, target_columns: list[str]):
        self.df = df
        self.target_columns = target_columns

    @abstractmethod
    def add_bet(self, df: pl.DataFrame) -> pl.DataFrame:
        """
        Add a column to the dataframe which will be the bet number (1, 0 or 2)
        """
        pass

    def add_bet_won_column(self, df):
        """
        Add a column to the dataframe which will be True if the bet has won.
        """
        return df.with_columns(bet_won=pl.col("winner_num").eq(pl.col("bet")))

    def add_odds_to_use(self, df):
        """
        Add a column to the dataframe which will be the odds to use for the bet.
        """
        return df.with_columns(
            odds_to_use=pl.when(pl.col("winner_num").eq(1))
            .then(pl.col("1_avg_odds"))
            .when(pl.col("winner_num").eq(2))
            .then(pl.col("2_avg_odds"))
            .otherwise(pl.col("x_avg_odds"))
        )

    def calculate_return(self, df):
        """
        Add a column to the dataframe which will be the return on the bet.
        """
        return df.with_columns(
            return_on_bet=pl.when(pl.col("bet_won"))
            .then(pl.col("odds_to_use") - 1)
            .otherwise(pl.lit(-1))
        )

    def get_underdog(self, df):
        """
        Add a column to the dataframe which will be the underdog team (1=home, 2=away, or None)
        """
        return df.with_columns(
            underdog=pl.when(pl.col("1_avg_odds") < pl.col("2_avg_odds"))
            .then(pl.lit(2))
            .when(pl.col("2_avg_odds") < pl.col("1_avg_odds"))
            .then(pl.lit(1))
            .otherwise(pl.lit(None))
        )

    def get_favourite(self, df):
        """
        Add a column to the dataframe which will be the favourite team (1=home, 2=away, or None)
        """
        return df.with_columns(
            favourite=pl.when(pl.col("1_avg_odds") < pl.col("2_avg_odds"))
            .then(pl.lit(1))
            .when(pl.col("2_avg_odds") < pl.col("1_avg_odds"))
            .then(pl.lit(2))
            .otherwise(pl.lit(None))
        )

    def has_favourite_won(self, df):
        """
        Add a column to the dataframe which will be True if the favourite has won.
        """
        return df.with_columns(
            has_favourite_won=pl.col("favourite").eq(pl.col("winner_num"))
        )

    def has_underdog_won(self, df):
        """
        Add a column to the dataframe which will be True if the underdog has won.
        """
        return df.with_columns(
            has_underdog_won=pl.col("underdog").eq(pl.col("winner_num"))
        )

    def apply_strategy(self):
        """
        Apply the strategy to the dataframe.
        """
        prep_df = (
            self.df.pipe(self.get_underdog)
            .pipe(self.get_favourite)
        )
        bet_df = self.add_bet(prep_df)

        # check if the method has been implemented correctly
        if "bet" not in bet_df.columns:
            raise ValueError(
                "The add_bet method has not been implemented correctly."
                "Please add a column called 'bet'."
            )

        # check if the bet column only contains 1, 0 or 2
        required_bet_values = set([1, 0, 2])
        if not bet_df["bet"].is_in(required_bet_values).all():
            raise ValueError(
                "The add_bet method has not been implemented correctly."
                f"Please add a column called 'bet' with the values {required_bet_values}."
            )

        return (
            bet_df.pipe(self.add_bet_won_column)
            .pipe(self.add_odds_to_use)
            .pipe(self.has_favourite_won)
            .pipe(self.has_underdog_won)
            .pipe(self.calculate_return)

        )

The class which I defined is an abstract class, which means that it cannot be instantiated directly. However it can be used as a base class for other strategies. There are multiple methods which are common for all strategies, like the add_bet_won_column or the calculate_return method. However the add_bet method is abstract and has to be implemented in the subclass every time. With the help of the abstract class, we can define the BetAlwaysOnFavourite strategy like this:

class BetAlwaysOnFavourite(BettingStrategy):
    def add_bet(self, df: pl.DataFrame) -> pl.DataFrame:
        return df.with_columns(bet=pl.lit(1))

As you can see the implementation is straightforward, only the add_bet method needs to be implemented. In some complicated strategies it is might necessary to implement other methods as well.

Applying the strategy to the data:


betting_strategy = BetAlwaysOnFavourite(df, target_columns)
result = betting_strategy.apply_strategy()

We will get the following result:

	home	away	home-name	away-name	result	homeResult	awayResult	home-winner	away-winner	1_avg_odds	x_avg_odds	2_avg_odds	date	winner	winner_num	home_result	away_result	underdog	favourite	bet	bet_won	odds_to_use	has_favourite_won	has_underdog_won	return_on_bet
0	8002902	8002903	Liverpool	Manchester City	0:0	0	0	draw	draw	2.75	3.60	2.60	2018-10-07 15:30:00	draw	0	0	0	1.0	2.0	1	False	3.60	False	False	-1.0
1	8002906	8002907	Southampton	Chelsea	0:3	0	3	lost	win	6.50	4.20	1.57	2018-10-07 13:15:00	away	2	0	3	1.0	2.0	1	False	1.57	True	False	-1.0
2	8002898	8002899	Fulham	Arsenal	1:5	1	5	lost	win	5.00	4.40	1.66	2018-10-07 11:00:00	away	2	1	5	1.0	2.0	1	False	1.66	True	False	-1.0
3	8002904	8002905	Manchester Utd	Newcastle	3:2	3	2	win	lost	1.40	4.75	10.00	2018-10-06 16:30:00	home	1	3	2	2.0	1.0	1	True	1.40	True	False	0.4
4	8002894	8002895	Burnley	Huddersfield	1:1	1	1	draw	draw	2.35	3.00	3.80	2018-10-06 14:00:00	draw	0	1	1	2.0	1.0	1	False	3.00	False	False	-1.0

In the last column of the table we can see the return on the bet. The sum of this column shows the total return on the bet if we would bet always on the favourite. In our case the sum of the return on bet is -300.87 which means if we would bet always on the favourite with just 1€ we would end up with -300.87€. This is obviously a losing strategy.

Furthermore we can just plot the cumulative sum of the return see how the strategy performs over time:

svg

There are some periods where the strategy performs better, however in general the trend is negative, and you cannot make money!

Statistics

I was curious to see how performs strategy performs if we try it multiple times but with always other part of the data. To do so I created a function which runs the selected strategy multiple times with random games.


def run_strategy_n_times(
    strategy: BettingStrategy, df: pl.DataFrame, n_times: int, n_games: int, target_columns: list[str]
) -> list[float]:
    """
    Run a betting strategy multiple times on randomly sampled subsets of data.

    This function applies a given betting strategy to randomly sampled subsets of the input
    DataFrame multiple times and returns a list of the total returns for each run.

    Args:
        strategy (BettingStrategy): The betting strategy class to be applied.
        df (pl.DataFrame): The input DataFrame containing the full dataset of games and their information.
        n_times (int): The number of times to run the strategy.
        n_games (int): The number of games to sample for each run of the strategy.
        target_columns (list[str]): The target columns to be used in the strategy.

    Returns:
        list[float]: A list containing the total return on bets for each run of the strategy.

    Notes:
        - The function uses random sampling with replacement, so the same game may appear
          multiple times in a single run or across different runs.
        - The 'target_columns' variable is assumed to be defined in the outer scope.
    """
    results = []
    for i in range(n_times):
        result = strategy(
            df.sample(n_games, shuffle=True), target_columns
        ).apply_strategy()
        results.append(result["return_on_bet"].sum())
    return results

Let`s run the strategy 100.000 times with 100 random games:

If we plot the results we can see the distribution of the returns:

svg

With the simulation we get a nearly normal distribution, where the mean is -4.01. This is the expected value for the return on bet if we would bet always on the favourite, after 100 games.

Conclusion

As expected the strategy, betting always on the favourite is a losing strategy. However I was suprised to see how much we lose on average. And If we keep betting, the losses are accumulating. With that said, I would not recommend betting always on the favourite.

Winnining by betting always on the favourite?

Betting data

Betting strategy

Statistics

Conclusion