Multi-Armed Bandit A/B Tests Explained For Dummys

Published On Thu 11 November, 2021

Traditionally, many companies have heard about the benefits of A/B testing. Make a hypothesis, create a variation to try and prove your theory, start an experiment, then run it long enough until you have enough data to prove one way or the other that you are either a genius or a massive numpty. A/B testing is great if you have time. If can afford to wait three months you will get lots of useful data and metrics that will hopefully improve your conversion rates and give you that long-term competitive edge over your sneaky rivals!

What happens though, if it's Black Friday next week and you have just been sent some hot new products. In these situations, you have no useful data to make a decision on which one to feature on your homepage. You can guess which one will convert over your sale period, however, this is not being data-driven. In these situations, an A/B test is not the ideal way to figure out which product will improve click rates. A/B testing will give you great long term benefits, however, while the test is running you have to sacrifice some sales. Until you get the data on which converts the sales, you will be sending some traffic to a variant that does not convert as well. During an extremely busy sale period, you probably don't want to lose sales just to prove which variation will convert better during a quieter period. This poses the question, how can you be data-driven in these scenarios, read on to learn 🔥🔥🔥

Instead of having to guess which product to show during the sale, wouldn't it be great if you could use an algorithm that drove traffic to the variation that's doing the best in real-time without you needing to do anything? Using this approach might not give you the best data to make long term decisions, however, for sale periods, this is going to improve your sales and make you more money. This is where the concept of a multi-armed bandit test can be useful.

A one-armed bandit is a slang term that refers to a slot machine, or as we call them in the UK, a fruit machine. The multi-arm bandit problem (MAB) is a maths challenge based around trying to help a gambler figure out how to make the most amount of money from a number of slot machines. The challenge is to create an algorithm to determine the best strategy for the gambler to win the most money over the long term. Which machines should the gambler play? How many times should the gambler play each machine? In which order should the gambler play the machines? Should the gambler stick with the current machine or try a different one?

There is a whole bandit solving vernacular you need to master in order to understand how some of the algorithms solve this problem. These terms include complex-sounding phrases like ε-Greedy, Markov decision processes, Hoeffding’s inequality, UCB1, Bayesian, and Thompson sampling. As the maths and terminology behind solving this challenge would hurt most of our heads (and let's be honest many of us don't care) let's skip it and talk about how we can make use of multi-armed bandit algorithms in web experimentation.

A good multi-arm bandit algorithm makes use of two techniques known as exploration and exploitation to make quicker use of data. When the test starts the algorithm has no data. During this initial phase, it uses exploration to collect data. Randomly assigning customers in equal numbers of either variation A or variation B. When enough data has been collated the data is exploited straight-away by driving traffic to the winning variation.

Like a Bayesian model, an optimal multi-arm bandit algorithm needs to incorporate data collected in the past, to determine how to most optimally win money in the future. In a multi-bandit test, as the test runs the algorithm uses the past data to determine which strategy (variation) will give the best return. This is an ideal approach for short-term sale periods. Instead of doing an A/B test and driving customers to an inferior variation until statistical significance is reached, customers are automatically directed to the winning variation in real-time. Using real-time data to immediately drive a customer to the best variation will give your business a very real competitive edge.

Multi-armed bandit tests will produce results faster as there's no need to wait for a single winning variation. You might be asking yourself, why don't we just use multi-bandit for everything? Surely we want all our tests to convert quickly and optimally!?!?! Multi-bandit tests tend to work best for both short-period tests (sale periods) and never-ending tests. I already used the Black Friday period as a short-period example, let us look at a longer-term one as well. For example, imagine you are an online newspaper and want to decide which article should be your headline? New articles are constantly being published. Instead of guessing, using a multi-armed bandit test will allow your site to always show the most interesting article as the main headline. For a newspaper, this is the type of feature you need to run forever, so a multi-bandit test makes much more sense compared to a classic A/B test. A final example could be deciding which ad to display to a customer? If you want the ad to update in real-time, an A/B test would not be ideal here.

One of the downsides of multi-bandit tests is that they are hard to implement and create. At a minimum, you need a developer and a data scientist to hang out and spend a lot of time figuring out how to implement this within your digital estate. This is where using an off-the-shelf tool like Optimziely Web can help you. Instead, of trying to figure this complex stuff out yourself, you can simply install a Javascript snippet into your website and start running these types of tests within 10 minutes using clicks not code. Happy Coding 🤘