Pages

Sunday, January 29, 2023

Banditopod

Recently as part of my research, I've been trying to measure a probability distribution – specifically, the chances that we've seen a certain signal in LISA. The trouble is, there are many random noise factors that go into the calculation of whether we see the signal or not, so it's not a straight equation I can plug things into. Instead we need to sample it many times to estimate the distribution, and this can be expensive. My colleague Henri suggested I could use a technique called Markov Chain Monte Carlo (MCMC). I thought to get a better feel for the method, I'd try out a simple example here.

There's a traditional problem in probability theory called the "two-armed bandit." Imagine a slot machine with two levers – You insert a coin and choose which lever to pull. Each arm has a certain probability of paying out, but the only way to find out is by playing, and looking at how often you win or lose. What then is the best strategy for choosing a lever? You may have gotten lucky your first few pulls of one lever and overestimated its chances of winning.

We can make this more like my research by extending to a multi-armed bandit – Each arm represents a set of parameters we're searching for, and we want to pick the arm with the biggest payout/best fit to the data. Still to be answered though is how we pick which arm to play: Imagine a set of players, who can choose an arm at each step based on the wins/loses they've seen. Each one is more likely to pick an arm with lots of wins, but might try another arm just in case. Now, if we look at the estimated probabilities for each arm as time goes on, we might think we'd get a good idea of the true values:

The blue line is the true probability for each arm, and the orange dots are the estimates based on the average number of wins. The dots are jumping around so much though that it's hard to see how well we're doing. Instead of animating in time, we can try looking at how frequently we play each arm:

Pretty quickly, each arm gets a consistent rate of pulls, but it looks like we're undersampling the highest-probability arms. I think this may be due to the top-probability arms having fairly similar values – As I pointed out above, we can't tell whether we have the best lever, or just a streak of luck, so we hedge our bets. A common technique with MCMC is run a "burn in" for a while to let the players move around the parameter space, then reset the probability estimates and continue running.

As a final view of the data, we look at how the players distribute themselves among the arms through time [NB: The x-values are off by 1 compared to the earlier plot due to the way I gave the distribution to the MCMC tool I used]:

It starts off fairly flat – the parameter exploration I was talking about – but after a certain point, the distribution establishes itself, and from there the shape simply scales upward. However, even if you could afford to play tens of thousands of times, I think you'll be hard-pressed to find a slot machine paying out as frequently as these!