Acquisition functions

Each acquisition balances exploration-exploitation in a different way. There is no universal best method.

Below we give a few examples of commonly used acquisition functions. These are based on the surrogate models contructed via Gaussian processes described in the previous notebook.

Expected improvement

\[ \alpha_{n}(x) = \mathbb{E}\left[\max(0,f(x)-f_{n}^{\star})\mid x_{1:n},f(x_{1:n})\right] = \max(0,f(x)-f_{n}^{\star}) + \sigma_{n}(x)\phi\left(\frac{\mu_{n}(x)-f_{n}^{\star}}{\sigma_{n}(x)}\right)-\left|\mu_{n}(x)-f_{n}^{\star}\right|\Phi\left(\frac{\mu_{n}(x)-f_{n}^{\star}}{\sigma_{n}(x)}\right), \]

where \(f_{n}^{\star} = \max_{i=1,\ldots,n}f(x_{i})\), and \(\phi\) the probability density function (PDF) and \(\Phi\) the cumulative distribution function (CDF) of the standard normal distribution.

Upper confidence bounds - UCB

\[ \alpha_{n}(x) = \mu_{n}(x) + \beta_{n} \sigma_{n}(x), \] where \(\beta_n\geq0\) is a scaler that explicity lets us balance exploration and exploitation.

Thompson sampling - TS

This is the acquisition function we use in our experiments. \[ \alpha_{n}(x) = g(x), \] where \(g\sim\mathcal{GP}(\mu_n,c_n)\) and \[ c_n(x,y) = k(x,y) - \Sigma_{0}(x,x_{1:n})\Sigma_{0}(x_{1:n},x_{1:n})^{-1}\Sigma_{0}(x_{1:n},y). \] Later when optimizing this acquisition function, two alternatives are: - Sample \(g\) on a finte set of points and find the \(\arg\max\), or - use Fourier features to get a continuous/differentiable sample \(g\) and then optimize. See [Rahimi and B. Recht, 2007].

TS naturally allows for parallel function evaluations by simply drawing multiple posterior samples.