Normal Bayesian two-armed bandits
The undiscounted normal two-armed bandit is examined from a Bayesian point of view for independent and singular priors on the mean vector ((theta)(,1),(theta)(,2)). Quantification is given to the well-accepted notion that an apparently inferior source needs to be sampled now and then. The optimal strategy is defined in terms of the source differential function, (DELTA)('n) = V(,y)('n) - V(,x)('n), where V(,x)('n) and V(,y)('n) are the valuations of sampling the two respective sources. For the independent prior case, bounds and linear approximations for (DELTA)('n) are obtained by recursion. The limiting behavior of (DELTA)('n) is discussed, in terms of certain summary parameters of location and information. In the more tractable singular case, the optimal strategy is myopic in the case of equal prior information on both sources.