In probability theory, the multi-armed bandit problem (sometimes called the K- or N-armed bandit problem) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating.

Bandit Processes and Dynamic Allocation Indices By J. C. GITTINS Keble College, Oxford [Read before the ROYAL STATISTICAL SOCIETY at a meeting organized by the RESEARCH SECTION on Wednesday, February 14th, , the Chairman Professor J. F. C. KINGMAN in the Chair] SUMMARY The paper aims to give a unified account of the central concepts in recent work on bandit processes and dynamic allocation. Multi-armed Bandits and the Gittins Index Theorem Richard Weber Statistical Laboratory, University of Cambridge A talk to accompany Lecture 7. Two-armed Bandit 3, 10, 4, 9, 12, 1, 5, 6, 2, 15, 2, 7, 0 Bandit 3, 10, 4, 9, 12, 1, , 6, 2, 15, 2, 7, 5 0. gramming approach to multi-armed bandits initialized by Bellman (), culminating in the index policy based on the dynamic allocation index (Git-tins (); Whittle ()) of an arm, which Chang and Lai () and Lai () showed to be asymptotically equivalent to the UCB. This uni- ed theory is reviewed in Section Section and 1. In the first edition of this book set out Gittins' pioneering index solution to the multi-armed bandit problem and his subsequent investigation of a wide of sequential resource allocation and stochastic scheduling problems. Since then there has been a remarkable flowering of new insights, generalizations and applications, to which Glazebrook and Weber have made major contributions.

