Implementation of several algorithms for combinatorial bandits, a kind of reinforcement learning.
Author dourouc05
7 Stars
Updated Last
8 Months Ago
Started In
January 2020


Project Status: Active – The project has reached a stable, usable state and is being actively developed. The MIT License example workflow Coverage Status

This package implements several algorithms to deal with combinatorial multi-armed bandit (CMAB), including the first polynomial-time optimum-regret algorithms: AESCB (described in our paper) and AOSSB (article in press).

See also Bandits.jl, focusing on multi-armed bandits (i.e. not combinatorial).

To install:

]add CombinatorialBandits

Example usage:

using CombinatorialBandits, Distributions

n = 20
m = 8
ε = 0.1
distr = Distribution[Bernoulli(.5 + ((i % 3 == 0) ? ε : -ε)) for i in 1:n]

i = MSet(distr, 8, MSetAlgosSolver())
@time simulate(i, ThompsonSampling(), 200)
@time simulate(i, LLR(), 200)
@time simulate(i, CUCB(), 200)
@time simulate(i, ESCB2(ESCB2Budgeted(.1, true)), 200)


If you use this package in your research, please cite either article:

    author = {Cuvelier, Thibaut and Combes, Richard and Gourdin, Eric},
    title = {Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits},
    year = {2021},
    issue_date = {March 2021},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    volume = {5},
    number = {1},
    url = {},
    doi = {10.1145/3447387},
    journal = {Proc. ACM Meas. Anal. Comput. Syst.},
    month = feb,
    articleno = {09},
    numpages = {31},
    keywords = {combinatorial bandits, combinatorial optimization, bandits}

  title={Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time},
  author={Cuvelier, Thibaut and Combes, Richard and Gourdin, Eric},
  journal={arXiv preprint arXiv:2102.07254},