POMDPs.jl

MDPs and POMDPs in Julia - An interface for defining, solving, and simulating fully and partially observable Markov decision processes on discrete and continuous spaces.
Popularity
275 Stars
Updated Last
19 Days Ago
Started In
June 2015

POMDPs

Linux Mac OS X Windows
Build Status Build Status Build Status

Docs Dev-Docs Gitter Slack

This package provides a core interface for working with Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). For examples, please see POMDPExamples, QuickPOMDPs, and the Gallery.

Our goal is to provide a common programming vocabulary for:

  1. Expressing problems as MDPs and POMDPs.
  2. Writing solver software.
  3. Running simulations efficiently.

There are nested interfaces for expressing and interacting with (PO)MDPs: When the explicit interface is used, the transition and observation probabilities are explicitly defined using api functions; when the generative interface is used, only a single step simulator (e.g. (s', o, r) = G(s,a)) needs to be defined. Problems may also be defined with probability tables, or with the simplified QuickPOMDPs interfaces.

Python can be used to define and solve MDPs and POMDPs via the QuickPOMDPs or tabular interfaces and pyjulia (Example: tiger.py).

For help, please post to the Google group, or on gitter. Check releases for information on changes. POMDPs.jl and all packages in the JuliaPOMDP project are fully supported on Linux and OS X. Windows is supported for all native solvers*, and most non-native solvers should work, but may require additional configuration.

Installation

To install POMDPs.jl, run the following in the Julia REPL:

using Pkg; pkg"add POMDPs"

To install supported JuliaPOMDP packages including various solvers, first add the JuliaPOMDP registry:

using Pkg; pkg"registry add https://github.com/JuliaPOMDP/Registry"

Note: to use this registry, JuliaPro users must also run edit(normpath(Sys.BINDIR,"..","etc","julia","startup.jl")), comment out the line ENV["DISABLE_FALLBACK"] = "true", save the file, and restart JuliaPro as described in this issue.

You can then list packages with POMDPs.available() and install a solver (say SARSOP.jl) with

using Pkg; pkg"add SARSOP"

Quick Start

To run a simple simulation of the classic Tiger POMDP using a policy created by the QMDP solver, you can use the following code (note that POMDPs.jl is not limited to discrete problems with explicitly-defined distributions like this):

using POMDPs, QuickPOMDPs, POMDPModelTools, POMDPSimulators, QMDP

m = QuickPOMDP(
    states = [:left, :right],
    actions = [:left, :right, :listen],
    observations = [:left, :right],
    initialstate_distribution = Uniform([:left, :right]),
    discount = 0.95,

    transition = function (s, a)
        if a == :listen
            return Deterministic(s) # tiger stays behind the same door
        else # a door is opened
            return Uniform([:left, :right]) # reset
        end
    end,

    observation = function (s, a, sp)
        if a == :listen
            if sp == :left
                return SparseCat([:left, :right], [0.85, 0.15]) # sparse categorical distribution
            else
                return SparseCat([:right, :left], [0.85, 0.15])
            end
        else
            return Uniform([:left, :right])
        end
    end,

    reward = function (s, a, sp, o...) # QMDP needs R(s,a,sp), but simulations use R(s,a,sp,o)
        if a == :listen  
            return -1.0
        elseif s == a # the tiger was found
            return -100.0
        else # the tiger was escaped
            return 10.0
        end
    end
)

solver = QMDPSolver()
policy = solve(solver, m)

rsum = 0.0
for (s,b,a,o,r) in stepthrough(m, policy, "s,b,a,o,r", max_steps=10)
    println("s: $s, b: $([pdf(b,s) for s in states(m)]), a: $a, o: $o")
    global rsum += r
end
println("Undiscounted reward was $rsum.")

For more examples with visualization see POMDPGallery.jl.

Tutorials

Several tutorials are hosted in the POMDPExamples repository.

Documentation

Detailed documentation can be found here.

Docs Docs

Supported Packages

Many packages use the POMDPs.jl interface, including MDP and POMDP solvers, support tools, and extensions to the POMDPs.jl interface.

Tools:

POMDPs.jl itself contains only the interface for communicating about problem definitions. Most of the functionality for interacting with problems is actually contained in several support tools packages:

Package Build Coverage
POMDPModelTools Build Status Coverage Status
BeliefUpdaters Build Status Coverage Status
POMDPPolicies Build Status Coverage Status
POMDPSimulators Build Status Coverage Status
POMDPModels Build Status Coverage Status
POMDPTesting Build Status Coverage Status
ParticleFilters Build Status codecov.io
RLInterface Build Status Coverage Status

MDP solvers:

Package Build/Coverage Online/
Offline
Continuous
States
Continuous
Actions
Value Iteration Build Status
Coverage Status
Offline N N
Local Approximation Value Iteration Build Status
Coverage Status
Offline Y N
Global Approximation Value Iteration Build Status
Coverage Status
Offline Y N
Monte Carlo Tree Search Build Status
Coverage Status
Online Y (DPW) Y (DPW)

POMDP solvers:

Package Build/Coverage Online/
Offline
Continuous
States
Continuous
Actions
Continuous
Observations
QMDP Build Status
Coverage Status
Offline N N N
FIB Build Status
Coverage Status
Offline N N N
BeliefGridValueIteration Build Status
codecov
Offline N N N
SARSOP* Build Status
Coverage Status
Offline N N N
BasicPOMCP Build Status
Coverage Status
Online Y N N1
ARDESPOT Build Status
Coverage Status
Online Y N N1
MCVI Build Status
Coverage Status
Offline Y N Y
POMDPSolve* Build Status
Coverage Status
Offline N N N
IncrementalPruning Build Status
Coverage Status
Offline N N N
POMCPOW Build Status
Coverage Status
Online Y Y2 Y
AEMS Build Status
Coverage Status
Online N N N

1: Will run, but will not converge to optimal solution

2: Will run, but convergence to optimal solution is not proven, and it will likely not work well on multidimensional action spaces

Reinforcement Learning:

Package Build/Coverage Continuous
States
Continuous
Actions
TabularTDLearning Build Status
Coverage Status
N N
DeepQLearning Build Status
Coverage Status
Y1 N

1: For POMDPs, it will use the observation instead of the state as input to the policy. See RLInterface.jl for more details.

Packages Awaiting Update

These packages were written for POMDPs.jl in Julia 0.6 and have not been updated to 1.0 yet.

Package Build Coverage
DESPOT Build Status Coverage Status

Performance Benchmarks:

Package
DESPOT

*These packages require non-Julia dependencies

Citing POMDPs

If POMDPs is useful in your research and you would like to acknowledge it, please cite this paper:

@article{egorov2017pomdps,
  author  = {Maxim Egorov and Zachary N. Sunberg and Edward Balaban and Tim A. Wheeler and Jayesh K. Gupta and Mykel J. Kochenderfer},
  title   = {{POMDP}s.jl: A Framework for Sequential Decision Making under Uncertainty},
  journal = {Journal of Machine Learning Research},
  year    = {2017},
  volume  = {18},
  number  = {26},
  pages   = {1-5},
  url     = {http://jmlr.org/papers/v18/16-300.html}
}