This package implements the Local Approximation Value Iteration algorithm in Julia for solving
Markov Decision Processes (MDPs). Algorithmically it is very similar to the DiscreteValueIteration.jl
package, but it represents the state space in a fundamentally different manner, as explained below.
As with DiscreteValueIteration
, the user should define the problem according to the API in
POMDPs.jl. Examples of problem definitions can be found in
POMDPModels.jl.
You need to have POMDPs.jl already and the registry added (see its README). Thereafter, you can add LocalApproximationValueIteration from package manager mode in the Julia REPL
using Pkg
Pkg.add("LocalApproximationValueIteration")
This solver is one example of Approximate Dynamic Programming, which tries to find approximately optimal value functions and policies for large or continuous state spaces. The approach of Local Approximation Value Iteration assumes that states that are near each other (by some appropriate distance metric) will have similar values, and it computes the values at some (user-defined) finite set of states, and interpolates the value function over the entire state space using some local function approximation technique. Details of this approach are described in Section 4.5.1 of the book Decision Making Under Uncertainty : Theory and Application.
For value function approximation, the solver depends on the LocalFunctionApproximation.jl
package. The LocalApproximationValueIteration
solver must be
initialized with an appropriate LocalFunctionApproximator
object that approximates
the computed value function over the entire state space by either interpolation over a multi-dimensional grid discretization
of the state space, or by k-nearest-neighbor averaging
with a randomly drawn set of state space samples. The resulting policy uses this object to compute the action-value
function or the best action for any arbitrary state query.
A key operational requirement that the solver has from the MDP is that any state can be represented via an equivalent
real-valued vector. This is enforced by the two convert_s
function requirements that convert an instance of
the MDP State type to a real-valued vector and vice versa.
The user is required to implement the above two functions for the State
type of their MDP problem model.
POMDPLinter.jl
has a macro @show_requirements
that determines the functions necessary to use some solver on some specific MDP model. As mentioned above, the
LocalApproximationValueIteration
solver depends on a LocalFunctionApproximator
object and so that object must first be created to invoke
the requirements of the solver accordingly. From our running example in test/runtests_versus_discrete_vi.jl
, a function approximation object that uses grid interpolation
(LocalGIFunctionApproximator
) is created, after the appropriate RectangleGrid
is
constructed (Look at GridInterpolations.jl for more details about this).
using POMDPs, POMDPModels
using GridInterpolations
using LocalFunctionApproximation
using LocalApproximationValueIteration
VERTICES_PER_AXIS = 10 # Controls the resolutions along the grid axis
grid = RectangleGrid(
range(1, 100, length=VERTICES_PER_AXIS), # x
range(1, 100, length=VERTICES_PER_AXIS) # y
)
interp = LocalGIFunctionApproximator(grid)
The user should modify the above steps depending on the kind of interpolation and the necessary parameters they want. We have delegated this step to the user
as it is extremely problem and domain specific. Note that the solver supports both explicit and generative transition models for the MDP (more on that here).
The is_mdp_generative
and n_generative_samples
arguments of the LocalApproximationValueIteration
solver should be set accordingly, and there are different
@requirements
depending on which kind of model the MDP has.
Once all the necessary functions have been defined, the solver can be created. A GridWorld
MDP is defined with grid size 100 x 100 and appropriate reward states:
mdp = SimpleGridWorld(
size = (100,100),
rewards = Dict(GWPos(x,y)=>10. for x ∈ 40:60, y ∈ 40:60)
)
Finally, the solver can be created using the function approximation object and other necessary parameters (this model is explicit), and the MDP can be solved:
approx_solver = LocalApproximationValueIterationSolver(interp, verbose=true, max_iterations=1000, is_mdp_generative=false)
approx_policy = solve(approx_solver, mdp)
The API for querying the final policy object is identical to DiscreteValueIteration
, i.e. the action
and value
functions can be called for the solved MDP:
v = value(approx_policy, s) # returns the approximately optimal value for state s
a = action(approx_policy, s) # returns the approximately optimal action for state s