This package implements the simplest version of Differential Evolution, but adds the ability to specify custom 'predictors', which allow to incorporate knowledge about the problem into the optimization.
For example, standard machine learning approaches could be trained on a set of examples of the optimization problem, and their predictions can be used to guide the optimization.
The key idea of Differential Evolution, without the additions of mutations and crossovers, is to evolve a population of candidates to the optimum by:
- selecting 3 random candididates
a
,b
,c
- comparing the costs of
a
anda' = a + lambda * (c-b)
- replacing
a
witha'
ifa'
has lower cost
Predictors allow to compare a
not only to a'
, but also to one or more hypotheses a'' = predict(a)
. Predictions are computed for the entire population at once, so the signature of a predictor is actually predictor(currentgeneration, currentcosts)
.
optimum = [-2 4]'
costf(x) = (r = sqrt(sum((x-optimum).^2)); sin(r)^2+r/2)
best, info = de(costf, [-10,-10], [10,10])
@show optimum best optimum-best
de
has the following signature:
de(costfunction, minima, maxima; kargs...)
The costfunction
is expected to take a single parameter of type Array{Number,2}
and return a scalar value.
minima
and maxima
are arrays of type Array{Number,2}
specifing the possible range of values passed to costfunction
.
The optimization process can be controlled through the following keyword arguments of de
, listed with their default values:
npop = 100
, number of candidates in the populationmaxiter = 1e6
, maximum number of iterationslambda = 0.85
, value oflambda
used in the default DE predictor (see above)initpop = mi .+ rand(length(mi), npop) .* (ma - mi)
, initial populationrecordhistory = false
, record details about each candidates' evolutioncontinueabove = -Inf
, stop when the cost of the best candidate reaches this valuereplaceworst = 0.0
, percentage specified as range0.0 ... 1.0
. In each iteration this percentage of candidates is replaced with copies of the best candidatesroundto = 1e-6
, size of the grid in the parameter space where candidates live. Set to0
for full numerical precision. Can also be a vector the size ofminima
/maxima
, in which case the grid resolution can be different for each parameter. E.g. to enforce integers set to1
In this example we assume that we can learn from a couple of examples in a training phase and use the resulting model in our predictor:
# example predictor which learned to predict offsets
predictor(pop, costs) = pop + predict(somemodel, pop)
best, info = de(costf, [-10,-10], [10,10], {predictor, :default})
There is a helper function available to compare different strategies:
using InformedDEHelpers, PyPlot
analyze(f, mi, ma, [
("default", {:default},{}),
("pred first", {predictor, :default}, {}),
("pred last", {:default, predictor}, {}),
("pred only", {predictor}, {}),
("tryall", {predictor, :default}, {(:tryallpredictors,true)})
])
- fixed broken rounding and clamping