Efficient optimal selection for tree breeding

Optimal selection problems are to find optimum of selection of genotypes that maximizes genetic gains under a constraint on genetic diversity which involves Wright's numerator relationship matrix.

Using a conic programming approach, this package provides efficient numerical methods for optimal selection problems arising from tree breeding. This package now implements two methods

- the compact SOCP formulation for unequally deployment problem
- the steepest-ascent method for equally deployment problem

`import Pkg; Pkg.add("OpSel")`

or in the package mode by `]`

,

`pkg> add https://github.com/makoto-yamashita/OpSel.jl`

The dataset of the package includes the sizes of Z = 200, 1050, 2045, 5050, 5255, 10100, 15100, and 15222; and N = 50 and 100.

```
Using OpSel
OpSel.testUnequal(2045);
```

Afther the execution of Ipopt, the summary will be reported as follows:

```
=== Result Summary ===
JuMP status = LOCALLY_SOLVED
Z = 2045, N_s = 14, N = 2800
gx = 439.116285, xAx = 0.071429, 2theta = 0.071429
time(s): build = 0.049, solver = 1.392, total = 1.441
```

The obtained objective value is =439.116825, and we can see that the quadratic constraint is satisfied. The computation time to build the SOCP is 0.049 seconds, the time by Ipopt is 1.392 seconds, and the entire time is 1.441 seconds.

```
Using OpSel
OpSel.testEqual(200,50);
```

Afther the execution, the summary will be reported as follows:

```
SOCP = 26.155540, STEEP = 25.090400, gap = 4.072330%
time(s): build = 0.016, solver = 0.088, steep = 0.820, total = 0.925
```

The objective value of the SOCP relaxation is 26.155540, and the objective value after the steepst-ascent method is 25.090440. The gap is computed from these two objective values. The exact optimal value must be between these two objective values, thus if the gap is 0, we can obtain the exact optimal value. In the computation time, steep corresponds to the steepest-ascent part.

- If an input CSV file is available, the methods can be executed as follows:

```
sp_csv = OpSel.loadFile(filename)
(x_result, info_result) = OpSel.compactSOCP(sp_csv)
```

or

```
sp_csv = OpSel.loadFile(filename)
(x_result, info_result) = OpSel.steepestAscent(sp_csv, N=50)
```

The columns of the input CSV file should be:
`i(id), p(parent1), p(parent2), g(EBV), u(upperbound), h(inbreeding)`

For example, the following line in a CSV
`1040, 782, 751, 3.1313800000000001, 50, 0.0410156250000000`

indicates that the parents of 1040 are 782 and 751. The EBV of 1040 is 3.13138, and the upperbound is 50 (this will be divided by 2800) and the inbreeding value is 0.041015625.

An unequally-type of optimization problem is of form:

Here, the decision variable is . The constant vector is estimated breeding values. The matrix is Wright's numerator relationship matrix, and the theta is a threashold. The vector is the vector of all ones. The vectors and are the lower and upper bounds of , respectively.

This optimization problem was defined in

- T.H.E. Meuwissen, "Maximizing the response of selection with a predefined rate of inbreeding", Journal of Animal Science, Vol. 75, pp. 934-940, 1997.

An equally-type of optimization problem is of form:

Here, is the given parameter, thus each genotype should contribute nothing or the same amount .

For optimal selection problems, GENCONT by Meuwissen (http://www.genebankdata.cgn.wur.nl/gencont/gencont.html) is often used. The main advantage of this package is its computation speed. The compact SOCP formulation is also available through OPSEL (https://www.skogforsk.se/opsel/). For more details, please refer to the two papers below at "Papers."

The two methods were proposed in the two papers below.

- the compact SOCP formulation for unequally deployment problem

- Makoto Yamashita, Tim J. Mullin, Sena Safarina, "An efficient second-order cone programming approach for optimal selection in tree breeding," Optimization Letters, Vol. 12 , No. 7, pp. 1683-1697, 2018. https://link.springer.com/article/10.1007/s11590-018-1229-y

```
@article{Yamashita2018,
author="Yamashita, Makoto and Mullin, Tim J. and Safarina, Sena",
title="An efficient second-order cone programming approach for optimal selection in tree breeding",
journal="Optimization Letters",
year="2018",
month="Oct",
day="01",
volume="12",
number="7",
pages="1683--1697"
}
```

- the steepest-ascent method for equally deployment problem

- Sena Safarina, Satoko Moriguchi, Tim J. Mullin, and Makoto Yamashita, "Conic relaxation approaches for equal deployment problems," Discrete Applied Mathematics, Vol. 275, pp. 111--125, 2020. http://www.sciencedirect.com/science/article/pii/S0166218X19304184

```
@article{Safarina2020,
title = "Conic relaxation approaches for equal deployment problems",
journal = "Discrete Applied Mathematics",
volume = "275",
pages = "111--125",
year = "2020",
issn = "0166-218X",
doi = "https://doi.org/10.1016/j.dam.2019.04.032",
url = "http://www.sciencedirect.com/science/article/pii/S0166218X19304184"
}
```

The datasets with sizes 2045 and 15222 were avaiable at the Dryad Digital Repository (http://dx.doi.org/10.5061/dryad.9pn5m). The other data were produced with POPSIM (https://www.skogforsk.se/popsim/). In each dataset, the candidates are sorted so that all the ids are sequential numbers.

- In the two papers above, ECOS (https://github.com/JuliaOpt/ECOS.jl) was used as the SOCP solver. However, recent versions of ECOS was not numerically stable for optimal selection problems. Instead, Ipopt (https://github.com/JuliaOpt/Ipopt.jl) is used in this package.
- The inbreeding value (the last column of the input CSV file) was computed by Quaas's algorithm

- R. L. Quaas, "Computing the diagonal elements and inverse of a large numerator relationship matrix," Biometrics, Vol. 32, pp. 949–953, 1976.