## SortingLab.jl

Faster sorting algorithms (sort and sortperm) for Julia
Popularity
23 Stars
Updated Last
1 Year Ago
Started In
January 2018
author title date
Dai ZJ
2019--09-28

# SortingLab

An alternative implementation of sorting algorithms and APIs. The ultimate aim is to contribute back to Julia base or SortingAlgorithms.jl. However, there is commitment to keep this package's API stable and supported, so other developers can rely on the implementation and API here.

# Faster Sort and Sortperm

The main function exported by SortingLab is `fsort` and `fsortperm` which generally implements faster algorithms than `sort` and `sortperm` for `CategoricalArrays.CategoricalVector`, `Vector{T}`, `Vector{Union{T, Missing}}` where `T` is

• Int*, UInt*, Float*, String

Note: The reason why we restrict the type to `Vector` is that SortingLab.jl assumes something about memory layout and hence `Vector` provides that guarantee in the types supported.

## Usage

```using SortingLab;
using Test
N = 1_000_000;
K = 100;

svec_sorted = fsort(svec);
@test issorted(svec_sorted)
@test issorted(svec) == false```
``````Test Passed
``````
```# faster string sortperm
sorted_idx = fsortperm(svec)
issorted(svec[sorted_idx]) #true

# in place string sort
fsort!(svec);
issorted(svec) # true```
``````true
``````
```# CategoricalArray sort
using CategoricalArrays
pools = "id".*string.(1:100,3);
byvec = CategoricalArray{String, 1}(rand(UInt32(1):UInt32(length(pools)), N), CategoricalPool(pools, false));
byvec = compress(byvec);

byvec_sorted = fsort(byvec);
@test issorted(byvec_sorted)```
``````Test Passed
``````

### Sorting `Vector{Union{T, Missing}}`

For vectors that contain `missing`, the `sort` and `sortperm` performance is often sub-optimal in `Base` and is not supported in `SortingAlgorithms.jl`'s radixsort implementation. This is solved by `SortingLab.jl` `fsort`, see Benchmarks Section

```using Test
using Missings: allowmissing
x = allowmissing(rand(1:10_000, 1_000_000))
x[rand(1:length(x), 100_000)] .= missing

using SortingLab
@test isequal(fsort(x), sort(x))```
``````Test Passed
``````

## Benchmarking code

```using SortingLab;
using BenchmarkTools;
import Random: randstring

N = 1_000_000;
K = 100;

sort_id_1m = @belapsed sort(\$svec);

sortperm_id_1m = @belapsed sortperm(\$svec);
fsortperm_id_1m = @belapsed fsortperm(\$svec);

rsvec = rand([randstring(rand(1:32)) for i = 1:N÷K], N);
sort_r_1m = @belapsed sort(\$rsvec);

sortperm_r_1m = @belapsed sortperm(\$rsvec);
fsortperm_r_1m = @belapsed fsortperm(\$rsvec);

using Plots
using StatsPlots
groupedbar(
repeat(["IDs", "Random len 32"], inner=2),
group = repeat(["Base.sort","SortingLab.radixsort"], outer = 2),
title = "Strings sort (1m rows): Base vs SortingLab")

groupedbar(
repeat(["IDs", "Random len 32"], inner=2),
[sortperm_id_1m, fsortperm_id_1m, sortperm_r_1m, fsortperm_r_1m],
group = repeat(["Base.sortperm","SortingLab.fsortperm"], outer = 2),
title = "Strings sortperm (1m rows): Base vs SortingLab")
savefig("benchmarks/sortperm_vs_fsortperm.png")```

# Similar package

https://github.com/JuliaCollections/SortingAlgorithms.jl