SortPerf.jl: Module to test the performance of sorting algorithms
The purpose of this module is to test the performance of the different sort (and related) algorithms in Julia. See https://github.com/kmsquire/SortPerf.jl/raw/master/sortperf.pdf for an example output from Version 0.3.0-prerelease+125.
std_sort_tests(;sort_algs=SortPerf.sort_algs, # [InsertionSort, HeapSort, MergeSort, # QuickSort, RadixSort, TimSort] types=SortPerf.std_types, # [Int32, Int64, Int128, Float32, Float64, String] range=6:20, # Array size 2^6 through 2^20, by powers of 2 replicates=3, # lt::Function=isless, # \ by::Function=identity, # | sort(...) options rev::Bool=false, # | order::Ordering=Forward, # / save::Bool=false, # create and save timing tsv and pdf plot prefix="sortperf") # prefix for saved files
You can also test individual algorithms with
sortperf(Algorithm(s), data, [size,] [replicates=xxx])
sortperf(QuickSort, Int, 10_000) # Test QuickSort on 10,000 random ints sortperf(MergeSort, [Float32, String], 6:2:10) # Test MergeSort on 2^6, 2^8, and 2^10 float 32s and strings sortperf([QuickSort, MergeSort, TimSort], # Test QuickSort, MergeSort, and TimSort on [Int, Float32, Float64, String], # Arrays of Int, Float32, Float64, and String 6:20; # ranging from 2^6 elements to 2^20 elements, by replicates=5) # powers of 2, and run each test 5 times
Ordering parameters accepted by sort!(...) will be passed through.
The actual tests run include sorting arrays with the following characteristics:
- sorted, but with 3 random exchanges
- sorted, with 10 random values appended
- 4 unique values
- all equal
- quicksort median killer: first half descending, second half ascending
The tests were inspired by similar tests used by sortperf in Python. See http://svn.python.org/projects/python/trunk/Objects/listsort.txt for more details.
Suggestions based on basic tests
Here is a table and some notes on the Julia implementations of the
various algorithms. The table indicates the recommended sort
algorithm for the given size (
small < ~2^12 (=8,192) items < large)
and type (string, floating point, or integer) of data.
- Random means that the data is permuted randomly.
- Structured here means that the data contains partially sorted runs (such as when adding random data to an already sorted array).
- Few unique indicates that the data only contains a few unique values.
|(Un)stable (small)||Stable (small)||(Un)stable (large)||Stable (large)||In-place (large)|
|- Few Unique||Q||M||Q||M||Q|
|- Few Unique||Q||M||Q||R||Q|
|- Few Unique||Q||M||R||R||Q|
Except for pathological cases, small arrays are sorted best with
QuickSort(unstable) or `MergeSort`` (stable)
When sorting large arrays with sections of already-sorted data, use
TimSort. The only structured case it does not handle well is reverse-sorted data with large numbers of repeat elements. An unstable version of
TimSort(to be contributed to Julia soon) will handle this case
For numerical data (Ints or Floats) without structure,
RadixSortis the best choice, except for 1) 128-bit values, or 2) 64-bit integers which span the full range of values.
When memory is tight,
QuickSortis the best in-place algorithm. If there is concern about pathological cases, use
HeapSort. All stable algorithms use additional memory, but
TimSortis (probably) the most frugal.
Composite types may behave differently. If sorting is important to your application, you should test the different algorithms on your own data. This package facilitates that.