A parallel iterator for large machine learning datasets that don't fit into memory inspired by PyTorch's `DataLoader` class.
A threaded data iterator for machine learning on out-of-memory datasets. Inspired by PyTorch's DataLoader.

It uses to load data in parallel while keeping the primary thread free. It can also load data inplace to avoid allocations.

Many data containers work out of the box and it is easy to extend with your own.

DataLoaders is built on top of and fully compatible with MLDataPattern.jl's Data Access Pattern, a functional interface for machine learning datasets.


x = rand(128, 10000)  #  10000 observations of size 128
y = rand(1, 10000)

dataloader = DataLoader((x, y), 16)

for (xs, ys) in dataloader
    @assert size(xs) == (128, 16)
    @assert size(ys) == (1, 16)

Of course, in the above example, we can keep the dataset in memory and don't need parallel workers. See Custom data containers for a more realistic example.

