A threaded data iterator for machine learning on out-of-memory datasets. Inspired by PyTorch's DataLoader.
It uses to load data in parallel while keeping the primary thread free. It can also load data inplace to avoid allocations.
Many data containers work out of the box and it is easy to extend with your own.
DataLoaders is built on top of and fully compatible with
MLDataPattern.jl's Data Access Pattern, a functional interface for machine learning datasets.
x = rand(128, 10000) # 10000 observations of size 128 y = rand(1, 10000) dataloader = DataLoader((x, y), 16) for (xs, ys) in dataloader @assert size(xs) == (128, 16) @assert size(ys) == (1, 16) end
Of course, in the above example, we can keep the dataset in memory and don't need parallel workers. See Custom data containers for a more realistic example.
If you get the idea and know it from PyTorch, see Quickstart for PyTorch users.
Otherwise, read on here.
Available methods are documented here.