HuggingFaceDatasets.jl

A Julia package for interacting with the Hugging Face dataset repository.
Author JuliaGenAI
Popularity
30 Stars
Updated Last
10 Months Ago
Started In
May 2022

HuggingFaceDatasets

Dev Build Status Coverage

HuggingFaceDatasets.jl is a non-official julia wrapper around the python package datasets from Hugging Face. datasets contains a large collection of machine learning datasets (see here for a list) that this package makes available to the julia ecosystem.

This package is built on top of PythonCall.jl.

Installation

HuggingFaceDatasets.jl is a registered Julia package. You can easily install it through the package manager:

pkg> add HuggingFaceDatasets

Usage

HuggingFaceDatasets.jl provides wrappers around types from the datasets python package (e.g. Dataset and DatasetDict) along with a few related methods.

Check out the examples/ folder for usage examples.

julia> train_data = load_dataset("mnist", split = "train")
Dataset({
    features: ['image', 'label'],
    num_rows: 60000
})

# Indexing starts with 1. 
# Python types are returned by default.
julia> train_data[1]
Python: {'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=28x28 at 0x7F04DE661CD0>, 'label': 5}

julia> length(train_data)
60000

# Now we set the julia format
julia> train_data = load_dataset("mnist", split = "train").with_format("julia");

# Returned observations are now julia objects
julia> train_data[1]
Dict{String, Any} with 2 entries:
  "label" => 5
  "image" => Gray{N0f8}[Gray{N0f8}(0.0) Gray{N0f8}(0.0)  Gray{N0f8}(0.0) Gray{N0f8}(0.0); Gray{N0f8}(0.0) Gray{N0f8}(0.0)  Gray{N0f8}(0.0) Gray{N0f8}(0.0);  ; Gray{N0f8}(0.0) Gray{N0f8}(0.0) ……

julia> train_data[1:2]
Dict{String, Vector} with 2 entries:
  "label" => [5, 0]
  "image" => ReinterpretArray{Gray{N0f8}, 2, UInt8, Matrix{UInt8}, false}[[Gray{N0f8}(0.0) Gray{N0f8}(0.0)  Gray{N0f8}(0.0) Gray{N0f8}(0.0); Gray{N0f8}(0.0) Gray{N0f8}(0.0)  Gray{N0f8}(0.0) Gra

Used By Packages

No packages found.