ARFFFiles.jl

Load and save ARFF files
Author cjdoris
Popularity
5 Stars
Updated Last
2 Years Ago
Started In
August 2020

ARFFFiles.jl

Load and save ARFF (Attribute Relation File Format) files.

Integrated into Tables.jl for easily converting to your favourite table types.

Install

] add ARFFFiles

Quick start

To load an ARFF file as a DataFrame:

using ARFFFiles, DataFrames
df = ARFFFiles.load(DataFrame, "mytable.arff")

Replace DataFrame with your favourite table type, or leave it out to get an ARFFTable.

To save any Tables.jl-compatible table:

using ARFFFiles
ARFFFiles.save("mytable.arff", df)

Loading

  • load(file) loads the table in the given file (filename or IO stream) as an ARFFTable.
  • load(func, file) is equivalent to func(load(file)) but operates recursively on any relational columns.
  • loadstreaming(file) returns a ARFFReader object r:
    • Satisfies the Tables.jl interface, so can be materialized as a table.
    • r.header contains the header parsed from io.
    • Iterates rows of type ARFFRow.
    • read(r), read(r, n) and read!(r, x) reads rows of the table.
    • readcolumns(r, [maxbytes=nothing]) reads the whole table into a columnar format. Specify maxbytes to read a portion of the rows.
    • close(r) closes the underlying io stream, unless own=false.
  • loadstreaming(func, file) is equivalent to func(loadstreaming(file)) but ensures the file is closed afterwards.
  • loadchunks(file) returns an iterator of ARFFTables for efficiently streaming very large tables. Equivalent to Tables.partitions(loadstreaming(file)).
  • loadchunks(func, file) is equivalent to func(loadchunks(file)) but ensures the file is closed afterwards.

Types. Numbers load as Float64, strings as String, dates as DateTime, nominals as CategoricalValue{String} (from CategoricalArrays) and relationals as ARFFTable.

Keyword options.

  • missingcols=:auto: Controls which columns may contain missing data (?). It can be :auto, :all, :none, a set or vector of column names (symbols), or a function taking a symbol and returning true if that column can contain missing. If the table is being read in a streaming fashion, then :auto behaves the same as :all.
  • missingnan=false: Convert missing values in numeric columns to NaN. This is equivalent to excluding these columns in missingcols.
  • categorical=true: When false, nominal columns are converted to String instead of CategoricalValue{String}.
  • chunkbytes=2^26: Read approximately this many bytes per chunk when iterating over chunks or rows.
  • own=false: Signals whether or not to close the underlying IO stream when close(::ARFFReader) is called.

Saving

  • save(file, table) saves the Tables.jl-compatible table to file (a filename or IO stream).

Types. Real is saved as numeric, AbstractString as string, DateTime and Date as date, and CategoricalValue{<:AbstractString} as nominal.

Keyword options.

  • relation="data": The relation name.
  • comment: A comment to print at the top of the file.