Load and save ARFF (Attribute Relation File Format) files.
Integrated into Tables.jl for easily converting to your favourite table types.
] add ARFFFiles
To load an ARFF file as a DataFrame:
using ARFFFiles, DataFrames
df = ARFFFiles.load(DataFrame, "mytable.arff")Replace DataFrame with your favourite table type, or leave it out to get an ARFFTable.
To save any Tables.jl-compatible table:
using ARFFFiles
ARFFFiles.save("mytable.arff", df)load(file)loads the table in the given file (filename or IO stream) as anARFFTable.load(func, file)is equivalent tofunc(load(file))but operates recursively on any relational columns.loadstreaming(file)returns aARFFReaderobjectr:- Satisfies the
Tables.jlinterface, so can be materialized as a table. r.headercontains the header parsed fromio.- Iterates rows of type
ARFFRow. read(r),read(r, n)andread!(r, x)reads rows of the table.readcolumns(r, [maxbytes=nothing])reads the whole table into a columnar format. Specifymaxbytesto read a portion of the rows.close(r)closes the underlying io stream, unlessown=false.
- Satisfies the
loadstreaming(func, file)is equivalent tofunc(loadstreaming(file))but ensures the file is closed afterwards.loadchunks(file)returns an iterator ofARFFTables for efficiently streaming very large tables. Equivalent toTables.partitions(loadstreaming(file)).loadchunks(func, file)is equivalent tofunc(loadchunks(file))but ensures the file is closed afterwards.
Types. Numbers load as Float64, strings as String, dates as DateTime, nominals as CategoricalValue{String} (from CategoricalArrays) and relationals as ARFFTable.
Keyword options.
missingcols=:auto: Controls which columns may contain missing data (?). It can be:auto,:all,:none, a set or vector of column names (symbols), or a function taking a symbol and returning true if that column can contain missing. If the table is being read in a streaming fashion, then:autobehaves the same as:all.missingnan=false: Convert missing values in numeric columns to NaN. This is equivalent to excluding these columns inmissingcols.categorical=true: When false, nominal columns are converted toStringinstead ofCategoricalValue{String}.chunkbytes=2^26: Read approximately this many bytes per chunk when iterating over chunks or rows.own=false: Signals whether or not to close the underlying IO stream whenclose(::ARFFReader)is called.
save(file, table)saves the Tables.jl-compatibletabletofile(a filename or IO stream).
Types. Real is saved as numeric, AbstractString as string, DateTime and Date as date, and CategoricalValue{<:AbstractString} as nominal.
Keyword options.
relation="data": The relation name.comment: A comment to print at the top of the file.