Load and save ARFF (Attribute Relation File Format) files.
Integrated into Tables.jl
for easily converting to your favourite table types.
] add ARFFFiles
To load an ARFF file as a DataFrame
:
using ARFFFiles, DataFrames
df = ARFFFiles.load(DataFrame, "mytable.arff")
Replace DataFrame
with your favourite table type, or leave it out to get an ARFFTable
.
To save any Tables.jl-compatible table:
using ARFFFiles
ARFFFiles.save("mytable.arff", df)
load(file)
loads the table in the given file (filename or IO stream) as anARFFTable
.load(func, file)
is equivalent tofunc(load(file))
but operates recursively on any relational columns.loadstreaming(file)
returns aARFFReader
objectr
:- Satisfies the
Tables.jl
interface, so can be materialized as a table. r.header
contains the header parsed fromio
.- Iterates rows of type
ARFFRow
. read(r)
,read(r, n)
andread!(r, x)
reads rows of the table.readcolumns(r, [maxbytes=nothing])
reads the whole table into a columnar format. Specifymaxbytes
to read a portion of the rows.close(r)
closes the underlying io stream, unlessown=false
.
- Satisfies the
loadstreaming(func, file)
is equivalent tofunc(loadstreaming(file))
but ensures the file is closed afterwards.loadchunks(file)
returns an iterator ofARFFTable
s for efficiently streaming very large tables. Equivalent toTables.partitions(loadstreaming(file))
.loadchunks(func, file)
is equivalent tofunc(loadchunks(file))
but ensures the file is closed afterwards.
Types. Numbers load as Float64
, strings as String
, dates as DateTime
, nominals as CategoricalValue{String}
(from CategoricalArrays
) and relationals as ARFFTable
.
Keyword options.
missingcols=:auto
: Controls which columns may contain missing data (?
). It can be:auto
,:all
,:none
, a set or vector of column names (symbols), or a function taking a symbol and returning true if that column can contain missing. If the table is being read in a streaming fashion, then:auto
behaves the same as:all
.missingnan=false
: Convert missing values in numeric columns to NaN. This is equivalent to excluding these columns inmissingcols
.categorical=true
: When false, nominal columns are converted toString
instead ofCategoricalValue{String}
.chunkbytes=2^26
: Read approximately this many bytes per chunk when iterating over chunks or rows.own=false
: Signals whether or not to close the underlying IO stream whenclose(::ARFFReader)
is called.
save(file, table)
saves the Tables.jl-compatibletable
tofile
(a filename or IO stream).
Types. Real
is saved as numeric, AbstractString
as string, DateTime
and Date
as date, and CategoricalValue{<:AbstractString}
as nominal.
Keyword options.
relation="data"
: The relation name.comment
: A comment to print at the top of the file.