Common table operations on Tables.jl compatible sources
Installation: at the Julia REPL, import Pkg; Pkg.add("TableOperations")
Maintenance: TableOperations is maintained collectively by the JuliaData collaborators. Responsiveness to pull requests and issues can vary, depending on the availability of key collaborators.
The TableOperations.select
function allows specifying a custom subset and order of columns from a Tables.jl source, like:
ctable = (A=[1, missing, 3], B=[1.0, 2.0, 3.0], C=["hey", "there", "sailor"])
table_subset = ctable |> TableOperations.select(:C, :A) |> Tables.columntable
This "selects" the C
and A
columns from the original table, and re-orders them with C
first. The column names can be provided as String
s, Symbol
s, or Integer
s.
The TableOperations.transform
function allows specifying a "transform" function per column that will be applied per element. This is handy
when a simple transformation is needed for a specific column (or columns). Note that this doesn't allow the creation of new columns,
but only applies the transform function to the specified column, and thus, replacing the original column. Usage is like:
ctable = (A=[1, missing, 3], B=[1.0, 2.0, 3.0], C=["hey", "there", "sailor"])
table = ctable |> TableOperations.transform(C=x->Symbol(x)) |> Tables.columntable
Here, we're providing the transform function x->Symbol(x)
, which turns an argument into a Symbol
, and saying we should apply it to the C
column.
Multiple tranfrom functions can be provided for multiple columns and the column to transform function can also be provided in Dict
s that
map column names as String
s, Symbol
s, or even Int
s (referring to the column index).
The TableOperations.filter
function allows applying a "filter" function to each row in the input table source, keeping rows for which f(row)
is true
.
Usage is like:
ctable = (A=[1, missing, 3], B=[1.0, 2.0, 3.0], C=["hey", "there", "sailor"])
table = ctable |> TableOperations.filter(x->Tables.getcolumn(x, :B) > 2.0) |> Tables.columntable
The TableOperations.map
function allows applying a "mapping" function to each row in the input table source; the function f
should take and
return a Tables.jl Row
compatible object. Usage is like:
ctable = (A=[1, missing, 3], B=[1.0, 2.0, 3.0], C=["hey", "there", "sailor"])
table = ctable |> TableOperations.map(x->(A=Tables.getcolumn(x, :A), C=Tables.getcolumn(x, :C), B=Tables.getcolumn(x, :B) * 2)) |> Tables.columntable
The TableOperations.narrowtypes function allows infering column element types to better fit the stored data. Usage is like:
ctable_type_any = (A=Any[1, missing, 3], B=Any[1.0, 2.0, 3.0], C=Any["hey", "there", "sailor"])
table = TableOperations.narrowtypes(ctable_type_any) |> Tables.columntable
The TableOperations.dropmissing function allows to lazily remove every row where missing values are present. Usage is like:
ctable = (A=[1, missing, 3], B=[1.0, 2.0, 3.0], C=["hey", "there", "sailor"])
table = ctable |> TableOperations.dropmissing |> Tables.columntable
The TableOperations.joinpartitions function allows you to lazily chain (or "join") multiple tables into a single long table. Usage is like:
ctables = Tables.partitioner(i -> (A=fill(i, 10), B=rand(10) * i), 1:3)
table = ctables |> TableOperations.joinpartitions |> Tables.columntable
Contributions are very welcome, as are feature requests and suggestions. Please open an issue if you encounter any problems or would just like to ask a question.