DataConvenience.jl

Convenience functions missing in Julia
Author xiaodaigh
Popularity
1 Star
Updated Last
5 Months Ago
Started In
October 2019

DataConvenience

An eclectic collection of convenience functions for you.

Data

cleannames!

Somewhat similiar to R's janitor::clean_names so that cleannames!(df) cleans the names of a DataFrame.

CSV Chunk Reader

You can read a CSV in chunks and apply logic to each chunk. The types of each column is inferred by CSV.read.

for chunk in CsvChunkIterator(filepath)
  # chunk is a DataFrame
  # do something to df
end

The chunk iterator uses CSV.read parameters. The user can pass in type and types to dictate the types of each column e.g.

# read all column as String
for chunk in CsvChunkIterator(filepath, type=String)
  # df is a DataFrame where each column is String
  # do something to df
end
# read a three colunms csv where the column types are String, Int, Float32
for chunk in CsvChunkIterator(filepath, types=[String, Int, Float32])  
  # do something to df
end

Note The chunks MAY have different column types.

Statistics & Correlations

Canonical Correlation

The first component of Canonical Correlation.

canonicalcor(x, y)

Correlation for Bool

cor(x::Bool, y) - allow you to treat Bool as 0/1 when computing correlation

Correlation for DataFrames

dfcor(df::AbstractDataFrame, cols1=names(df), cols2=names(df), verbose=false)

Compute correlation in a DataFrames by specifying a set of columns cols1 vs another set cols2. The cartesian product of cols1 and cols2's correlation will be computed

Miscellaneous

@replicate

@replicate code times will run code multiple times e.g.

@replicate 10 randstring(8)

StringVector

StringVector(v::CategoricalVector{String}) - Convert v::CategoricalVector efficiently to WeakRefStrings.StringVector

Used By Packages

No packages found.