MaskArrays.jl

Author cscherrer
Popularity
6 Stars
Updated Last
12 Months Ago
Started In
March 2021

MaskArrays

Stable Dev Build Status Coverage

When working with missing values in an array, there are a few challenges:

  1. ::Missing values have some overhead, and prevent BLAS operations
  2. For imputations, we need to track which values were imputed
  3. For many inference algorithms, it's convenient to have the imputed values together in a dense array

MaskArrays addresses these issues. For example, say you're given an array like

julia> x
7-element Vector{Union{Missing, Float64}}:
 -0.742
   missing
 -0.301
 -0.954
   missing
   missing
 -0.436

Then we can convert this easily:

julia> ma = maskarray(x)
7-element MaskArray{Float64,1}:
 -0.742
  6.9439480399727e-310
 -0.301
 -0.954
  0.0
  0.0
 -0.436

The imputed values are represented as a view into the data:

julia> imputed(ma)
3-element view(::Vector{Float64}, [2, 5, 6]) with eltype Float64:
 6.9439480399727e-310
 0.0
 0.0

For example, we can easily do

julia> imputed(ma) .= 1:3
3-element view(::Vector{Float64}, [2, 5, 6]) with eltype Float64:
 1.0
 2.0
 3.0

julia> ma
7-element MaskArray{Float64,1}:
 -0.742
  1.0
 -0.301
 -0.954
  2.0
  3.0
 -0.436

Buffers

A MaskArray has a "buffer" to allow it to easily connect to outside data sources. By default, this is identical to the imputed values (so extra allocation is avoided).

For example, say we have

julia> outside_data = randn(10)
10-element Vector{Float64}:
 -0.42452477906454783
  0.03203787170597264
  1.1366181451933932
 -2.018667288063533
  1.3208417491973015
  0.07966694888217887
  1.063328831016872
  0.07649454253602395
 -2.4029119018577814
  0.6908031059739369

we can connect a subset of this to our imputed values like this:

julia> ma2 = replace_buffer(ma, view(outside_data, 3:5))
7-element MaskArray{Float64,1}:
 -0.742
  1.0
 -0.301
 -0.954
  2.0
  3.0
 -0.436

After a change in the buffer, we need to sync! to push the results to the data:

julia> sync!(ma2)
7-element MaskArray{Float64,1}:
 -0.742
  1.1366181451933932
 -0.301
 -0.954
 -2.018667288063533
  1.3208417491973015
 -0.436

Now say we make a change to the outside data:

julia> outside_data .= 99999
10-element Vector{Float64}:
 99999.0
 99999.0
 99999.0
 99999.0
 99999.0
 99999.0
 99999.0
 99999.0
 99999.0
 99999.0

julia> sync!(ma2)
7-element MaskArray{Float64,1}:
    -0.742
 99999.0
    -0.301
    -0.954
 99999.0
 99999.0
    -0.436

Used By Packages

No packages found.