SerializedElementArrays.jl

Julia arrays with elements saved to disk with serialization.
Author ITensor
Popularity
0 Stars
Updated Last
2 Years Ago
Started In
May 2021

SerializedElementArrays

Stable Dev Build Status Coverage Code Style: Blue

Installation

Install with the Julia package manager with import Pkg; Pkg.add("SerializedElementArrays").

Introduction

This package introduces a function disk which transfers an AbstractArray in memory to one stored on disk, called a SerializedElementArray. The elements of the original array are serialized and by default are saved into individual files in a randomly generated directory inside the system's temporary directory.

For example:

using SerializedElementArrays: disk, pathname

a = reshape(1:6, 2, 3)
d = disk(a)
@show d isa SerializedElementArrays.SerializedElementArray
@show a[1, 2]
@show d[1, 2]
@show readdir(pathname(d))
d[2, 2] = 3

Normal array operations like getindex and setindex! work on SerializedElementArrays, but note that they involve reading from and writing to disk so will be much slower than the same operations for Array. Keep this in mind when using a SerializedElementArray and organize your code to minimizing accessing individual elements.

To create an array stored on disk with undefined elements, disk accepts undefined Arrays:

using SerializedElementArrays: disk, pathname

a = Array{Matrix{Float64}}(undef, 2, 3)
d = disk(a)
@show isassigned(a, 1, 2)
@show isassigned(d, 1, 2)
@show readdir(pathname(d))
x = randn(5, 5)
d[1, 2] = x
@show x == d[1, 2]
@show readdir(pathname(d))

When initialized from undefined Arrays, no files are created, but elements can be set which are then written to disk.

Internally, files are written to a path in the system's temporary directory created by tempname(). In Julia 1.4 and later, the files are cleaned up once the Julia process finishes (see the Julia documentation for tempname). You can use disk(a; cleanup=false) to keep the files after the process ends. However, note that because serialization is used (with the standard library module Serialization), in general it is not guaranteed that the files can be read and written by different versions of Julia, or an instance of Julia with a different system image. The aim of this package is to make it easier to perform calculations with collections of very large objects which collectively might not fit in memory and are not read and written very often during the calculation, and which are not necessarily needed long term after the calculation finishes. For more stable reading and writing across different versions of Julia, we recommend using packages like HDF5, JLD, or JLD2.

Future plans

  • Automate caching of recently accessed elements to speed up repeated access of the same elements. This could use something like LRUCache.jl.
  • Make a dictionary interface through a type SerializedElementDict. A design question would be if the file structure should be "nested" or "shallow", i.e. when saving nested dictionaries, should the dictionaries themselves be serialized and saved to files or should the individual elements of the nested dictionaries be saved to files?

Related packages: