Julia Dict
and Set
data structures safely persisted to disk.
All collections are backed by LMDB - a super fast B-Tree based embedded KV database with ACID guaranties. As with other B-Tree based databases reads are faster than writes. However, write performance is still decent (expect 1k-10k TPS).
Care was taken to make the data structures thread-safe. LMDB handles most of the locking well - we just have to exclusively lock the LMDB.Environment
when writing
to prevent multiple threads opening multile write transactions (deadlock will occur).
- Install this package:
import Pkg Pkg.add("https://github.com/blenessy/PersistentCollections.jl.git")
- Create an
LMDB.Environment
in a directory calleddata
(in your current working directory):using PersistentCollections env = LMDB.Environment("data")
- Create an
AbstractDict
in your LMDB environment:dict = PersistentDict{String,String}(env)
- Use it as any other dict:
dict["foo"] = "bar" @assert dict["foo"] == "bar" @assert collect(keys(dict)) == ["foo"] @assert collect(values(dict)) == ["bar"]
- (Optional) note the asymetric performance characteristic of LMDB (B-Tree) based database:
@time dict["bar"] = "baz"; # Writes to LMDB (B-Tree) are relatively slow @time dict["bar"]; # Reads are very fast though :)
It is possible to create persistent collection of Any
type although some methods will not be able to convert the value to the correct type because no metadata is stored for this in DB.
Most notably the getindex
method (e.g. dict["foo"]
) will not return a converted value. To mitigate this limitation, use the get
method, which includes a default value.
The type of the default value (if other than nothing
) will be used to convert the value to the desired type.
env = LMDB.Environment("data")
dict = PersistentDict{Any,Any}(env)
dict["foo"] == "bar"
dict["foo"] # PersistentCollections.LMDB.MDBValue{Nothing}(0x0000000000000003, Ptr{Nothing} @0x000000012c806ffd, nothing)
get(dict, "foo", "") # "bar"
convert(String, dict["foo"]) # "bar"
It is possible if you need transactional consistency between multiple persistent collections:
- Create your
LMDB.Environment
with "named database" support by specifying the number of persistent collections yoy want with themaxdbs
keyword argument:env = LMDB.Environment("data", maxdbs=2)
- Instantiate your persistent collections with a unique (within LMDB env.) id:
dict1 = PersistentDict{String,String}(env, id="mydict1") dict2 = PersistentDict{String,Int}(env, id="mydict2")
Yes, you can expect significant increase with write throughput if you are willing to risk loosing your last written transactions. Please note that database integrity (risk of curruption) is not in danger here.
unsafe_env = LMDB.Environment("data", flags=LMDB.MDB_NOSYNC)
unsafe_dict = PersistentDict{String,String}(unsafe_env)
flush(unsafe_env) do
unsafe_dict["foo"] = "bar"
unsafe_dict["foo"] = "baz"
end # <== data is flushed to disk here
This is equvalent to:
unsafe_env = LMDB.Environment("data", flags=LMDB.MDB_NOSYNC)
unsafe_dict = PersistentDict{String,String}(unsafe_env)
try
unsafe_dict["foo"] = "bar"
unsafe_dict["foo"] = "baz"
finally
flush(unsafe_env)
end
make test
make coverage
make bench
- Travis CI integration
- Coveralls integration (when public)
- All platforms supported
- Part of Julia Registry
- Optimised implementation
- Thread Safe
- MDB_NOSYNC support
- Named database support
- Manual flush (sync) to disk
- Implemented
- Thread Safe
- MDB_NOSYNC support
- Named database support
- Manual flush (sync) to disk
Lots of LMDB wrapping magic was pinched from wildart/LMDB.jl - who deserves lots of credits.