FunctionalData.jl

FunctionalData

FunctionalData is a package for fast and expressive data modification.

Built around a simple memory layout convention, it provides a small set of general purpose functional constructs as well as routines for efficient computation with dense numerical arrays.

Optionally, it supplies a syntax for clean, concise code:

wordcount(filename) = @p read filename String | lines | map split | flatten | length

Memory Layout

Indexing is simplified for dense n-dimensional arrays, which are viewed as collections of (n-1)-dimensional items.

For example, this allows to use the exact same code for 2D patches and 3D blocks:

a = [1 2 3; 4 5 6]
b = ones(2, 2, 10)          #  10 2D patches
c = ones(2, 2, 2, 10)       #  10 3D blocks

len(a)       =>   3
len(b)       =>  10
len(c)       =>  10

at(a,2)      =>  [2 5]'
part(a,2:3)  =>  [2 3; 5 6]

normsum(x) = x/sum(x)

map(b, normsum)   =>  [0.25 ...  ] of size 2 x 2 x 10
map(c, normsum)   =>  [0.125 ... ] of size 2 x 2 x 2 x 10

#  Result shape may change:
map(b, sum)       =>  [4 ... ]     of size 1 x 10
map(c, sum)       =>  [8 ... ]     of size 1 x 10

Efficiency

Using a custom View type based on this memory layout assumption, the provided map operations can be considerably faster than built-ins. Given our data and desired operation:

a = rand(10, 1000000)   #  =>  80 MB

csum!(x) = for i = 2:length(x) x[i] += x[i-1] end
csumoncopy(x) = (for i = 2:length(x) x[i] += x[i-1] end; x)

we can use the following simple, general and efficient statement:

map!(a, csum!) 
#  elapsed time: 0.027491752 seconds (256 bytes allocated)

Built-in alternatives are either slower or require manual inlining, for a specific data layout:

mapslices(csumoncopy, a, [1])
#  elapsed time: 0.85726391 seconds (404 MB allocated, 5.34% gc time)

f(a) = for i = 1:size(a,2)  a[:,i] = csumoncopy(a[:,i])  end
#  elapsed time: 0.110978216 seconds (144 MB allocated, 3.86% gc time)

f2(a) = for i = 1:size(a,2)  csum!(sub(a,:,i))  end
#  elapsed time: 0.071394038 seconds (160 MB allocated, 16.46% gc time)

function f3(a)
    for n = 1:size(a,2)
        for m = 2:size(a,1)  a[m,n] += a[m-1,n]  end
    end
end
#  elapsed time: 0.017072235 seconds (80 bytes allocated)

function f4(a)
    for n = 1:size(a,1):length(a)
        for m = 1:size(a,1)-1  a[n+m] += a[n+m-1]  end
    end
end
#  elapsed time: 0.013347679 seconds (80 bytes allocated)

With the exact same syntax we can easily parallelize our code using the local workers via shared memory or Julia's inter-process serialization, both on the local host or all machines:

shmap!(a, csum!)      # local processes, shared memory
lmap!(a, csum!)       # local processes
pmap!(a, csum!)       # all available processes

For each of these variants there are optimized functions available for in-place operation on the input array, in-place operation on a new output array, or fallback options for functions which do not work in-place. For details, see the section on map and Friends.

News

0.0.9

version requirement for 0.4 build
map and mapmap for Dict
fix typed

0.0.7 / 0.0.8

fixed repeat for numeric arrays
made test_equal more robust
reworked map and view for Array{T,1} / scalar return values
fix partsoflen, concat
add takelast(a), unequal, sortpermrev, filter
fix map for Dict

0.0.6

added localworkers and hostpids
added hmap and variants, which map tasks to the first pid of each machine
removed makeliteral, as the built-in repr does the same
sped up matrix
added map2, map3, map4, map5
fixed unzip
added flip, flipdims
added extract, removed @getfield

Documentation

Please see the overview below for one-line descriptions of each function. More details and examples can then be found in the following sections (work in progress)

Overview

Length and Size [details]

len(a)                              # length
siz(a)                              # lsize, ndims x 1
siz3(a)                             # lsize, 3 x 1

Data Access [details]

at(a, i)                            # item i
setat!(a, i, value)                 # set item i to value
fst(a)                              # first item
snd(a)                              # second item
third(a)                            # third item
last(a)                             # last item
part(a, ind)                        # items at indices ind
trimmedpart(a, ind)                 # items at ind, no error if a is too short
take(a, n)                          # the first up to n elements
takelast(a,n=1)                     # the last up to elements
drop(a,n)                           # a, except for the first n elements
droplast(a,n=1)                     # a, except for the last n elements
partition(a, n)                     # partition into n parts
partsoflen(a, n)                    # partition into parts of length n
extract(a, field, default)          # get key x of dict or field x of composite type instance

Data Layout [details]

row(a)                              # reshape into row vector
col(a)                              # reshape into column vector
reshape(a, siz)                     # reshape into size in ndim x 1 vector siz
split(a, x or f)                    # split a where item == x or f(item) == true                         
concat(a...)                        # same as flatten([a...])
subtoind(sub, a)                    # transform ndims x npoints sub to linear ind for a
indtosub(ind, a)                    # transform linear ind to ndims x len(ind) sub for a
stack(a)                            # concat along the n + 1st dim of the items in a
flatten(a)                          # reduce the nestedness of a
unstack(a)                          # split the dense array a into array of items
riffle(a, x)                        # insert x between the items of a
matrix(a)                           # reshape items of a to column vectors
unmatrix(a, example)                # reshape the column vector items in a according to example
lines(a)                            # split the text a into array of lines
unlines(a)                          # concat a with newlines 
unzip(a)                            # unzip items
findsub(a)                          # return sub for the non-zero entries
randsample(a, n)                    # draw n items from a with repetition
randperm(a)                         # randomly permute order of items
flip(a)                             # reverse the order of items
flipdims(a,d1,d2)                   # flip dims d1 and d2

Pipeline Syntax [details]

r = @p f1 a b | f2 | f3 c           # pipeline macro, equals f3(f2(f1(a,b)),c)
r = @p f1 a | f2 b _ | f3 e         # equals f3(f2(b,f1(a)),c)

Efficient Views [details]

view(a,i)                           # lightweight view of item i of a
view(a,i,v)                         # lightweight view of item i of a, reusing v
next!(v)                            # make v point to the i + 1th item of a
trytoview(a,v)                      # for dense array, use view, otherwise part
trytoview(a,v,i)                    # for dense array, use view reusing v, otherwise part

Computing: map and Friends [details]

map(a, f)                           # apply f to each item
map!(a, f!)                         # apply f! to each item in-place
map!r(a, f)                         # apply f to each item, overwriting a                         
map2!(a, f, f!)                     # apply f to fst(a), f! to other items
map2!(a, r, f!)                     # apply f!(resultitem, item) to each item
shmap(a, f)                         # parallel map f to shared array a, accross procs(a)
shmap!(a, f!)                       # inplace shmap f!, overwriting a, accross procs(a)
shmap!r(a, f)                       # apply f to each item, overwriting a, accross procs(a)                         
shmap2!(a, f, f!)                   # apply f to fst(a), f! to other items, accross procs(a)
shmap2!(a, r, f!)                   # apply f!(resultitem, item), accross procs(a)
pmap(a, f)                          # parallel map of f accross all workers
lmap(a, f)                          # parallel map of f accross local workers
mapmap(a, f)                        # shorthand for map(a, x->map(x,f))
map2(a,b,f), map3, map4, map5       # map over a and b invoking f(x,y)
work(a, f)                          # apply f to each item, no result value
pwork, lwork, shwork, workwork      # like the corresponding map variants
any(a, f)                           # is any f(item) true
anyequal(a, x)                      # is any item == x
all(a, f)                           # are all f(item) true
allequal(a, x)                      # are all items == x
unequal(a,b)                        # shortcut for !isequal(a,b)
sort(a, f; kargs...)                # sort a accorting to f(item)
uniq(a[, f])                        # unique elements of a or uniq(a,map(a,f))
table(f, a...)                      # like [f(m,n) for m in a[1], n in a[2]], for any length of a
ptable, ltable                      # parallel table using all workers, local workes
tableany, ptableany, ltableany      # like table, but does not flatten result

Output [details]

showinfo
tee

I/O [details]

read
write
existsfile
mkdir 
filenames
filepaths
dirnames
dirpaths
readmat
writemat

Helpers [details]

zerossiz(s, typ)                    # zeros(s...), default typ is Float64
shzerossiz(s, typ)                  # shared zerossiz
shzeros([typ,] s...)                # shared zeros
onessiz(s, typ)                     # ones(s...), default typ is Float64
shonessiz(s, typ)                   # shared onessiz
shones([typ,] s...)                 # shared ones
randsiz(s, typ)                     # rand(s...), default typ is Float64
shrandnsiz(s, typ)                  # shared randsiz
shrand([typ,] s...)                 # shared rand
randnsiz(s, typ)                    # randn(s...), default typ is Float64
shrandnsiz(s, typ)                  # shared randnsiz
shrandn([typ,] s...)                # shared randn
zeroel(a)                           # zero(eltype(a))
oneel                               # one(eltype(a))
@dict a b c ...                     # Dict("a" => a, "b" => b, "c" => c, ...)
+
* 
repeat(a, n)                        # repeat a n times
nop()                               # no-op
id(a...)                            # returns a...
istrue(a or f)                      # is a or result of f true
isfalse(a or f)                     # !istrue
not                                 # alias for !
or                                  # alias for ||
and                                 # alias for &&
plus                                # alias for .+
minus                               # alias for .-
times                               # alias for .*
divby                               # alias for ./

Unit Tests [details]

@test_equal a b                     # test a and b for equality, show detailed info if not
@assert_equal a b                   # like test_equal, then throws error
@test_almostequal a b maxdiff       # like test_equal, but allows up to maxdiff difference