## FunctionalData.jl

Functional, efficient data manipulation framework
Popularity
26 Stars
Updated Last
2 Years Ago
Started In
January 2015

## FunctionalData

`FunctionalData` is a package for fast and expressive data modification.

Built around a simple memory layout convention, it provides a small set of general purpose functional constructs as well as routines for efficient computation with dense numerical arrays.

Optionally, it supplies a syntax for clean, concise code:

`wordcount(filename) = @p read filename String | lines | map split | flatten | length`

#### Memory Layout

Indexing is simplified for dense n-dimensional arrays, which are viewed as collections of (n-1)-dimensional items.

For example, this allows to use the exact same code for 2D patches and 3D blocks:

```a = [1 2 3; 4 5 6]
b = ones(2, 2, 10)          #  10 2D patches
c = ones(2, 2, 2, 10)       #  10 3D blocks

len(a)       =>   3
len(b)       =>  10
len(c)       =>  10

at(a,2)      =>  [2 5]'
part(a,2:3)  =>  [2 3; 5 6]

normsum(x) = x/sum(x)

map(b, normsum)   =>  [0.25 ...  ] of size 2 x 2 x 10
map(c, normsum)   =>  [0.125 ... ] of size 2 x 2 x 2 x 10

#  Result shape may change:
map(b, sum)       =>  [4 ... ]     of size 1 x 10
map(c, sum)       =>  [8 ... ]     of size 1 x 10```

#### Efficiency

Using a custom `View` type based on this memory layout assumption, the provided `map` operations can be considerably faster than built-ins. Given our data and desired operation:

```a = rand(10, 1000000)   #  =>  80 MB

csum!(x) = for i = 2:length(x) x[i] += x[i-1] end
csumoncopy(x) = (for i = 2:length(x) x[i] += x[i-1] end; x)```

we can use the following simple, general and efficient statement:

```map!(a, csum!)
#  elapsed time: 0.027491752 seconds (256 bytes allocated)```

Built-in alternatives are either slower or require manual inlining, for a specific data layout:

```mapslices(csumoncopy, a, )
#  elapsed time: 0.85726391 seconds (404 MB allocated, 5.34% gc time)

f(a) = for i = 1:size(a,2)  a[:,i] = csumoncopy(a[:,i])  end
#  elapsed time: 0.110978216 seconds (144 MB allocated, 3.86% gc time)

f2(a) = for i = 1:size(a,2)  csum!(sub(a,:,i))  end
#  elapsed time: 0.071394038 seconds (160 MB allocated, 16.46% gc time)

function f3(a)
for n = 1:size(a,2)
for m = 2:size(a,1)  a[m,n] += a[m-1,n]  end
end
end
#  elapsed time: 0.017072235 seconds (80 bytes allocated)

function f4(a)
for n = 1:size(a,1):length(a)
for m = 1:size(a,1)-1  a[n+m] += a[n+m-1]  end
end
end
#  elapsed time: 0.013347679 seconds (80 bytes allocated)```

With the exact same syntax we can easily parallelize our code using the local workers via shared memory or Julia's inter-process serialization, both on the local host or all machines:

```shmap!(a, csum!)      # local processes, shared memory
lmap!(a, csum!)       # local processes
pmap!(a, csum!)       # all available processes```

For each of these variants there are optimized functions available for in-place operation on the input array, in-place operation on a new output array, or fallback options for functions which do not work in-place. For details, see the section on map and Friends.

## News

#### 0.0.9

• version requirement for 0.4 build
• `map` and `mapmap` for `Dict`
• fix `typed`

#### 0.0.7 / 0.0.8

• fixed `repeat` for numeric arrays
• made `test_equal` more robust
• reworked `map` and `view` for `Array{T,1}` / scalar return values
• fix `partsoflen`, `concat`
• add `takelast(a)`, `unequal`, `sortpermrev`, `filter`
• fix `map` for `Dict`

#### 0.0.6

• added `localworkers` and `hostpids`
• added `hmap` and variants, which map tasks to the first pid of each machine
• removed `makeliteral`, as the built-in `repr` does the same
• sped up `matrix`
• added map2, map3, map4, map5
• fixed unzip

## Documentation

Please see the overview below for one-line descriptions of each function. More details and examples can then be found in the following sections (work in progress)

### Overview

###### Length and Size [details]
```len(a)                              # length
siz(a)                              # lsize, ndims x 1
siz3(a)                             # lsize, 3 x 1```
###### Data Access [details]
```at(a, i)                            # item i
setat!(a, i, value)                 # set item i to value
fst(a)                              # first item
snd(a)                              # second item
third(a)                            # third item
last(a)                             # last item
part(a, ind)                        # items at indices ind
trimmedpart(a, ind)                 # items at ind, no error if a is too short
take(a, n)                          # the first up to n elements
takelast(a,n=1)                     # the last up to elements
drop(a,n)                           # a, except for the first n elements
droplast(a,n=1)                     # a, except for the last n elements
partition(a, n)                     # partition into n parts
partsoflen(a, n)                    # partition into parts of length n
extract(a, field, default)          # get key x of dict or field x of composite type instance```
###### Data Layout [details]
```row(a)                              # reshape into row vector
col(a)                              # reshape into column vector
reshape(a, siz)                     # reshape into size in ndim x 1 vector siz
split(a, x or f)                    # split a where item == x or f(item) == true
concat(a...)                        # same as flatten([a...])
subtoind(sub, a)                    # transform ndims x npoints sub to linear ind for a
indtosub(ind, a)                    # transform linear ind to ndims x len(ind) sub for a
stack(a)                            # concat along the n + 1st dim of the items in a
flatten(a)                          # reduce the nestedness of a
unstack(a)                          # split the dense array a into array of items
riffle(a, x)                        # insert x between the items of a
matrix(a)                           # reshape items of a to column vectors
unmatrix(a, example)                # reshape the column vector items in a according to example
lines(a)                            # split the text a into array of lines
unlines(a)                          # concat a with newlines
unzip(a)                            # unzip items
findsub(a)                          # return sub for the non-zero entries
randsample(a, n)                    # draw n items from a with repetition
randperm(a)                         # randomly permute order of items
flip(a)                             # reverse the order of items
flipdims(a,d1,d2)                   # flip dims d1 and d2```
###### Pipeline Syntax [details]
```r = @p f1 a b | f2 | f3 c           # pipeline macro, equals f3(f2(f1(a,b)),c)
r = @p f1 a | f2 b _ | f3 e         # equals f3(f2(b,f1(a)),c)```
###### Efficient Views [details]
```view(a,i)                           # lightweight view of item i of a
view(a,i,v)                         # lightweight view of item i of a, reusing v
next!(v)                            # make v point to the i + 1th item of a
trytoview(a,v)                      # for dense array, use view, otherwise part
trytoview(a,v,i)                    # for dense array, use view reusing v, otherwise part```
###### Computing: map and Friends [details]
```map(a, f)                           # apply f to each item
map!(a, f!)                         # apply f! to each item in-place
map!r(a, f)                         # apply f to each item, overwriting a
map2!(a, f, f!)                     # apply f to fst(a), f! to other items
map2!(a, r, f!)                     # apply f!(resultitem, item) to each item
shmap(a, f)                         # parallel map f to shared array a, accross procs(a)
shmap!(a, f!)                       # inplace shmap f!, overwriting a, accross procs(a)
shmap!r(a, f)                       # apply f to each item, overwriting a, accross procs(a)
shmap2!(a, f, f!)                   # apply f to fst(a), f! to other items, accross procs(a)
shmap2!(a, r, f!)                   # apply f!(resultitem, item), accross procs(a)
pmap(a, f)                          # parallel map of f accross all workers
lmap(a, f)                          # parallel map of f accross local workers
mapmap(a, f)                        # shorthand for map(a, x->map(x,f))
map2(a,b,f), map3, map4, map5       # map over a and b invoking f(x,y)
work(a, f)                          # apply f to each item, no result value
pwork, lwork, shwork, workwork      # like the corresponding map variants
any(a, f)                           # is any f(item) true
anyequal(a, x)                      # is any item == x
all(a, f)                           # are all f(item) true
allequal(a, x)                      # are all items == x
unequal(a,b)                        # shortcut for !isequal(a,b)
sort(a, f; kargs...)                # sort a accorting to f(item)
uniq(a[, f])                        # unique elements of a or uniq(a,map(a,f))
table(f, a...)                      # like [f(m,n) for m in a, n in a], for any length of a
ptable, ltable                      # parallel table using all workers, local workes
tableany, ptableany, ltableany      # like table, but does not flatten result```
###### Output [details]
```showinfo
tee```
```read
write
existsfile
mkdir
filenames
filepaths
dirnames
dirpaths
writemat```
###### Helpers [details]
```zerossiz(s, typ)                    # zeros(s...), default typ is Float64
shzerossiz(s, typ)                  # shared zerossiz
shzeros([typ,] s...)                # shared zeros
onessiz(s, typ)                     # ones(s...), default typ is Float64
shonessiz(s, typ)                   # shared onessiz
shones([typ,] s...)                 # shared ones
randsiz(s, typ)                     # rand(s...), default typ is Float64
shrandnsiz(s, typ)                  # shared randsiz
shrand([typ,] s...)                 # shared rand
randnsiz(s, typ)                    # randn(s...), default typ is Float64
shrandnsiz(s, typ)                  # shared randnsiz
shrandn([typ,] s...)                # shared randn
zeroel(a)                           # zero(eltype(a))
oneel                               # one(eltype(a))
@dict a b c ...                     # Dict("a" => a, "b" => b, "c" => c, ...)
+
*
repeat(a, n)                        # repeat a n times
nop()                               # no-op
id(a...)                            # returns a...
istrue(a or f)                      # is a or result of f true
isfalse(a or f)                     # !istrue
not                                 # alias for !
or                                  # alias for ||
and                                 # alias for &&
plus                                # alias for .+
minus                               # alias for .-
times                               # alias for .*
divby                               # alias for ./```
###### Unit Tests [details]
```@test_equal a b                     # test a and b for equality, show detailed info if not
@assert_equal a b                   # like test_equal, then throws error
@test_almostequal a b maxdiff       # like test_equal, but allows up to maxdiff difference```