lambda: An R package for an easier iteration coding
Package: lambda Version: 1.0 Date: 2012-01-11 Author: eOnOs Maintainer: contact me! License: GPL-3
You can download the package HERE and run from R the command: install.packages("lambda_1.0.tar.gz", type="source", repos=NULL)
Introduction
As you know, ’for’ loops are forbidden in R. You should prefer vectorialized tools provided in base package to run your computations. But sometimes, you need to go through your data in a particular way that doesn’t produce a clear and understandable code.
We developped, at EONOS, a package to handle specific loop on arrays (or array like structures). You want to compute a columnwise standard deviation, on a rolling 50 rows window every 5 rows? This package is for you.
This package will run a function on a bunch of subparts of a data-structure according to a specific pattern, with the possibility to remodel outputs into a convenient form.
1.1 Data structure
Arrays, and matrices are convenient structures to store data. But lists are very useful too! The default authorized data-objects are arrays, and lists of (lists of (...)) arrays. Lets consider 2 simple data structures for the rest of this document:
- let’s y be the list of 2 matrices (10, 4) with dimnames described below:
[[1]]
col_1 col_2 col_3 col_4
row_1 1 11 21 31
row_2 2 12 22 32
row_3 3 13 23 33
row_4 4 14 24 34
row_5 5 15 25 35
row_6 6 16 26 36
row_7 7 17 27 37
row_8 8 18 28 38
row_9 9 19 29 39
row_10 10 20 30 40
[[2]]
col_1 col_2 col_3 col_4
row_1 -1 -11 -21 -31
row_2 -2 -12 -22 -32
row_3 -3 -13 -23 -33
row_4 -4 -14 -24 -34
row_5 -5 -15 -25 -35
row_6 -6 -16 -26 -36
row_7 -7 -17 -27 -37
row_8 -8 -18 -28 -38
row_9 -9 -19 -29 -39
row_10 -10 -20 -30 -40
- x will be a simple matrix, the first part of y (x = y[[1]]).
1.2 The ’lambda’ function
The main function of this package is the function lambda. You call it this way:
lambda( FUN, DATA, WAY, ... )
Where FUN and DATA are obviously the function we want to apply and the data-structure we have, and WAY describes how we want to run through the data structure (DATA). Think of the ’apply’ function, you specify the subscripts the function will be applied over. The philosophy is the same here, you can set WAY=1, or WAY=c(1,2), except you can use more sophisticated arguments.
# note: ’lambda’, ’Idx’, ’win’ are part of the package. # --- a colSums proxy --- lambda( sum, x, 2 ) # --- a column-wise cumsum --- # here we specify # NULL for subscript 1 -> take all of them # Idx for subscript 2 -> use expanding windows # (i.e. all windows of the form [1:i]) lambda( sum, x, list(NULL,Idx) ) # --- a rolling colSums for 3-rows windows --- # (i.e. all windows of the form [i:(i+2)]) lambda( sum, x, win(3) )
And a more complicated call, on y, summing pairs of numbers, from both matrices, placed at the same positions:
lambda( sum, y, c(2,3) )
WAY argument can be a vector of integers (acting like the apply function), or a list of iterators. An iterator is a function interpreting a data-structure, and a subscript (tte place of the iterator in the list), returning the way you should go through this dimension.
2 Iterators
When ’lambda’ is called on a data-structure, you need to specify the way you want to go through it. Specifically, you need to provide an iterator per dimension. Calling a sum over a 3-dimensional array needs 3 iterators, passed as a list argument. Here are the descriptions of availables iterators:
2.1 Trivial (keyword: NULL)
Default iterator, takes all the elements.
lambda( sum, x, list(NULL,NULL) )
will return the sum of x elements.
2.2 Atomic (keyword: dx)
Consider every element apart.
lambda( sum, x, list(dx,NULL) )
result:
row_1 row_2 row_3 row_4 row_5 row_6 row_7 row_8 row_9 row_10 64 68 72 76 80 84 88 92 96 100
Because dx is associated to the first subscript, and NULL (trivial) to the second, will return a rowSums.
2.3 Rolling Windows (keyword: win)
This iterator needs an argument, the size of the window. Here is a sample call:
lambda( sum, x, list(win(5), dx) )
result:
col_1 col_2 col_3 col_4 row_1 NA NA NA NA row_2 NA NA NA NA row_3 NA NA NA NA row_4 NA NA NA NA row_5 15 65 115 165 row_6 20 70 120 170 row_7 25 75 125 175 row_8 30 80 130 180 row_9 35 85 135 185 row_10 40 90 140 190
We want to roll our sum over rolling row-window of size 5, for each column (dx at the second slot).
2.4 Expanding Window (keyword: Idx)
Windows take every element from first to current
lambda( sum, x, list(Idx, dx) )
result:
col_1 col_2 col_3 col_4 row_1 1 11 21 31 row_2 3 23 43 63 row_3 6 36 66 96 row_4 10 50 90 130 row_5 15 65 115 165 row_6 21 81 141 201 row_7 28 98 168 238 row_8 36 116 196 276 row_9 45 135 225 315 row_10 55 155 255 355
Will return a cumulative sum for each column (as apply( x, 2, cumsum ) )
2.5 All paths (keyword: ap)
Apply your function on every possible window of the chosen dimension
lambda( sum, x, list(ap, dx) )[+/-] show/hide the result
col_1 col_2 col_3 col_4
row_1 row_2 3 23 43 63
row_1 row_3 6 36 66 96
row_1 row_4 10 50 90 130
row_1 row_5 15 65 115 165
row_1 row_6 21 81 141 201
row_1 row_7 28 98 168 238
row_1 row_8 36 116 196 276
row_1 row_9 45 135 225 315
row_1 row_10 55 155 255 355
row_2 row_3 5 25 45 65
row_2 row_4 9 39 69 99
row_2 row_5 14 54 94 134
row_2 row_6 20 70 120 170
row_2 row_7 27 87 147 207
row_2 row_8 35 105 175 245
row_2 row_9 44 124 204 284
row_2 row_10 54 144 234 324
row_3 row_4 7 27 47 67
row_3 row_5 12 42 72 102
row_3 row_6 18 58 98 138
row_3 row_7 25 75 125 175
row_3 row_8 33 93 153 213
row_3 row_9 42 112 182 252
row_3 row_10 52 132 212 292
row_4 row_5 9 29 49 69
row_4 row_6 15 45 75 105
row_4 row_7 22 62 102 142
row_4 row_8 30 80 130 180
row_4 row_9 39 99 159 219
row_4 row_10 49 119 189 259
row_5 row_6 11 31 51 71
row_5 row_7 18 48 78 108
row_5 row_8 26 66 106 146
row_5 row_9 35 85 135 185
row_5 row_10 45 105 165 225
row_6 row_7 13 33 53 73
row_6 row_8 21 51 81 111
row_6 row_9 30 70 110 150
row_6 row_10 40 90 140 190
row_7 row_8 15 35 55 75
row_7 row_9 24 54 84 114
row_7 row_10 34 74 114 154
row_8 row_9 17 37 57 77
row_8 row_10 27 57 87 117
row_9 row_10 19 39 59 79
3 Miscellanous
Iterators argument Shortcuts
- If you don’t specify all iterators, last missing ones will be replaced by NULL
- c(2, 3) passed as WAY argument will replicate list(NULL, dx, dx), and mimic ’apply’ call
Killing structure: struct
By default, Iterators doesn’t change the structure of DATA. A list of matrices will remain unchanged regardless of the WAY to iteratate over it. If you want to unlist/vectorialize intermediate data, just specify struct=FALSE when calling lambda.
lambda( sum, y, c(2,3) ) # KO lambda( sum, y, c(2,3), struct=FALSE ) # all good!
The first expression isolate every element on subscripts 2 and 3, letting extracted data structure as list[2 elmts] x matrix[1,1]. You can’t perform a sum on this structure, unless you unlist it. That’s what struct parameter does.
Understanding extractions
To dive deeper into underlying process, just try the following expression on your data:
lambda( identity, DATA, WAY, simp=FALSE )
This call allow to visualize intermediate data-structures for a specific list of iterators.
Aucun commentaire:
Enregistrer un commentaire