mercredi 18 janvier 2012

R package lambda

lambda: An R package for an easier iteration coding

lambda: An R package for an easier iteration coding

Package: lambda
Version: 1.0
Date: 2012-01-11
Author: eOnOs
Maintainer: contact me!
License: GPL-3
 
You can download the package HERE and run from R the command:
install.packages("lambda_1.0.tar.gz", type="source", repos=NULL)

Introduction

As you know, ’for’ loops are forbidden in R. You should prefer vectorialized tools provided in base package to run your computations. But sometimes, you need to go through your data in a particular way that doesn’t produce a clear and understandable code.
We developped, at EONOS, a package to handle specific loop on arrays (or array like structures). You want to compute a columnwise standard deviation, on a rolling 50 rows window every 5 rows? This package is for you.
This package will run a function on a bunch of subparts of a data-structure according to a specific pattern, with the possibility to remodel outputs into a convenient form.

 

1.1  Data structure

Arrays, and matrices are convenient structures to store data. But lists are very useful too! The default authorized data-objects are arrays, and lists of (lists of (...)) arrays. Lets consider 2 simple data structures for the rest of this document:
  • let’s y be the list of 2 matrices (10, 4) with dimnames described below:
[+/-] show/hide this element

[[1]]        
       col_1 col_2 col_3 col_4 
row_1      1    11    21    31 
row_2      2    12    22    32 
row_3      3    13    23    33 
row_4      4    14    24    34 
row_5      5    15    25    35 
row_6      6    16    26    36 
row_7      7    17    27    37 
row_8      8    18    28    38 
row_9      9    19    29    39 
row_10    10    20    30    40

[[2]]        
       col_1 col_2 col_3 col_4 
row_1     -1   -11   -21   -31 
row_2     -2   -12   -22   -32 
row_3     -3   -13   -23   -33 
row_4     -4   -14   -24   -34 
row_5     -5   -15   -25   -35 
row_6     -6   -16   -26   -36 
row_7     -7   -17   -27   -37 
row_8     -8   -18   -28   -38 
row_9     -9   -19   -29   -39 
row_10   -10   -20   -30   -40

  • x will be a simple matrix, the first part of y (x = y[[1]]).

 

1.2 The ’lambda’ function

The main function of this package is the function lambda. You call it this way:
lambda( FUN, DATA, WAY, ... )
Where FUN and DATA are obviously the function we want to apply and the data-structure we have, and WAY describes how we want to run through the data structure (DATA). Think of the ’apply’ function, you specify the subscripts the function will be applied over. The philosophy is the same here, you can set WAY=1, or WAY=c(1,2), except you can use more sophisticated arguments.
# note: ’lambda’, ’Idx’, ’win’ are part of the package.

# --- a colSums proxy ---
lambda( sum, x, 2 )

# --- a column-wise cumsum ---
# here we specify
# NULL for subscript 1 -> take all of them
#  Idx for subscript 2 -> use expanding windows
#     (i.e. all windows of the form [1:i])
lambda( sum, x, list(NULL,Idx) ) 

# --- a rolling colSums for 3-rows windows ---
#    (i.e. all windows of the form [i:(i+2)])
lambda( sum, x, win(3) )         
                                 
And a more complicated call, on y, summing pairs of numbers, from both matrices, placed at the same positions:
lambda( sum, y, c(2,3) )
WAY argument can be a vector of integers (acting like the apply function), or a list of iterators. An iterator is a function interpreting a data-structure, and a subscript (tte place of the iterator in the list), returning the way you should go through this dimension.

 

2 Iterators

When ’lambda’ is called on a data-structure, you need to specify the way you want to go through it. Specifically, you need to provide an iterator per dimension. Calling a sum over a 3-dimensional array needs 3 iterators, passed as a list argument. Here are the descriptions of availables iterators:

 

2.1 Trivial (keyword: NULL)

Default iterator, takes all the elements.
lambda( sum, x, list(NULL,NULL) )
will return the sum of x elements.

 

2.2 Atomic (keyword: dx)

Consider every element apart.
lambda( sum, x, list(dx,NULL) )
result:
row_1  row_2  row_3  row_4  row_5  row_6  row_7  row_8  row_9 row_10      
   64     68     72     76     80     84     88     92     96    100 
Because dx is associated to the first subscript, and NULL (trivial) to the second, will return a rowSums.

 

2.3 Rolling Windows (keyword: win)

This iterator needs an argument, the size of the window. Here is a sample call:
lambda( sum, x, list(win(5), dx) )
result:
      col_1 col_2 col_3 col_4 
row_1    NA    NA    NA    NA 
row_2    NA    NA    NA    NA 
row_3    NA    NA    NA    NA 
row_4    NA    NA    NA    NA 
row_5    15    65   115   165 
row_6    20    70   120   170 
row_7    25    75   125   175 
row_8    30    80   130   180 
row_9    35    85   135   185 
row_10   40    90   140   190
We want to roll our sum over rolling row-window of size 5, for each column (dx at the second slot).

 

2.4 Expanding Window (keyword: Idx)

Windows take every element from first to current
lambda( sum, x, list(Idx, dx) )
result:
       col_1 col_2 col_3 col_4 
row_1      1    11    21    31 
row_2      3    23    43    63 
row_3      6    36    66    96 
row_4     10    50    90   130 
row_5     15    65   115   165 
row_6     21    81   141   201 
row_7     28    98   168   238 
row_8     36   116   196   276 
row_9     45   135   225   315 
row_10    55   155   255   355
Will return a cumulative sum for each column (as apply( x, 2, cumsum ) )

 

2.5 All paths (keyword: ap)

Apply your function on every possible window of the chosen dimension
lambda( sum, x, list(ap, dx) )
[+/-] show/hide the result

             col_1 col_2 col_3 col_4 
row_1 row_2      3    23    43    63 
row_1 row_3      6    36    66    96 
row_1 row_4     10    50    90   130 
row_1 row_5     15    65   115   165 
row_1 row_6     21    81   141   201 
row_1 row_7     28    98   168   238 
row_1 row_8     36   116   196   276 
row_1 row_9     45   135   225   315 
row_1 row_10    55   155   255   355 
row_2 row_3      5    25    45    65 
row_2 row_4      9    39    69    99 
row_2 row_5     14    54    94   134 
row_2 row_6     20    70   120   170 
row_2 row_7     27    87   147   207 
row_2 row_8     35   105   175   245 
row_2 row_9     44   124   204   284 
row_2 row_10    54   144   234   324 
row_3 row_4      7    27    47    67 
row_3 row_5     12    42    72   102 
row_3 row_6     18    58    98   138 
row_3 row_7     25    75   125   175 
row_3 row_8     33    93   153   213 
row_3 row_9     42   112   182   252 
row_3 row_10    52   132   212   292 
row_4 row_5      9    29    49    69 
row_4 row_6     15    45    75   105 
row_4 row_7     22    62   102   142 
row_4 row_8     30    80   130   180 
row_4 row_9     39    99   159   219 
row_4 row_10    49   119   189   259 
row_5 row_6     11    31    51    71 
row_5 row_7     18    48    78   108 
row_5 row_8     26    66   106   146 
row_5 row_9     35    85   135   185 
row_5 row_10    45   105   165   225 
row_6 row_7     13    33    53    73 
row_6 row_8     21    51    81   111 
row_6 row_9     30    70   110   150 
row_6 row_10    40    90   140   190 
row_7 row_8     15    35    55    75 
row_7 row_9     24    54    84   114 
row_7 row_10    34    74   114   154 
row_8 row_9     17    37    57    77 
row_8 row_10    27    57    87   117 
row_9 row_10    19    39    59    79 

 

3 Miscellanous

 

Iterators argument Shortcuts

  • If you don’t specify all iterators, last missing ones will be replaced by NULL
  • c(2, 3) passed as WAY argument will replicate list(NULL, dx, dx), and mimic ’apply’ call

 

Killing structure: struct

By default, Iterators doesn’t change the structure of DATA. A list of matrices will remain unchanged regardless of the WAY to iteratate over it. If you want to unlist/vectorialize intermediate data, just specify struct=FALSE when calling lambda.
lambda( sum, y, c(2,3) )               # KO
lambda( sum, y, c(2,3), struct=FALSE ) # all good!
The first expression isolate every element on subscripts 2 and 3, letting extracted data structure as list[2 elmts] x matrix[1,1]. You can’t perform a sum on this structure, unless you unlist it. That’s what struct parameter does.

 

Understanding extractions

To dive deeper into underlying process, just try the following expression on your data:
lambda( identity, DATA, WAY, simp=FALSE )
This call allow to visualize intermediate data-structures for a specific list of iterators.