Ankur Goel

On hunt of awesomeness!

Utility Functions and Documentation

I have been coding the utility functions for matrices and vectors which we are/will need frequently in further functionalities.
One of it is to add_constant to a matrix. add_constant prepends or appends a column of ones to a matrix if it already doesn’t have one.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#=Adds a column of constants.
#Appends a column of ones to the matrix/array if first argument is false
#If an n-array, first checks if one column of ones is already present
#if present, then original(self) is returned, else, prepends with a vector of ones
def add_constant(prepend = true)
  #for Matrix
  (0...column_size).each do |i|
    if self.column(i).map(&:to_f) == Object::Vector.elements(Array.new(row_size, 1.0))
      return self
    end
  end
  #append/prepend a column of one's
  vectors = (0...row_size).map do |r|
    if prepend
      [1.0].concat(self.row(r).to_a)
    else
      self.row(r).to_a.push(1.0)
    end
  end
  return Matrix.rows(vectors)
end

There are other such methods such as chain_dot which carries out dot multplication of matrices in chain. It uses the ruby’s reduce ability to reduce the available arguments(matrices) to consequential product.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#=Chain Product
#Class method
#Returns the chain product of two matrices
#==Usage:
#Let `a` be 4 * 3 matrix, 
#Let `b` be 3 * 3 matrix, 
#Let `c` be 3 * 1 matrix,
#then `Matrix.chain_dot(a, b, c)`
#===*NOTE*: Send the matrices in multiplicative order with proper dimensions
def self.chain_dot(*args)
  #inspired by Statsmodels
  begin
    args.reduce { |x, y| x * y } #perform matrix multiplication in order
  rescue ExceptionForMatrix::ErrDimensionMismatch
    puts "ExceptionForMatrix: Please provide matrices with proper multiplicative dimensions"
  end
end

Apart from adding such functionalities, I have covered entire documentation of statsample-timeseries. I made sure to explain role of each function, every input parameter, and return type of the function. In most of the cases, I also added the usage examples too.
Later, I will add the usage examples in all those which are still not equipped with that and details about parameters wherever it is still missing.

After this, with the great help from Claudio and Ra’s pointers about modular(namespace) hierarchial convention; we managed to make it more conventional. :)
Here are the final results:

1
2
3
4
5
6
module Statsample::TimeSeries #Module for all Timeseries related stuff
class Statsample::TimeSeries::Series < Statsmple::Vector  # Class containing a timeseries objects and general related methods

module Statsample::TimeSeries::Pacf #Pacf related methods
class Statsample::TimeSeries::Arima #Arima class, which is initialized by class method
class Statsample::TimeSeries::Arima::KalmanFilter # For Kalman Filter on ARIMA

We are now reading and continuing to code Kalman filter. Hope it doesn’t stay tricky. :)

Till next time,
Cheers,
- Ankur Goel

Kalman and Cholesky Decomposition

Hi everyone!

First, this blog is coming a bit late than usual; sorry for that. I was traveling to my hometown(Delhi) for some occassion and couldn’t do much in last 3 days. I am thankful to Claudio for his support.

So, in this phase, as discussed, we continue to compose estimation methods for ARMA/ARIMA. Good news - Most of method seem to be in place. Even if we manage to make atleast one or two; we seem to be in good position. Bad news - the estimation methods, I am hanging out with has lot of pre-requisitie. These requisites are both theoretical and technical. So, I’m currently initially coding them as I go. This comes with a plus. These methods will be extremely valuable in many other analysis. ;)

Kalman Fiter

So, we started up with developing Kalman filter. Kalman filter is one of the crucial method for ARIMA model fit. It is primarily identified with constitution of 3 matrices -

  • T Matrix : It is the coefficient matrix for the state vector in the state equation.
  • R Matrix : It is the coefficient matrix for the state vector in the observation equation.
  • Z Matrix: It is the selctor matrix.

Currently, these methods are available as class methods of the new class - KalmanFilter in ARIMA. It can be found here

The example snippet of T matrix code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
  def self.T(r, k, p)
    arr = Matrix.zero(r)
    params_padded = Statsample::Vector.new(Array.new(r, 0), :scale)

    params_padded[0...p] = params[k...(p+k)]
    intermediate_matrix = (r-1).times.map { Array.new(r, 0) }
    #appending an array filled with padded values in beginning
    intermediate_matrix[0,0] = [params_padded]

    #now generating column matrix for that:
    arr = Matrix.columns(intermediate_matrix)
    arr_00 = arr[0,0]

    #identify matrix substituition in matrix except row[0] and column[0]
    r.times do |i|
      arr[r,r] = 1
    end
    arr[0,0] = arr_00
    arr
  end

The complete coding of R matrix is still pending.

Cholesky Decomposition:

While venturing into another estimation method; I encoutered Cholesky decomposition of matrix; and it took me by surprise. Cholesky decomposition is the decomposition of a symmetric matrix in the product of lower half of Hermitian matrix and it’s conjugate.

I implemented the following as extension of Matrix here. Since the matrix has to be symmetric before it can be decomposed to Hermitian matrix, I also wrote down is_symmetric? method to check if the matrix is symmetric or not. Though symmetric? is present in Ruby Matrix 1.9+, to satisfy back compatibility with Ruby 1.8, it was necessary.

That’s pretty much for now.

Continuing the work.

Cheers,
- Ankur Goel

Bio-statsample-timeseries Deliverables

Hi there!

This summer, I’m working on new statsample-timeseries gem. This will act as an extension to existing Statsample by Claudio Bustos. I aim to add the support of timeseries and related functionalities to it.

I am using MiniTest for unit-testing and Cucumber for the feature testing. In the intial period of first phase, I targeted on completing as much portion of testing as possible for existing Statsample.

Also, another goal is to make statsample supported by current and previous Ruby versions. We are taking care of that - travis-ci.org/AnkurGel/statsample-timeseries by managing support of Ruby 2.0.0, 1.9.3, 1.9.2, jruby-19mode and rbx-19mode.

Now, we have many basic and advanced functions in place for timeseries. We have enabled:

  • Autocorrelation
  • Partial autocorrelation with:
    • yule-walker
    • levinson-durbin
    • biased levinson-durbin
  • Autocovariance
  • Lag and mean of series
  • correlation (almost complete)

Apart from them, we are also working on ARIMA module(another goal of project) and have realized the simulation of:

  • AR (Autoregressive) model
  • MA (Moving Average) model
  • ARMA model

For those simulations, the requisite was to pre-acquire the values of parameters against which the simulation was generated. For pure model, in this phase we aim to work and complete most of such functions to support that. We have completed:

  • Yule-walker for AR modelling.
  • Levinson-Durbin for AR modelling

I will now start with other such modelling like burg algorithm and IRLS. :).
The deliverables of project must be the all this stuff. Since the estimation methods for these modelling poses lot of theoretical and accuracy challenges, the pace to achieve them may not be too fast. :).

Documentation

One thing I liked in R and Statsmodels is amount of documentation and detail of API for developers and users. I wish to have similar amount of documentation for Statsample, so as to attract more number of Ruby developers and scientists.
Considering the amount of code present in Statsample and statsample-timeseries combined, maybe devoting considerable quality time for RDoc documentation will be a good idea!.

After

I already expressed my wish to continue contributing to project after timeline to Claudio. We will continue to work on it! On next module, if this is near completion. :)