StatSample | Code Begins

Hi everyone!

This summer, I am working with Ruby Science Foundation for StatSample project. As you must have read in previous blog posts; StatSample is a powerful statistical library in Ruby. Unfortunately, development of this great utility has been on hold from last 2 years. My project aims to revamp StatSample and primarily to enhance functionality for TimeSeries and Generalized Linear Models.

You can read more about my proposal, here.

During the community bonding period, I initially studied on few topics which my project is concerned about - primarily, estimation methods like ARIMA. I saw it’s implementation in alternative statistical applications like R and StatsModels. StatsModels uses Kalman filter for maximum likelihood and provides other estimations such as log-likelihood and conditional-sum-of-squares etc. The basic interface for ARIMA in StatsModels is as follows:

ARIMA class in StatsModelsSource code of class

ARIMA(series, order, dates=None)
#series => list of timeseries values
#order  => ARIMA order(p=autoregressive, d=differenced, q=moving average)
#dates  => [optional] timeline

The returned ARIMA object can be called with :

fit(...) for maximum likelihood with primarily three methods - maximum-likelihood, conditional-sum-of-squares, css-then-mle.
predict(...), it is a recursive function which gives back list of predictedvalues for supplied varying series.
loglike_css(...) - stands for conditional-sum-of-squares, returns aggregated css value.

The R Project too has substantial work in ARIMA. I talked about it on mailing list. Thanks to John’s concerns, researching more in StatsModels was good idea than in R. In StatSample, we should work on ARIMA module as idiomatically they have done in StatsModels.
Beside this, I honestly didn’t get much time to devote to project during this period because of my then ongoing semester examinations, which I initially brought into notice to my mentors.

Currently, I am working on repairing and brining uniformity in tests. StatSample’s tests are written in MiniTest primarily, and somewhere making use of shoulda DSL. Tests using the latter, are breaking on my system with:

➜  test git:(master) ✗ ruby test_anovaoneway.rb
test_anovaoneway.rb:3:in `<class:StatsampleAnovaOneWayTestCase>': undefined method `context' for StatsampleAnovaOneWayTestCase:Class (NoMethodError)
  from test_anovaoneway.rb:2:in `<main>'
➜  test git:(master) ✗ ruby test_regression.rb 
test_regression.rb:4:in `<class:StatsampleRegressionTestCase>': undefined method `context' for StatsampleRegressionTestCase:Class (NoMethodError)
  from test_regression.rb:3:in `<main>'
➜  test git:(master) ✗

To aid this, I am correcting and testing specs as : commit.

➜  statsample git:(master) ✗ ruby test/test_regression2.rb
Run options: --seed 40873

# Running tests:

..S......

Finished tests in 0.176938s, 50.8652 tests/s, 740.3708 assertions/s.

9 tests, 131 assertions, 0 failures, 0 errors, 1 skips
➜  statsample git:(master) ✗

Hopefully, setting up the codebase in good position will work great as I dwell in coding further with TimeSeries.

Github: http://github.com/AnkurGel/statsample

Cheers!
-Ankur Goel