This summer, I am working with Ruby Science Foundation for StatSample project. As you must have read in previous blog posts; StatSample is a powerful statistical library in Ruby. Unfortunately, development of this great utility has been on hold from last 2 years. My project aims to revamp StatSample and primarily to enhance functionality for TimeSeries and Generalized Linear Models.
You can read more about my proposal, here.
During the community bonding period, I initially studied on few topics which my project is concerned about - primarily, estimation methods like ARIMA. I saw it’s implementation in alternative statistical applications like R and StatsModels. StatsModels uses Kalman filter for maximum likelihood and provides other estimations such as log-likelihood and conditional-sum-of-squares etc. The basic interface for ARIMA in StatsModels is as follows:
1 2 3 4
The returned ARIMA object can be called with :
fit(...)for maximum likelihood with primarily three methods - maximum-likelihood, conditional-sum-of-squares, css-then-mle.
predict(...), it is a recursive function which gives back list of predictedvalues for supplied varying series.
loglike_css(...)- stands for conditional-sum-of-squares, returns aggregated css value.
The R Project too has substantial work in ARIMA. I talked about it on mailing list. Thanks to John’s concerns, researching more in StatsModels was good idea than in R. In StatSample, we should work on ARIMA module as idiomatically they have done in StatsModels.
Beside this, I honestly didn’t get much time to devote to project during this period because of my then ongoing semester examinations, which I initially brought into notice to my mentors.
Currently, I am working on repairing and brining uniformity in tests. StatSample’s tests are written in MiniTest primarily, and somewhere making use of
shoulda DSL. Tests using the latter, are breaking on my system with:
1 2 3 4 5 6 7
To aid this, I am correcting and testing specs as : commit.
1 2 3 4 5 6 7 8 9 10 11
Hopefully, setting up the codebase in good position will work great as I dwell in coding further with TimeSeries.