Ankur Goel

On hunt of awesomeness!

Statistics With Ruby - TimeSeries and GLM

Introduction:

Statsample is a basic and advance statistics suite on Ruby. It is majorly supported by many Ruby 1.8.7, 1.9.1, 1.9.2, 1.9.3 and 2.0.0.

Statsample is perfect library for anyone who is interested in exploring statistical aspects and even a little familiar(or interested) in Ruby.
It had very rich API; except that of TimeSeries and Generalized linear models, which was rather basic.

So, in this Google Summer of Code 2013 program, with SciRuby, I released two extensions - Statsample TimeSeries and Statsample GLM. These gems aim to take Statsample further and incorporate various functionalities and estimation techniques on continuous data.

Statsample TimeSeries

TimeSeries is equipped with wide amount of operations, which are directly available on Series object. Few of those functionalities are:

  • Autocorrelation of series
  • Partial autocorrelation with yule walker
  • Levinson Durbin estimation
  • AutoRegressive and Moving Average(ARIMA)
  • KalmanFiltering, Log Likelihood estimation
  • Moving averages, differences and autocovariances.

To get your hands dirty,

  • Install statsample with gem install statsample
  • Now, simply install timeseries gem by gem install statsample-timeseries.

Now, let’s make a simple TimeSeries object:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
require 'statsample-timeseries'
include Statsample::TimeSeries
#creates a randomized timeseries of 100 continuous elements
ts = 100.times.map { rand(100) }

#Gives autocorrelation of series
ts.acf

#Gives partial autocorrelation of series
ts.pacf
#Partial autocorrelation with 11 lags by maximum likelihood estimation
ts.pacf(11, 'mle')

#autoregressive coefficients:
ts.ar

#ARIMA(2, 1, 1)
k_obj = TimeSeries.arima(ts, 2, 1, 1)
k_obj.ar #Gives autoregressive coefficients
k_obj.ma #Gives moving average coefficients

You can go through the documentation and API, here

Statsample-GLM

Statsample GLM includes many helpful regression techniques, which can be used for regression analysis on data.
Some of those techniques are:

  • Poisson Regression
  • Logistic Regression
  • Exponential Regression
  • Iteratively Reweighted Least Squares

The toplevel module for Regression techniques is Statsample::Regression Using it as simple as ever,

  * First, install `statsample` by `gem install statsample`
  * Now, install GLM by `gem install `statsample-glm`

Let’s get quickly started:

1
2
3
4
5
6
7
8
9
10
11
require 'statsample-glm'
#Creating datasets:
x1 = Statsample::Vector.new([0.537322309644812,-0.717124209978434,-0.519166718891331,0.434970973986765,-0.761822002215759,1.51170030921189,0.883854199811195,-0.908689798854196,1.70331977539793,-0.246971150634099,-1.59077593922623,-0.721548040910253,0.467025703920194,-0.510132788447137,0.430106510266798,-0.144353683251536,-1.54943800728303,0.849307651309298,-0.640304240933579,1.31462478279425,-0.399783455165345,0.0453055645017902,-2.58212161987746,-1.16484414309359,-1.08829266466281,-0.243893919684792,-1.96655661929441,0.301335373291024,-0.665832694463588,-0.0120650855753837,1.5116066367604,0.557300353673344,1.12829931872045,0.234443748015922,-2.03486690662651,0.275544751380246,-0.231465849558696,-0.356880153225012,-0.57746647541923,1.35758352580655,1.23971669378224,-0.662466275100489,0.313263561921793,-1.08783223256362,1.41964722846899,1.29325100940785,0.72153880625103,0.440580131022748,0.0351917814720056, -0.142353224879252],:scale)
x2 = Statsample::Vector.new([-0.866655707911859,-0.367820249977585,0.361486610435,0.857332626245179,0.133438466268095,0.716104533073575,1.77206093023382,-0.10136697295802,-0.777086491435508,-0.204573554913706,0.963353531412233,-1.10103024900542,-0.404372761837392,-0.230226345183469,0.0363730246866971,-0.838265540390497,1.12543549657924,-0.57929175648001,-0.747060244805248,0.58946979365152,-0.531952663697324,1.53338594419818,0.521992029051441,1.41631763288724,0.611402316795129,-0.518355638373296,-0.515192557101107,-0.672697937866108,1.84347042325327,-0.21195540664804,-0.269869371631611,0.296155694010096,-2.18097898069634,-1.21314663927206,1.49193669881581,1.38969280369493,-0.400680808117106,-1.87282814976479,1.82394870451051,0.637864732838274,-0.141155946382493,0.0699950644281617,1.32568550595165,-0.412599258349398,0.14436832227506,-1.16507785388489,-2.16782049922428,0.24318371493798,0.258954871320764,-0.151966534521183],:scale)

y = Statsample::Vector.new([0,0,1,0,1,1,1,1,0,1,1,1,1,0,1,0,1,1,0,1,0,1,1,1,1,0,0,1,1,0,0,1,0,0,1,1,0,0,1,1,0,1,1,1,1,0,0,0,1,1],:scale)

x = Statsample::Dataset.new({"i"=>intercept,"x1"=>x1,"x2"=>x2})

obj = Statsample::Regression.glm(x, y, :binomial)
#=> Returns logistic regression object

The documentation and API details is available here

We have some more plans for GLM module. First in list is to make the algorithms work with SVD, because manual inversion of matrices is not cool for larger values in Poisson Regression.

I have blogged about most of the functionalities on my blog - www.ankurgoel.com.
Please explore and use the libraries; I will be waiting for your inputs, suggestions and questions. Feel free to leave your questions on Github issues.

Had an amazing summer!
Stay tuned and Enjoy. :)

- Ankur Goel