Ankur Goel

On hunt of awesomeness!

Cuke Up With Cucumber

Recently, I wrote few features in Cucumber. Cucumber is a powerful tool which enables us to write automated tests in functional descriptions. These descriptions are as easy to comprehend, as plain English. The purpose of this tool is to perfom BDD(Behavior-Driven-Development).

Consider this small snippet from my pacf feature:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Feature: PACF

  As a statistician
  So that I can quickly evaluate partial autocorrelation of a series
  I want to evaluate pacf

Background: a timeseries

  Given the following values in a timeseries:
    | timeseries |
    | 10  20  30  40  50  60  70  80  90  100 |
    | 110 120 130 140 150 160 170 180 190 200 |

Scenario: check pacf for 10 lags with unbiased
  When I provide 10 lags for pacf
  When I provide yw yule walker as method
  Then I should get Array as resultant output
  Then I should get 11 values in resultant pacf

Scenario: check pacf for 5 lags with mle
  When I provide 5 lags for pacf
  When I provide mle yule walker as method
  Then I should get Array as resultant output
  Then I should get 6 values in resultant pacf

Yes, these are tests! And they perform the operations as they say.

  • Feature denotes the feature this test will cover. It is followed by the description of the feature as:

    • As a statistician -> (use-case)
    • So that I can quickly evaluate pacf of a seires -> (purpose)
    • I want to evaluate pacf -> (expected result)
  • Given is analogous to before in RSpec. In context of Background, it denotes before all. That is, the forementioned time-series will be available in all scenarios furhter. This timeseries is resolved by Gherkin parser. This is further resolved after parsing by following definition:

1
2
3
4
5
6
7
Given /^the following values in a timeseries:$/ do |series|
  arr = []
  series.hashes.each do |sequence|
    arr += sequence['timeseries'].split(' ').map(&:to_i).to_ts
  end
  @timeseries = arr.to_ts
end
  • Scenarios cover the test cases with the combination of When, And, Then keywords. They are regular English sentences and combine to form a gramatically sound process. These sentences are then captured by regular-expressions written by programmer. For example;
Scenario’s When clause
1
  When I provide 10 lags for pacf
Converted DSL
1
2
3
When /^I provide (\d+) lags for p?acf$/ do |lags|
  @lags = lags.to_i
end

Above will capture the lags and the strings like:

  • When I provide 5 lags for pacf
  • When I provide 10 lags for acf

Result: Compliant for both acf and pacf. :)

You can check my features and step definitions here.

Cheers
Ankur Goel

Statsample - Partial Autocorrelation

The partial autocorrelation(pacf) of an ARMA process is the function defined by the equation:
f(0) = 1, f(x) = g(x)(x) correlation of series with itself. for x >= 1

The first component of every pacf series is 1.

I implemented pacf with yule-walker equations of unbiased and mle outcomes. Yule-walker equations are the set of equations represented by:
Courtesy: Wikipedia

Yule-walker uses the Toeplitz matrix(gives same output when stored in either row-major or column-major form) inverse with the outcomes to generate the intermediate vector results.

Here, we can generate pacf by making use of either unbiased and mle method with yule-walker function. For unbiased, the denominator is (n-k) whereas for mle, it is n (n is the size of time-series). To achieve that, I made use of fantastic Ruby lambdas to make a closure over the variable k as:

1
2
3
4
5
6
7
8
if method.downcase.eql? 'yw'
  #unbiased => denominator = (n - k)
  denom =->(k) { n - k }
else
  #mle
  #denominator => (n)
  denom =->(k) { n }
end

Below might have been a viable shortcut, but I used former for maintaining descriptive comments and simplicity in code:

1
denom =->(k) { method.downcase.eql?('yw') ? (n - k) : n }

Here is the useful description and theoretical implementation of yule-walker by University of Pennsylvania.

Henceforth, the overall yule-walker method looks like following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def yule_walker(series, k = 1, method='yw')
  #From the series, estimates AR(p)(autoregressive) parameter
  #using Yule-Waler equation. See -
  #http://en.wikipedia.org/wiki/Autoregressive_moving_average_model

  #parameters:
  #ts = series
  #k = order, default = 1
  #method = can be 'yw' or 'mle'. If 'yw' then it is unbiased, denominator
  #is (n - k)

  #returns:
  #rho => autoregressive coefficients
  ts = series #timeseries
  ts = ts - ts.mean
  n = ts.size
  if method.downcase.eql? 'yw'
    #unbiased => denominator = (n - k)
    denom =->(k) { n - k }
  else
    #mle
    #denominator => (n)
    denom =->(k) { n }
  end
  r = Array.new(k + 1) { 0.0 }
  r[0] = ts.map { |x| x ** 2 }.inject(:+).to_f / denom.call(0).to_f

  for l in (1..k)
    r[l] = (ts[0...-l].zip(ts[l...ts.size])).map do |x|
      x.inject(:*)
    end.inject(:+).to_f / denom.call(l).to_f
  end

  r_R = toeplitz(r[0...-1])

  mat = Matrix.columns(r_R).inverse()
  solve_matrix(mat, r[1..r.size])
end

toeplitz method generates the Toeplitz matrix, and solve_matrix solves the equation by using the inverse and matrix muliplication.

pacf is available in Statsample::TimeSeries and can be called as:

1
2
3
4
series = (1..20).map { |x| x * 10 }.to_ts
#Usage: pacf(lags, method), method = 'yw' => unbiased, 'mle' => mle
series.pacf(10, 'yw')
#=> results

The entire implementation can be seen at : https://github.com/AnkurGel/statsample/blob/master/lib/statsample/timeseries.rb#L151 with it’s tests at : https://github.com/AnkurGel/statsample/blob/master/test/test_pacf.rb

Cheers, /-Ankur Goel

Statsample - Wald Test

Hi everyone,
After completing and verifying the integrity of all tests and Ruby versions, I and Claudio started with implementation of Wald Test. He explained it pretty well and was very patient. :)

Wald test is used to test if a series of n acf or pacf indeces are equal to 0.
For acf, the distribution for a white noise sationary process are approximately independent and identically distributed normal random variables with mean 0 and variance n-1.

What that means is, if terms in an acf of a timeseries with k lags are squared and added (sum-of-squares), then that statistic is chi-square distributed over degree of freedom, directly dependent on the k number of lags.

I will demonstrate this with example:

1
2
3
4
5
6
7
8
9
10
11
12
#Create time series

require 'statsample'
include Statsample::TimeSeries
series = (1..30).map { rand(100) }.to_time_series

#find and stores it acf with specific lags
k = 10
series_acf = series.acf(k)

#find sum of squares for series_acf using powerful Ruby map and inject.
sum_of_sq = series_acf.map { |x| x ** 2 }.inject(:+)

So far, we have managed to find the sum of squares of a acf-series with k = 10 = number of lags.
Now, we will check whether or not it is less than quantile 0.95 of a chi-square with k degree of freedom.

For that, include Distribution as:

1
2
3
4
5
6
7
#continuing from last snippet

include Distribution
cdf = ChiSquare.cdf(sum_of_sq, k)

cdf >= 0.05
#=> True

This verifies the Wald test.

The tests can be found on Github repository at: https://github.com/AnkurGel/statsample/blob/master/test/test_wald2.rb

Cheers,
Ankur Goel