Ankur Goel

On hunt of awesomeness!

Examples With Statsample

TimeSeries

Statsample has a module for Time Series as Statsample::TimeSeries. This module has a class named TimeSeries which enables users to perform operations on sequence of data points, indexed by time and ordered from earliest to latest. Example: Stock data. Suppose, we have a time series as:

1
2
timeseries = (1..10).map { rand 100 }.to_ts
#=> Time Series(type:scale, n:10)[62,91,92,71,86,99,80,64,15,94]

This is the returned TimeSeries object which is now capable of performing several interesting operations such as:

Lag

1
2
3
4
5
timeseries.lag
#=> Vector(type:scale, n:10)[nil,62,91,92,71,86,99,80,64,15]
timeseries.lag(3)
#Lag of series by three units, will place nil in first three positions.
#=> Vector(type:scale, n:10)[nil,nil,nil,62,91,92,71,86,99,80]

Auto-Correlation

This is frequently used statistical operation. In Digital signal processing, autocorrelation of series is the cross-correlation of signal with itself, but without the normalization. Though, in statistics, normalization exists.

1
2
timeseries.acf
#=> Returns the auto-correlation of series.

Diff

diff performs the first difference of the series. That is difference of series with itself and it’s first lag.

1
2
timeseries.diff
#=> Time Series(type:scale, n:10)[nil,29,1,-21,15,13,-19,-16,-49,79]

Exponential moving average

Moving average is a finite impulse response filter which creates a series of averages of subsets of full-data to analyze the given set of data points.
EMA is similar to moving average, but more weight is given to latest data.
image_ema
In StatSample, EMA can be accessed from TimeSeries by calling ema on a timeseries. Example:

1
2
3
4
t_series = (1..100).map { rand }.to_timeseries
t_series.ema
t_series.ema(15, true)
#=> uses 15 observations and sets Welles wilder coefficient to true.

acf takes optional parameters - n(default: 10) that accounts on how many observations to consider and Welles Wilder coefficent(default: zero) which uses smoothing value of 2/(n + 1) on false and 1/n on true.

TimeSeries module, as can be seen, can become highly sophisticated on inclusion of other methods such as ARMA Estimation etc.

Simple Random Sampling

SRS is an unbiased technique to choose subset of individuals (sample) from a larger set (called, population). Selection of each individual in that sample is entirely random and has equal probability as other individuals. Various techniques for SRS is given here.
SRS is a module in StatSample which comprises of various sections for Proportion estimation, confidence intervals, standard deviation, mean estimation etc.
I covered various tests of SRS methods here, as I explored and understood them. I am currently still writing few more tests for this and other modules in StatSample.

I will update the post as soon as I write them. If anyone wishes me to write about the detailed functionality of this module too, please comment. I will be delighted to do that.

Cheers,
-Ankur Goel