As discussed, I am implementing estimation methods for phi in AR modelling. We covered yule_walker earlier, I’ll write a post about that. After it’s implementation, we go ahead with another estimation method - Levinson Durbin
Levinson-Durbin requires timeline series to be demeaned(series = series - series.mean) and it’s autocovavirance.
Autocovariance of series is represented by summation of summation of product of series with series at lag k. That is, summation of (x_i * x_{i+lag}). It is also directly related with acf of series as acf(k) = acvf(h) / acvf(0). It’s code can now be found in Statsample::TimeSeries’s acvf method.
Now, with the help of autocovariance series, our levinson_durbin function recursively computes the following parameters:
sigma_v : estimation of error variance
arcoefs : AR phi values for timeseries
pac : unbiased levinson pacf estiation
sigma : sigma for AR.
L-D performs recursive matrix and vector multiplications to populate it’s toeplitz matrix. Here is some code depicting those manipulations:
123456789101112131415161718
order=nlagsphi=Matrix.zero(nlags+1)sig=Array.new(nlags+1)#setting initial point for recursion:phi[1,1]=series[1]/series[0]#phi[1][1] = series[1]/series[0]sig[1]=series[0]-phi[1,1]*series[1]2.upto(order).eachdo|k|phi[k,k]=(series[k]-(Statsample::Vector.new(phi[1...k,k-1])*series[1...k].reverse.to_ts).sum)/sig[k-1]#some serious refinement needed in above for matrix manipulation. Will do today1.upto(k-1).eachdo|j|phi[j,k]=phi[j,k-1]-phi[k,k]*phi[k-j,k-1]endsig[k]=sig[k-1]*(1-phi[k,k]**2)end
Now, in this week, I will integrate this in AR modelling and perform some tests to verify the estimation. And will soon start with next estimation method :)
As mentioned in previous post, I have been working with Autoregressive and Moving Average simulations.
To test the correctness of estimations by our simulations, we employ acf(Autocorrelation) and pacf(partial autocorrelation) to our use. For different order of AR and MA, we get the varying visualizations with them, such as:
Exponential decreasing curves.
Damped sine waves.
Positive and negative spikes, etc.
While analyzing and writing tests for same, I also took some time to visualize that data on ilne and bar charts to get a clearer picture:
AR(1) process
AR(1) process is the autoregressive simulation with order p = 1, i.e, with one value of phi.
Ideal AR(p) process is represented by:
Here, number of observations, n = 1500 (greater value is preferrable for best fit), p = 1, with phi = [0.9].
ACF
To generate it’s autocorrelation
12
acf=ar_1.to_ts.acfpacf
For an AR(1) process, acf must exponentially decay if phi > 0, or alternate in sign if phi < 0Ref. Go through the analysis above. It can be visualized as:
When phi > 0, acf decreases exponentially:
When phi < 0, you get the alternate acf lags:
PACF
To generate it’s partial autocorrelation:
12
pacf=ar_1.to_ts.pacfppacf
For AR(1) process, pacf must have a spike at lag 1, then 0. Former spike must be positive if phi > 0, otherwise, negative spike. Have a look at the pacf series generated above. On visualizing the data:
When phi > 0, positive lag at 1 and 0(contains 1.0):
When phi < 0, negative lag at 1:
Here is the representation of ideal acf-vs-pacf for positive phi in AR(1):
For AR(p), acf must give a damping sine wave. The pattern is greatly dependent on the value and sign of phi parameters.
When positive content in phi coefficients is more, you will get a sine wave starting from positive side, else, sine wave will start from negative side.
Notice, the damping sine wave starting from positive side here:
and negative side here..
PACF
pacf gives spike at lag 0(value = 1.0, default) and from lag 1 to lag k. The example above, features AR(2) process, for this, we must get spikes at lag 1 - 2 as:
MA(1) process
MA(1) process is the moving average simulation with order q = 1, i.e, with one value of theta.
To simulate this, use ma_sim method from Statsample::ARIMA::ARIMA
For theta > 0, for MA(1), we must get a positive spike at lag 1 as:
For theta < 0, the spike at lag 1 must be in negatie direction as:
When I put these two visualizations aside each other, the visualization seems quite fit:
MA(q) process
MA(q) process. Order = q => Number of theta coefficients = q.
Ideal MA(q) process is represented by:
ACF
Similar to AR(1) simulation, it will have spikes for lag 1 - lag p as :
PACF
In pacf of MA(q) simulation, we observe exponentially decaying/damping sine wave.
ARMA(p, q) process
ARMA(p, q) is combination of autoregressive and moving average simulations.
When q = 0, the process is called as pure autoregressive process; when p = 0, the process is purely moving average.
The simulator of ARMA can be found as arma_sim in Statsample::ARIMA::ARIMA.
For ARMA(1, 1) process, here are the comparisons of the visualizations from R and this code, which just made my day :)
I have been reading and coding AR(Autoregressive Model) and Moving Average(MA) basics.
First, I am very grateful for all the patience by Claudio Bustos to help me with understanding this. There is still lot to be learnt in these models, but I will keep bugging him. :D
We wrote the simulations for AR(p) and MA(q). The idea behind creating them is to first simulate the process with known coefficients and then move on to write ARMA process which could also estimate the coefficients.
Autoregressive Model
This model is represented by AR(p) where p is the order of this model. For a pure autoregressive model, we consider order of moving average model as 0.
AR(p) is represented by the following equation:
Here, are the parameters of model, and is the error noise.
To realize this model, we have to observe and keep track of previous x(t) values, and realize current value with observed summation and error noise.
Here is that code fragment of general AR(p) simulator:
It creates a buffer initially to prevent ‘broken values’, and trims that buffer before returning the result.
It goes on to create the backshifts vector. The backshift vector depends on the ongoing iteration from (1..n).
After creating backshifts vector, it extracts the required phi values from the stack.
Then it performs vector multiplication - backshifts * parameters, then adds the result.
The current x(t) values is then returned with active white noise.
12345678910111213141516171819202122232425
defar_sim(n,phi,sigma)err_nor=Distribution::Normal.rng(0,sigma)#creating buffer with 10 random valuesbuffer=Array.new(10,err_nor.call())x=buffer+Array.new(n,0)#For now "phi" are the known model parameters#later we will obtain it by Yule-walker/Burg11.upto(n+11)do|i|ifi<=phi.size#dependent on previous accumulation of xbackshifts=create_vector(x[0...i].reverse)else#dependent on number of phi size/orderbackshifts=create_vector(x[(i-phi.size)...i].reverse)endparameters=create_vector(phi[0...backshifts.size])summation=(backshifts*parameters).inject(:+)x[i]=summation+err_nor.call()endx-bufferend
There are certain tests which will now be performed in context of acf and pacf, which I earlier coded. They form the basis of correctness of our autoregressive estimation. :) Expect the next post about those tests.
Moving Average Model
This model is represented by MA(q), where q is the order of this model. Again, for pure moving-average model, we consider p=0.
This is represented by following equation:
Unlike autoregressive model, this model was somewhat hard to obtain. It needs to obsere previous error noise instead of previous x(t) values. And the series largely depends upon the order of the model.
I am now currently working on those tests analysis, I mentioned; for various combinations of AR(p) and MA(q). As they get over, I will move to realize few algorithms like yule walker and burgs for estimation of coefficients.