Time Series Notes (5) - Parameter estimation

# Time Series Notes (5) - Parameter estimation

## Introduction

The hyper parameters of a time series model, for example, $p,d,q$ for an $\text{ARIMA}(p,d,q)$ model is already known. And the model will be completely determined if the values of parameters like

$\phi_1,\dots,\phi_p,\theta_1,\dots,\theta_p,\theta_0\text{ and }\sigma_a^2$

are further known.

Then we have to use methods to estimate these values based on observed values from ${Z_t}$.

## Conditional least squares (CLS)

The conditional least square is derived from the basic linear regression, where we can see $Z_{t+1}$ as $Y$ in the linear regression and the other $Z_t,\dots,Z_{t-p}$ as the $X_i$ In the linear regression.

Consider an $\text{AR}(p)$ model $Z_t=\theta_0+\phi_1Z_{t-1}+\phi_2Z_{t-2}+\cdots+\phi_pZ_{t-p}+a_t$, the conditional least square estimates are

$(\hat\theta_0,\hat\theta_1,\dots,\hat\theta_p)=\underset{\theta_0,\phi_1,\dots,\phi_p}{\text{argmin}}\sum_{t=p+1}^n(Z_t-\theta_0-\phi_1Z_{t-1}-\cdots-\phi_pZ_{t-p})^2$

and

$\hat\sigma_a^2=\frac{1}{n-p}\sum_{t=p+1}^n(Z_t-\hat\theta_0-\hat\phi_1Z_{t-1}-\cdots-\hat\phi_pZ_{t-p})^2$

Notice there is a common term in above two expressions. We call the term

$S_C(\theta_0,\phi_1,\dots,\phi_p)=\sum_{t=p+1}^n(Z_t-\theta_0-\phi_1Z_{t-1}-\cdots-\phi_pZ_{t-p})^2$

the conditional sum of squares function. In this way, we can rewrite the formulas into the following styles:

\begin{align} & (\hat\theta_0,\hat\theta_1,\dots,\hat\theta_p)=\text{argmin}\space S_C(\theta_0,\phi_1,\dots,\phi_p) \\ & \hat\sigma_a^2=\frac{1}{n-p}S_C(\hat\theta_0,\hat\phi_1,\dots,\hat\phi_p) \end{align}

### CLS on $\text{AR}(1)$ process

The conditional sum of squares function for $\text{AR}(1)$ process is

$S_C(\mu,\theta_0)=\sum_{t=2}^n[Z_t-\theta_0-\phi Z_{t-1}]^2$

Therefore, the CLS estimates $(\hat\theta_0,\hat\phi_0)$ satisfy

\begin{align} & \hat\phi=\frac{\sum_{t=2}^n(Z_t-\bar y)(Z_{t-1}-\bar x)}{\sum_{t=2}^n(Z_{t-1}-\bar x)^2} \\ & \bar y=\hat\theta_0+\hat\phi\bar x \end{align}

where

$\bar y=\frac{1}{n-1}\sum_{t=2}^nZ_t,\space\space\space\space\bar x=\frac{1}{n-1}\sum_{t=2}^nZ_{t-1}$

Especially, for large $n$,

$\bar y\approx\bar Z=\frac{1}{n}\sum_{t=1}^nZ_t,\space\space\space\space\bar x\approx\bar Z$

Therefore,

$\hat\theta_0\approx(1-\hat\phi)\bar Z,\space\space\space\space\hat\mu=\frac{\hat\theta_0}{1-\hat\phi}\approx\bar Z$

and

$\hat\phi=\frac{\sum_{t=2}^n(Z_t-\bar y)(Z_{t-1}-\bar x)}{\sum_{t=2}^n(Z_{t-1}-\bar x)^2}\approx\frac{\sum_{t=2}^n(Z_t-{\color{red}\bar Z})(Z_{t-1}-{\color{red}\bar Z})}{\sum_{t=2}^n(Z_{t-1}-{\color{red}\bar Z})^2}\approx\frac{\sum_{t=2}^n(Z_t-{\color{red}\bar Z})(Z_{t-1}-{\color{red}\bar Z})}{\sum_{\color{green}t=1}^n(Z_{t-1}-{\color{red}\bar Z})^2}=r_1(\hat\rho_1)$

### CLS on $\text{MA}(1)$ process

Suppose we have an invertible process $Z_t=a_t+\theta a_{t-1}$, where $\vert\theta\vert<1$, the conditional sum of squares function is

$S_C(\theta)=\sum_{t=1}^n(a_t)^2$

By invertibility and truncate at $0=Z_0=Z_{-1}=Z_{-2}=\cdots$, the above sum of squares can be written as

$S_C(\theta)=\sum_{t=1}^n[Z_t-\theta Z_{t-1}+\theta^2Z_{t-2}-\cdots+(-\theta)^{t-1}Z_1]^2$

Note that if $a_0=0$, then we have

$a_1=Z_1,\space\space\space\space a_2=Z_2-Z_1=Z_2-\theta a_1,\space\space\space\space a_3=Z_3-\theta Z_2+\theta^2Z_1=Z_3-\theta a_2,\space\space\cdots\space\space a_n=Z_n-\theta a_{n-1}$

It is impossible to directly calculate the conditional least square estimates for MA and ARMA models. Some numerical optimization methods, such as Gaussian Newton, are usually used to search the estimates.

## Maximum likelihood (ML) and unconditional least squares (ULS)

For any set of observations $Z_1,Z_2,\cdots,Z_n$ (time series or otherwise), the likelihood function $L$ is defined to be the probability (density) value of obtaining the data actually observed. However, it is considered as a function of the parameters in the model.

Advantage: First, all of the information in the data is used rather than just the first and second moments, as is the case with least squares. Second, many large-sample results are known under very general conditions.

Disadvantage: The method needs a specific joint probability density function of the process, which is sometimes complex.

### ULS and ML on $\text{AR}(1)$ model

Consider an $\text{AR}(1)$ model $Z_t-\mu=\phi(Z_{t-1}-\mu)+a_t$, where the white noise ${a_t}\sim i.i.d.N(0,\sigma_a^2)$, and the unknown parameters are $\mu,\phi,\sigma_a^2$.

The random variable $Z_t$, conditional on ${Z_{t-1},Z_{t-2},\dots}$, will follow the normal distribution $N{\mu+\phi(Z_{t−1} −\mu),\sigma_a^2}$, and hence has the density

$f(z|Z_{t-1},Z_{t-2},\dots)=f(z|Z_{t-1})=(2\pi\sigma_a^2)^{-\frac{1}{2}}\exp\{-\frac{[z-\mu-\phi(Z_{t-1}-\mu)]^2}{2\sigma_a^2}\}$

Replace $Z_t$ as its MA presentation, we have the likelihood function for $Z_1,\dots,Z_n$ is

$\begin{split} L(\phi,\mu,\sigma_a^2)&=f(z_n,z_{n-1},\dots,z_1) = f(z_n|z_{n-1},\dots,z_1)f(z_{n-1},\dots,z_1) \\ &=f(z_n|z_{n-1})f(z_{n-1},\dots,z_1)=\cdots \\ &=f(z_n|z_{n-1})f(z_{n-1}|z_{n-2})\cdots f(z_2|z_1)f(z_1) \\ &=(2\pi\sigma_a^2)^{-\frac{n}{2}}(1-\phi^2)^{\frac{1}{2}}\exp\{-\frac{1}{2\sigma_a^2}S(\phi,\mu)\} \end{split}$

where

$\begin{split} {\color{green}S(\phi,\mu)}&=\sum_{t=2}^n[(Z_t-\mu)-\phi(Z_{t-1}-\mu)]^2+(1-\phi^2)(Z_1-\mu)^2 \\ &={\color{red}S_C(\phi,\mu)}+(1-\phi^2)(Z_1-\mu)^2 \end{split}$

is called the unconditional sum of squares function.

When $\mu=0$, we let $Z_1=0$, the ML method is exactly the same as the CLS method.

The ML estimates for $\mu,\phi,\sigma_a^2$ will minimize the log-likelihood function

$-\frac{n}{2}\log(2\pi)-\frac{n}{2}\log(\sigma_a^2)+\frac{1}{2}\log(1-\phi^2)-\frac{1}{2\sigma_a^2}S(\phi,\mu)$

As a compromise between conditional least squares (CLS) estimates and full maximum likelihood (ML) estimates, the unconditional least squares (ULS) estimates are

\begin{align} & (\hat\phi,\hat\mu)=\text{argmin}\space S(\phi,\mu)=\text{argmin}\{S_C(\phi,\mu)+(1-\phi^2)(Z_1-\mu)\} \\ & \hat\sigma_a^2=\frac{1}{n-1}S(\hat\phi,\hat\mu) \end{align}

## Properties of the estimates

The CLS, ULS and ML estimates have the same large-sample properties.

Asymptotic variances of the estimates for a few low order ARMA models are as follows:

\begin{align} \text{AR}(1)&:\text{Var}(\hat\phi)\approx\frac{1}{n}(1-\phi^2) \\ \text{AR}(2)&:\left\{\begin{matrix}\text{Var}(\hat\phi_1)\approx\text{Var}(\hat\phi_2)\approx\frac{1}{n}(1-\phi_2^2)\\\text{corr}(\hat\phi_1,\hat\phi_2)\approx-\frac{\phi_1}{1-\phi_2}=-\rho_1\end{matrix}\right. \\ \text{MA}(1)&:\text{Var}(\hat\theta)\approx\frac{1}{n}(1-\theta^2) \\ \text{MA}(2)&:\left\{\begin{matrix}\text{Var}(\hat\theta_1)\approx\text{Var}(\hat\theta_2)\approx\frac{1}{n}(1-\theta_2^2)\\\text{corr}(\hat\theta_1,\hat\theta_2)\approx\frac{\theta_1}{1+\theta_2}\end{matrix}\right. \\ \text{ARMA}(1,1)&:\left\{\begin{matrix}\text{Var}(\hat\phi)\approx\frac{1}{n}(1-\phi^2)(\frac{1+\phi\theta}{\phi+\theta})^2\\\text{Var}(\hat\theta)\approx\frac{1}{n}(1-\theta^2)(\frac{1+\phi\theta}{\phi+\theta})^2\\\text{corr}(\hat\phi,\hat\theta)\approx\frac{\sqrt{(1-\phi^2)(1-\theta^2)}}{1+\phi\theta}\end{matrix}\right. \end{align}

When estimating an $\text{AR}(1)$ process with the $\text{AR}(2)$ model, the variance of the estimates will increase. It is the same for the $\text{MA}$ models.

For the $\text{ARMA}(1,1)$ process with $\phi+\theta\sim0$, the variance may be very large.