User’s Guide : Multiple Equation Analysis : System Estimation : Technical Discussion
Technical Discussion
Ordinary Least Squares
Weighted Least Squares
Seemingly Unrelated Regression (SUR)
Two-Stage Least Squares (TSLS) and Weighted TSLS
Three-Stage Least Squares (3SLS)
Full Information Maximum Likelihood)
Generalized Method of Moments (GMM)
White’s Heteroskedasticity Consistent Covariance Matrix
Heteroskedasticity and Autocorrelation Consistent (HAC) Covariance Matrix
Kernel Options
Bandwidth Selection
Multivariate ARCH
Diagonal VECH
Constant Conditional Correlation (CCC)
Diagonal BEKK
While the discussion to follow is expressed in terms of a balanced system of linear equations, the analysis carries forward in a straightforward way to unbalanced systems containing nonlinear equations.
Denote a system of equations in stacked form as:
where is vector, is a matrix, and is a vector of coefficients. The error terms have an covariance matrix . The system may be written in compact form as:
Under the standard assumptions, the residual variance matrix from this stacked system is given by:
Other residual structures are of interest. First, the errors may be heteroskedastic across the equations. Second, they may be heteroskedastic and contemporaneously correlated. We can characterize both of these cases by defining the matrix of contemporaneous correlations, , where the (i,j)-th element of is given by for all . If the errors are contemporaneously uncorrelated, then, for , and we can write:
More generally, if the errors are heteroskedastic and contemporaneously correlated:
Lastly, at the most general level, there may be heteroskedasticity, contemporaneous correlation, and autocorrelation of the residuals. The general variance matrix of the residuals may be written:
where is an autocorrelation matrix for the i-th and j-th equations.
Ordinary Least Squares
The OLS estimator of the estimated variance matrix of the parameters is valid under the assumption that . The estimator for is given by,
and the variance estimator is given by:
where is the residual variance estimate for the stacked system.
Weighted Least Squares
The weighted least squares estimator is given by:
where is a consistent estimator of , and is the residual variance estimator:
where the inner product is taken over the non-missing common elements of and . The max function in Equation (39.17) is designed to handle the case of unbalanced data by down-weighting the covariance terms. Provided the missing values are asymptotically negligible, this yields a consistent estimator of the variance elements. Note also that there is no adjustment for degrees of freedom.
When specifying your estimation specification, you are given a choice of which coefficients to use in computing the . If you choose not to iterate the weights, the OLS coefficient estimates will be used to estimate the variances. If you choose to iterate the weights, the current parameter estimates (which may be based on the previously computed weights) are used in computing the . This latter procedure may be iterated until the weights and coefficients converge.
The estimator for the coefficient variance matrix is:
The weighted least squares estimator is efficient, and the variance estimator consistent, under the assumption that there is heteroskedasticity, but no serial or contemporaneous correlation in the residuals.
It is worth pointing out that if there are no cross-equation restrictions on the parameters of the model, weighted LS on the entire system yields estimates that are identical to those obtained by equation-by-equation LS. Consider the following simple model:
If and are unrestricted, the WLS estimator given in Equation (39.18) yields:
The expression on the right is equivalent to equation-by-equation OLS. Note, however, that even without cross-equation restrictions, the standard errors are not the same in the two cases.
Seemingly Unrelated Regression (SUR)
SUR is appropriate when all the right-hand side regressors are assumed to be exogenous, and the errors are heteroskedastic and contemporaneously correlated so that the error variance matrix is given by . Zellner’s SUR estimator of takes the form:
where is a consistent estimate of with typical element , for all and .
If you include AR terms in equation , EViews transforms the model (see “Specifying AR Terms”) and estimates the following equation:
where is assumed to be serially independent, but possibly correlated contemporaneously across equations. At the beginning of the first iteration, we estimate the equation by nonlinear LS and use the estimates to compute the residuals . We then construct an estimate of using and perform nonlinear GLS to complete one iteration of the estimation procedure. These iterations may be repeated until the coefficients and weights converge.
Two-Stage Least Squares (TSLS) and Weighted TSLS
TSLS is a single equation estimation method that is appropriate when some of the variables in are endogenous. Write the j-th equation of the system as,
or, alternatively:
where , , and . is the matrix of endogenous variables and is the matrix of exogenous variables; is the matrix of endogenous variables not including .
In the first stage, we regress the right-hand side endogenous variables on all exogenous variables and get the fitted values:
In the second stage, we regress on and to get:
where . The residuals from an equation using these coefficients are used for form weights.
Weighted TSLS applies the weights in the second stage so that:
where the elements of the variance matrix are estimated in the usual fashion using the residuals from unweighted TSLS.
If you choose to iterate the weights, is estimated at each step using the current values of the coefficients and residuals.
Three-Stage Least Squares (3SLS)
Since TSLS is a single equation estimator that does not take account of the covariances between residuals, it is not, in general, fully efficient. 3SLS is a system method that estimates all of the coefficients of the model, then forms weights and reestimates the model using the estimated weighting matrix. It should be viewed as the endogenous variable analogue to the SUR estimator described above.
The first two stages of 3SLS are the same as in TSLS. In the third stage, we apply feasible generalized least squares (FGLS) to the equations in the system in a manner analogous to the SUR estimator.
SUR uses the OLS residuals to obtain a consistent estimate of the cross-equation covariance matrix . This covariance estimator is not, however, consistent if any of the right-hand side variables are endogenous. 3SLS uses the 2SLS residuals to obtain a consistent estimate of .
In the balanced case, we may write the equation as,
where has typical element:
If you choose to iterate the weights, the current coefficients and residuals will be used to estimate .
Full Information Maximum Likelihood)
Following the discussion in Amemiya (1997), recall that we have
where is a vector of endogenous variables, is a vector of exogenous variables. The Full Information Maximum Likelihood (FIML) estimator finds the vector of parameters by maximizing the likelihood under the assumption that is a vector of i.i.d. multivariate normal random variables with covariance matrix .
Under the normality assumption, the log-likelihood is given by
where . Note that the log determinant of the derivatives of captures the simultaneity in the system of equations.
For the unrestricted and diagonal restricted covariance variants of the model, we may use the first-order conditions for the variance parameters and rewrite the likelihood in concentrated form:
The diagonal restricted estimator replaces the off diagonal terms in the latter matrix with zeros. The corresponding FIML estimator maximizes the concentrated likelihood with respect to the (or equivalently, the full likelihood with respect to and the free parameters of ).
The FIML estimator for models with user restricted covariances maximizes the full likelihood in Equation (39.31) with respect to given the user specified value for .
The estimator for is asymptotically normally distributed with coefficient covariance typically computed using the partitioned inverse of the outer-product of the gradient of the full likelihood (OPG) or the inverse of the negative of the observed Hessian of the concentrated likelihood. EViews employs the OPG covariance by default, but there is evidence that one should take seriously the choice of method (Calzolari and Panattoni, 1988). In addition, EViews offers a QML covariance computation that employs a Huber-White sandwich using both the OPG and the inverse negative Hessian.
Over the years, a number of approaches for FIML estimation have been proposed (see, for example, Parke 1982, Belsley 1980, Dagenais 1978, or Amemiya 1977). EViews offers standard BFGS, Newton-Raphson, and OPG/BHHH algorithms with various step methods in trust region form, as well as a simple implementation of BHHH with Marquardt and line search steps (“Optimization Algorithms”). See Calzolari and Panattoni (1987) and Weihs, Calzolari, and Panattoni (1986) for simulation results for the performance of various estimators.
Whichever method you select, we encourage you to perform sensitivity analysis.
Generalized Method of Moments (GMM)
The basic idea underlying GMM is simple and intuitive. We have a set of theoretical moment conditions that the parameters of interest should satisfy. We denote these moment conditions as:
The method of moments estimator is defined by replacing the moment condition (39.33) by its sample analog:
However, condition (39.34) will not be satisfied for any when there are more restrictions than there are parameters . To allow for such overidentification, the GMM estimator is defined by minimizing the following criterion function:
which measures the “distance” between and zero. is a weighting matrix that weights each moment condition. Any symmetric positive definite matrix will yield a consistent estimate of . However, it can be shown that a necessary (but not sufficient) condition to obtain an (asymptotically) efficient estimate of is to set equal to the inverse of the covariance matrix of the sample moments . This follows intuitively, since we want to put less weight on the conditions that are more imprecise.
To obtain GMM estimates in EViews, you must be able to write the moment conditions in Equation (39.33) as an orthogonality condition between the residuals of a regression equation, , and a set of instrumental variables, , so that:
For example, the OLS estimator is obtained as a GMM estimator with the orthogonality conditions:
For the GMM estimator to be identified, there must be at least as many instrumental variables as there are parameters . See the section on “Generalized Method of Moments” for additional examples of GMM orthogonality conditions.
An important aspect of specifying a GMM problem is the choice of the weighting matrix . EViews uses the optimal , where is the estimated long-run covariance matrix of the sample moments . EViews uses the consistent TSLS estimates for the initial estimate of in forming the estimate of .
White’s Heteroskedasticity Consistent Covariance Matrix
If you choose the GMM-Cross section option, EViews estimates using White’s heteroskedasticity consistent covariance matrix:
where is the vector of residuals, and is a matrix such that the moment conditions at may be written as .
Heteroskedasticity and Autocorrelation Consistent (HAC) Covariance Matrix
If you choose the GMM-Time series option, EViews estimates by,
You also need to specify the kernel function and the bandwidth .
Kernel Options
The kernel function is used to weight the covariances so that is ensured to be positive semi-definite. EViews provides two choices for the kernel, Bartlett and quadratic spectral (QS). The Bartlett kernel is given by:
while the quadratic spectral (QS) kernel is given by:
where . The QS has a faster rate of convergence than the Bartlett and is smooth and not truncated (Andrews 1991). Note that even though the QS kernel is not truncated, it still depends on the bandwidth (which need not be an integer).
Bandwidth Selection
The bandwidth determines how the weights given by the kernel change with the lags in the estimation of . Newey-West fixed bandwidth is based solely on the number of observations in the sample and is given by:
where int( ) denotes the integer part of the argument.
EViews also provides two “automatic”, or data dependent bandwidth selection methods that are based on the autocorrelations in the data. Both methods select the bandwidth according to the rule:
The two methods, Andrews and Variable-Newey-West, differ in how they estimate and .
Andrews (1991) is a parametric method that assumes the sample moments follow an AR(1) process. We first fit an AR(1) to each sample moment (39.36) and estimate the autocorrelation coefficients and the residual variances for . Then and are estimated by:
Note that we weight all moments equally, including the moment corresponding to the constant.
Newey-West (1994) is a nonparametric method based on a truncated weighted sum of the estimated cross-moments . and are estimated by,
where is a vector of ones and:
for .
One practical problem with the Newey-West method is that we have to choose a lag selection parameter . The choice of is arbitrary, subject to the condition that it grow at a certain rate. EViews sets the lag parameter to:
where for the Bartlett kernel and for the quadratic spectral kernel.
You can also choose to prewhiten the sample moments to “soak up” the correlations in prior to GMM estimation. We first fit a VAR(1) to the sample moments:
Then the variance of is estimated by where is the long-run variance of the residuals computed using any of the above methods. The GMM estimator is then found by minimizing the criterion function:
Note that while Andrews and Monahan (1992) adjust the VAR estimates to avoid singularity when the moments are near unit root processes, EViews does not perform this eigenvalue adjustment.
Multivariate ARCH
ARCH estimation uses maximum likelihood to jointly estimate the parameters of the mean and the variance equations.
Assuming multivariate normality, the log likelihood contributions for GARCH models are given by:
where m is the number of mean equations, and is the vector of mean equation residuals. For Student's t-distribution, the contributions are of the form:
where is the estimated degree of freedom.
Given a specification for the mean equation and a distributional assumption, all that we require is a specification for the conditional covariance matrix. We consider, in turn, each of the three basic specifications: Diagonal VECH, Constant Conditional Correlation (CCC), and Diagonal BEKK.
Diagonal VECH
Bollerslev, et. al (1988) introduce a restricted version of the general multivariate VECH model of the conditional covariance with the following formulation:
where the coefficient matrices , , and are symmetric matrices, and the operator “•”is the element by element (Hadamard) product. The coefficient matrices may be parametrized in several ways. The most general way is to allow the parameters in the matrices to vary without any restrictions, i.e. parameterize them as indefinite matrices. In that case the model may be written in single equation format as:
where, for instance, is the i-th row and j-th column of matrix .
Each matrix contains parameters. This model is the most unrestricted version of a Diagonal VECH model. At the same time, it does not ensure that the conditional covariance matrix is positive semidefinite (PSD). As summarized in Ding and Engle (2001), there are several approaches for specifying coefficient matrices that restrict to be PSD, possibly by reducing the number of parameters. One example is:
where raw matrices , , and are any matrix up to rank . For example, one may use the rank Cholesky factorized matrix of the coefficient matrix. This method is labeled the Full Rank Matrix in the coefficient Restriction selection of the system ARCH dialog. While this method contains the same number of parameters as the indefinite version, it does ensure that the conditional covariance is PSD.
A second method, which we term Rank One, reduces the number of parameter estimated to and guarantees that the conditional covariance is PSD. In this case, the estimated raw matrix is restricted, with all but the first column of coefficients equal to zero.
In both of these specifications, the reported raw variance coefficients are elements of , , and . These coefficients must be transformed to obtain the matrix of interest: , , and . These transformed coefficients are reported in the extended variance coefficient section at the end of the system estimation results.
There are two other covariance specifications that you may employ. First, the values in the matrix may be a constant, so that:
where is a scalar and is an vector of ones. This Scalar specification implies that for a particular term, the parameters of the variance and covariance equations are restricted to be the same. Alternately, the matrix coefficients may be parameterized as Diagonal so that all off diagonal elements are restricted to be zero. In both of these parameterizations, the coefficients are not restricted to be positive, so that is not guaranteed to be PSD.
Lastly, for the constant matrix , we may also impose a Variance Target on the coefficients which restricts the values of the coefficient matrix so that:
where is the unconditional sample variance of the residuals. When using this option, the constant matrix is not estimated, reducing the number of estimated parameters.
You may specify a different type of coefficient matrix for each term. For example, if one estimates a multivariate GARCH(1,1) model with indefinite matrix coefficient for the constant while specifying the coefficients of the ARCH and GARCH term to be rank one matrices, then the number of parameters will be , instead of .
Constant Conditional Correlation (CCC)
Bollerslev (1990) specifies the elements of the conditional covariance matrix as follows:
Restrictions may be imposed on the constant term using variance targeting so that:
where is the unconditional variance.
When exogenous variables are included in the variance specification, the user may choose between individual coefficients and common coefficients. For common coefficients, exogenous variables are assumed to have the same slope, , for every equation. Individual coefficients allow each exogenous variable effect to differ across equations.
Diagonal BEKK
BEKK (Engle and Kroner, 1995) is defined as:
EViews does not estimate the general form of BEKK in which and are unrestricted. However, a common and popular form, diagonal BEKK, may be specified that restricts and to be diagonals. This Diagonal BEKK model is identical to the Diagonal VECH model where the coefficient matrices are rank one matrices. For convenience, EViews provides an option to estimate the Diagonal VECH model, but display the result in Diagonal BEKK form.