User’s Guide : Multiple Equation Analysis : Vector Autoregression and Error Correction Models : Bayesian VAR
Bayesian VAR
Estimating a Bayesian VAR in EViews
Prior Type
Prior Specification
Litterman/Minnesota Prior
Normal-Wishart Prior
Sims-Zha Priors
An Example
Alternate priors
Technical Background
Litterman or Minnesota prior
Normal-Wishart prior
Sims-Zha priors
Sims-Zha normal-Wishart prior
Sims-Zha normal-flat prior
VARs are frequently used in the study of macroeconomic data. Since VARs frequently require estimation of a large number of parameters, over-parameterization of VAR models is often a problem—with too few observations to estimate the parameters of the model.
One approach for solving this problem is shrinkage, where we impose restrictions on parameters to reduce the parameter set. Bayesian VAR (BVAR) methods (Litterman, 1986; Doan, Litterman, and Sims, 1984; Sims and Zha, 1998) are one popular approach for achieving shrinkage, since Bayesian priors provide a logical and consistent method of imposing parameter restrictions.
The remainder of this discussion describes the estimation of VARs with Bayesian restrictions shrinkage. We first describe the set of EViews tools for estimating and working with BVARs and provide examples of the approach. This first section assumes that you are familiar with the various methods outlined in the literature. The remaining section outlines the methods in somewhat more detail,
Estimating a Bayesian VAR in EViews
To estimate a Bayesian VAR in EViews, click on Quick/Estimate VAR... or type var in the command window to bring up the VAR Specification dialog. Select the Bayesian VAR as the VAR type in the radio buttons on the left-hand side of the dialog.
The dialog will change to the BVAR version of the VAR Specification dialog. As with a standard VAR, you may use the Basics page to list of endogenous variables, the included lags, and any exogenous variables, and to specify the estimation sample:
The two BVAR specific tabs, Prior type and Prior specification, allow you to customize your specification. The following discussion of these settings assumes that you are familiar with the basics of the various prior types and associated settings. For additional detail, see “Technical Background”.
Prior Type
The Prior type tab lets you specify the type of prior you wish to use, along with options for calculating the initial residual covariance matrix.
You may use the drop-down menu to choose between Litterman/Minnesota, normal-Wishart, Sims-Zha normal-Wishart, and Sims-Zha normal-flat priors.
For the priors other than normal-Wishart, you may select a method for estimating the initial (or prior) residual covariance matrix, and whether you would like to correct that estimated covariance matrix by the degrees-of-freedom in the model.
Prior Specification
The Prior specification tab lets you further specify the prior distributions by either assigning hyper-parameter values, or providing a user-supplied prior matrix. If you wish to assign hyper-parameter values, you should select the Hyper-parameters radio button in the Prior specification type box.
Litterman/Minnesota Prior
For the Litterman/Minnesota prior depicted here, you may specify the hyper-parameters using the four scalars , , , and .
As described below, the prior mean is likely to have most or all of its elements set to zero to lessen the risk of over-fitting, and this implies that should be close to zero.
is the overall tightness on the variance (of the first lag) and controls the relative importance of sample and prior information. Note that if is small, prior information dominates the sample information. represents the relative tightness of the variance of other variables. Setting implies the VAR is collapsed to a vector of univariate models. represents the relative tightness of the variance of lags. For reference, Koop and Korobilis (2009) set equal to 2, whereas Kadiyala and Karlsson (1997) choose to be 1 (a special case, linear decay) for their particular application.
To specify your own hyper-parameter values, select the User-specified radio button If you choose User-specified you should provide the following information:
Coefficient means. Fill in the edit box with the name of a vector in the workfile containing a prior mean for the coefficients.
Coefficient covariance. If desired, you may provide the name of a matrix containing a prior covariance for the coefficients.
Normal-Wishart Prior
For the normal-Wishart prior, you can specify the two hyper-parameters and (where the prior coefficient mean and covariance are and , respectively, for an -element unit vector and an identity matrix).
Note that the prior covariance has the form (to ensure natural conjugacy of the prior). This result implies that the prior covariance in any equation is identically equal to , which may be an undesirable restriction.
If you select User-specified you should enter the name of a vector in your workfile containing a prior mean for the coefficients.
Sims-Zha Priors
The hyper-parameters for both Sims-Zha priors may be specified by setting the five scalars values , , ,, and .
The parameter , is used to set prior weights on dummy observations for a sum of coefficient prior that implies beliefs about the presence of unit roots. controls the initial dummy observations. Note that the dummy variables can introduce correlations among coefficients, and therefore as and , the prior imposes more constraints on the model. Specifically, implies that there are as many unit roots as variables and there is no cointegration. When , the model tends to be a form in which either all variables are stationary with means equal to sample average of initial conditions (i.e. dummies are set to be the averages of initial conditions), or there are unit root components without drift (linear trend) terms.
Following Litterman, the hyper-parameter controls overall tightness, controls the rate at which prior variance shrinks with increasing lag size, and controls the tightness of beliefs on the residual covariance.
If you select User-specified you should provide the name of a matrix containing a prior covariance for the coefficients in the H matrix edit box, and the name of a matrix containing a residual prior scale matrix in the Residual scale matrix edit box.
An Example
To illustrate the Bayesian approach, we now estimate the coefficients of a VAR(2) model using the first differences of the logarithm of the DLINVESTMENT (investment), DLINCOME (income), and DLCONCUMPTION (consumption) example data. The raw data are provided in the EViews workfile “wgmacro.WF1”. This data set was examined by Lütkepohl (2007, page 228).
Click on Quick/Estimate VAR... to open the main VAR specification dialog. In the VAR type box, select Bayesian VAR and in the Endogenous Variables box, type:
dlincome dlinvestment dlconsumption
Here, you will see the pre-filled settings including the variable names. You may change the default settings, but for now on, we assume that the default settings are used.
Next, click on the Prior type tab to select the prior type for the VAR. By default, EViews will choose the Litterman/Minnesota prior and the Univariate AR estimate for the Initial residual covariance options, but you can change the prior type and the initial covariance estimation option from the menus.
The Prior specification tab shows the hyper-parameter settings. Note that the settings may vary depending on the prior type. We will use the default settings for our example so that you may click on OK to continue.
EViews estimates the VAR and displays the results view. The top portion of the main results is shown below. The heading information provides the basic information about the settings used in estimation, and the basic prior information:
In his study of this data, Lütkepohl chose a set of different hyper-parameters from those set by default in EViews, and chose to use a diagonal VAR to estimate the initial residual covariance. We can replicate his results by setting the Diagonal VAR estimate on the Prior type tab of the dialog.
Since the estimates in the third row of Table 5.3 of Lütkepohl’s example may be obtained using EViews’ default hyper-parameter values, click on OK to estimate the modified BVAR specification.
The results in the other rows of table Table 5.3 may be obtained by changing the hyper-parameters. For example, to obtain the results in the fourth row, go to the Prior Specification tab in the estimation dialog and change Lambda1 to 0.01:
Click on OK to estimate the updated specification. The resulting estimation output is displayed below:
Alternate priors
To illustrate the importance of the prior selection, we estimate the same model using the Sims-Zha normal-flat prior, with a univariate AR estimate for the initial residual covariance, and the default hyper-parameter settings.
The results of this estimation are shown below:
We can see that the point estimates of the coefficients have changed, in some cases by a large degree, when compared to our initial BVAR estimation using default settings. For example, the coefficient in the DLINVESTMENT equation for the lagged value of DLCONSUMPTION has decreased from a value of 0.272 to 0.004, with a corresponding change in t-statistic from 0.76 to 0.02.
Technical Background
Bayesian analysis requires knowledge of the distributional properties of the prior, likelihood, and posterior. In Bayesian statistics and econometrics, anything about which we are uncertain, including the true value of a parameter, can be thought of as being a random variable to which can assign a probability distribution.
The prior is the external distributional information based on researchers’ belief on parameters of interest. The likelihood is the data information contained in the sample probability distribution function (pdf). Combining the prior distribution via Bayes’ theorem with the data likelihood results in the posterior distribution.
In particular, denote the parameters of interest in a given model by and the data by . Let us say that the prior distribution is and the likelihood is , then the posterior distribution is the distribution of given the data and may be derived by
Note that the denominator part is a normalizing constant which has no randomness, and thus the posterior is proportional to the product of the likelihood and the prior
The main target of Bayesian estimation is to find the posterior moments of the parameter of interest. For instance, location and dispersion are the general estimates which are comparable to those obtained in classical estimation (namely the classical coefficient estimate and coefficient standard error). These point estimates can be easily derived from the posterior because the posterior distribution contains all the information available on the parameter .
To relate this general framework to Bayesian VAR (BVAR) models, suppose that we have the VAR(p) model:
where for is an vector containing observations on m different series and is an vector of errors where we assume is i.i.d. . For compactness we may rewrite the model as:
where and are matrices and is a matrix for , is the identify matrix of dimension , , and . Using Equation (40.42) the likelihood function is
To illustrate how to derive the posterior moments, let us assume is known and a multivariate normal prior for :
where is the prior mean and is the prior covariance. When we combine this prior with the likelihood function in Equation (40.43), the posterior density can be written as
which is a multivariate normal pdf. For simplicity, define
Then the exponent in Equation (40.45) can be written as
where the posterior mean is
Since is known, the second term of Equation (40.47) has no randomness about . The posterior therefore may be summarized as
and the posterior covariance is given as
A fundamental feature of Bayesian econometrics is the formulation of the prior distribution of the parameters, based upon information which reflects researchers’ beliefs. A proper Bayesian analysis will incorporate the prior information to strengthen inferences about the true value of the parameters. An obvious argument against the use of prior distributions is that a prior is intrinsically subjective and therefore offers the potential for manipulation.
EViews offers four different priors which have been popular in the BVAR literature:
1. The Litterman/Minnesota prior: A normal prior on with fixed .
2. The Normal-Wishart prior: A normal prior on and a Wishart prior on
3. The Sims-Zha normal-Wishart prior.
4. The Sims-Zha normal-flat: A normal prior on and non-informative prior on :
It is worth noting that EViews only offers conjugate priors (whose posterior has the same distributional family as the prior distribution). This restriction allows for analytical calculation of the Bayesian VAR, rather than simulation-based estimation (e.g. the MCMC method) as is generally required. It is also worth noting that the choice of priors does not imply the need for different Bayesian techniques of estimation. Disagreement over the priors may be addressed by post-estimation sensitivity analysis evaluating the robustness of posterior quantities of interest to different prior specifications.
Litterman or Minnesota prior
Early work on Bayesian VAR priors was done by researchers at the University of Minnesota and the Federal Reserve Bank of Minneapolis (see Litterman (1986) and Doan, Litterman, and Sims (1984)), and these early priors are often referred to as the “Litterman prior” or the “Minnesota prior”. This family of priors is based on an assumption that is known; replacing with its estimate . This assumption yields simplifications in prior elicitation and computation of the posterior.
EViews offers three choices of an estimator of :
Univariate AR: is restricted to be a diagonal matrix, where , the -th element of , is the standard OLS estimate of the error variance calculated from an univariate AR regression using the i-th variable.
Full VAR: estimates a standard classical VAR and uses the covariance matrix from that estimation as the initial estimate of . This choice is not always feasible in cases where there are not enough observations to estimate the full VAR.
Diagonal VAR: is restricted to be a diagonal matrix (as in the univariate VAR estimator), however the diagonal elements of the matrix are calculated from the full classical VAR (i.e., the diagonal elements are equal to those in the full VAR method, and the non-diagonal elements are set equal to zero).
Since is replaced by , we need only specify a prior for the VAR coefficient . The Litterman prior assumes that the prior of is
(where the hyper-parameter , which indicates a zero mean model) and nonzero prior covariance . Note that although the choice of zero mean could lessen the risk of over-fitting, theoretically any value for is possible.
To explain the Minnesota/Litterman prior for the covariance , note that the explanatory variables in the VAR in any equation can be divided into own lags of the dependent variable, lags of the other dependent variables, and finally any exogenous variables, including the constant term. The elements of corresponding to exogenous variables are set to infinity (i.e., no information about the exogenous variables is contained within the prior).
The remainder of is then a diagonal matrix with its diagonal elements for
where is the i-th diagonal element of .
This prior setting simplifies the complicated choice of specifying all the elements of down to choosing three scalars , and . The first two scalars and are overall tightness and relative cross-variable weight, respectively. captures the lag decay that, as lag length increases, coefficients are increasingly shrunk toward zero.
Note that changes in these hyper-parameter scalar values may lead to smaller (or larger) variances of coefficients, which is called tightening (or loosening) the prior. The exact choice of values for these three scalars depends on the empirical application, so that researchers can make trials with different values for themselves. Litterman (1986) provides additional discussion of these choices.
Given this choice of prior, the posterior for takes the form
A primary advantage of the Minnesota/Litterman prior is that it leads to simple posterior inference. The prior does not, however, provide a full Bayesian treatment of as an unknown, so it ignores uncertainty in this parameter.
Normal-Wishart prior
When the assumption that is known is loosened, a prior for the residual covariance can be also chosen. One well-known conjugate prior for normal data is the normal-Wishart:
where is the AR(1) coefficient mean and is the coefficient covariance with the two prior hyper-parameters and , and
where is the degree of freedom and is the scale matrix . Any values for the hyper-parameters can be chosen, however, it is worth noting that a non-informative prior is obtained by setting the hyper-parameters as and letting . It can be seen that the non-informative prior leads to a posterior based on OLS quantities which are identical to classical VAR estimation results.
According to the Bayes updating rule, the posterior becomes:
with the standard OLS estimate and
Since the natural conjugate priors have the same distributional form for the prior, likelihood, and posterior, the prior can be considered as dummy observations. In the following section, we will discuss how this interpretation develops the priors for structural VARs.
Sims-Zha priors
Sims and Zha (1998) show how the dummy observations approach can be used to elicit the priors for structural VAR models. To illustrate the Sims-Zha priors, suppose that we have a contemporaneous correlation of the series, then the model can be written as:
where and . Note that given appropriate identifying restrictions, there will be a mapping from the parameters of the reduced form VAR to the structural VAR. This form can be also written in a multivariate regression form by defining to be a matrix of the coefficients on the lagged variable
where is , is , is , is , and is . Note that contains the lagged Y’s and a column of 1’s corresponding to the constant.
Sims and Zha suggest the conditional prior (Sims-Zha prior) on and . In particular,
where is a marginal distribution of and is a normal density with mean and covariance . Note that EViews sets the marginal distribution of to be normal. The conditional likelihood can be expressed in a compact form:
Combining Equation (40.51) and Equation (40.52), we can derive the posterior density as:
where is a notation for vectorized. Since this posterior has a nonstandard form, a direct analysis of the likelihood may be computationally infeasible. However, the conditional posterior distribution can be analytically derived by:
This specification differs from the Litterman/Minnesota case in a few respects. First, there is no distinction between the prior variances on own lags versus other lags. Second, there is only one scale factor in the denominator , rather than using the ratio scale factors . In particular, each element of for and is written as
where is the j-th diagonal element of for the l-th lag of the series i in equation j.
EViews offers two different choices for the estimate of : Univariate AR and Diagonal VAR , as previously specified in 1) and 2). The three hyper-parameters and reflect the general beliefs about the VAR, and in practice theses are specified on the basis of prior knowledge of researchers. Specifically, is overall tightness of beliefs on , is standard deviation around , and represents lag decay.
Based on the recognition that the prior information can be considered as dummy observations, Sim and Zha suggest two extra dummy variables ( and )
which account for unit roots ( and ) and trends ( and ), and write the model as
The first set of dummies are given by
where the hyper-parameter implies the beliefs on the presence of different stationarities. Note that the last columns of , which correspond to the constant term and any exogenous variables, are set to zero.
The second set of dummies reflect a belief that the average of initial values of variable i (i.e., for j=1,..., p) is likely to be a good forecast of . The dummies for initial observation are
where allows for common trend.
EViews will provide two different Sims-Zha priors, normal-Wishart and normal-flat, which apply different distributional family to the covariance matrix (i.e., can be either Wishart or flat distribution). It is worth noting that the normal-flat prior puts a non-informative information on the covariance matrix.
Sims-Zha normal-Wishart prior
For notation consistency, let us denote the coefficient parameter to be . For the natural conjugate normal-Wishart prior, the prior mean is given as Equation (40.54) and its posterior is updated as in Equation (40.55). The prior covariance is given as
where is the degree of freedom and is the scale matrix where .
The posteriors is analytically calculated as
where and .
Sims-Zha normal-flat prior
The normal-flat prior is a weak conjugate prior which has no meaningful prior information on
After some mathematical calculation, the posteriors are derived as
Note that the coefficient parameter is updated by the rule in Equation (40.55).