Background
The following discussion describes only the basic features of switching models. Switching models have a long history in economics that is detailed in numerous surveys (Goldfeld and Quandt, 1973, 1976; Maddala, 1986; Hamilton, 1994; Frühwirth-Schnatter, 2006), and we encourage you to explore these resources for additional discussion.
The Basic Model
Suppose that the random variable of interest, follows a process that depends on the value of an unobserved discrete state variable . We assume there are possible regimes, and we are said to be in state or regime in period when , for .
The switching model assumes that there is a different regression model associated with each regime. Given regressors and , the conditional mean of in regime is assumed to be the linear specification:
 (35.1)
where and are and vectors of coefficients. Note that the coefficients for are indexed by regime and that the coefficients associated with are regime invariant.
Lastly, we assume that the regression errors are normally distributed with variance that may depend on the regime. Then we have the model:
 (35.2)
when , where is standard normally distributed. Note that the standard deviation may be regime dependent, .
The likelihood contribution for a given observation may be formed by weighting the density function in each of the regimes by the one-step ahead probability of being in that regime:
 (35.3)
, , are parameters that determine the regime probabilities, is the standard normal density function, and is the information set in period . In the simplest case, the represent the regime probabilities themselves.
The full log-likelihood is a normal mixture
 (35.4)
which may be maximized with respect to .
Simple Switching
To this point, we have treated the regime probabilities in an abstract fashion. This section considers a simple switching model featuring independent regime probabilities. We begin by focusing on the specification of the regime probabilities, then describe likelihood evaluation and estimation of those probabilities.
It should be emphasized that the following discussion is valid only for specifications with uncorrelated errors. Models with correlated errors are described in “Serial Correlation”.
Regime Probabilities
In the case where the probabilities are constant values, we could simply treat them as additional parameters in the likelihood in Equation (35.4). More generally, we may allow for varying probabilities by assuming that is a function of vectors of exogenous observables and coefficients parameterized using a multinomial logit specification:
 (35.5)
for with the identifying normalization . The special case of constant probabilities is handled by choosing to be identically equal to 1.
Likelihood Evaluation
We may use Equation (35.4) and Equation (35.5) to obtain a normal mixture log-likelihood function:
 (35.6)
This likelihood may be maximized with respect to the parameters using iterative methods.
It is worth noting that the likelihood function for this normal mixture model is unbounded for certain parameter values. However, local optima have the usual consistency, asymptotic normality, and efficiency properties. See Maddala (1986) for discussion of this issue as well as a survey of different algorithms and approaches for estimating the parameters.
Given parameter point-estimates, coefficient covariances may be estimated using conventional methods, e.g., inverse negative Hessian, inverse outer-product of the scores, and robust sandwich.
Filtering
The likelihood expression in Equation (35.6) depends on the one-step ahead probabilities of being in a regime: . Note, however, that observing the value of the dependent variable in a given period provides additional information about which regime is in effect. We may use this contemporaneous information to obtain updated estimates of the regime probabilities
The process by which the probability estimates are updated is commonly termed filtering. By Bayes’ theorem and the laws of conditional probability, we have the filtering expressions:
 (35.7)
The expressions on the right-hand side are obtained as a by-product of the densities obtained during likelihood evaluation. Substituting, we have:
 (35.8)
Markov Switching
The Markov switching regression model extends the simple exogenous probability framework by specifying a first-order Markov process for the regime probabilities. We begin by describing the regime probability specification, then discuss likelihood computation, filtering, and smoothing.
Regime Probabilities
The first-order Markov assumption requires that the probability of being in a regime depends on the previous state, so that
 (35.9)
Typically, these probabilities are assumed to be time-invariant so that for all , but this restriction is not required.
We may write these probabilities in a transition matrix
 (35.10)
where the -th element represents the probability of transitioning from regime in period to regime in period . (Note that some authors use the transpose of so that all of their indices are reversed from those used here.)
As in the simple switching model, we may parameterize the probabilities in terms of a multinomial logit. Note that since each row of the transition matrix specifies a full set of conditional probabilities, we define a separate multinomial specification for each row of the matrix
 (35.11)
for and with the normalizations .
As noted earlier, Markov switching models are generally specified with constant probabilities so that contains only a constant. Hamilton’s (1989) model of GDP is a notable example of a constant transition probability specification. Alternately, Diebold, Lee, and Weinbach (1994), and Filardo (1994) adopt two-state models that employ time-varying logistic parameterized probabilities.
Likelihood Evaluation and Filtering
The Markov property of the transition probabilities implies that the expressions on the right-hand side of Equation (35.4) must be evaluated recursively.
Briefly, each recursion step begins with filtered estimates of the regime probabilities for the previous period. Given filtered probabilities, , the recursion may broken down into three steps:
1. We first form the one-step ahead predictions of the regime probabilities using basic rules of probability and the Markov transition matrix:
 (35.12)
2. Next, we use these one-step ahead probabilities to form the one-step ahead joint densities of the data and regimes in period :
 (35.13)
3. The likelihood contribution for period is obtained by summing the joint probabilities across unobserved states to obtain the marginal distribution of the observed data
 (35.14)
4. The final step is to filter the probabilities by using the results in Equation (35.13) to update one-step ahead predictions of the probabilities:
 (35.15)
These steps are repeated successively for each period, . All that we require for implementation are the initial filtered probabilities, , or alternately, the initial one-step ahead regime probabilities . See “Initial Probabilities” for discussion.
The likelihood obtained by summing the terms in Equation (35.14) may be maximized with respect to the parameters using iterative methods. Coefficient covariances may be estimated using standard approaches.
Smoothing
Estimates of the regime probabilities may be improved by using all of the information in the sample. The smoothed estimates for the regime probabilities in period use the information set in the final period, , in contrast to the filtered estimates which employ only contemporaneous information, . Intuitively, using information about future realizations of the dependent variable () improves our estimates of being in regime in period because the Markov transition probabilities link together the likelihood of the observed data in different periods.
Kim (2004) provides an efficient smoothing algorithm that requires only a single backward recursion through the data. Under the Markov assumption, Kim shows that the joint probability is given by
 (35.16)
The key in moving from the first to the second line of Equation (35.16) is the fact that under appropriate assumptions, if were known, there is no additional information about in the future data .
The smoothed probability in period is then obtained by marginalizing the joint probability with respect to :
 (35.17)
Note that apart from the smoothed probability terms, , all of the terms on the right-hand side of Equation (35.16) are obtained as part of the filtering computations. Given the set of filtered probabilities, we initialize the smoother using , and iterate computation of Equation (35.16) and Equation (35.17) for to obtain the smoothed values.
Initial Probabilities
The Markov switching filter requires initialization of the filtered regime probabilities in period 0, .
There are a few ways to proceed. Most commonly, the initial regime probabilities are set to the ergodic (steady state) values implied by the Markov transition matrix (see, for example Hamilton (1999, p. 192) or Kim and Nelson (1999, p. 70) for discussion and results). The values are thus treated as functions of the parameters that determine the transition matrix.
Alternately, we may use prior knowledge to specify regime probability values, or we can be agnostic and assign equal probabilities to regimes. Lastly, we may treat the initial probabilities as parameters to be estimated.
Note that the initialization to ergodic values using period 0 information is somewhat arbitrary in the case of time-varying transition probabilities.
Dynamic Models
We may extend the basic switching model to allow for dynamics in the form of lagged endogenous variables and serially correlated errors. The two methods require different assumptions about the dynamic response to changes in regime.
Our discussion is very brief. Frühwirth-Schnatter (2006) offers a nice overview of the differences between these two approaches, and provides further discussion and references.
Dynamic Regression
The most straightforward method of adding dynamics to the switching model is to include lagged endogenous variables. For a model with lagged endogenous regressors, and random state variable taking the value we have:
 (35.18)
where is again standard normally distributed. The coefficients on the lagged endogenous variable are allowed to be regime-varying, but this generality is not required.
In the Markov switching context, this model has been termed the “Markov switching dynamic regression” (MSDR) model (Frühwirth-Schnatter, 2006). In the special case where the lagged endogenous coefficients are regime-invariant, the model may be viewed as a variant of the “Markov switching intercept” (MSI) specification (Krolzig, 1997).
Of central importance is the fact that the mean specification depends only on the contemporaneous state variable so that lagged endogenous regressors may be treated as additional regime specific or invariant for purposes of likelihood evaluation, filtering, and smoothing. Thus, the discussions in “Simple Switching” and “Markov Switching” are directly applicable in MSDR settings.
Serial Correlation
An alternative dynamic approach assumes that the errors are serially correlated (Hamilton, 1989). With serial correlation of order , we have the AR specification
 (35.19)
Rearranging terms and applying the lag operator, we have:
 (35.20)
In the Markov switching literature, this specification has been termed the “Markov switching autoregressive” (MSAR) (Frühwirth-Schnatter, 2006) or the “Markov switching mean” (MSM) model (Krolzig, 1997). The MSAR model is perhaps most commonly referred to as the “Hamilton model” of switching with dynamics.
Note that, in contrast to the MSDR specification, the mean equation in the MSAR model depends on lag states. The presence of the regime-specific lagged mean adjustments on the right-hand side of Equation (35.20) implies that probabilities for a dimensional state vector representing the current and previous regimes are required to obtain a representation of the likelihood.
For example, in a two regime model with an AR(1), we have the standard prediction error representation of the likelihood:
 (35.21)
which requires that we consider probabilities for the four potential regime outcomes for the state vector .
More generally, since there is a dimensional state vector and regimes, the number of potential realizations is . The description of the basic Markov switching model above (“Markov Switching”) is no longer valid since it does not handle the filtering and smoothing for the full vector of probabilities.
Markov Switching AR
Hamilton (1989) derived the form of the MSAR specification and outlined an operational filtering procedure for evaluating the likelihood function. Hamilton (1989), Kim (1994), and Kim and Nelson (1999, Chapter 4) all offer excellent descriptions of the construction of this lagged-state filtering procedure.
Briefly, the Hamilton filter extends the analysis in “Markov Switching” to handle the larger dimensional state vector. While the mechanics of the procedure are a bit more involved, the concepts follow directly from the simple filter described above (“Likelihood Evaluation and Filtering”). The filtered probabilities for lagged values of the states, conditional on the information set are obtained from the previous iteration of the filter, and the one-step ahead joint probabilities for the state vector are obtained by applying the Markov updates to the filtered probabilities. These joint probabilities are used to evaluate a likelihood contribution and in obtaining updated filtered probabilities.
Hamilton also offers a modified lag-state smoothing algorithm that may be used with the MSAR model, but the approach is computationally unwieldy. Kim (1994) improves significantly on the Hamilton smoother with an efficient smoothing filter that handles the probabilities using a single backward recursion pass through the data. This approach is a straightforward extension of the basic Kim smoother (see “Smoothing”).
Simple Switching AR
The simple switching results outlined earlier (“Simple Switching”) do not hold for the simple switching with autocorrelation (SSAR) model. As with the MSAR specification, the presence of lagged states in the specification complicates the dynamics and requires handling a dimensional state variable representing current and lag states.
Conveniently, we may obtain results for the specification by treating it as a restricted Markov switching model with transition probabilities that do not depend on the origin regime:
 (35.22)
so that the rows of the transition matrix are the identical
 (35.23)
We may then apply the Hamilton filter and Kim smoother to this restricted specification to obtain the one-step ahead, likelihood, filtered, and smoothed values.
Initial Probabilities
In the serial correlation setting, the Markov switching filter requires initialization of the vector of probabilities associated with the dimensional state vector. We may proceed as in the uncorrelated model by setting initial probabilities in period using one of the methods described in “Initial Probabilities” and recursively applying Markov transition updates to obtain the joint initial probabilities for the dimensional initial probability vector in period .
Again note that the initialization to steady state values using the period information is somewhat arbitrary in the case of time-varying transition probabilities.