An Illustration

For our example of robust regression estimation, we employ the “salinity” data set taken from Rousseeuw and Leroy (1987, page 82), which has been used in many studies of robust regression and outlier effects. See, for example, Rousseeuw and van Zomeren (1992) and Fung (1993). The data consist of 28 observations on water salinity (salt concentration) and river discharge measurements taken from Pamlico Sound in North Carolina.

We are interested in modeling the relationship between the amount of discharge and the level of salinity. The regression model of interest is:

(31.20) |

where is the salinity level, is discharge and represents a time trend.

The data are provided in the workfile “Rousseeuw and Leroy.wf1” located in the EViews application data directory. The series SALINITY and DISCHARGE contain the salt and discharge measurements, TREND contains the number of biweekly periods since the start of spring, and LAGSEL is contains the lagged value of SALINITY.

We begin with ordinary least squares estimation of this specification. The equation EQ01 in the workfile contains these least squares estimates:

Rousseeuw and Leroy identify observation 16 as being an outlier. We can confirm this finding by looking at the influence statistics and leverages for this equation. From the EQ01 menu, display the influence statistics dialog by selecting View/Stability Diagnostics/Influence Statistics...

Check the box labeled Hat Matrix to tell EViews that you want to view the diagonals of the matrix along with the default results, then click on OK to display the graphs:

The spikes in the graphs for all four influence measures point to observation 16 as being an outlier. This finding is confirmed by the leverage plot view of EQ01. Select View/Stability Diagnostics/Leverage Plots... and click on OK to accept the default settings:

The graphs support the view that observation 16 has high leverage, especially in the relationship between SALINITY and DISCHARGE. Using the mouse pointer to hover over the outlier confirms the identity of the outlier observation. (For additional discussion of these diagnostics, see “Leverage Plots” and “Influence Statistics”.)

M-estimation example

Given the presence of outliers, we re-estimate the regression using robust M-estimation. Create a new equation object by clicking on Quick/Estimate Equation…, or by selecting Object/New Object…/Equation and then select ROBUSTLS from the Method dropdown menu. Enter the dependent variable followed by the list of regressor variables in the Equation specification edit field:

salinity c lagsel trend discharge

and click on OK to instruct EViews to estimate the specification using the default estimator and settings. (For convenience, we have included the equation object EQ02 estimated using these settings in your workfile.)

A description of the settings used in the M-estimation is presented at the top of the output. Here we see that the Bisquare function with a default tuning parameter value of 4.685 was used, that the scale was estimated using the median centered, median absolute deviation method, and that the z-statistics in the output are based on Huber Type I covariance estimates.

Turning to the coefficient estimates, we see the effect on the coefficient estimates of moving from least squares to robust M-estimation. The M-estimator produces a much larger negative impact of DISCHARGE on SALINITY than does ordinary least squares (-0.623 versus -0.295) with the M-estimator coefficient estimated with similar precision (0.091 versus 0.107). The sensitivity of the DISCHARGE coefficient estimates to robust estimation is in accord with the earlier EQ01 diagnostic suggesting that observation 16 had high leverage for the relationship between SALINITY and DISCHARGE.

The bottom portion of the output displays the and goodness-of-fit and adjusted measures, along which indicate that the model accounts for roughly 60-90% of the variation in the constant-only model. The statistic of 202.906 and corresponding p-value of 0.00 indicate strong rejection of the null hypothesis that all non-intercept coefficients are equal to zero. Lastly, the output shows the value of the deviance, information criteria, and the estimated scale. These measures may be of use when comparing models. See “M-estimator summary statistics” for formulae and discussion.

MM-estimation example

Next, we estimate the equation using MM-estimation. In the Specification tab:

• Specify your equation estimation method as ROBUSTLS - Robust Least Squares,

• Change the Robust Estimation Type dropdown to MM-estimation

• Fill out the Specification edit field with “salinity c lagsel trend discharge” as before.

Next, we will provide values for the tuning and breakdown values, and will specify options to control the S-estimation refinement.

Click on the Options tab to display the additional estimation settings:

• For the objective specification, we will provide tuning and breakdown values. Select the User-specified constants radio button and enter the values as depicted. The S-tuning value of 2.937 is chosen to provide a breakdown of 0.25; the M-tuning value of 3.44 is chosen to produce relative efficiency of 0.85.

• Select Huber Type II standard errors.

• Under S options, enter values for the Number of trials, and Max refinements, as depicted. The Initial sample size will be pre-filled with the number of regressor variables specified on the first tab of the dialog—we will leave this at the default setting. Enter “5” in the Number of comparisons edit field so that EViews will refine the best 5 of the 200 trials.

Additional detail on these settings are provided in “S-estimation” and “S-estimation options”.

Click on OK to accept the specification and options and to estimate the equation. EViews will display the results of the MM-estimation.

Notice that the top of the output displays various settings for both the S and the M-portions of the MM-estimation. In addition to showing the S-tuning value of 2.937 and associated breakdown value of 0.25, EViews reports that the S-estimation consists of 200 trials with initial coefficients obtained from a random initial sample size of 4 and 2 initial refinement steps. The final comparison involves fully refining 5 sets of the scale estimates and choosing the smallest scale estimate.

Once the scale estimate is obtained, EViews performs fixed scale M-estimation using the reported 3.44 tuning parameter.

EViews also reports information on the random number generator used to obtain the random subsamples, and the method used to obtain coefficient estimate covariances.

Turning to the results, we note that despite the difference in robust estimation method, relative efficiency settings, and method of computing standard errors, the results from the M-estimation and the MM-estimation are generally quite similar. Most importantly, both estimates show statistically significant DISCHARGE coefficients of around -0.62 with roughly comparable coefficient standard errors (0.089 versus 0.13). The results for other coefficients are even closer.

The MM-estimate of the scale is considerably larger than that obtained from M-estimation (1.16 versus 0.73), but the overall goodness-of-fit measures and statistic and test results are quite similar.