FAQ: 2. Working with Data


2.1 Series

2.1.1. Where is Hodrick-Prescott filtering described in the EViews User's Guide?

Hodrick-Prescott filtering was added to Version 2.0 of EViews after the manuals went to press. As with other late additions to the program, we have documented the procedure in our on-line help system. To access the documentation, simply select Help...Read Me from the main EViews menu, then select Hodrick-Prescott


2.1.2. How do I standardize a series?

EViews provides built-in functions for computing the mean and variance of a series. You can use these tools in two ways.

First, simply generate a new series containing the standardized value. From the menus, select Quick...Generate Series, and enter the expression to standardize the series. Alternatively, from the command line, simply type GENR [new variable] and the new expression. For example, if Y is the series to be standardized, the command line form:

GENR YSTD=(Y-@MEAN(Y))/SQR(@VAR(Y))

will create a new series containing the standardized value.

An alternative approach is to use EViews expressions to create an auto-series. For example, if you wanted to use the standardized series as a regressor in a least-squares problem, you could generate the new standardized series and then include it as a regressor. Alternatively, you could simply enter the expression for the new series directly:

LS Z C (Y-@MEAN(Y))/SQR(@VAR(Y))


2.1.3. How can I easily create dummy variables?

EViews does not have specialized functions for dummy variables since the GENR command will create dummy variables quite easily. To understand how to create a dummy variable, note that if there is a Boolean expression on the right hand side of an equation, EViews interprets this as a 0,1 variable. Thus, to GENR a dummy variable, you only need to put your condition on the right-hand side of the equation. For example:

GENR HIGHINC=(INCOME>7000)
GENR MIDINC=(INCOME>3000)*(INCOME<=7000)

will create two dummy variables: HIGHINC will take the value of 1 if INCOME is greater than 7000 and 0 otherwise; MIDINC will take the value of 1 if INCOME is between 3000 and 7000 (inclusive), and 0 otherwise.

An alternative method uses SMPL statements to create a dummy variable. First generate a series containing only 0 values for the entire sample. Next, use the SMPL statement to restrict the sample to the observations for which you want the variable to take a value of 1. Using a GENR statement to set the series to 1 will only apply to this sample. For example:

SMPL @ALL
GENR HIGHINC=0
SMPL IF INCOME>7000
GENR HIGHINC=1
SMPL @ALL

SMPL @ALL
GENR EARLY=0
SMPL 1990:1 1991:4
GENR EARLY=1
SMPL @ALL


2.1.4. What's the best way to recode data in EViews?

Since there are many ways that you might wish to recode data, there is no single answer to this question, but one useful technique uses the Boolean variables described above in 2.1.3.

In this example, suppose that you wish to censor outliers so that incomes above 7000 and below 3000 are set to those limit values. Then you can recode the data in a single command (all on one line):

GENR INCLIMIT = (INCOME>7000)*7000 + (INCOME<=3000)*3000 + (INCOME>3000)*(INCOME<=7000)*INCOME

The key to this approach is that the Boolean conditions act as IF statements, and the multiplicative value enters in only if the Boolean condition is TRUE. Note also that this calculation could have been performed using SMPL IF statements; many computations that would involve multiple SMPL IF statements can use this approach:

SMPL @ALL
GENR INCLIMIT=INCOME
SMPL @ALL IF INCOME>7000
GENR INCLIMIT=7000
SMPL @ALL IF INCOME<=3000
GENR INCLIMIT=3000
SMPL @ALL


2.1.5. How do I generate the cumulative sum of a series?

If the series to be accumulated contains no NAs you can perform this calculation in a couple of simple steps. The idea is to initialize the series containing the cumulative sum to the original series, and then to accumulate values for the second through the last observation. For example, suppose that we have an undated workfile with 100 observations, and that we want to place the cumulative sum of the series X in the series SUM.

SMPL 1 1
GENR SUM=X
SMPL 2 100
SUM=SUM(-1)+X

Note that the first SMPL statement which resets the sample is not necessary if your sample is already set to the entire series.

If the series to be accumulated contains NAs, then a slightly more complicated form is necessary:

SMPL @ALL
GENR SUM=X
SMPL IF SUM(-1)<>NA AND X<>NA
SUM=SUM(-1)+X

This example has the property that SUM 'resets' whenever it encounters an NA in X and then continues to accumulate. For example, the commands above would generate the following values for SUM given the values of X:

Observation X SUM
1 NA NA
2 4 4
3 5 9
4 3 12
5 NA NA
6 NA NA
7 2 2
8 1 3
9 4 7

A second approach to the handling of NAs in accumulating a series relies on a peculiarity of the implementation of NAs in the current version of EViews. For example:

SMPL @ALL
GENR SUM=(X<>NA)*(SUM(-1)<>NA)*(SUM(-1)+X) + (X=NA)*NA

This approach relies on the fact that, at present, zero multiplied by an NA is defined in EViews to equal zero. For future versions of EViews we will be moving to an implementation of NAs which follows the rules of IEEE NaNs which will mean that this particular code will no longer work. However, we will be providing a similar mechanism to facilitate coding in this style.


2.2 Pools

2.2.1. What is the best way to read data into EViews so that it can be used by a POOL object?

The current version of EViews does not have built-in procedures for importing and exporting data from our POOL objects. This makes data entry more cumbersome than we would like. We will be adding features to facilitate data I/O in the next version, but for now, the following discussion may prove useful.

The most common form for saved data is in a stacked spreadsheet or ASCII file. For example, we recently received the following e-mail message:

> I downloaded some pooled data from SAS and put it into an excel spreadsheet. It looks like this:
>
> country date x1 .... x10
> 111 1960
> .
> .
> 111 1994 x1 .....x10
> 124 1960 x1 .....x10
> .
> 124 1994 x1 .....x10
> .
> .
> 967 1994 x1 .....x10
>
> What's the best way to read these data into EViews?

There are two basic ways to read stacked data. The first is to use cut-and-paste in a Windows application such as an Excel or Lotus spreadsheet. In the following I assume a balanced panel where the data are stacked by country. If the data are not balanced, you can artificially balance the panel by adding blank lines to the spreadsheet. These extra observations will be read as missing values and handled accordingly.

Next, create an EViews workfile of the proper frequency and size. In the example above, we create an annual workfile dated 1960 to 1994. Create a POOL object with the cross-section identifiers listed in the same order as the stacked data. For the example above, the cross-section identifiers will be listed as 111, 124, ... , 967.

Once your POOL is defined, select View ... Spreadsheet (stacked data) and then list the series you want with ? identifiers as desired. For the data above, you would enter something like

ID YEAR X1_? X2_? X3_? X4_? X5_? X6_? X7_? X8_? X9_? X10_?

EViews will create a stacked representation of your data that matches the Excel spreadsheet described above (my convention here is to name the series X1_111, X1_124, ..., X10_987). At this point, all of the data are displayed as NA since you have not entered any data. If necessary, click on the Edit +/- to enable editing cell values.

Now go to your Excel spreadsheet, highlight the data, and select Edit ... Copy. Switch back to EViews, point to the upper left-hand cell of the stacked representation, and select Edit ... Paste to paste the data in this cell. Click on Edit +/- to turn off editing.

For writing data to a spreadsheet, reverse the procedure.

A second method allows for reading data from the command line. In this method, we will use two workfiles: an undated workfile containing the stacked data, and an annual workfile, containing the final POOL form of the data.

First read the stacked data into an undated EViews workfile. The data in the example above would be kept in the variables ID, YEAR, X1, X2, .... , X10.

Next, create the annual workfile for the dates 1960 to 1994.

Then, from the undated workfile, use the export/ASCII command in EViews to write the series X1 to a temporary file on disk.

From the annual workfile, use the import/ASCII features of EViews to read the data from the temporary file. Note that if there are K cross-section units, this workfile is 1/K as long as the stacked data so that you will need to provide K variables to correspond to the stacked series X1. I recommend appending the cross-section identifier to the original variable name (X1_111, X1_124, ... , X1_967).

Repeat the export/import for each of the remaining variables.

Here's an example of a program which performs this procedure:

=====

' read in the stacked data
new workfile stacked u 350
read(o) stacked.txt id year x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
new workfile pool a 1960 1994

' loop over the variable names
for %y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

' switch to the first workfile and export the first
' series
workfile stacked
write(s) export.txt %y

' switch to the second workfile and import the first series ' appending the cross-section identifiers
workfile poolx
read(s) export.txt %y_111 %y_124 [...other names...] %y_967

next

save poolx

If you have further questions or have data that do not fit the above example, please feel free to contact us; we may have suggestions for your particular situation.


2.2.2. Can POOLs be used with unbalanced data?

Yes. EViews handles unbalanced (different number of time-series observations for different cross-sectional units) pooled data with no additional work. Just set the frequency of your workfile to include all of the potential observations, and enter or read in the available data. Data that are not available for different cross-section units should be coded as NA and will automatically be ignored in all estimation and computation.

For example, suppose you have four cross-sectional units _UNIT1, _UNIT2, _UNIT3 and UNIT_4 and a series Y? which varies across the units:

Year Y_UNIT1 Y_UNIT2 Y_UNIT3 Y_UNIT4
1990 34.234 NA NA 54.256
1991 38.342 28.734 NA 55.457
1992 40.324 30.542 44.124 57.412
1993 41.864 31.774 45.218 54.253
1994 42.662 32.887 56.124 60.412
1995 44.124 34.124 65.253 64.214
1996 50.342 35.665 62.425 NA

Note that EViews stores these pooled data in the separate series Y_UNIT1, Y_UNIT2, Y_UNIT3, Y_UNIT4 with no restrictions on the observations that are available for each series. In this example, each of the series is available for a different time period.

In all procedures, EViews will automatically adjust samples to account for the NAs in the Y? (and other) series.


2.2.3. How do I use a POOL object to "de-mean" my data?

The answer depends upon which mean you wish to remove from your series. Suppose, as in 2.2.2 above, that there is an underlying series Y which differs across the cross-sectional identifiers _UNIT1, _UNIT2, _UNIT3 and _UNIT4. The workfile then will contain the series Y_UNIT1, Y_UNIT2, Y_UNIT3, and Y_UNIT4, which we collectively term the POOL series Y?.

Generally speaking, there are two means you might wish to remove from each series in Y?: (a) the mean for that series taken across time; (b) the mean for the POOL series taken across cross-sectional units.

(a) Before discussing the first case, note that EViews' POOL object automates the estimation of fixed-effects models so in principle you should not have to "de-mean" your data. If, however, you still need to subtract off cross-section means, the POOL Genr procedure may be used to quickly create new series. From a POOL object window, click on Genr, then enter

YM?=Y?-@MEAN(Y?)

This command creates new variables YM_UNIT1, YM_UNIT2, ... which contain the cross-section specific Y series with the mean removed. If the POOL object is named (say, POOL1) you can compute this from the command line or in a program by executing the command

POOL1.GENR YM?=Y?-@MEAN(Y?)

You could, of course, do the computation in place:

POOL1.GENR Y?=Y?-@MEAN(Y?)

(b) In the example above, the POOL Genr command operates by performing the looping over each of the cross-section identifiers and performing the Genr after substituting in the cross-section identifier. We will use this feature in our second computation. Suppose you wish to subtract off the means for a given period (or observation) taken across all of the cross-section units. You will first have to compute a series representing the cross-section mean for a given observation. This will be accomplished in two steps. First create and initialize a series to contain the means using the standard (non-POOL) Genr command. Click on the Quick...Generate Series button and enter

CXMEANS=0

Next we will use the POOL Genr to loop over the cross-section units and create a sum and a count of the valid observations. From the POOL object window, click on Genr and enter

CXMEANS=CXMEANS+Y?

The implicit loop of the POOL Genr will sum up across the cross-section units for each observation. Then compute the "de-meaned" series by using the POOL Genr to compute

POOL1.GENR YM?=Y?-CXMEANS/4

where the 4 in this case represents the number of cross-section units.

In the case where there are missing values, this computation is a bit more complicated since adding in the NA associated with a given observation will make the entire sum an NA. Thus we need to test for NAs before doing the accumulation. First initialize CXMEANS to 0 as above. Also initialize a series CXCOUNT to 0. This series will contain the number of cross-section units with valid data for each observation. Then use the POOL Genr to compute

CXMEANS=CXMEANS+(Y?<>NA)*Y?
CXCOUNT=CXCOUNT+(Y?<>NA)

The idea here is that we use a Boolean indicator (Y?<>NA) to test for whether the observation for Y for a given cross-sectional unit is equal to an NA. Only if it is not do we accumulate the sum and the count of the number of valid observations that is used in demeaning the data.

This approach relies on the fact that, at present, zero multiplied by an NA is defined in EViews to equal zero. For future versions of EViews we will be moving to an implementation of NAs which follows the rules of IEEE NaNs which will mean that this particular code will no longer work. However, we will be providing a similar mechanism to facilitate coding in this style.

Then setting the sample to consider only those observations with valid data, we can use the POOL Genr to compute the new POOL series YM?. From the command line:

SMPL IF CXCOUNT>0
POOL1.GENR YM?=Y?-CXMEANS/CXCOUNT


2.3 Groups

2.3.1. How can I put the results of the EViews covariance/correlation matrix calculation in a MATRIX object?

EViews displays a view depicting the covariance/correlation matrix calculation for a group of series. Unfortunately, we do not presently provide a simple function which will extract that information and place it in a MATRIX object. This oversight makes it difficult for users to perform additional calculations on the covariance/correlation matrix. Because of the desirability of such a feature, the omission will be remedied in the next release of EViews.

For now, however, the usual Windows cut-and-paste techniques may be used to fill the matrix. First create and display a matrix of the desired size. Once you've created the matrix, if necessary, click on the Edit +/- button in the matrix window to turn on editing [you'll be able to tell that Edit mode is on by the additional edit window that opens just below the menubar]. Next, display the covariance [correlation] view of the group. Highlight the data you wish to extract, and select Edit...Copy from the menu. You will want to copy in "Unformatted" mode to retain full numeric precision. Click on the upper left-hand cell of the MATRIX and select Edit...Paste from the main menu. [If nothing happens, make certain that you both copied the data from the GROUP view, and that the Edit mode is turned on in the MATRIX object.]

For those who wish to do this computation in a program, you will have to use our built-in series functions for computing covariances and correlations (@COV, @COR) and compute the desired elements of the matrix in pairwise fashion. Thus, for a 2x2 case:

SYM(2) MYCOV
MYCOV(1,1)=@VAR(X)
MYCOV(2,2)=@VAR(Y)
MYCOV(1,2)=@COV(X,Y)

Clearly, for a large group, this is cumbersome. Note that we have used a SYM instead of an arbitrary square matrix for simplicity and efficiency.


2.3.2. How do I compute the eigenvalues of the covariance/correlation matrix of a group of series?

You should place the covariance/correlation matrix of the GROUP in a MATRIX object, and then use the @CHOLESKY function. See the answer to 2.3.1., above, for details on the first step.


2.4 Matrices/Vectors/Scalars

2.4.1. How do I display or print a scalar?

There may be occasions when you wish to place a single numeric value in a reusable named object: SCALAR objects serve this purpose. But SCALARs are unlike most other EViews objects in that they do not have their own display windows and may not directly be printed.

To display the value of a SCALAR object, simply type SHOW [scalar name] at the EViews command line, or click on Quick...Show and enter the scalar name. The value of the SCALAR will be displayed in the EViews Status Line at the bottom left-hand corner of the EViews window.

To print the value of a SCALAR object, you will need to place the result in a VECTOR object which can be displayed and printed. For example, to create a VECTOR named X containing the value of the SCALAR Y, simply type:

VECTOR X=Y

at the command line. The VECTOR object X may be displayed and printed in the usual fashion. Alternatively, you could place the SCALAR in a TABLE object which can also be displayed and printed.


2.4.2. How can I put the results of the EViews covariance/correlation matrix calculation in a MATRIX object?

See the answer to 2.3.1. above.


2.5 Miscellaneous

2.5.1. EViews doesn't appear to have a built-in constant for Pi. How can I use Pi in my calculations?

While EViews does not supply a built-in constant or SCALAR object for Pi, there are several internal functions which may be used to compute Pi. One easy way to get Pi is to use the @ASIN function.

If you need to use Pi in your calculations, we recommend that you generate a SCALAR containing the value of Pi, which you can then use in all of your expressions. For example, if you type

SCALAR PI=2*@ASIN(1)

at the EViews command line, EViews will create a SCALAR object PI, containing the value 3.14159...You may then use PI in any expression where you would otherwise enter the numeric value. Suppose, for example, that Z and MU are series or scalars. Then

GENR Y=1/(SQR(2*PI))*(Z-MU)^2

will generate a series Y.

The SCALAR PI will be saved (like any other EViews object) when you save your workfile. PI may also be STOREd to disk and FETCHed into any workfile.


2.5.2. I have data contained in an ASCII data file which EViews doesn't seem to read correctly. What could be the problem?

The most important thing to realize about the current version of EViews is that line formatting is ignored when reading data in from an ASCII file. EViews simply reads through the file from beginning to end, filling in observations for each of the variables one by one. Consequently, if EViews for some reason does not find the number of data items expected on a single line, it uses data from the next line to fill in the remaining variables. The result is that if your input line does not match your data, variables in the EViews workfile may appear 'scrambled' with data values from the ASCII file being scattered across the series in the EViews workfile. Typically, there will also be missing values at the bottom of the imported series resulting from EViews padding out the series after the data available in the ASCII file was exhausted.

To get around problems of this sort, simply make certain that the list of variables that EViews attempts to read exactly matches the data fields that EViews actually sees when reading in each line. There are a number of considerations to bear in mind when trying to do match your data.

The first thing to note is that the current release version of EViews does not support reading of files with fixed width fields. This has a number of consequences. First, fixed width fields consisting entirely of spaces will be ignored rather than coded as missing values. Second, adjacent columns of numbers which are packed together with no spacing will be read as a single number. Both these problems are quite difficult to get around and may require altering the existing format of your data.

When reading delimited files, EViews treats all characters which are not numerical as delimiting characters, and all adjacent delimiting characters are treated as a single delimiter. Furthermore, any delimiters at the beginning of a line are ignored. There are two situations where this behavior can cause problems. First, if the file uses adjacent delimiters to indicate missing values (such as the 'CSV' format produced by some programs) EViews will not be able to read in the file. Second, mixed alphanumeric identifiers may cause trouble for EViews, particularly if the number of "switches" between alphabetical and numerical characters varies from identifier to identifier (for example, a number plate field which may contain custom number plate registrations).

To see how these rules operate in practice, consider the data given in:

B428394d7 3/21/68 Boston 24123 5.4

EViews sees this line as containing seven entries: 42894, 7, 3, 21, 68, 24123 and 5.4. For this to be imported correctly seven variable names should be specified either in the import dialog or in at the top of the data file.

Note, however, that if the next line in the file contains:

B428265dc 9/12/76 Dallas 24169 2.7

then EViews will read this as containing only six entries: 428265, 9, 12, 76, 24169 and 2.7. Since EViews reads the first entry on each line as a different number of fields, you will not be able to import this file into EViews without first editing the file to remove the first entry on each line.

Sometimes you will be able to import your data in a slightly different way and then perform transformations to return to the original data. For example, consider the line:

USA-1994 $1,000,347 $11,591,465

EViews will read this line as containing seven entries: 1994, 1, 0, 347, 11, 591 and 465. You could read this in to EViews by specifying seven series names in the import dialog such as "year x1 x2 x3 y1 y2 y3". You could then reconstruct the original data with the commands:

GENR X = 1000000*X1 + 1000*X2 + X3
GENR Y = 1000000*Y1 + 1000*Y2 + Y3

However, note that if the next line of the file is:

AUS-1994 $300,657 $2,591,465

then EViews will read this as containing only six entries: 1994, 300, 657, 2, 591 and 465. Consequently, it would not be possible to import this data file into EViews without first modifying the file.

We hope to remove some of these limitations in the next version of EViews.


Top of page...


[ Home   |  EViews 6  |  Product Info  |  Pricing   |  Contact Us  |   Register EViews   |  Downloads  |  Search  ]


For sales info, send email to sales@eviews.com. For technical support, send email to support@eviews.com.  Please include your serial number with all support questions. Send email to webmaster@eviews.com with questions or comments about this web site.

Last modified: November 13, 2008 06:03 PM

Copyright © 1997-2008 Quantitative Micro Software. All rights reserved.