EViews Help: import

Object Reference : Object View and Procedure Reference : Matrix : import : Excel Files

import

Imports data from a foreign file into the matrix object.

Syntax

matrix_name.import([type=]) source_description import_specification

• source_description should contain a description of the file from which the data is to be imported. The specification of the description is usually just the path and file name of the file, however you can also specify more precise information. See wfopenfor more details on the specification of source_description.

• The optional “type=” option may be used to specify a source type. For the most part, you should not need to specify a “type=” option as EViews will automatically determine the type from the filename. The following table summaries the various source formats with the corresponding “type=” keywords:

	Option Keywords
Excel (through 2003)	“excel”
Excel 2007 (xml)	“excelxml”
HTML	“html”
Text / ASCII	“text”

• import_specification can be used to provide additional information about the file to be read. The details of import_specification will depend upon the type of file being imported.

Excel Files

The syntax for reading Excel files is:

matrix_name.import(type=excel[xml]) source_description [table_description] [variables_description]

The following table_description elements may be used when reading Excel data:

• “range = arg”, where arg is a range of cells to read from the Excel workbook, following the standard Excel format [worksheet!][topleft_cell[:bottomright_cell]].

If the worksheet name contains spaces, it should be placed in single quotes. If the worksheet name is omitted, the cell range is assumed to refer to the currently active sheet. If only a top left cell is provided, a bottom right cell will be chosen automatically to cover the range of non-empty cells adjacent to the specified top left cell. If only a sheet name is provided, the first set of non-empty cells in the top left corner of the chosen worksheet will be selected automatically. As an alternative to specifying an explicit range, a name which has been defined inside the excel workbook to refer to a range or cell may be used to specify the cells to read.

• “byrow”, transpose the incoming data. This option allows you to read files where the series are contained in rows (one row per series) rather than columns.

The optional variables_description may be formed using the elements:

• “colhead=int”, number of table rows to be treated as column headers.

• “na="arg1"”, text used to represent observations that are missing from the file. The text should be enclosed on double quotes.

• “scan=[int| all]”, number of rows of the table to scan during automatic format detection (“scan=all” scans the entire file).

• “firstobs=int”, first observation to be imported from the data (default is 1). This option may be used to start reading rows from partway through the table.

• “lastobs = int”, last observation to be read from the data (default is last observation of the file). This option may be used to read only part of the file, which may be useful for testing.

Excel Examples

matrix_name.import "c:\data files\data.xls"

loads the active sheet of DATA.XLSX into the MATRIX_NAME matrix object.

matrix_name.import "c:\data files\data.xls" range="GDP data"

reads the data contained in the “GDP data” sheet of “Data.XLS” into the MATRIX_NAME object.

HTML Files

The syntax for reading HTML pages is:

matrix_name.import(type=html) source_description [table_description] [variables_description]

The following table_description elements may be used when reading an HTML file or page:

• “table = arg”, where arg specifies which HTML table to read in an HTML file/page containing multiple tables.

When specifying arg, you should remember that tables are named automatically following the pattern “Table01”, “Table02”, “Table03”, etc. If no table name is specified, the largest table found in the file will be chosen by default. Note that the table numbering may include trivial tables that are part of the HTML content of the file, but would not normally be considered as data tables by a person viewing the page.

• “skip = int”, where int is the number of rows to discard from the top of the HTML table.

• “byrow”, transpose the incoming data. This option allows you to import files where the series are contained in rows (one row per series) rather than columns.

The optional variables_description may be formed using the elements:

• “colhead=int”, number of table rows to be treated as column headers.

• “na="arg1"”, text used to represent observations that are missing from the file. The text should be enclosed on double quotes.

• “scan=[int|all]”, number of rows of the table to scan during automatic format detection (“scan=all” scans the entire file).

• “firstobs=int”, first observation to be imported from the table of data (default is 1). This option may be used to start reading rows from partway through the table.

• “lastobs = int”, last observation to be read from the table of data (default is last observation of the file). This option may be used to read only part of the file, which may be useful for testing.

HTML Examples

mat1.import "c:\data.html"

loads into the MAT1 matrix object the data located in the HTML file “Data.HTML” located on the C:\ drive

mat1.import(type=html) "http://www.tradingroom.com.au/apps/mkt/forex.ac" colhead=3

loads into a matrix object MAT1 the data with the given URL located on the website site “http://www.tradingroom.com.au”. The column header is set to three rows.

Text and Binary Files

The syntax for reading text or binary files is:

matrix_name.import(type=arg) source_description [table_description] [variables_description]

If a table_description is not provided, EViews will attempt to read the file as a free-format text file. The following table_description elements may be used when reading a text or binary file:

• “ftype = [ascii|binary]” specifies whether numbers and dates in the file are stored in a human readable text (ASCII), or machine readable (Binary) form.

• “rectype = [crlf|fixed|streamed]” describes the record structure of the file:

“crlf”, each row in the output table is formed using a fixed number of lines from the file (where lines are separated by carriage return/line feed sequences). This is the default setting.

“fixed”, each row in the output table is formed using a fixed number of characters from the file (specified in “reclen= arg”). This setting is typically used for files that contain no line breaks.

“streamed”, each row in the output table is formed by reading a fixed number of fields, skipping across lines if necessary. This option is typically used for files that contain line breaks, but where the line breaks are not relevant to how rows from the data should be formed.

• “reclines =int”, number of lines to use in forming each row when “rectype=crlf” (default is 1).

• “reclen=int”, number of bytes to use in forming each row when “rectype=fixed”.

• “recfields=int”, number of fields to use in forming each row when “rectype=streamed”.

• “skip=int”, number of lines (if rectype is “crlf”) or bytes (if rectype is not “crlf”) to discard from the top of the file.

• “comment=string“, where string is a double-quoted string, specifies one or more characters to treat as a comment indicator. When a comment indicator is found, everything on the line to the right of where the comment indicator starts is ignored.

• “emptylines=[keep|drop]”, specifies whether empty lines should be ignored (“drop”), or treated as valid lines (“keep”) containing missing values. The default is to ignore empty lines.

• “tabwidth=int”, specifies the number of characters between tab stops when tabs are being replaced by spaces (default=8). Note that tabs are automatically replaced by spaces whenever they are not being treated as a field delimiter.

• “fieldtype=[delim|fixed|streamed|undivided]”, specifies the structure of fields within a record:

“Delim”, fields are separated by one or more delimiter characters

“Fixed”, each field is a fixed number of characters

“Streamed”, fields are read from left to right, with each field starting immediately after the previous field ends.

“Undivided”, read entire record as a single series.

• “quotes=[single|double|both|none]”, specifies the character used for quoting fields, where “single” is the apostrophe, “double” is the double quote character, and “both” means that either single or double quotes are allowed (default is “both”). Characters contained within quotes are never treated as delimiters.

• “singlequote“, same as “quotes = single”.

• “custom="arg1"”, specifies custom delimiter characters in the double quoted string. Use the character “t” for tab, “s” for space and “a” for any character.

• “mult=[on|off]”, to treat multiple delimiters as one. Default value is “on” if “delim” is “space”, “dblspace”, “white”, or “dblwhite”, and “off” otherwise.

• “endian = [big|little]”, selects the endianness of numeric fields contained in binary files.

• “string = [nullterm|nullpad|spacepad]”, specifies how strings are stored in binary files. If “nullterm”, strings shorter than the field width are terminated with a single zero character. If “nullpad”, strings shorter than the field width are followed by extra zero characters up to the field width. If “spacepad”, strings shorter than the field width are followed by extra space characters up to the field width.

• “byrow”, transpose the incoming data. This option allows you to import files where the series are contained in rows (one row per series) rather than columns.

• “lastcol”, include implied last column. For lines that end with a delimiter, this option adds an additional column. When importing a CSV file, lines which have the delimiter as the last character (for example: “name, description, date,”), EViews normally determines the line to have 3 columns. With the above option, EViews will determine the line to have 4 columns. Note this is not the same as a line containing “name, description, date”. In this case, EViews will always determine the line to have 3 columns regardless if the option is set.

A central component of the table_description element is the format statement. You may specify the data format using the following table descriptors:

• Fortran Format:

fformat=([n1]Type[Width][.Precision], [n2]Type[Width][.Precision], ...)

where Type specifies the underlying data type, and may be one of the following,

I - integer

F - fixed precision

E - scientific

A - alphanumeric

X - skip

and n1, n2, ... are the number of times to read using the descriptor (default=1). More complicated Fortran compatible variations on this format are possible.

• Column Range Format:

rformat="[n1]Type[Width][.Precision], [n2]Type[Width][.Precision], ...)"

where optional type is “$” for string or “#” for number, and n1, n2, n3, n4, etc. are the range of columns containing the data.

• C printf/scanf Format:

cformat="fmt"

where fmt follows standard C language (printf/scanf) format rules.

The optional variables_description may be formed using the elements:

• “colhead=int”, number of table rows to be treated as column headers.

• “na="arg1"”, text used to represent observations that are missing from the file. The text should be enclosed on double quotes.

• “scan=[int|all]”, number of rows of the table to scan during automatic format detection (“scan=all” scans the entire file).

• “firstobs=int”, first observation to be imported from the table of data (default is 1). This option may be used to start reading rows from partway through the table.

Text and Binary File Examples (.txt, .csv, etc.)

mat2.import c:\data.csv skip=5

reads “Data.CSV” into a MAT2, skipping the first 5 rows.

mat2.import(type=text) c:\date.txt delim=comma

loads the comma delimited data “Date.TXT” into the MAT2 matrix object.

Cross-references