Tuesday, January 31, 2012

Basic parts of a SAS program

Courtesy of
http://www.kellogg.northwestern.edu/rc/docs/sas_programming_skills.pdf

• There are two basic building blocks in a SAS program: DATA ‘steps’ and PROC
(procedures). SAS procedures do not require the execution of a data step before them.

• In addition, OPTIONS to control appearance of output and log files.
• The DATA step is where you manipulate the data (creating variables, recoding,
subsetting, etc). The data step puts the data in a format SAS can understand.

• A SAS data file can have up to three parts:
                         (a) the headers – descriptive information of the dataset’s contents;
                         (b) the matrix of data values; and
                         (c) if created, indexes.
Indexes are saved to separate “physical” files, but SAS considers it part of the data
file. Thus, if an index exists, it should not be removed from the directory where the
data file resides. Version 8 data files have the extension “.sas7bdat”, while index files
have extension “.sas7bndx”. Most data files in WRDS have indexes.

• SAS reads and executes a data step statement by statement, observation by
observation. All the variables in the portion of memory that processes the current
each observation (input buffer or program data vector, depending on whether you are
reading “raw” data or a SAS data file) are reset to missing in each iteration of the
data step. The RETAIN statement prevents this from happening.

• Missing values: Generally, missing values are denoted by a period (“.”). Most
operators propagate missing values. For example, if you have three variables (v1, v2,
v3) and for observation 10, v2 is missing, creating total=v1+v2+v3 will result in a
missing value for observation 10 [if you would like the sum of the non-missing
values, use the SUM function: total=sum(v1,v2,v3)]. However, for comparison
operators (<, >, <= or >=), SAS treats missing values as infinitely negative numbers
(- ∞) [unlike Stata, which treats them as infinitely positive]


  • PROCs are used to run statistics on existing datasets and, in turn, can generate datasets as output. Output data sets are extremely useful and can simplify a great deal of the data manipulation.

  • DATA step and PROC boundary: Under Windows, all procedures must end with a “RUN;” statement. In UNIX, this is not necessary since SAS reads the entire program before execution. Does, it determines DATA step boundaries when it encounters a PROC and knows the boundary of a PROC when it is followed by another PROC or a DATA step. The one exception to this rule is in the context of SAS macros, where you may need to add the “RUN;” statement.

1 comment: