Tuesday, January 31, 2012

Basic parts of a SAS program

Courtesy of
http://www.kellogg.northwestern.edu/rc/docs/sas_programming_skills.pdf

• There are two basic building blocks in a SAS program: DATA ‘steps’ and PROC
(procedures). SAS procedures do not require the execution of a data step before them.

• In addition, OPTIONS to control appearance of output and log files.
• The DATA step is where you manipulate the data (creating variables, recoding,
subsetting, etc). The data step puts the data in a format SAS can understand.

• A SAS data file can have up to three parts:
                         (a) the headers – descriptive information of the dataset’s contents;
                         (b) the matrix of data values; and
                         (c) if created, indexes.
Indexes are saved to separate “physical” files, but SAS considers it part of the data
file. Thus, if an index exists, it should not be removed from the directory where the
data file resides. Version 8 data files have the extension “.sas7bdat”, while index files
have extension “.sas7bndx”. Most data files in WRDS have indexes.

• SAS reads and executes a data step statement by statement, observation by
observation. All the variables in the portion of memory that processes the current
each observation (input buffer or program data vector, depending on whether you are
reading “raw” data or a SAS data file) are reset to missing in each iteration of the
data step. The RETAIN statement prevents this from happening.

• Missing values: Generally, missing values are denoted by a period (“.”). Most
operators propagate missing values. For example, if you have three variables (v1, v2,
v3) and for observation 10, v2 is missing, creating total=v1+v2+v3 will result in a
missing value for observation 10 [if you would like the sum of the non-missing
values, use the SUM function: total=sum(v1,v2,v3)]. However, for comparison
operators (<, >, <= or >=), SAS treats missing values as infinitely negative numbers
(- ∞) [unlike Stata, which treats them as infinitely positive]


  • PROCs are used to run statistics on existing datasets and, in turn, can generate datasets as output. Output data sets are extremely useful and can simplify a great deal of the data manipulation.

  • DATA step and PROC boundary: Under Windows, all procedures must end with a “RUN;” statement. In UNIX, this is not necessary since SAS reads the entire program before execution. Does, it determines DATA step boundaries when it encounters a PROC and knows the boundary of a PROC when it is followed by another PROC or a DATA step. The one exception to this rule is in the context of SAS macros, where you may need to add the “RUN;” statement.

Monday, January 30, 2012

SAS Tips

Some very good tips in this presentation

http://www.wuss.org/proceedings09/09WUSSProceedings/papers/tut/TUT-Lafler2.pdf

SAS programming manua

http://support.sas.com/documentation/cdl/en/basess/58133/PDF/default/basess.pdf

SAS® Programming Guidelines
http://www2.sas.com/proceedings/sugi31/123-31.pdf

Proc SQL And UNION

There are five Union operators in Proc SQL. Four perform very different operations than the simple concatenation seen here – and several of those are very difficult to code in the SAS Data Step.

Union - matches by column position, not name, and drops duplicate rows

Union All - matches by column position, not name, and doesn’t drop duplicate rows

Union Corresponding - matches by column name and drops duplicate rows

Except - matches by column name and drops the rows found in both tables

Intersection - matches by column name and keeps the unique rows in both tables

The Unions can be combined also. Two tables could be concatenated with a Union Corresponding and then that result intersected with another table to find the records in the last table that were also in either the first table or the second.

Copy/Move Files using Data Step

I found this post on "The SAS Dummy" Blog by Chris Hemedinger. It is an example of how to copy and move files anywhere. Thank You Chris. I have found your blog very informative and helpful.


/* these IN and OUT filerefs can point to anything */
filename in "c:\dataIn\input.xlsx";
filename out "c:\dataOut\output.xlsx";

/* copy the file byte-for-byte */
data _null_;
length filein 8 fileid 8;
filein = fopen('in','I',1,'B');
fileid = fopen('out','O',1,'B');
rec = '20'x;
do while(fread(filein)=0);
rc = fget(filein,rec,1);
rc = fput(fileid, rec);
rc =fwrite(fileid);
end;
rc = fclose(filein);
rc = fclose(fileid);
run;

Proc SQL - Unmatched Records

1) How to keep only unmatched records from 2 tables;
data a; input varA @@; cards; 200 300 400 500 ; run;
data b; input varB @@; cards;600 700 200 300 ; run;

2) there are 2 tables one is master table with 10 million rows and second has 10000 rows

Need to update the master table records with the records from 2nd table.

Solution
Proc SQL

proc sql;
select VAR, src
from
(Select distinct a.VAR, 'a' as src from work.a a
outer union corr
Select distinct b.VAR, 'b' as src from work.b b
)
group by VAR
having count(VAR) eq 1
;
quit;