Basics of SAS Programming

Basic Rules:

All SAS programs consist of a sequence of statements organized into "steps". There are only two kinds of steps:

DATA step:
PROC step:
Note:Once a dataset has been created, it can be processed by any subsequent "DATA" or "PROC" step.

A SAS program can contain any number of "DATA" and "PROC" steps.
All SAS statements start with a keyword (DATA, INPUT, PROC, etc.).
All SAS statements end with a semicolon ";".
SAS statements can be entered in free-format. That is, they can begin in any column; there may be multiple statements per line; you may split a statement over several lines (as long as no word is split.).
Uppercase and lowercase are equivalent, except inside quote marks (lang = 'c'; is not the same as lang = 'C';).
Naming Convention:
32 characters in length (Note: v6.12 users: 8 characters in length)

begin with A-Z or _(i.e. underscore)
cannot contain blanks or special symbols (e.g., &, %, $, #, etc.)

SAS Types:

Character variables (followed by $)
Numeric variables

Note:Missing data: represented by '.' for numeric variables; by ' ' (i.e. space) for character variables.

General statements:

These are statements that do not belong to a particular DATA or PROC step. They have a global effect.

footnote statement
title statement
options statement

DATA step statements:

SAS carries out all statements in the DATA step in order for each input observation.

The "DATA" statement identifies the start of a DATA step and names the data structure to be created.

DATA <dataset_name>;

The "INFILE" statement identifies an external file from which data will be read.

INFILE <file_name>;

The "DATALINES" statement is used if you choose to embed data in your program instead of reading it from an external file.

DATALINES;

The "INPUT" statement defines variables, their type, and specifies how data is to be read:

INPUT <variable_name [type] [position]>;

Example:

i) INPUT MFG @@;

ii) INPUT MFG $ TYPE $ SEEK TRANSFER;

iii) INPUT MFG $ 1-8 TYPE $ 11-12 SEEK 13-16 TRANSFER 17-19;

Variables may be given descriptive names:

6. LABEL <variable_name='label'>...;

8. Example:

9. i) LABEL MFG='Manufacturer';

10. ii) LABEL MFG='Manufacturer'

11. SEEK='Seek Time';

Assignment statements create new variables. The usual arithmetic operations are available:

Symbol Operation Example

** Exponentiation Z=X**2;

* Multiplication Z=X*Y;

/ Division Z=X/Y;

+ Addition Z=X+Y;

- Subtraction Z=X-Y;

The "DROP" and "KEEP" statements are used to remove variables and all associated values from a dataset:

DROP <variable_name>...;

removes named variables from the dataset and keeps unnamed variables.

KEEP <variable_name>...;

keeps named variables and drops unnamed variables from the dataset.

The "IF" statement is used for conditional processing:

IF <expression> THEN <statement>;

ELSE <statement>;

Note: The ELSE statement is optional. The IF ... THEN parts comprise a single statement:

i) IF SEEK < 15 THEN CLASS = 'FAST';

ELSE CLASS = 'SLOW';

ii) CLASS='SLOW';

IF SEEK < 15 THEN CLASS = 'FAST';

SAS comparison operators are shown below. You can use either the symbol or the two-letter abbreviation.

Symbol Abbrev

<, <= LT, LE

>, >= GT, GE

=, ^= EQ, NE

A special form of the "IF" statement is used for subsetting a dataset, that is selecting/excluding particular observations.

DATA CDROM;

INPUT MFG $ TYPE $ SEEK TRANSFER;

IF SEEK < 15;

The statement IF SEEK < 15; is equivalent to:

i) IF SEEK < 15 THEN OUTPUT;

ii) IF SEEK >=15 THEN DELETE;

Comments:
Two types of comments:

i) * ... ;

ii) /* ... */

DATA CDROM;

* Read in variables;

INPUT MFG $ TYPE $ TRANSFER SEEK;

/* ignore next statement

SEEKMIN = SEEK/60000;

The example below reads CDROM data and creates additional variables:

DATA CDROM;

INPUT MFG $ TYPE $ SEEK TRANSFER;

IF SEEK < 15 THEN CLASS='FAST';

ELSE CLASS='SLOW';

DROP MFG TYPE;

DATALINES;

NEC 12X 7.3 105

SONY 6X 23.1 830

SONY 4X 40.1 330

CANON 6X 13.5 530

SONY 12X 5.5 1000

The resulting dataset will contain observations 1, 4 and 5 and will look like:.

7.3 105

13.5 530

5.5 1000

PROC step statements:

SAS procedures execute predefined procedures which may be either statistical or utility procedures. The data structure processed is the most recently created dataset unless otherwise specified in a "DATA=" option.

PROC <procedure_name>;

[procedure_statement];

"procedure_statement" typically depends on the procedure but some may be used with all procedures:

VAR <variable_name>;

Indicates which variables are to be analyzed. If this statement is omitted, the default is to include all variables of the appropriate type.

BY <variable_name>;

Used primarily with the SORT procedure in which case it cannot be omitted.

You should be familiar with the following procedures:

Correlations among a set of variables:

PROC CORR [options];

[VAR <variable_name>;]

Means, standard deviations, and other univariate statistics (N; MEAN; STD; MIN; MAX; SUM; VAR):

PROC MEANS [options];

Univariate statistics. That is means, standard deviations, median etc. Also provides options to generate a p-value for a normality test and to produce the box plot, stem & leaf and normal plots:

PROC UNIVARIATE [options];

[VAR <variable_name>...;]

Print a SAS data set:

PROC PRINT [options];

[VAR <variable_name>...;]

Sort a SAS data set according to one or more variables:

PROC SORT [options];

BY <variable_name>...;

Plot y aginst x. May be used to create a scatter plot or a residual plot:

PROC PLOT [options];

PLOT <dep_var_name>*<indep_var_name>='*' [options];

Basics of SAS Programming

Monday, 16 March 2015

Basics of SAS Programming

Blog Archive