- Basic Rules:
- All SAS programs consist of a sequence of statements
organized into "steps". There are only two kinds of steps:
- DATA step:
- PROC step:
Note:Once a dataset has been created, it can be processed by any subsequent "DATA" or "PROC" step. - A SAS program can contain any number of
"DATA" and "PROC" steps.
- All SAS statements start with a keyword (DATA, INPUT,
PROC, etc.).
- All SAS statements end with a semicolon ";".
- SAS statements can be entered in free-format. That is,
they can begin in any column; there may be multiple statements per line;
you may split a statement over several lines (as long as no word is
split.).
- Uppercase and lowercase are equivalent, except inside
quote marks (lang = 'c'; is not the same as lang = 'C';).
- Naming Convention:
-
32
characters in length (Note: v6.12 users: 8 characters in length)
- begin
with A-Z or _(i.e. underscore)
- cannot
contain blanks or special symbols (e.g., &, %, $, #, etc.)
- SAS Types:
- Character
variables (followed by $)
- Numeric
variables
Note:Missing data: represented by '.' for numeric
variables; by ' ' (i.e. space) for character variables.
- General statements:
These are statements
that do not belong to a particular DATA or PROC step. They have a global
effect.
- footnote
statement
- title
statement
- options statement
- DATA step statements:
SAS carries out all
statements in the DATA step in order for each input observation.
- The "DATA" statement identifies the start of a DATA step and names the data structure to be created.
DATA <dataset_name>;
- The "INFILE" statement identifies an
external file from which data will be read.
INFILE <file_name>;
- The "DATALINES" statement is used if you
choose to embed data in your program instead of reading it from an
external file.
DATALINES;
- The "INPUT" statement defines variables, their type, and specifies how data is to be read:
INPUT <variable_name [type] [position]>;
Example:
i)
INPUT MFG @@;
ii)
INPUT MFG $ TYPE $ SEEK TRANSFER;
iii) INPUT MFG $ 1-8 TYPE $ 11-12 SEEK 13-16
TRANSFER 17-19;
- Variables may be given descriptive names:
6.
LABEL
<variable_name='label'>...;
7.
8.
Example:
9.
i) LABEL MFG='Manufacturer';
10. ii) LABEL
MFG='Manufacturer'
11. SEEK='Seek Time';
- Assignment statements create new variables. The usual
arithmetic operations are available:
Symbol Operation Example
** Exponentiation Z=X**2;
* Multiplication Z=X*Y;
/ Division Z=X/Y;
+ Addition Z=X+Y;
- Subtraction Z=X-Y;
- The "DROP" and "KEEP" statements
are used to remove variables and all associated values from a dataset:
DROP
<variable_name>...;
removes named variables from the dataset and keeps unnamed
variables.
KEEP
<variable_name>...;
keeps named variables and drops unnamed variables from the
dataset.
- The "IF" statement is used for conditional
processing:
IF <expression> THEN
<statement>;
ELSE <statement>;
Note: The ELSE statement is
optional. The IF ... THEN parts comprise a single statement:
i) IF SEEK < 15 THEN CLASS = 'FAST';
ELSE CLASS = 'SLOW';
ii) CLASS='SLOW';
IF SEEK < 15 THEN CLASS = 'FAST';
SAS comparison operators
are shown below. You can use either the symbol or the two-letter abbreviation.
Symbol Abbrev
<, <= LT, LE
>, >= GT, GE
=, ^= EQ, NE
A special form of the
"IF" statement is used for subsetting a dataset, that is
selecting/excluding particular observations.
DATA CDROM;
INPUT MFG $ TYPE $ SEEK
TRANSFER;
IF SEEK < 15;
The statement IF SEEK
< 15; is equivalent to:
i) IF SEEK < 15 THEN
OUTPUT;
ii) IF SEEK >=15 THEN DELETE;
- Comments:
Two types of comments:
i) * ... ;
ii) /* ... */
DATA CDROM;
* Read in variables;
INPUT MFG $ TYPE $
TRANSFER SEEK;
/* ignore next statement
SEEKMIN = SEEK/60000;
*/
The example below reads CDROM data and creates additional variables:
DATA CDROM;
INPUT MFG $ TYPE $ SEEK
TRANSFER;
IF SEEK < 15 THEN
CLASS='FAST';
ELSE CLASS='SLOW';
DROP MFG TYPE;
DATALINES;
NEC 12X 7.3 105
SONY 6X 23.1 830
SONY 4X 40.1 330
CANON 6X 13.5 530
SONY 12X 5.5 1000
The resulting dataset will contain observations 1, 4 and 5 and
will look like:.
7.3 105
13.5 530
5.5 1000
- PROC step statements:
SAS procedures execute
predefined procedures which may be either statistical or utility procedures.
The data structure processed is the most recently created dataset unless
otherwise specified in a "DATA=" option.
PROC
<procedure_name>;
[procedure_statement];
"procedure_statement" typically depends on the procedure but some may be used with all procedures:
VAR <variable_name>;
Indicates which variables are to be analyzed. If this statement is omitted, the default is to include all variables of the appropriate type.
BY <variable_name>;
Used primarily with the SORT procedure in which case it cannot be omitted.
You should be familiar
with the following procedures:
- Correlations among a set of variables:
PROC CORR [options];
[VAR <variable_name>;]
- Means, standard deviations, and other univariate
statistics (N; MEAN; STD; MIN; MAX; SUM; VAR):
PROC MEANS [options];
- Univariate statistics. That is means, standard
deviations, median etc. Also provides options to generate a p-value for a
normality test and to produce the box plot, stem & leaf and normal
plots:
PROC UNIVARIATE [options];
[VAR <variable_name>...;]
- Print a SAS data set:
PROC PRINT [options];
[VAR <variable_name>...;]
- Sort a SAS data set according to one or more
variables:
PROC SORT [options];
BY <variable_name>...;
- Plot y aginst x. May be used to create a scatter plot
or a residual plot:
PROC PLOT [options];
PLOT
<dep_var_name>*<indep_var_name>='*' [options];