Trailing-Edge
-
PDP-10 Archives
-
decuslib10-09
-
43,50466/stepr.doc
There are 5 other files named stepr.doc in the archive. Click here to see a list.
WESTERN MICHIGAN UNIVERSITY
COMPUTER CENTER
LIBRARY PROGRAM #1.3.2
CALLING NAME: STEPR
PREPARED BY: RUSSELL R. BARR III & SAM ANEMA
STATISTICAL CONSULTANT: MICHAEL R. STOLINE
PROGRAMMED BY: *
APPROVED BY: JACK R. MEAGHER
DATE: FEBRUARY 1977 (VERSION 4)
STEPWISE REGRESSION
1.0 PURPOSE
LET THERE BE GIVEN M OBSERVATIONS ON N VARIABLES X(1),...,X(N). THE USER MAY
OBTAIN CORRELATIONS, REGRESSIONS, AND STEPWISE REGRESSIONS ON THIS DATA. PRO-
VISIONS ARE AVAILABLE FOR HANDLING MISSING DATA. TRANSFORMATIONS OF VARIABLES
ARE POSSIBLE PRIOR TO THE REGRESSION ANALYSIS SO THAT THIS PROGRAM COULD BE
USED FOR CURVILINEAR OR CERTAIN NON-LINEAR STEPWISE REGRESSIONS.
MISSING DATA AND TRANSFORMATION OF VARIABLES ARE JUST 2 OF 13 OPTIONS AVAIL-
ABLE FOR THE USER. THESE OPTIONS ARE DESCRIBED IN DETAIL IN SECTION 3.0.
FOR ANY SINGLE PROGRAM THE USER MAY ELECT NONE, SOME, OR ALL OF THESE OPTIONS.
EACH OF THE 13 OPTIONS IS IDENTIFIED WITH A UNIQUE 3-LETTER CODE. THE USER
RESPONDS TO THE COMPUTER STATEMENT ENTER OPTIONS BY:
(I) PUSHING THE RETURN KEY IF NO OPTIONS ARE WANTED (SEE SECTION 2.0
FOR MORE DETAILS ON THIS), OR
(II) ENTERING A 3-LETTER CODE FOR EACH OPTION DESIRED. THE CODES ARE
SEPARATED BY COMMAS AND NO PARTICULAR ORDER IS REQUIRED.
TABLE 1
OPTION DESCRIPTION AND CODE TABLE
CODE DESCRIPTION
MIS* MISSING DATA
TRA* TRANSFORMATION OF VARIABLES
MVS* MEANS, STANDARD DEVIATIONS, AND VARIANCES
XPR RAW SUMS OF SQUARES AND CROSS PRODUCTS
COV VARIANCE - COVARIANCE MATRIX
COR CORRELATION, FISHER Z, AND T MATRICES
FVL F-VALUES
FOR FORCE VARIABLES
ELM ELIMINATE VARIABLES
RES RESIDUALS
ANA ANALYSIS OF VARIANCE AT FINAL STAGE
DUR DURBIN-WATSON STATISTICS
ZER FORCE REGRESSION THROUGH ZERO
*READ PERTINENT REMARKS IN SECTION 3.0 BEFORE USING.
---------------
*THIS WAS PROGRAMMED BY SAM ANEMA PATTERNED AFTER A PROGRAM GIVEN BY WAYNE
STATE UNIVERSITY.
2.0 AUTOMATIC STEPWISE REGRESSION
IF NO OPTIONS ARE SPECIFIED, THE USER OBTAINS AUTOMATICALLY A GENERAL STEPWISE
REGRESSION IN WHICH EACH INDEPENDENT VARIABLE IS FORCED INTO THE REGRESSION
EQUATION ONE AT A TIME UNTIL ALL N-1 INDEPENDENT VARIABLES ARE IN THE REGRES-
SION EQUATION OF THE LAST STAGE.
THE USER AUTOMATICALLY OBTAINS THE INTERMEDIATE EQUATIONS AT EACH STAGE OF THE
STEPWISE PROCEDURE:
Y = B(0) + B(1) X1
Y = B(0)' + B(1)' X(1) + B(2) X(2)
Y = B(0)" + B(1)" X(1) + B(2)' X(2) + B(3) X(3)
ETC.
THE VARIABLE NUMBERS 1,2,...,N ARE NOT NECESSARILY INCLUDED IN THE STEPWISE
REGRESSION IN THE ORDER 1,2,...,N. RATHER, THE NEXT VARIABLE TO BE SELECTED
FOR INCLUSION WILL BE THAT VARIABLE FOUND TO CONTRIBUTE THE MOST TO THE IN-
CREASE IN THE COEFFICIENT OF DETERMINATION (MULTIPLE R*R), WHICH IS THE PRO-
PORTION OF VARIANCE OF THE DEPENDENT VARIABLE ACCOUNTED FOR BY THE INDEPENDENT
VARIABLES INCLUDED IN THE REGRESSION EQUATION.
AT EACH STATE IN THE STEPWISE REGRESSION THE FOLLOWING ITEMS ARE PRINTED:
(I) VARIABLE NUMBER ENTERING
(II) PARTIAL REGRESSION COEFFICIENTS AND THEIR STANDARD ERRORS
(III) STANDARD ERROR OF ESTIMATE
(IV) CONSTANT TERM
(V) MULTIPLE R AND COEFFICIENT OF DETERMINATION = R*R
(VI) INCREASE IN COEFFICIENT OF MULTIPLE DETERMINATION OVER THE PREVIOUS
STAGE
(VII) (F-LEVEL) AT EACH STAGE THE USER CAN TEST THE HYPOTHESIS:
H(0) : VARIABLE JUST ADDED IS NOT SIGNIFICANT
H(1) : VARIABLE JUST ADDED IS SIGNIFICANT
THIS IS DONE BY COMPARING THE GIVEN F-VALUE TO A TABLED F POINT
FOR 1 AND M-K-1 DEGREES OF FREEDOM, WHERE
M = NUMBER OF OBSERVATIONS
K = NUMBER OF VARIABLES INCLUDED IN THE REGRESSION AT THIS
PARTICULAR STAGE
H(1) IS ACCEPTED IF THE F-VALUE IS GREATER THAN THE TABLED F.
TABLE 2 GIVES THE ALPHA = .05 AND ALPHA = .01 CRITICAL VALUES FOR AN F DISTRIBUTION
WITH 1 AND N=M-K-1 DEGREES OF FREEDOM.
TABLE 2
N ALPHA =.05 ALPHA =.01
2 18.5 98.5
3 10.1 34.1
4 7.71 21.2
5 6.61 16.3
6 5.99 13.7
7 5.59 12.2
8 5.32 11.3
9 5.12 10.6
10 4.96 10.0
11 4.84 9.65
12 4.75 9.33
15 4.54 8.68
20 4.35 8.10
24 4.26 7.82
30 4.17 7.56
40 4.08 7.31
60 4.00 7.08
120 3.92 6.85
INFINITY 3.84 6.64
THE USER MAY ELECT TO HAVE THE COMPUTER ADD OR DELETE VARIABLES AT EACH STAGE
BY USE OF THE OPTION FVL. ALSO THE OPTIONS FOR AND ELM AUTOMATICALLY FORCE INTO
OR ELIMINATE FROM THE REGRESSION EQUATIONS CERTAIN SPECIFIED VARIABLES. THESE
OPTIONS ARE EXPLAINED IN SECTION 3.0.
AT THE FINAL STAGE OF THE STEPWISE PROCEDURE THE FOLLOWING ENTRIES ARE PRINTED
IN ADDITION TO (I) - (VII) GIVEN ON THE PREVIOUS PAGE:
(A) THE STANDARDIZED PARTIAL REGRESSION COEFFICIENTS.
THE B(I) (UNSTANDARDIZED PARTIAL REGRESSION COEFFICIENTS) ARE RELATED
TO BETA(I) (STANDARDIZED PARTIAL REGRESSION COEFFICIENTS) BY THE FORMULA
B(I) =(BETA(I)*SD(Y))/SD(X(I)) , WHERE
SD(X(I)) AND SD(Y) ARE THE STANDARD DEVIATIONS OF X(I) AND Y RESPEC-
TIVELY. Y IS THE DEPENDENT VARIABLE.
IN THE OUTPUT COEFFICIENT REFERS TO UNSTANDARDIZED COEFFICIENTS.
(B) T-VALUES, WHICH ARE USED FOR TESTING THE SIGNIFICANCE OF THE PARTIAL
REGRESSION COEFFICIENTS. T-VALUES ARE OBTAINED BY DIVIDING THE COEFF
BY THE STD ERROR OF COEFF. (FOR TESTING, REFER TO A T-TABLE FOR
N=M-K-1 DEGREES OF FREEDOM, WHERE K=NUMBER OF VARIABLES INCLUDED
IN THE FINAL REGRESSION EQUATION.)
3.0 DETAILED OPTION DESCRIPTIONS
OPTION 1. MIS - MISSING DATA
THE USER ELECTS THIS OPTION WHEREVER THERE IS AT LEAST ONE DATA POINT
MISSING FROM THE DATA SUBMITTED. WHENEVER THE OPTION IS USED, THE USER
MUST MAKE TWO DECISIONS:
(I) WHETHER TO USE A SINGLE MISSING DATA SYMBOL FOR ALL VARIABLES
OR TO USE A SEPARATE MISSING DATA SYMBOL FOR EACH OF THE N
VARIABLES, AND
(II) WHETHER TO INSTRUCT THE COMPUTER TO REPLACE MISSING VALUES
BY THE MEAN OF THE NON-MISSING DATA POINTS FOR THAT VARIABLE
OR TO DELETE THAT PARTICULAR OBSERVATION FROM ALL REGRESSION
CALCULATIONS.
RULES FOR MISSING DATA SYMBOLS:
(I) A MISSING DATA SYMBOL MUST BE AN INTEGER OR A DECIMAL NUMBER.
(II) A LETTER CANNOT BE A MISSING DATA SYMBOL.
(III) THE NUMBER USED FOR A MISSING DATA SYMBOL MUST NOT BE EQUAL TO
ANY VALID INPUT DATA POINT SUBMITTED.
(IV) WHEN A SEPARATE MISSING DATA SYMBOL IS USED FOR EACH VARIABLE,
THEN
(A) THEY ARE SEPARATED BY COMMAS.
(B) THERE MUST BE EXACTLY AS MANY MISSING DATA SYMBOLS AS
THERE ARE VARIABLES, EVEN THOUGH SOME VARIABLES DO NOT
CONTAIN MISSING DATA.
(C) IF THERE ARE MORE THAN 10 MISSING DATA SYMBOLS, THEY MUST
BE ENTERED AT THE RATE OF 10 PER LINE.
WE WILL ILLUSTRATE THE USE OF THIS OPTION WITH TWO EXAMPLES.
FOR EXAMPLE 1 AND 2, WE ARE SHOWING ONLY THOSE INSTRUCTIONS, QUESTIONS, AND
RESPONSES WE WANT TO EMPHASIZE.
EXAMPLE 1
HOW MANY VARIABLES:
3
ENTER OPTIONS OR TYPE "HELP"
MIS
IS THERE MORE THAN ONE MISSING DATA SYMBOL?
NO
ENTER MISSING DATA SYMBOL
(1 IS THE MISSING DATA SYMBOL FOR VARIABLES 1, 2, AND 3.)
EXAMPLE 2
IS THERE MORE THAN ONE MISSING DATA SYMBOL?
YES
ENTER MISSING DATA SYMBOLS
9,8,9
(9 IS THE MISSING DATA SYMBOL FOR VARIABLES 1 AND 3; 8 IS
THE MISSING DATA SYMBOL FOR VARIABLE 2.)
OPTION 2. TRA - TRANSFORMATION OF VARIABLES
THIS OPTION ALLOWS ONE TO MAKE TRANSFORMATIONS OF VARIABLES PRIOR TO THE
REGRESSION ANALYSIS.
TABLE 3 CONTAINS 15 POSSIBLE TRANSFORMATIONS WHICH CAN BE USED. RULES FOR
USING TRANSFORMATIONS ARE:
(I) TRANSFORMATIONS WILL BE PERFORMED IN THE ORDER THAT THE USER
SUBMITS THEM.
(II) NO MORE THAN 40 TRANSFORMATIONS ARE ALLOWED.
(III) A VARIABLE MAY BE TRANSFORMED MORE THAN ONCE.
(IV) NEW VARIABLES MAY BE GENERATED.
(V) AFTER ALL THE TRANSFORMATIONS HAVE BEEN GIVEN, THE USER MUST
TYPE END AND ENTER RETURN.
(VI) ALL TRANSFORMATIONS MUST BE OF THE FORM IN TABLE 3. THREE
ILLUSTRATIVE EXAMPLES OF THE TRANSFORMATION OPTION ARE GIVEN.
NOTE THAT A**B MEANS A RAISED TO THE POWER B.
TABLE 3
1. (I) = -(J) VARIABLE I BECOMES (-VARIABLE J)
2. (I) = (J)
3. (I) = (J)**A
4. (I) =(J)+(K)
5. (I) = (J)*(K) VARIABLE I = (VARIABLE J)*
(VARIABLE K)
6. (I) = LN(J)
7. (I) = LOG(J)
8. (I) = E**(J)
9. (I) = 10**(J)
10. (I) = (J)+A
11. (I) = (J)*A VARIABLE I = A(VARIABLE J)
12. (I) = SIN (J)
13. (I) = COS (J)
14. (I) = (J)/(K) VARIABLE I = (VARIABLE J)/(VARIABLE K)
15. (I) = (I) - (J)
NOTATION: (I), (J), (K) DENOTE VARIABLES. A DENOTES A CONSTANT WHICH THE
USER SPECIFIES.
IN EXAMPLES 1, 2, AND 3 WE ARE SHOWING ONLY THOSE INSTRUCTIONS, QUESTIONS, AND
RESPONSES WE WANT TO EMPHASIZE.
EXAMPLE 1
POLYNOMIAL REGRESSION (CUBIC)
HOW MANY VARIABLES?
2
ENTER OPTIONS OR TYPE "HELP"
TRA
ENTER TRANSFORMATIONS
(3)=(2)**2
(4)=(2)**3
END
WHICH IS THE DEPENDENT VARIABLE?
1
(LETTING Y=VARIABLE 1 AND X=VARIABLE 2, THE MODEL IS:
Y = A + B(1)X + B(2)X + B(3)X , WHERE VARIABLE 3=X*X AND VARIABLE 4=X*X*X.)
EXAMPLE 2
SECOND DEGREE RESPONSE SURFACE
HOW MANY VARIABLES:
3
ENTER OPTIONS OR TYPE "HELP"
TRA
ENTER TRANSFORMATIONS
(4)=(2)**2
(5)=(2)*(3)
(6)=(3)**2
END
WHICH IS THE DEPENDENT VARIABLE?
1
(LETTING Y=VARIABLE 1, X=VARIABLE 2, AND Z=VARIABLE 3, THE MODEL IS:
Y= A + B(1)X + B(2)Z + B(3)X + B(4)XZ + BZ , WHERE VARIABLE 4=X*X, VARIABLE
5=XZ, AND VARIABLE 6=Z*Z.)
EXAMPLE 3
NON-LINEAR FIT
MODEL Y = VARIABLE 1
X = VARIABLE 2
FIND LEAST SQUARES ESTIMATING FOR A AND B, WHERE
Y = A + B(1) SIN (1/(1+X))
SOLUTION:
HOW MANY VARIABLES:
2
ENTER OPTIONS
TRA
ENTER TRANSFORMATIONS
(2)=(2)+1 (X IS REPLACED BY X+1)
(2)=(2)**-1 (X IS REPLACED BY 1/(1+X))
(2)=SIN(2) (X IS REPLACED BY SIN(1/(1+X)))
END
WHICH IS THE DEPENDENT VARIABLE?
1
(VARIABLE 2 IS NOW SIN(1/(1+X))).
OPTION 3. MVS - MEANS, STANDARD DEVIATIONS, AND VARIANCES FOR EACH VARIABLE
AND PRINTED
OPTION 4. XPR - ROW SUMS OF SQUARES AND CROSS PRODUCTS FOR EACH PAIR OF
VARIABLES IS PRINTED. (CROSS PRODUCT = SUM X(I,J)X(I,K) FOR EACH
VARIABLE PAIR J AND K, I GOES FROM 1 TO M.)
OPTION 5. COV - THE N BY N VARIANCE-COVARIANCE MATRIX OF THE N VARIABLES IS
PRINTED.
OPTION 6. COR - THE N BY N CORRELATION, FISHER Z, AND T-VALUE MATRICES ARE
PRINTED.
(LET R(IJ) = CORRELATION OF VARIABLES I AND J
FISHER Z = Z(IJ) = 1/2*LN((1+R(I,J))/(1-R(I,J)))*SQRT(M-3) AND
T-VALUE = T(IJ) = R(I,J)*SQRT((N(I,J)-2)/(1-R(I,J)*R(I,J)))
OPTION 7. FVL - F-VALUES. ALLOWS ONE TO SPECIFY F-VALUES FOR ENTERING AND
DELETING VARIABLES FROM THE REGRESSION EQUATION.
TWO RULES FOR USING FVL:
(I) TWO F-VALUES MUST BE SPECIFIED BY THE USER; ONE FOR ENTER-
ING A VARIABLE AND ONE FOR DELETING A VARIABLE.
(II) THE F-VALUE FOR ENTERING A VARIABLE MUST BE EQUAL TO OR
LARGER THAN THE F-VALUE FOR REMOVING A VARIABLE.
A VARIABLE MAY BE SIGNIFICANT AT AN EARLY STAGE OF THE STEPWISE PROCEDURE AND
THUS ENTER THE EQUATION; BUT LATER, AFTER MORE VARIABLES HAVE BEEN ADDED, THIS
VARIABLE MAY BECOME INSIGNIFICANT. IT WILL THEN BE DELETED FROM THE REGRESSION
EQUATION BEFORE AN ADDITIONAL VARIABLE IS ADDED. ONLY SIGNIFICANT (AT THE
USER'S SPECIFIED LEVEL) VARIABLES ARE INCLUDED IN THE FINAL EQUATIONS.
SIGNIFICANCE OF A VARIABLE IS INDICATED BY THE FACT THAT THE F-LEVEL CALCULATED
BY THE STEPR PROGRAM FOR THAT VARIABLE IS GREATER THAN OR EQUAL TO THE USER
SPECIFIED F-VALUE.
THE F-STATISTIC (LEVEL) CALCULATED BY THE STEPR PROGRAM FOR ENTERING A VARI-
ABLE HAS AN F-DISTRIBUTION WITH 1 AND M-K-1 DEGREES OF FREEDOM, WHERE
M = NUMBER OF OBSERVATIONS, AND
K = NUMBER OF VARIABLES INCLUDED IN THE REGRESSION OF THIS PARTI-
CULAR STAGE.
HOWEVER, THE USER SPECIFIES A SINGLE F-VALUE FOR ENTERING A VARIABLE WHICH IS
USED AT EACH STAGE. HENCE, THE TESTS FOR ENTERING AND DELETING A VARIABLE ARE
SLIGHTLY ARBITRARY.
TABLE 1 INCLUDES THE ALPHA =.05 AND ALPHA =.01 CRITICAL VALUES FOR AN F-DISTRIBUTION WITH
1 AND M-K-1 DEGREES OF FREEDOM.
EXAMPLE
ENTER OPTIONS OR TYPE "HELP"
FVL
ENTER F-VALUES FOR ENTERING A VARIABLE
4.0
ENTER F-VALUE FOR OMITTING A VARIABLE
3.5
WHEN OPTION FVL IS NOT SPECIFIED, THE COMPUTER ASSUMES THAT BOTH F-VALUES ARE
ZERO; HENCE, ALL VARIABLES ARE INCLUDED IN THE FINAL REGRESSION EQUATION AND
NONE ARE ELIMINATED. NOTE: INTERMEDIATE STEPS ARE PRINTED. IT IS POSSIBLE
THAT ROUNDOFF ERRORS AND OTHER REASONS CAUSE THE CORRELATION MATRIX TO BE NEAR-
SINGULAR OR SINGULAR. THIS COULD CAUSE ONE OR MORE VARIABLES NOT TO BE INCLUDED
IN THE FINAL REGRESSION EQUATION.
OPTION 8. FOR - SPECIFIED VARIABLES ARE FORCED INTO THE FINAL REGRESSION EQUA-
TION. INTERMEDIATE STEPS ARE PRINTED ONLY FOR THOSE VARIABLES
WHICH ARE NOT FORCED INTO THE REGRESSION EQUATION. IN THE CASE YOU
FORCE ALL THE VARIABLES SUBMITTED TO THIS PROGRAM, NO INTERMEDIATE
STEPS ARE PRINTED AND ONLY THE FINAL REGRESSION EQUATION IS SHOWN.
FORCING ALL THE VARIABLES MAKES THE STEPR PROGRAM BEHAVE AS A MULTI-
PLE REGRESSION PROGRAM.
NOTE: DO NOT FORCE THE DEPENDENT VARIABLE.
FOR THE LAST TWO EXAMPLES GIVEN BELOW, ONLY THOSE INSTRUCTIONS, QUESTIONS, AND
RESPONSES WE WANT TO EMPHASIZE ARE SHOWN.
EXAMPLE
ENTER OPTIONS OR TYPE "HELP" (SUPPOSE THAT 6 VARI-
ABLES HAVE BEEN
ENTERED)
FOR
ENTER NUMBER OF VARIABLES TO BE FORCED INTO THE
REGRESSION
3
WHICH ARE THEY? (MAX: 20 PER LINE)
1,4,5
(THE REGRESSION ANALYSIS IS THEN PERFORMED DIRECTLY ON THE MODEL WITH VARIABLE
2 AS THE DEPENDENT VARIABLE AND VARIABLES 1,4, AND 5 AS THE INDEPENDENT VARI-
ABLES.)
OPTION 9. ELM - THIS OPTION ALLOWS THE USER TO ELIMINATE SPECIFIED VARIABLES
FROM THE STEPWISE REGRESSION AT ALL STAGES.
EXAMPLE
ENTER NUMBER OF VARIABLES
6
ENTER OPTIONS OR TYPE "HELP"
ELM
WHICH IS THE DEPENDENT VARIABLE?
2
HOW MANY VARIABLES WOULD YOU LIKE TO ELIMINATE?
2
WHICH ARE THEY? (MAX: 20 PER LINE)
1,5
(THE STEPWISE REGRESSION IS THEN RUN ON THE MODEL WITH VARIABLE 2 AS THE DE-
PENDENT VARIABLE AND VARIABLES 3,4, AND 6 AS THE INDEPENDENT VARIABLES.)
NOTE: DO NOT ELIMINATE THE DEPENDENT VARIABLE!
OPTION 10. RES - RESIDUALS ARE PRINTED IN THE FINAL STEP IN THE REGRESSION.
(IN THE OUTPUT TABLE TITLED PREDICTED VS ACTUAL RESULTS, THE NUMBERS UNDER
ACTUAL ARE THE OBSERVATIONS SUPPLIED BY THE USER OR THE VALUES RESULTING FROM
TRANSFORMATION OF OBSERVATIONS SUPPLIED BY THE USER FOR THE DEPENDENT VARIABLE.
THE NUMBERS UNDER PREDICTED ARE CALCULATED FROM THE FINAL REGRESSION EQUATION
--I.E.,
Y(PI) = B(0) + B(1) X(1I) + ... + B(N) X(NI), ..., WHERE X(1I),...,
X(NI)
ARE THE VALUES OF THE I-TH OBSERVATION OF THE N INDEPENDENT VARIABLES AND Y(PI)
IS THE PREDICTED VALUE FOR THE I-TH OBSERVATION OF THE DEPENDENT VARIABLE.)
OPTION 11. ANA - AN ANALYSIS OF VARIANCE IS PRINTED AFTER THE FINAL STAGE IN
THE REGRESSION.
OPTION 12. DUR - DURBIN-WATSON STATISTICS ARE PRINTED. THE DURBIN-WATSON
STATISTIC IS A TEST FOR POSITIVE OR NEGATIVE CORRELATION. SPECIAL
DURBIN-WATSON TABLES MUST BE CONSULTED IN ORDER TO INTERPRET THE
SIGNIFICANCE OF THIS STATISTIC.
FOR A COMPLETE DISCUSSION OF THIS STATISTIC AND TABLES, THE USER
SHOULD CONSULT "TESTING FOR SERIAL CORRELATION IN LEAST SQUARE
REGRESSION I", BIOMETRIKA, VOLUME 37 (1950), PAGES 409-428 AND
"TESTING FOR SERIAL CORRELATION IN LEAST SQUARE REGRESSION II",
BIOMETRIKA, VOLUME 38 (1951), PAGES 159-178. BOTH ARTICLES ARE BY
J. DURBIN AND G.S. WATSON.
OPTION 13. ZER - FORCING REGRESSION THROUGH ZERO. THIS OPTION FORCES THE
CONSTANT TERM OF THE UNSTANDARDIZED REGRESSION EQUATION TO BE ZERO.
4.0 LIMITATIONS
1. NO MORE THAN 70 VARIABLES
2. NO MORE THAN 40 TRANSFORMATIONS.
3. NO MORE THAN 5 FORMAT CARDS.
4. NO LIMIT ON OBSERVATIONS.
5. IF THERE ARE MORE THAN 10 MISSING DATA SYMBOLS, THEY MUST BE ENTERED AT
THE RATE OF 10 PER LINE.
6. WITH STANDARD FORMAT IF THERE ARE MORE THAN 10 VARIABLES, THEY MUST BE
ENTERED AT THE RATE OF 10 PER LINE.
5.0 METHOD OF USE
AN EXAMPLE OF RUNNING THE STEPR PROGRAM IS GIVEN HERE WITH A LINE BY LINE
EXPLANATION FOLLOWING IT. ^Z AND LINES ENDING WITH <CR> CONTAIN USER'S RESPONSES.
AN ^ MEANS THAT THE CHARACTER FOLLOWING IT IS TYPED WHILE DEPRESSING THE CONTROL
(CTRL) KEY. A <CR> INDICATES THAT THE RETURN KEY IS PRESSED.
.R STEPR<CR>
WMU STEPWISE REGRESSION
LINE 1 OUTPUT? TTY:<CR>
LINE 2 INPUT? TTY:<CR>
LINE 3 FORMAT: (F-TYPE ONLY)
LINE 4 STD<CR>
LINE 5 ENTER NUMBER OF VARIABLES
LINE 6 4<CR>
LINE 7 ENTER IDENTIFICATION IF DESIRED OTHERWISE RETURN
LINE 8 TRIAL RUN<CR>
LINE 9 ENTER OPTIONS OR TYPE "HELP"
LINE 10 MIS,TRA,MVS,FVL,FOR,ELM,ANA<CR>
LINE 11 IS THERE MORE THAN ONE MISSING DATA SYMBOL?
LINE 12 NO<CR>
LINE 13 ENTER MISSING DATA SYMBOL
LINE 14 0<CR>
HOW WOULD YOU LIKE TO COMPENSATE FOR MISSING DATA? TYPE:
LINE 15 1 - TO REPLACE MISSING DATA BY MEAN VALUE
2 - TO DELETE THE OBSERVATION
LINE 16 1<CR>
LINE 17 ENTER TRANSFORMATIONS
LINE 18 (5)=E**(4)<CR>
LINE 19 END<CR>
LINE 20 ENTER DATA (AT MOST 10 PER LINE)
1,3,2,4<CR>
0,1,3,2<CR>
1,4,2,3<CR>
LINE 21 7,1,4,5<CR>
1,1,2,1<CR>
1,0,5,5<CR>
9,7,8,7<CR>
^Z
[OUTPUT]
LINE 22 WHICH IS THE DEPENDENT VARIABLE?
LINE 23 1<CR>
LINE 24 ENTER F-VALUE FOR ENTERING A VARIABLE.
LINE 25 .2<CR>
LINE 26 ENTER F-VALUE FOR OMITTING A VARIABLE.
LINE 27 .2<CR>
LINE 28 HOW MANY VARIABLES WOULD YOU LIKE TO ELIMINATE?
LINE 29 1<CR>
LINE 30 WHICH ARE THEY? (MAX: 20 PER LINE)
LIEN 31 4<CR>
LINE 32 ENTER NUMBER OF VARIABLES TO BE FORCED INTO THE REGRESSION.
LINE 33 2<CR>
LINE 34 WHICH ARE THEY? (MAX: 20 PER LINE)
LINE 35 2,3<CR>
[OUTPUT]
LINE 36 DO YOU WISH TO REANALYZE THE SAME DATA?
LINE 37 NO<CR>
LINE 38 INPUT? FINISH<CR>
CPU TIME: 2.03 ELPASED TIME: 10:1.35
NO EXECUTION ERRORS DETECTED
EXIT
.
THE FOLLOWING IS AN EXPLANATION OF THE EXAMPLE LISTED ABOVE.
LINE 1 OUTPUT?
LINE 2 INPUT?
LINES 1 AND 2 DEFINE WHERE THE USER INTENDS TO WRITE HIS OUTPUT FILE (LINE 1)
AND FROM WHERE THE USER EXPECTS TO READ HIS INPUT DATA (LINE 2). SEE NOTE (2)
BELOW FOR OTHER INPUT OPTIONS.
THE PROPER RESPONSE TO EACH OF THESE QUESTIONS CONSISTS OF THREE BASIC PARTS:
A DEVICE, A FILENAME, AND A PROJECT-PROGRAMMER NUMBER.
THE GENERAL FORMAT FOR THESE THREE PARTS IS AS FOLLOWS:
DEV:FILE.EXT[PROJ,PROG]
1) DEV: ANY OF THE FOLLOWING DEVICES ARE APPROPRIATE WHERE INDICATED:
DEVICE LIST DEFINITION STATEMENT USE
TTY: TERMINAL INPUT OR OUTPUT
DSK: DISK INPUT OR OUTPUT
CDR: CARD READER INPUT ONLY
LPT: LINE PRINTER OUTPUT ONLY
DTA0: DECTAPE 0 INPUT OR OUTPUT
DTA1: DECTAPE 1 INPUT OR OUTPUT
DTA2: DECTAPE 2 INPUT OR OUTPUT
DTA3: DECTAPE 3 INPUT OR OUTPUT
DTA4: DECTAPE 4 INPUT OR OUTPUT
DTA5: DECTAPE 5 INPUT OR OUTPUT
DTA6: DECTAPE 6 INPUT OR OUTPUT
DTA7: DECTAPE 7 INPUT OR OUTPUT
MTA0: MAGNETIC TAPE 0 INPUT OR OUTPUT
MTA1: MAGNETIC TAPE 1 INPUT OR OUTPUT
INPUT MAY NOT BE DONE FROM THE LINE PRINTER NOR MAY OUTPUT GO TO THE CARD
READER.
2) FILE.EXT IS THE NAME AND EXTENSION OF THE FILE TO BE USED. THIS PART OF
THE SPECIFICATION IS USED ONLY IF DISK OR DECTAPE IS USED.
3) [PROJ,PROG] IF A DISK IS USED AND THE USER WISHES TO READ A FILE IN
ANOTHER PERSON'S DIRECTORY, HE MAY DO SO BY SPECIFYING THE PROJECT-PROGRAMMER
NUMBER OF THE DIRECTORY FROM WHICH HE WISHES TO READ. THE PROJECT NUMBER AND
THE PROGRAMMER NUMBER MUST BE SEPARATED BY A COMMA AND ENCLOSED IN BRACKETS.
OUTPUT MUST GO TO YOUR OWN AREA.
EXAMPLE: OUTPUT? LPT:/2
INPUT? DSK:DATA.DAT[71171,71026]
IN THE EXAMPLE, TWO COPIES OF THE OUTPUT ARE TO BE PRINTED BY THE HIGH SPEED
LINE PRINTER. THE INPUT DATA IS A DISK FILE OF NAME DATA.DAT IN USER DIRECTORY
[71171,71026].
DEFAULTS:
1) IF NO DEVICE IS SPECIFIED BUT A FILENAME IS SPECIFIED THE DEFAULT DEVICE
WILL BE DSK:
2) IF NO FILENAME IS SPECIFIED AND A DISK OR DECTAPE IS USED THE DEFAULT ON
INPUT WILL BE FROM INPUT.DAT; ON OUTPUT IT WILL BE OUTPT.DAT.
3) IF THE PROGRAM IS RUN FROM THE TERMINAL AND NO SPECIFICATION IS GIVEN (JUST
A CARRIAGE RETURN) BOTH INPUT AND OUTPUT DEVICES WILL BE THE TERMINAL.
4) IF THE PROGRAM IS RUN THROUGH BATCH AND NO SPECIFICATION IS GIVEN, (A
BLANK CARD) THE INPUT DEVICE WILL BE CDR: AND THE OUTPUT DEVICE WILL BE LPT:
5) IF NO PROJECT-PROGRAMMER NUMBER IS GIVEN, THE USER'S OWN NUMBER WILL BE
ASSUMED.
NOTE: (1) IF LPT: IS USED AS AN OUTPUT DEVICE MULTIPLE COPIES MAY BE OBTAINED
BY SPECIFYING LPT:/N WHERE N REFERS TO THE NUMBER OF COPIES DESIRED.
(2) THE FOLLOWING TWO OPTIONS ARE NOT APPLICABLE FOR THE FIRST DATA SET,
I.E., IT IS APPLICABLE ONLY WHEN THE PROGRAM BRANCHES BACK TO LINE 2 UPON
FIRST COMPLETION OF LINES 1-37.
(A) SAME OPTION
UPON RETURNING FROM LINE 37, IF THE SAME DATA FILE IS TO BE
USED AGAIN SIMPLY ENTER "SAME<CR>", OTHERWISE, EITHER USE THE
FINISH OPTION OR ENTER ANOTHER FILENAME ETC.
(B) FINISH OPTION
THE USER MUST ENTER "FINISH<CR>" TO BRANCH OUT OF THE STEPR
PROGRAM. FAILURE TO DO SO MIGHT RESULT IN LOSING THE ENTIRE
OUTPUT FILE.
LINES 3-4
THERE ARE 3 OPTIONS AVAILABLE FOR THE FORMAT, NAMELY:
(A) STANDARD FORMAT OPTION
UNLESS OTHERWISE SPECIFIED, THE PROGRAM ASSUMES THE STANDARD
OPTION. IN THIS OPTION, THE DATA ARE ARRANGED IN GROUPS OF
10 PER LINE, TWO VALUES BEING SEPARATED BY A COMMA.
TO USE THIS OPTION, SIMPLY TYPE IN "<CR>" ON TERMINAL JOBS
OR USE A BLANK CARD FOR BATCH JOBS OR ENTER "STD<CR>".
(B) OBJECT TIME FORMAT OPTION
IF THE DATA IS SUCH THAT A USER'S OWN FORMAT IS REQUIRED,
SIMPLY ENTER A LEFT PARENTHESIS FOLLOWED BY THE FIRST FORMAT
SPECIFICATION, A COMMA AND THE SECOND SPECIFICATION, ETC.
WHEN YOU FINISH ENTER A RIGHT PARENTHESIS, AND THEN A CARRIAGE
RETURN. THERE CAN BE A MAXIMUM OF 5 LINES FOR THE FORMAT,
EACH LINE BEING 80 COLUMNS LONG.
NOTE THAT THE FORMAT SPECIFICATION LIST MUST USE THE FLOATING
POINT (F-TYPE) NOTATION AND MUST CONTAIN SPECIFICATION FOR
EACH OF THE VARIABLES. THE SPECIFICATIONS FOR THE FORMAT
ITSELF ARE THE SAME AS FOR THE FORTRAN IV FORMAT STATEMENT.
(C) SAME OPTION
THE SAME OPTION IS APPLICABLE ONLY TO JOBS THAT USE MORE THAN
ONE DATA FILE. IF AN OBJECT TIME FORMAT WAS USED ON A DATA
SET AND THE SUCCEEDING DATA SET UTILIZES THE SAME FORMAT,
SIMPLY ENTER "SAME<CR>".
LINES 5-6. ON LINE 6 ENTER THE NUMBER OF VARIABLES TO BE ENTERED INTO THE
PROGRAM FROM THE DATA. DO NOT COUNT ANY VARIABLES GENERATED DURING
TRANSFORMATIONS. THIS NUMBER MUST BE LESS THAN OR EQUAL TO 70.
LINES 7-8. THE USER MAY IDENTIFY HIS OUTPUT FROM THIS PROGRAM BY UP TO 80
CHARACTERS BY ENTERING THEM ON LINE 8. IF NO OUTPUT IDENTIFICATION IS WANTED,
THE ENTERING OF A CARRIAGE RETURN ON LINE 8 WILL CAUSE THE COMPUTER TO SKIP TO
LINE 9.
LINES 9-10. AT THIS POINT THE USER MUST SUBMIT THE OPTIONS HE HAS SELECTED.
(SEE SECTION 3.0) THESE THREE DIGIT CODES MAY BE SUBMITTED IN ANY ORDER.
TYPING HELP GIVES AN OUTPUT SIMILAR TO TABLE 1 ON PAGE 1, THEN THE QUESTION
IS REPEATED.
LINES 11-14. THE QUESTIONS ON LINES 11, 13, AND 15 WILL ONLY BE ASKED AND
ANSWERS WILL ONLY BE REQUIRED IF THE MIS OPTION IS SELECTED.
THE USER CAN SPECIFY A SINGLE MISSING DATA SYMBOL FOR ALL VARIABLES. IN THIS
CASE NO IS TYPED ON LINE 12 AND THE SINGLE MISSING DATA SYMBOL IS TYPED ON
LINE 14.
THE ALTERNATIVE TO SPECIFYING A SINGLE MISSING DATA SYMBOL IS TO SPECIFY A
MISSING DATA SYMBOL FOR EACH VARIABLE SUBMITTED BY THE USER. IN THIS CASE YOU
MUST TYPE YES ON LINE 12. THEN THIS PROGRAM WILL RESPOND BY CAUSING THE
PRINTING OF ENTER MISSING DATA SYMBOLS ON LINE 13. YOU MUST THEN TYPE ON LINE
14 THE MISSING DATA SYMBOLS FOR EACH VARIABLE SEPARATED BY COMMAS.
A MISSING DATA SYMBOL MUST BE AN INTEGER OR A DECIMAL NUMBER. A LETTER CANNOT
A MISSING DATA SYMBOL. THE NUMBER USED FOR MISSING DATA SYMBOL MUST NOT BE
EQUAL TO ANY VALID INPUT NUMBER SUBMITTED BY THE USER.
EXAMPLE:
LINE 11. IS THERE MORE THAN ONE MISSING DATA SYMBOL?
LINE 12. YES
LINE 13. ENTER MISSING DATA SYMBOLS
LINE 14. 9,8,9
9 IS THE MISSING DATA SYMBOL FOR VARIABLES 1 AND 3; AND 8 IS THE MISSING DATA
SYMBOL FOR VARIABLE 2. IF THERE ARE MORE THAN 10 MISSING DATA SYMBOLS, ENTER
THEM AT THE RATE OF 10 PER LINE.
CAUTION: THERE MUST BE EXACTLY AS MANY MISSING DATA SYMBOLS AS THERE ARE
VARIABLES SUBMITTED BY THE USER, EVEN THOUGH SOME VARIABLES DO NOT CONTAIN
MISSING DATA.
LINES 15-16. SELECT THE METHOD TO BE USED TO COMPENSATE FOR MISSING DATA. IF
1 IS SELECTED MISSING DATA CODES WILL BE REPLACED BY THE MEAN VALUE OF THE
VALID CODES IN THE VARIABLE IN WHICH THE MISSING DATA CODE APPEARS. IF 2 IS
SELECTED ANY OBSERVATION IN WHICH A MISSING DATA CODE APPEARS WILL BE OMITTED
FROM THE ANALYSIS.
LINES 17-19. THE QUESTION ON LINE 17 WILL APPEAR ONLY IF THE TRA OPTION HAS
BEEN SELECTED. THE USER MUST SUBMIT TRANSFORMATIONS OF THE ORIGINAL DATA AND
ENTER AN END WHEN ALL TRANSFORMATIONS HAVE BEEN ENTERED. SEE THE DETAILED
DESCRIPTION OF THE TRA OPTION UNDER PART 3.0 OF THE DOCUMENT.
LINES 20-21. IF THE INPUT DEVICE SELECTED IN LINE 2 WAS TTY: THE DATA MUST BE
ENTERED AT THIS POINT. BE SURE TO SUBMIT IT IN THE FORMAT SELECTED IN LINE 4.
DATA ENTRY IS TERMINATED BY A ^Z. IF SOME OTHER DEVICE WAS SELECTED FOR INPUT
THE DATA WILL BE READ FROM THE DEVICE AT THIS POINT. (IT MAY TAKE A FEW
MINUTES TO READ THE DATA.)
LINES 22-23. AT THIS POINT ONE OF THE VARIABLES THAT WERE EITHER SUBMITTED OR
GENERATED WITH TRANSFORMATIONS MUST BE SELECTED AS A DEPENDENT VARIABLE.
LINES 24-27. IF THE FVL OPTION WAS SELECTED, THE F-VALUE CRITERIA FOR ENTERING
AND OMITTING VARIABLES MUST BE SUBMITTED ON LINES 25 AND 27. SEE THE
DESCRIPTION OF THE FVL OPTION UNDER PART 3.0 OF THIS DOCUMENT.
LINES 28-31. IF THE ELM OPTION WAS SELECTED THE QUESTIONS ON LINES 28 AND 30
WILL BE ASKED. THE RESPONSES TO THESE QUESTIONS GIVE THE USER THE OPPORTUNITY
TO PERFORM A STEPWISE REGRESSION ON A SUBSET OF THE EXISTING VARIABLES. SEE
THE ELM OPTION UNDER PART 3.0 OF THIS DOCUMENT.
LINES 32-35. IF THE FOR OPTION WAS SELECTED THE QUESTIONS ON LINES 32 AND 34
WILL BE ASKED. THE RESPONSES TO THESE QUESTIONS GIVE THE USER THE ABILITY TO
FORCE VARIABLES INTO THE REGRESSION REGARDLESS OF THEIR STATISTICAL
SIGNIFICANCE. SEE THE FOR OPTION UNDER PART 3.0 OF THIS DOCUMENT.
LINES 36-37. IF THE RESPONSE TO THE QUESTION ON LINE 36 IS YES THE PROGRAM
WILL TRANSFER CONTROL TO LINE 22 AND OTHER VARIABLES MAY BE FORCED OR OMITTED
AS OTHER F-VALUES MAY BE SELECTED. IF THE RESPONSE IS NO THE PROGRAM WILL
REQUEST MORE INPUT.
LINE 38. IF THE USER WISHES TO PROCESS ANOTHER SET OF DATA HE MAY BE SELECTING
THE APPROPRIATE INPUT OPTION, (SEE LINE 2) OR HE MAY EXIT FROM THE PROGRAM BY
TYPING FINISH.
6.0 SAMPLE TERMINAL RUN
.R STEPR
WMU STEPWISE REGRESSION
OUTPUT? (TYPE HELP IF NEEDED)--TTY:
INPUT? (TYPE HELP IF NEEDED)--TTY:
FORMAT: (F-TYPE ONLY)
STD
ENTER NUMBER OF VARIABLES
4
ENTER IDENTIFICATION IF DESIRED OTHERWISE RETURN
TRIAL RUN
ENTER OPTIONS OR TYPE "HELP"
MIS,TRA,MVS,FVL,ELM,ANA
IS THERE MORE THAN ONE MISSING DATA SYMBOL ?
NO
ENTER MISSING DATA SYMBOL
0
HOW WOULD YOU LIKE TO COMPENSATE FOR MISSING DATA? TYPE:
1 - TO REPLACE MISSING DATA BY MEAN VALUE
2 - TO DELETE THE OBSERVATION
1
ENTER TRANSFORMATIONS
(5)=E**(4)
END
ENTER DATA (AT MOST 10 PER LINE)
1,3,2,4
0,1,3,2
1,4,2,3
7,1,4,5
1,1,2,1
1,0,5,5
9,7,8,7
^Z
TRIAL RUN
THE NUMBER OF OBSERVATIONS IS 7
VARIABLE MEAN VARIANCE STD. DEV.
1 3.333333 11.22222 3.349959
2 2.833333 4.805556 2.192158
3 3.714286 4.904762 2.214670
4 3.857143 4.142857 2.035401
5 211.1786 156321.4 395.3751
WHICH IS THE DEPENDENT VARIABLE?
1
ENTER F-VALUE FOR ENTERING A VARIABLE.
.2
ENTER F-VALUE FOR OMITTING A VARIABLE.
.2
HOW MANY VARIABLES WOULD YOU LIKE TO ELIMINATE?
1
WHICH ARE THEY?(MAX: 20 PER LINE)
4
VARIABLE NO. 1 IS DEPENDENT
ELIMINATE 4
STANDARD ERROR OF Y = 3.349959
STEP NO. 1
VARIABLE ENTERING 5
F LEVEL 7.9830 F-PROB = 0.03687
STANDARD ERROR OF ESTIMATE = 2.2773
COEFFICIENT OF DETERMINATION = 0.61488
COEFFICIENT OF MULTIPLE REGRESSION = 0.78414
INCREASE IN COEFFICIENT OF DETERMINATION = 0.61488
DEGREES OF FREEDOM = 5
CONSTANT STD. ERR. T T-PROB
1.930277 0.9937278 1.942460 0.10972
VARIABLE COEFFICIENT STD ERROR OF COEF
X= 5 0.00664 0.00235
STEP NO. 2
VARIABLE ENTERING 2
F LEVEL = 3.6842 F-PROB = 0.12736
STANDARD ERROR OF ESTIMATE = 1.8370
COEFFICIENT OF DETERMINATION = 0.79953
COEFFICIENT OF MULTIPLE REGRESSION = 0.89416
INCREASE IN COEFFICIENT OF DETERMINATION = 0.18465
DEGREES OF FREEDOM = 4
CONSTANT STD. ERR. T T-PROB
4.117491 1.393215 2.955387 0.04175
VARIABLE COEFF. STD ERR OF COEFF. T-VALUE T-PROB
STANDARDIZED COEFF.
X( 2) -1.17425 0.611774 -0.768411 -1.919 0.12736
X( 5) 0.01204 0.003392 1.421175 3.550 0.02380
ANALYSIS OF VARIANCE
SOURCE SUM OF SQ. DF MEAN SQ. F F-PROB
REGRESSION 53.834723 2 26.91736 7.976 0.04019
ERROR 13.498612 4 3.37465
TOTAL 67.333334 6
DO YOU WISH TO REANALYZE THE SAME DATA?
NO
INPUT? (TYPE HELP IF NEEDED)--FINISH
END OF EXECUTION
CPU TIME: 0.80 ELAPSED TIME: 9.73
EXIT
7.0 BATCH OPERATION
THE FOLLOWING IS A BATCH JOB SET UP: (EACH LINE REPRESENTS ONE CARD, EACH
CARD STARTING IN COLUMN 1; DO NOT INCLUDE THE COMMENTS AT THE RIGHT.)
--------------------------------------------------------------------------------
COMMENTS
$JOB [#,#] JOB CARD; INSERT USER'S PROJECT-
PROGRAMMER NUMBER WITHIN THE BRACKET
$PASSWORD ###### IN PLACE OF THE 6#'S, PUT IN THE
PASSWORD
$DATA SIGNIFY BEGINNING OF DATA DECK
(DATA CARDS) INSERT THE DATA CARD DECK TO BE
ANALYZED
$EOD SIGNIFY THE END OF DATA CARD DECK
.R STEPR START THE EXECUTION
(RESPONSES TO LINES 1-38 IN
SECTION 5.0 REPEATED OR NOT) USER'S RESPONSE
(EOF) END-OF-FILE CARD
--------------------------------------------------------------------------------
REFERENCE: "MATHEMATICAL METHODS FOR DIGITAL COMPUTERS", A. RALSTON AND
H.S. WILF, 1960, JOHN WILEY & SONS, INC., ARTICLE BY M.A. EFROYMSON