Google
 

Trailing-Edge - PDP-10 Archives - decuslib20-02 - decus/20-0026/stprg.doc
There are 2 other files named stprg.doc in the archive. Click here to see a list.
SUBROUTINE STPRG

PURPOSE
   TO PERFORM A STEPWISE MULTIPLE REGRESSION ANALYSIS FOR A
   DEPENDENT VARIABLE AND A SET OF INDEPENDENT VARIABLES.  AT
   EACH STEP, THE VARIABLE ENTERED INTO THE REGRESSION EQUATION
   IS THAT WHICH EXPLAINS THE GREATEST AMOUNT OF VARIANCE
   BETWEEN IT AND THE DEPENDENT VARIABLE (I.E. THE VARIABLE
   WITH THE HIGHEST PARTIAL CORRELATION WITH THE DEPENDENT
   VARIABLE).  ANY VARIABLE CAN BE DESIGNATED AS THE DEPENDENT
   VARIABLE.  ANY INDEPENDENT VARIABLE CAN BE FORCED INTO OR
   DELETED FROM THE REGRESSION EQUATION, IRRESPECTIVE OF ITS
   CONTRIBUTION TO THE EQUATION.

USAGE
   CALL STPRG (M,N,D,XBAR,IDX,PCT,NSTEP,ANS,L,B,S,T,LL,IER)

DESCRIPTION OF PARAMETERS
   M	- TOTAL NUMBER OF VARIABLES IN DATA MATRIX
   N	- NUMBER OF OBSERVATIONS
   D	- INPUT MATRIX (M X M) OF SUMS OF CROSS-PRODUCTS OF
	  DEVIATIONS FROM MEAN.  THIS MATRIX WILL BE DESTROYED.
   XBAR - INPUT VECTOR OF LENGTH M OF MEANS
   IDX	- INPUT VECTOR OF LENGTH M HAVING ONE OF THE FOLLOWING
	  CODES FOR EACH VARIABLE.
	    0 - INDEPENDENT VARIABLE AVAILABLE FOR SELECTION
	    1 - INDEPENDENT VARIABLE TO BE FORCED INTO THE
		REGRESSION EQUATION
	    2 - VARIABLE NOT TO BE CONSIDERED IN THE EQUATION
	    3 - DEPENDENT VARIABLE
	  THIS VECTOR WILL BE DESTROYED
   PCT	- A CONSTANT VALUE INDICATING THE PROPORTION OF THE
	  TOTAL VARIANCE TO BE EXPLAINED BY ANY INDEPENDENT
	  VARIABLE.  THOSE INDEPENDENT VARIABLES WHICH FALL
	  BELOW THIS PROPORTION WILL NOT ENTER THE REGRESSION
	  EQUATION.  TO ENSURE THAT ALL VARIABLES ENTER THE
	  EQUATION, SET PCT = 0.0.
   NSTEP- OUTPUT VECTOR OF LENGTH 5 CONTAINING THE FOLLOWING
	  INFORMATION
	     NSTEP(1)- THE NUMBER OF THE DEPENDENT VARIABLE
	     NSTEP(2)- NUMBER OF VARIABLES FORCED INTO THE
		       REGRESSION EQUATION
	     NSTEP(3)- NUMBER OF VARIABLE DELETED FROM THE
		       EQUATION
	     NSTEP(4)- THE NUMBER OF THE LAST STEP
	     NSTEP(5)- THE NUMBER OF THE LAST VARIABLE ENTERED
   ANS	- OUTPUT VECTOR OF LENGTH 11 CONTAINING THE FOLLOWING
	  INFORMATION FOR THE LAST STEP
	     ANS(1)- SUM OF SQUARES REDUCED BY THIS STEP
	     ANS(2)- PROPORTION OF TOTAL SUM OF SQUARES REDUCED
	     ANS(3)- CUMULATIVE SUM OF SQUARES REDUCED UP TO
		     THIS STEP
	     ANS(4)- CUMULATIVE PROPORTION OF TOTAL SUM OF
		     SQUARES REDUCED
	     ANS(5)- SUM OF SQUARES OF THE DEPENDENT VARIABLE
	     ANS(6)- MULTIPLE CORRELATION COEFFICIENT
	     ANS(7)- F RATIO FOR SUM OF SQUARES DUE TO
		     REGRESSION
	     ANS(8)- STANDARD ERROR OF THE ESTIMATE (RESIDUAL
		     MEAN SQUARE)
	     ANS(9)- INTERCEPT CONSTANT
	     ANS(10)-MULTIPLE CORRELATION COEFFICIENT ADJUSTED
		     FOR DEGREES OF FREEDOM.
	     ANS(11)-STANDARD ERROR OF THE ESTIMATE ADJUSTED
		     FOR DEGREES OF FREEDOM.
   L	- OUTPUT VECTOR OF LENGTH K, WHERE K IS THE NUMBER OF
	  INDEPENDENT VARIABLES IN THE REGRESSION EQUATION.
	  THIS VECTOR CONTAINS THE NUMBERS OF THE INDEPENDENT
	  VARIABLES IN THE EQUATION.
   B	- OUTPUT VECTOR OF LENGTH K, CONTAINING THE PARTIAL
	  REGRESSION COEFFICIENTS CORRESPONDING TO THE
	  VARIABLES IN VECTOR L.
   S	- OUTPUT VECTOR OF LENGTH K, CONTAINING THE STANDARD
	  ERRORS OF THE PARTIAL REGRESSION COEFFICIENTS,
	  CORRESPONDING TO THE VARIABLES IN VECTOR L.
   T	- OUTPUT VECTOR OF LENGTH K, CONTAINING THE COMPUTED
	  T-VALUES CORRESPONDING TO THE VARIABLES IN VECTOR L.
   LL	- WORKING VECTOR OF LENGTH M
   IER	- 0, IF THERE IS NO ERROR.
	  1, IF RESIDUAL SUM OF SQUARES IS NEGATIVE OR IF THE
	  PIVOTAL ELEMENT IN THE STEPWISE INVERSION PROCESS IS
	  ZERO.  IN THIS CASE, THE VARIABLE WHICH CAUSES THIS
	  ERROR IS NOT ENTERED IN THE REGRESSION, THE RESULT
	  PRIOR TO THIS STEP IS RETAINED, AND THE CURRENT
	  SELECTION IS TERMINATED.

REMARKS
   THE NUMBER OF DATA POINTS MUST BE AT LEAST GREATER THAN THE
   NUMBER OF INDEPENDENT VARIABLES PLUS ONE.  FORCED VARIABLES
   ARE ENTERED INTO THE REGRESSION EQUATION BEFORE ALL OTHER
   INDEPENDENT VARIABLES.  WITHIN THE SET OF FORCED VARIABLES,
   THE ONE TO BE CHOSEN FIRST WILL BE THAT ONE WHICH EXPLAINS
   THE GREATEST AMOUNT OF VARIANCE.
   INSTEAD OF USING, AS A STOPPING CRITERION, A PROPORTION OF
   THE TOTAL VARIANCE, SOME OTHER CRITERION MAY BE ADDED TO
   SUBROUTINE STOUT.

SUBROUTINES AND FUNCTION SUBPROGRAMS REQUIRED
   STOUT(NSTEP,ANS,L,B,S,T,NSTOP)
   THIS SUBROUTINE MUST BE PROVIDED BY THE USER.  IT IS AN
   OUTPUT ROUTINE WHICH WILL PRINT THE RESULTS OF EACH STEP OF
   THE REGRESSION ANALYSIS.  NSTOP IS AN OPTION CODE WHICH IS
   ONE IF THE STEPWISE REGRESSION IS TO BE TERMINATED, AND IS
   ZERO IF IT IS TO CONTINUE.  THE USER MUST CONSIDER THIS IF
   SOME OTHER STOPPING CRITERION THAN VARIANCE PROPORTION IS TO
   BE USED.

METHOD
   THE ABBREVIATED DOOLITTLE METHOD IS USED TO (1) DECIDE VARI-
   ABLES ENTERING IN THE REGRESSION AND (2) COMPUTE REGRESSION
   COEFFICIENTS.  REFER TO C. A. BENNETT AND N. L. FRANKLIN,
   'STATISTICAL ANALYSIS IN CHEMISTRY AND THE CHEMICAL INDUS-
   TRY', JOHN WILEY AND SONS, 1954, APPENDIX 6A.