Trailing-Edge
-
PDP-10 Archives
-
decuslib10-08
-
43,50504/mulreg.rnd
There are 2 other files named mulreg.rnd in the archive. Click here to see a list.
.LM0;.RM75;.PS41,75;.TS71;.LC;.AP;.NNM;.FL CAP;#
.BR;^FIRST PAGE
.PAGE;#
.SK10;.C;^^A USER PROGRAM FOR MULTIPLE LINEAR REGRESSION ANALYSIS\\
.BR;.C;======================================================
.SK7;.C;^MARTEN VAN ^GELDEREN
.SK2;.C;^VERSION 5^H(246)
.SK2;.C;16-^FEB-80
.PAGE;#
.SK23;.C;^COPYRIGHT (^C) 1975, 1979, 1980 BY
.BR;.C;^FOUNDATION ^MATHEMATICAL ^CENTRE, ^AMSTERDAM
.BR;.C;^INSTITUTE FOR ^NUCLEAR ^PHYSICS ^RESEARCH, ^AMSTERDAM
.SK;.LM+4;.RM-4;^GENERAL PERMISSION TO MAKE FAIR USE IN TEACHING OR RESEARCH
OF ALL OR PART OF THIS MATERIAL IS GRANTED TO INDIVIDUAL READERS AND TO
NONPROFIT ORGANIZATIONS, PROVIDED THAT THE COPYRIGHT NOTICE OF EITHER THE
^FOUNDATION ^MATHEMATICAL ^CENTRE OR THE ^INSTITUTE FOR ^NUCLEAR ^PHYSICS
^RESEARCH IS GIVEN AND THAT REFERENCE IS MADE TO THIS PUBLICATION AND TO THE
FACT THAT REPRINTING PRIVILEGES WERE GRANTED BY PERMISSION OF ONE OF THE
ABOVE MENTIONED ORGANIZATIONS.
.LM-4;.RM+4;.PAGE;<CONTENTS ^PAGE
.SK2;^ABSTRACT ##1
.SK;^INTRODUCTION ##2
.LM+12;.SK;.I-12;^CHAPTER 1 - ^MULTIPLE LINEAR REGRESSION ANALYSIS ##4
.BR;1.###^THE REGRESSION MODEL ##4
.BR;2.###^LEAST SQUARES ##5
.BR;2.1##^WEIGHTED LEAST SQUARES ##8
.BR;2.2##^RESIDUAL ANALYSIS ##9
.BR;3.###^TESTS OF HYPOTHESES #10
.BR;3.1##^A PARTICULAR REGRESSION COEFFICIENT #10
.BR;3.2##^ANALYSIS OF VARIANCE #10
.SK;.I-12;^CHAPTER 2 - ^THE INPUT TO THE PROGRAM #14
.BR;1.###^THE MODEL SPECIFICATION #14
.BR;1.1##^TRANSFORMATIONS #15
.BR;2.###^THE INPUT SPECIFICATION #17
.BR;3.###^THE OPTION SPECIFICATION #19
.BR;4.###^THE DATA SPECIFICATION #21
.BR;5.###^THE USER PROGRAM #22
.BR;6.###^EXAMPLES #23
.SK;.I-12;^CHAPTER 3 - ^THE OUTPUT FROM THE PROGRAM #25
.BR;1.###^STANDARD AND OPTIONAL PRINTED OUTPUT #25
.BR;2.###^STANDARD AND OPTIONAL DATA OUTPUT #27
.BR;3.###^ERROR MESSAGES #28
.BR;4.###^EXAMPLES #30
.SK;.I-12;^APPENDICES##1.###^TECHNICAL REMARKS #31
.BR;2.###^DEFINITION OF THE SYNTAX OF A USER PROGRAM #34
.BR;3.###^TECHNICAL DESCRIPTION OF THE PROGRAM #37
.SK;.I-12;^REFERENCES #41
.NM1;.LM-12;.PAGE;<ABSTRACT
^PERFORMING MULTIPLE LINEAR REGRESSION ANALYSIS ON A ELECTRONIC COMPUTER
CAN BE VERY LABORIOUS WHEN PREPARATION OF THE INPUT IS COMPLICATED.
^THE PROGRAM INTRODUCED IN THIS DOCUMENT IS ESPECIALLY DESIGNED TO HAVE
FLEXIBLE AND COMPREHENSIBLE MODEL AND INPUT SPECIFICATIONS. ^IT ACCEPTS
A "^MODEL" FORMULA WHICH RESEMBLES THE
NOTATION OF REGRESSION MODELS IN COMMON STATISTICAL LITERATURE QUITE CLOSELY. ^AN ACCOMPANYING
"^INPUT" FORMULA PROVIDES THE PROGRAM WITH INFORMATION ABOUT THE
ARRANGEMENT OF THE OBSERVATIONS IN THE INPUT "^DATA", WHICH
CONSISTS OF A SERIES OF NUMBERS IN FREEFIELD FORMAT. ^A "^RUN"
COMMAND ACTIVATES THE PROGRAM, WHILE AN "^EXIT" COMMAND CAUSES THE PROGRAM
TO STOP. ^EXTENSIVE RUNTIME "^HELP" INFORMATION IS AVAILABLE.
^THE FOLLOWING PIECE OF PROGRAM MAY SERVE AS AN EXAMPLE OF
SOME OF THESE IDEAS:
.LM+9;.NO FILL;.NO JUSTIFY;
.SK;"^MODEL" Y = ALFA0 + ALFA1 * X + ALFA2 * X _^ 2;
.BR;"^INPUT" 5 * ([X], N, N * [Y]);
.BR;"^OPTIONS" ^TRANSFORMED DATA MATRIX, ^PROCESS SUBMODELS (1);
.BR;"^DATA" 1 4 1.1 0.7 1.8 0.4
.BR; 3 5 3.0 1.4 4.9 4.4 4.5
.BR; 5 3 7.3 8.2 6.2
.BR; 10 4 12.0 13.1 12.6 13.2
.BR; 15 4 18.7 19.7 17.4 17.1
.BR;"^RUN"
.BR;"^EXIT"
.LM-9;.FILL;.JUSTIFY;
.PAGE;<INTRODUCTION
.FILL;.JUSTIFY
^MANY COMPUTER PROGRAMS OR SUBROUTINES EXIST, WHICH IN ONE WAY OR
ANOTHER CALCULATE ESTIMATES FOR THE PARAMETERS OF A MULTIPLE LINEAR
REGRESSION MODEL, BUT MOST OF THEM CAN HARDLY BE USED BY
THE LAYMAN. ^SUBROUTINES HAVE TO BE EMBEDDED IN A HIGHER LEVEL LANGUAGE
PROGRAM TO PERFORM THE NECESSARY INPUT AND OUTPUT; AS A LAYMAN
HOWEVER, ONE IS NOT ALWAYS ACQUAINTED WITH THESE LANGUAGES.
^THE PROGRAMS USUALLY REQUIRE THE INPUT DATA TO BE PRESENTED IN A SPECIFIC
STANDARD FORM, WHICH IN PRACTICE MEANS THAT INPUT DATA THAT ARE ALREADY
AVAILABLE IN A MACHINE READABLE FORM MUST BE TRANSFORMED INTO THAT
STANDARD FORM OR THAT THE INPUT DATA SHOULD BE PUNCHED EXACTLY IN THAT
STANDARD FORM. ^MORE DIFFICULT STILL AND OFTEN MORE CONFUSING, IS THE
WAY IN WHICH THE PROGRAM IS TOLD WHICH REGRESSION MODEL THE USER WANTS TO CONSIDER.
^FOR INSTANCE, FOR THE PROGRAM FOR MULTIPLE POLYNOMIAL REGRESSION ANALYSIS OF
THE ^MATHEMATICAL ^CENTRE A CODE MATRIX OF ZEROS AND ONES HAD TO BE GIVEN
IN ORDER TO INDICATE THE FORM OF THE REGRESSION POLYNOMIAL. ^TRANSFORMATIONS OF ONE OR
MORE VARIABLES ARE HARDLY EVER AUTOMATICALLY POSSIBLE AND REQUIRE A SEPARATE
PROGRAM TO BE RUN IN ADVANCE TO PREPARE THE TRANSFORMED DATA MATRIX.
^IN THIS DOCUMENT A PROGRAM IS DESCRIBED THAT WILL
MAKE IT POSSIBLE FOR ALMOST EVERYONE TO OBTAIN HIS DESIRED RESULTS
WITH NO MORE THAN A SUPERFICIAL KNOWLEDGE OF THE UNDERLYING
PROGRAMMING SYSTEM.
^IT HAS BEEN RECOGNIZED RECENTLY THAT STANDARD PROGRAMS SHOULD
NOT BOTHER THE USER TOO MUCH WITH AWKWARD INPUT SPECIFICATIONS.
^A STATISTICIAN IS MORE INTERESTED IN THE RESULTS OF A PROGRAM AND
IN HOW HE CAN OBTAIN THEM WITHOUT INTERFERENCE OF SOFTWARE SPECIALISTS OF THE
COMPUTER CENTRE, THAN IN A FEW SECONDS GAIN IN ACTUAL COMPUTING TIME.
^THEREFORE ONE OF THE OBJECTIVES WAS TO DEVELOP A PROGRAM WHICH
WOULD ENABLE THE USER TO SPECIFY THE FORM OF THE REGRESSION MODEL AND
THE ARRANGEMENT OF THE NUMBERS IN HIS INPUT DATA, IN A STRAIGHTFORWARD MANNER,
EVEN IF HE HAS HARDLY ANY KNOWLEDGE OF PROGRAMMING LANGUAGES AT ALL.
^THE PROGRAM THUS ACCEPTS A MODEL FORMULA WHICH
RESEMBLES THE NOTATION OF REGRESSION MODELS IN COMMON STATISTICAL LITERATURE
QUITE CLOSELY; THE ACCOMPANYING INPUT FORMULA INDICATES WHICH NUMBERS OR
SERIES OF NUMBERS IN THE INPUT DATA BELONG TO WHICH VARIABLE IN THE MODEL FORMULA.
^THIS INPUT SCHEME GIVES THE USER THE OPPORTUNITY TO WORK WITH EXISTING DATA AND
TO PROCESS POSSIBLE TRANSFORMATIONS OF THE VARIABLES IN THE MODEL FORMULA WITHOUT
THE NEED FOR EXTERNAL DATA ADJUSTMENT. ^HOWEVER, THE
MODEL FORMULA MUST BE GIVEN EXPLICITLY AND COMPLETELY AND THE
STRUCTURE OF THE INPUT DATA MUST BE KNOWN EXACTLY BEFORE AN INPUT FORMULA CAN BE
CONSTRUCTED.
^IN THE OUTPUT THE RESULTS ARE IDENTIFIED WITH THE NAMES GIVEN BY THE
USER IN THE MODEL FORMULA. ^ANY TECHNICAL (MACHINE DEPENDENT)
ACTION REQUIRED TO RUN THE PROGRAM IS DESCRIBED IN APPENDIX 1.
.SK2;<ACKNOWLEDGEMENTS
^THE BASIC IDEAS FOR THE INPUT ARRANGEMENT PRESENTED IN THIS DOCUMENT
ORIGINATED FROM <A.P.B.M.#<VEHMEYER.
^THE ARTICLE '<ALGOL 60 TRANSLATION FOR EVERYBODY' BY <F.E.J.#<KRUSEMAN#<ARETZ
[6] PROVIDED THE FOUNDATION FOR THE TRANSLATOR AND THE EXECUTION SECTION OF THE PROGRAM,
WHILE THE TECHNIQUES USED IN THE STORAGE SECTION WERE DESIGNED BY <L.J.#<OOSTRIJK.
^THE MATRIX AND LEAST SQUARES ROUTINES USED IN THE PROGRAM WERE COPIED FROM
'<ALGOL 60 PROCEDURES IN NUMERICAL ALGEBRA, PART 1', BY <T.J.#<DEKKER [2].
^THE CLASSIC BOOK '^APPLIED ^REGRESSION ^ANALYSIS' BY <N.R.#<DRAPER _& ^H.#<SMITH [1] SERVED AS A FRAME OF REFERENCE THROUGHOUT THE DESIGN AND IMPLEMENTATION OF THE PROGRAM.
^NUMEROUS RECENTLY INCORPORATED STATISTICAL IMPROVEMENTS WERE SUGGESTED BY <R.D.#<GILL.
.PAGE;.C;<CHAPTER 1
.SK;.C;^^MULTIPLE LINEAR REGRESSION ANALYSIS\\
.SK2;1.1##^^THE REGRESSION MODEL\\
^IN A REGRESSION PROBLEM THE RESEARCHER POSTULATES A CERTAIN RELATION- SHIP
BETWEEN A RANDOM VARIABLE Y (THE REALIZATIONS OF WHICH ARE SUBJECT
TO SOME FORM OF DISTURBANCE) ON THE ONE SIDE AND A NUMBER OF VARIABLES
X1,...,XP
(WHICH ARE WITHOUT OR AT LEAST ALMOST WITHOUT DISTURBANCES) ON THE
OTHER SIDE. ^THIS RELATIONSHIP IS EXPRESSED BY A MATHEMATICAL FORMULA,
WHICH IS CALLED THE (LINEAR) REGRESSION MODEL, FOR INSTANCE:
.TS72;.SK;.I18;Y = A0 + A1 * X1 +#...#+ AP * XP + E (1)
.SK;IN WHICH A0,...,AP REPRESENT UNKNOWN REGRESSION COEFFICIENTS (PARAMETERS)
WHICH ARE TO BE ESTIMATED AND E REPRESENTS THE DISTURBANCE.
^IF A CONSTANT TERM IS PRESENT IN THE MODEL FORMULA (IN (1) THE A0), THE MODEL IS
SAID TO BE AN 'INTERCEPT#MODEL', IF NO CONSTANT TERM IS PRESENT, THE MODEL IS
CALLED A 'NO-INTERCEPT#MODEL'.
.BR;^THE VARIABLES X1,...,XP AND THE VARIABLE Y CAN ALSO REPRESENT
(OTHER) TRANSFORMED VARIABLES. ^THE RESEARCHER MIGHT HAVE REASONS
TO BELIEVE (FROM BACKGROUND INFORMATION CONCERNING THE EXPERIMENT)
THAT TRANSFORMATIONS ARE NECESSARY, FOR INSTANCE:
.BR;1)#TO OBTAIN NORMALLY DISTRIBUTED DISTURBANCES,
.BR;2)#TO OBTAIN A GREATER HOMOGENEITY OF THE VARIANCES OF THE DISTURBANCES,
.BR;3)#TO LINEARIZE NON-LINEAR REGRESSION MODELS (IF POSSIBLE).
.BR;^THE TRANSFORMED REGRESSION MODEL CAN BE WRITTEN AS:
.SK;.I5;^G(Y) = A0 + A1 * ^F1(X1,...,XM) +#...#+ AP * ^FP(X1,...,XM) + E (2)
.SK;IN WHICH ^G, ^F1,...,^FP REPRESENT THE TRANSFORMATIONS,
.BR;.I12;A0,...,AP REPRESENT THE PARAMETERS TO BE ESTIMATED,
.BR;.I20;Y REPRESENTS THE DEPENDENT VARIABLE,
.BR;.I12;X1,...,XM REPRESENT THE INDEPENDENT VARIABLES,
.BR;.I20;E REPRESENTS THE DISTURBANCE.
^THE CHOICE OF A TRANSFORMATION BY MEANS OF 'TRIAL AND ERROR' IS RATHER
TIME CONSUMING AND COSTLY. ^THE IMPORTANCE OF THE LOCATION PARAMETER MAKES
FOR THE DIFFICULTY. ^IT IS NOT UNUSUAL THAT ^LOG#(X) YIELDS NO IMPROVEMENT,
BUT THAT ^LOG#(C+X) GIVES BETTER RESULTS FOR A PARTICULAR CHOICE OF C.
^BECAUSE THIS HOLDS FOR ALMOST ANY TRANSFORMATION OF SOME IMPORTANCE,
WE MUST ACTUALLY SOLVE IN EACH CASE A NONLINEAR ADJUSTMENT PROBLEM. ^OFTEN
THOUGH, A SIMPLE FORM OF THE TRANSFORMATION IS SUGGESTED BY THE RESEARCHER
WHO IS BETTER ACQUAINTED WITH THE PECULIARITIES OF THE EXPERIMENT.
.SK2;1.2##^^LEAST SQUARES\\
^REGRESSION ANALYSIS CONSISTS IN FACT OF THE ADJUSTMENT OF A HYPERPLANE
OF THE REQUIRED DIMENSION TO THE DATA. ^THE FITTING IS DONE WITH THE METHOD
OF LEAST SQUARES, WHICH MEANS THAT THE SUM OF THE SQUARES OF THE DIFFERENCES
BETWEEN THE OBSERVED VALUES FOR Y AND THE ESTIMATED VALUES FOR THE EXPECTATION
OF Y, ARE MINIMIZED. ^THIS SUM OF SQUARES IS ALSO CALLED THE RESIDUAL
SUM OF SQUARES.
^IN MATRIX NOTATION THE REGRESSION MODEL CAN BE WRITTEN AS
(CF.#<DRAPER#_&#<SMITH [3] PP.#58-62):
.SK;.I30;^Y = ^XA + E (3)
.SK;IN WHICH ^Y IS A (N*1) RANDOM VECTOR OF OBSERVATIONS,
.BR;.I9;^X IS A (N*P) MATRIX OF KNOWN (FIXED) VALUES,
.BR;.I9;A IS A (P*1) VECTOR OF (UNKNOWN) PARAMETERS,
.BR;.I5;AND E IS A (N*1) RANDOM VECTOR OF DISTURBANCES.
.SK;^IT IS SUPPOSED THAT ^E(E)#=#0 AND VAR(E)#=#^ISIGMA_^2, IN WHICH ^I
IS THE UNIT MATRIX, THUS:
.SK;.I31;^E(^Y) = ^XA (4)
^THE SUM OF SQUARES OF THE DIFFERENCES BETWEEN THE OBSERVED VALUES OF ^Y AND
THE ESTIMATED VALUES FOR THE EXPECTATION OF ^Y THUS EQUALS:
.SK;.I17;(^Y-^XA)'(^Y-^XA) = <Y'Y - 2A'<X'Y + A'^X'^XA (5)
.SK;(FOR A'<X'Y IS A SCALAR AND THEREFORE EQUAL TO ^Y'^XA).
^CHOOSING AS LEAST SQUARES ESTIMATOR B THAT VALUE OF A WHICH MINIMIZES (5),
INVOLVES DIFFERENTIATING WITH RESPECT TO THE ELEMENTS OF A AND EQUATING
THE RESULT TO ZERO:
.SK;.I19;-2<X'Y + 2^X'^XB = 0,##THUS:##<X'Y = ^X'^XB (6)
.SK;^THIS SYSTEM IS CALLED THE NORMAL EQUATIONS.
^IF THE RANK OF ^X EQUALS P, <X'X IS NONSINGULAR AND THE INVERSE OF <X'X
EXISTS. ^IN THAT CASE THE SOLUTION OF THE NORMAL EQUATIONS CAN BE WRITTEN AS:
.SK;.I29;B = INV(<X'X)X'Y (7)
.SK;^OBSERVE THAT P _<= N MUST HOLD, IN ORDER THAT THE RANK OF ^X CAN BE P AT ALL.
^THEREFORE AT LEAST AS MANY OBSERVATIONS MUST BE MADE, AS THERE ARE PARAMETERS IN THE MODEL.
^ALSO OBSERVE THAT ^E(B)#=#INV(<X'X)^X'^E(^Y)#=#A, THUS B IS AN UNBIASED
ESTIMATOR OF A.
.SK;^THE LEAST SQUARES ESTIMATOR HAS THE FOLLOWING PROPERTIES:
.LM+3;.I-3;1.#^IT IS AN ESTIMATOR WHICH MINIMIZES THE SUM OF SQUARES OF
DEVIATIONS, IRRESPECTIVE OF ANY DISTRIBUTION PROPERTIES OF THE DISTURBANCES.
^THE ASSUMPTION THAT THE DISTURBANCES ARE NORMALLY DISTRIBUTED IS, OF COURSE,
NECESSARY FOR TESTS WHICH DEPEND ON THIS ASSUMPTION, SUCH AS T- OR ^F- TESTS,
OR FOR OBTAINING CONFIDENCE INTERVALS BASED ON T- OR ^F- DISTRIBUTIONS.
.BR;.I-3;2.#^ACCORDING TO THE ^GAUSS-^MARKOV THEOREM, THE ELEMENTS OF B ARE
UNBIASED ESTIMATORS, WHICH HAVE MINIMUM VARIANCE (OF ANY LINEAR FUNCTION OF
THE ^Y'S WHICH PROVIDES UNBIASED ESTIMATORS), AGAIN IRRESPECTIVE OF THE
DISTRIBUTION PROPERTIES OF THE DISTURBANCES.
.BR;.I-3;3.#^IF THE DISTURBANCES ARE MUTUALLY INDEPENDENT AND NORMALLY
DISTRIBUTED (WITH ^E(E)#=#0 AND VAR(E)#=#^ISIGMA_^2), THEN B IS ALSO THE
MAXIMUM LIKELIHOOD ESTIMATOR.
.LM-3;.SK;^THE VARIANCE-COVARIANCE MATRIX OF B IS:
.SK;.I25;VAR(B) = INV(^X'^X)SIGMA_^2 (8)
.SK;^THE VARIANCES ARE THE DIAGONAL AND THE COVARIANCES THE
OFF-DIAGONAL ELEMENTS.
.SK;^AN UNBIASED ESTIMATOR FOR SIGMA_^2 IS GIVEN BY:
.SK;.I23;S_^2 = (<Y'Y - B'<X'Y) / (N-P) (9)
.SK;^THE SQUARE ROOT OF THIS ESTIMATOR IS FREQUENTLY CALLED 'STANDARD
ERROR OF ESTIMATE'. ^IN THE PRINTED OUTPUT OF THE PROGRAM IT IS
INDICATED MORE PROPERLY AS 'STANDARD DEVIATION OF THE ERROR TERM'.
^LET VIJ BE THE ELEMENT IN THE I-TH ROW AND J-TH COLUMN OF INV(<X'X),
THEN SDI = S * ^SQRT(VII) ESTIMATES THE STANDARD DEVIATION OF BI, AND
CIJ = VIJ / ^SQRT(VII * VJJ) GIVES THE CORRELATION COEFFICIENT BETWEEN
BI AND BJ FOR I = 1,...,P AND J = 1,...,P. ^THUS:
.TS71;.SK;.I27;VII = (SDI / S)_^2 (10)
.BR;AND
.BR;.I10;VIJ = CIJ * ^SQRT(VII * VJJ) = CIJ * (SDI * SDJ) / S (11)
.SK; ^A FREQUENTLY USED STATISTICAL MEASURE FOR EVALUATING REGRESSION MODELS
IS THE MULTIPLE CORRELATION COEFFICIENT ^R WHICH IS DEFINED IN THE INTERCEPT MODEL AS THE SQUARE
ROOT OF THE PROPORTION OF THE CORRECTED TOTAL SUM OF SQUARES ACCOUNTED FOR BY THE
MODEL. ^IF THE CORRECTION FOR MEANS IS DENOTED BY NU_^2, WITH U = ^SUM(I,1,N,YI)/N,
THEN ^R CAN BE DEFINED BY:
.SK;.I7;^R_^2 = (B'^X'^Y-NU_^2)/(^Y'^Y-NU_^2) = 1 - (^Y'^Y-B'^X'^Y)/(^Y'^Y-NU_^2) (12)
.SK;^HOWEVER, WE MUST DIVIDE ^Y'^Y-B'^X'^Y BY N-P, NOT BY N, TO OBTAIN AN
UNBIASED ESTIMATOR OF SIGMA_^2, MOREOVER IT IS CUSTOMARY TO DIVIDE ^Y'^Y-NU_^2
BY N-1, NOT BY N. ^IF WE ADOPT BOTH MODIFICATIONS WE OBTAIN THE ADJUSTED
MULTIPLE CORRELATION COEFFICIENT, WHICH CAN THUS BE DEFINED BY:
.SK;.I11;ADJ(^R)_^2 = 1 - (N-1)/(N-P) * (^Y'^Y-B'^X'^Y)/(^Y'^Y-NU_^2) (13)
^IN THE NO-INTERCEPT MODEL THE CORRECTION FOR MEANS IS IGNORED, GIVING
AS DEFINITION OF ^R_^2: B'^X'^Y/^Y'^Y#=#1#-#(^Y'^Y-B'^X'^Y)/^Y'^Y,#
WHILE THE ADJ(^R)_^2 IS DEFINED CORRESPONDINGLY AS: 1#-#N/(N-P)#*#(^Y'^Y-B'^X'^Y)/^Y'^Y.
^R_^2 ITSELF IS OFTEN CALLED THE 'PROPORTION OF VARIATION EXPLAINED'
(CF.#<THEIL [11] PP.#178-179).
.PAGE;1.2.1##^^WEIGHTED LEAST SQUARES\\
^IT SOMETIMES HAPPENS THAT SOME OF THE OBSERVATIONS FOR THE DEPENDENT
VARIABLE ARE 'LESS RELIABLE' THAN OTHERS. ^THIS USUALLY MEANS THAT THE
VARIANCES OF THE OBSERVATIONS ARE NOT ALL EQUAL; IN OTHER WORDS THE MATRIX
^V#=#VAR(E) IS NOT OF THE FORM ^ISIGMA_^2, BUT IS DIAGONAL WITH UNEQUAL
DIAGONAL ELEMENTS. ^THE BASIC IDEA TO SOLVE THIS PROBLEM IS, TO TRANSFORM
^Y TO OTHER VARIABLES, WHICH DO APPEAR TO SATISFY THE USUAL TENTATIVE
MODEL ASSUMPTIONS, AND THEN APPLY THE USUAL (UNWEIGHTED) ANALYSIS
TO THE VARIABLES SO OBTAINED. ^THE ESTIMATES CAN THEN BE RE-EXPRESSED IN
TERMS OF THE ORIGINAL VARIABLES ^Y (CF.#<DRAPER#_&#<SMITH [3] PP.#77-81).
^LET THE ORIGINAL REGRESSION MODEL BE: ^Y#=#^XA#+#E, WITH ^E(E)#=#0 AND
VAR(E)#=#^VSIGMA_^2, WITH ^V DIAGONAL WITH UNEQUAL DIAGONAL ELEMENTS,
AND LET ^P#=#INV(^V). ^PREMULTIPLYING THE ORIGINAL REGRESSION MODEL
WITH ^Q#=#^SQRT(^P) GIVES AS TRANSFORMED REGRESSION MODEL:
.SK;.I29;<QY = ^Q^XA + ^QE (14)
.SK;WITH ^E(^QE)#=#0 AND VAR(^QE)#=#^ISIGMA_^2. ^THE NORMAL EQUATIONS THEN BECOME:
.SK;.I27;<(QX)'QY = (^Q^X)'^Q^XA (15)
.SK;GIVING AS SOLUTION IF THE INDICATED INVERSE MATRIX EXISTS:
.SK;.I16;B = INV((<QX)'QX)(QX)'QY = INV(<X'PX)X'PY (16)
.SK;WITH VARIANCE-COVARIANCE MATRIX:
.SK;.I23;VAR(B) = INV(^X'^P^X)SIGMA_^2 (17)
^IN PRACTICAL SITUATIONS IT IS OFTEN DIFFICULT TO OBTAIN SPECIFIC INFORMATION
ON THE FORM OF ^V AT FIRST. ^FOR THIS REASON IT IS SOMETIMES NECESSARY TO MAKE
THE (KNOWN TO BE ERRONEOUS) ASSUMPTION ^V#=#^I AND THEN ATTEMPT TO DISCOVER
SOMETHING ABOUT THE FORM OF ^V BY EXAMINING THE RESIDUALS FROM THE REGRESSION
ANALYSIS.
.SK2;1.2.2##^^RESIDUAL ANALYSIS\\
^THE VECTOR OF RESIDUALS ^D IS DEFINED AS THE DIFFERENCE BETWEEN THE VECTOR
OF OBSERVATIONS ^Y AND THE VECTOR OF FITTED VALUES ^Z, OBTAINED BY USING THE
REGRESSION EQUATION##^Z#=#^XB. ^SO ^D#=#^Y#-#^Z OR DI#=#YI#-#ZI FOR I#=#1,...,N.
^IF THE MODEL IS CORRECT, THE RESIDUAL MEAN SQUARE <MSE = S_^2 ESTIMATES SIGMA_^2, AND
THE ESTIMATED STANDARD DEVIATION OF THE FITTED VALUE ZI AT XI = (XI1,...,XIP)' IS:
.SK;.I21;SD(ZI)#=#S * ^SQRT(XI'INV(^X'^X)XI) (18)
.SK;WHICH CAN BE USED TO CONSTRUCT A CONFIDENCE INTERVAL FOR THE EXPECTED
VALUE OF YI:#^E(YI) AT XI = (XI1,...,XIP)', OR TO CONSTRUCT A PREDICTION
INTERVAL FOR THE MEAN OF H NEW OBSERVATIONS AT THIS POINT
(CF.#<DRAPER#_&#<SMITH [3] PP.#121-122).
^IN THE FIRST CASE THE CONFIDENCE INTERVAL IS:
.SK;.I11;ZI +- T(N-P-1,1-ALPHA/2) * S * ^SQRT(XI'INV(^X'^X)XI) (19)
.SK;AND IN THE SECOND CASE THE PREDICTION INTERVAL IS:
.SK;.I8;ZI +- T(N-P-1,1-ALPHA/2) * S * ^SQRT(1/H + XI'INV(^X'^X)XI) (20)
^RESEARCHERS OFTEN DIVIDE THE RESIDUALS DI BY S, RESULTING IN THE STANDARDIZED
RESIDUALS, WHICH CAN BE EXAMINED TO SEE IF THEY MAKE IT APPEAR THAT THE ASSUMPTION
EI/SIGMA ~ ^N(0,1) IS VIOLATED (CF.#<DRAPER#_&#<SMITH [3] PP.#86-97).
^IT MIGHT BE EXPECTED THAT ROUGHLY 95% OF THE
DI/S WERE BETWEEN THE LIMITS (-2,2).
^HOWEVER, THE VARIANCES OF THE
RESIDUALS ARE NOT CONSTANT BUT A FUNCTION OF THE ^X MATRIX (SEE (18)),
WHICH SUGGESTS AS STANDARDIZATION:
.SK;.I19;TI = DI / S / ^SQRT(1 - XI'INV(^X'^X)XI) (21)
.SK;GIVING THE STUDENTIZED RESIDUAL.
^THE MAXIMUM STUDENTIZED RESIDUAL CAN BE USED IN A TEST FOR DETECTING
OUTLIERS, AS FOLLOWS: LET T_^2#=#MAX(TI_^2),
THEN##MIN(1,#N#*#(1-^FISHER(1,#N-P-1,#T_^2*(N-P-1)/(N-P-T_^2)))) IS AN
'UPPER BOUND FOR THE RIGHT TAIL PROBABILITY OF THE LARGEST
ABSOLUTE STUDENTIZED RESIDUAL' (CF.#<LUND [9] PP.#473-474).
.SK2;1.3##^^TESTS OF HYPOTHESES\\
.SK;1.3.1##^^A PARTICULAR REGRESSION COEFFICIENT\\
^IF THE DISTURBANCES ARE MUTUALLY INDEPENDENT AND NORMALLY DISTRIBUTED
(WITH ^E(E)#=#0 AND VAR(E)#=#^ISIGMA_^2) AND WITH A (PRESET) LEVEL OF
SIGNIFI- CANCE ALPHA, A SIGNIFICANCE TEST FOR A PARTICULAR
REGRESSION COEFFICIENT CAN BE PERFORMED, OR MORE SPECIFICALLY:
THE NULL HYPOTHESIS IS:
.SK;.I10;^H0: AI = 0 (GIVEN THAT ALL OTHER AJ ARE IN THE MODEL),
.SK;WHICH IS TESTED AGAINST THE ALTERNATIVE HYPOTHESIS:
.SK;.I10;^H1: AI IS NOT EQUAL TO ZERO,
.SK;BY TREATING ^F^RI = BI_^2/VAR(BI) AS A REALIZATION OF A ^FISHER(1,N-P) VARIATE.
^HOWEVER, THIS TEST MUST BE USED WITH CAUTION, BECAUSE WITH THE (PRESET)
LEVEL OF SIGNIFICANCE ALPHA, ONLY ONE COEFFICIENT CAN BE TESTED
PROPERLY, WHILE THE COMPUTER OUTPUT LISTS STATISTICS FOR ALL COEFFICIENTS.
^IT SEEMS VERY TEMPTING TO TEST THE COEFFICIENTS SERIALLY ONE AT A TIME,
BUT ONE MUST KEEP IN MIND THAT IN DOING SO THE LEVEL OF SIGNIFICANCE
OF THE WHOLE TEST RISES ABOVE THE NOMINAL VALUE (CF.#<DRAPER#_&#<SMITH [3] P.#65).
.SK2;1.3.2##^^ANALYSIS OF VARIANCE\\
.TS16,22,40,54,64
^IN THE ANALYSIS OF VARIANCE TABLE THE DIFFERENT CONTRIBUTIONS TO THE TOTAL
UNCORRECTED SUM OF SQUARES <Y'Y (WHICH IS THE FIRST PART IN THE TABLE) ARE GIVEN
(CF.#<DRAPER#_&#<SMITH [3] PP.#57#_>).
^THE SECOND PART OF THE TABLE ASSUMES THE PRESENCE OF AN (UNKNOWN) CONSTANT
TERM IN THE MODEL; IF THIS TERM IS ABSENT, THE 'MEAN'-LINE
DISAPPEARS AND IN THE 'REGRESSION'-LINE P-1 CHANGES INTO P AND
B'^X'^Y-NU_^2 CHANGES INTO B'^X'^Y.
.SK;#####^THE THIRD PART OF THE TABLE IS ONLY PRESENT WHEN REPEATED OBSERVATIONS
FOR THE DEPENDENT VARIABLE ARE AVAILABLE, IN WHICH CASE:
.BR;K##IS THE NUMBER OF GROUPS OF REPLICATIONS,
.BR;MI IS THE NUMBER OF REPLICATIONS IN GROUP I, AND
.BR;^W = (W1,...,WK)', WITH WI = ^SUM(J,1,MI,YIJ) / ^SQRT(MI), FOR I = 1,...,K.
^THE FOURTH PART OF THE TABLE IS ONLY PRESENT IF A REDUCTION
IS REQUESTED AND POSSIBLE. <SSQ THEN STANDS FOR THE RESIDUAL SUM
OF SQUARES FROM A REGRESSION ANALYSIS WITH THE FIRST P-Q OUT OF THE
P INDEPENDENT VARIABLES (1#_<=#Q#_<=#P-1),
WHILE <SSE STANDS FOR THE RESIDUAL SUM OF SQUARES FROM A REGRESSION
ANALYSIS WITH THE ORIGINAL P INDEPENDENT VARIABLES.
.SK;.C;^ANALYSIS OF VARIANCE
.BR;SOURCE OF RIGHT##TAIL
.BR;VARIATION #DF SUM OF SQUARES MEAN SQUARE ^F-RATIO PROBABILITY
.BR;---------------------------------------------------------------------------
.BR;TOTAL #N ######<Y'Y
.BR;---------------------------------------------------------------------------
.BR;MEAN #1 #####NU_^2 #<MSM#=#NU_^2 ##<FRM #<P(FM_>=FRM)
.BR;REGRESSION P-1 #B'^X'^Y#-#NU_^2 ####<MSR ##<FRR #<P(FR_>=FRR)
.BR;RESIDUAL N-P ##<Y'Y#-#B'^X'^Y #<MSE#=#S_^2
.BR;---------------------------------------------------------------------------
.BR;#LACK OF FIT K-P ##<W'W#-#B'^X'^Y ####<MSL ##<FRL #<P(FL_>=FRL)
.BR;#PURE ERROR N-K ###<Y'Y#-#<W'W ####<MSP
.BR;---------------------------------------------------------------------------
.BR;#REDUCTION #Q ###<SSQ#-#<SSE ####<MSQ ##<FRQ #<P(FQ_>=FRQ)
.BR;---------------------------------------------------------------------------
^THE COLUMN 'MEAN SQUARE' IS OBTAINED BY DIVISION OF THE SUMS OF SQUARES
BY THEIR CORRESPONDING DEGREES OF FREEDOM.
^THE COLUMN '^F-RATIO' IS OBTAINED BY DIVISION OF THE MEAN SQUARES BY THE RESIDUAL MEAN SQUARE,
EXCEPT FOR THE LACK OF FIT ^F-RATIO, WHICH IS OBTAINED BY DIVISION OF
THE LACK OF FIT MEAN SQUARE BY THE PURE ERROR MEAN SQUARE, THUS:
.SK;.I3;<MSM = NU_^2/1, <MSR = (B'^X'^Y-NU_^2)/(P-1), <MSE = S_^2 = (^Y'^Y-B'^X'^Y)/(N-P),
.SK;.I5;<MSL = (^W'^W-B'^X'^Y)/(K-P), <MSP = (^Y'^Y-^W'^W)/(N-K), <MSQ = (^S^S^Q-^S^S^E)/Q,
.SK;.I7;<FRM = <MSM/MSE, <FRR = <MSR/MSE, <FRL = <MSL/MSP AND <FRQ = <MSQ/MSE.
^IF THE DISTURBANCES ARE MUTUALLY INDEPENDENT AND NORMALLY DISTRIBUTED
(WITH ^E(E)#=#0 AND VAR(E)#=#^ISIGMA_^2) AND WITH A (PRESET)
LEVEL OF SIGNIFI- CANCE ALPHA, A SIGNIFICANCE TESTS CAN BE PERFORMED FOR:
.SK;.LM+4;.I-4;1.##^THE MEAN OF THE OBSERVATIONS FOR THE DEPENDENT VARIABLE,
OR MORE SPECI- FICALLY: THE NULL HYPOTHESIS IS:
.SK;.I6;^H0: ^E(U) = 0,
.SK;WHICH IS TESTED AGAINST THE ALTERNATIVE HYPOTHESIS:
.SK;.I6;^H1: ^E(U) IS NOT EQUAL TO ZERO,
.SK;BY TREATING <FRM AS A REALIZATION OF A ^FISHER(1,N-P) VARIATE.
.SK;.I-4;2.##^THE REGRESSION EQUATION, OR MORE SPECIFICALLY:
THE NULL HYPOTHESIS IS:
.SK;.I6;^H0: A1 =#...#= AP = 0, EXCEPT FOR THE AI THAT DENOTES
.BR;.I30;THE CONSTANT TERM (IF PRESENT),
.SK;WHICH IS TESTED AGAINST THE ALTERNATIVE HYPOTHESIS:
.SK;.I6;^H1: AT LEAST ONE OF A1,...,AP IS NOT EQUAL TO ZERO,
.SK;BY TREATING <FRR AS A REALIZATION OF A ^FISHER(P-1,N-P) VARIATE.
.SK;.I-4;3.##^THE ADEQUACY (LINEARITY) OF THE MODEL, OR MORE SPECIFICALLY:
THE NULL HYPOTHESIS IS:
.SK;.I6;^H0: THE LINEAR MODEL IS ADEQUATE (THAT IS, NO MODEL SIGNIFI-
.BR;.I11;CANTLY IMPROVES THE PREDICTION OF ^Y OVER THE LINEAR MODEL),
.SK;WHICH IS TESTED AGAINST THE ALTERNATIVE HYPOTHESIS:
.SK;.I6;^H1: THE LINEAR MODEL IS NOT ADEQUATE,
.SK;BY TREATING <FRL AS A REALIZATION OF A ^FISHER(K-P,N-K) VARIATE.
.SK;.I-4;4.##^A SUBSET OF REGRESSION COEFFICIENTS, OR MORE SPECIFICALLY:
SUPPOSE WITHOUT LOSS OF GENERALITY THAT THE SUBSET CONSISTS OF THE
LAST Q COEFFICIENTS, THEN THE NULL HYPOTHESIS IS:
.SK;.I6;^H0: AR =#...#= AP = 0, WITH R = P-Q+1,
.SK;WHICH IS TESTED AGAINST THE ALTERNATIVE HYPOTHESIS:
.SK;.I6;^H1: AT LEAST ONE OF AR,...,AP IS NOT EQUAL TO ZERO,
.SK;BY TREATING <FRQ AS A REALIZATION OF A ^FISHER(Q,N-P) VARIATE.
.SK;.I-4;5.##^A LINEAR COMBINATION OF THE REGRESSION COEFFICIENTS,
OR MORE SPECIFI- CALLY: THE NULL HYPOTHESIS IS:
.SK;.I6;^H0: C'A = M, IN WHICH C IS A VECTOR OF CONSTANTS WITH ORDER Q+1,
.SK;WHICH IS TESTED AGAINST THE ALTERNATIVE HYPOTHESES:
.SK;.I6;^H1: C'A IS NOT EQUAL TO M,
.SK;BY SUBSTITUTING C'A = M IN THE ORIGINAL MODEL, SHIFTING THE KNOWN TERMS
TO THE LEFT HAND PART, COMBINING THE CORRESPONDING TERMS IN THE RIGHT HAND
PART, AND TESTING THE THUS DERIVED SO CALLED 'REDUCED MODEL'
BY TREATING <FRQ AS A REALIZATION OF A ^FISHER(Q,N-P) VARIATE.
.LM-4;.SK;^IN EACH CASE THE RIGHT TAIL PROBABILITY <P(F#_>=#<FR) CAN
BE FOUND IN THE LAST COLUMN OF THE ANALYSIS OF VARIANCE TABLE
(CF.#<DRAPER#_&#<SMITH [3] PP.#63-64, 68#_H-75).
.PAGE;.C;<CHAPTER 2
.SK;.C;^^THE INPUT TO THE PROGRAM\\
.SK2;2.0##^THE PURPOSE OF THE INPUT SYSTEM IS TO GIVE THE USER A
SIMPLE AND ADEQUATE FORMALISM TO TELL THE PROGRAM WHAT HE WANTS.
^IN ORDER TO SPECIFY THE REGRESSION MODEL AND THE CORRESPONDENCE
BETWEEN THE VARIABLES IN THE MODEL AND THE NUMBERS IN THE
INPUT DATA, THE USER MUST PROVIDE A SO CALLED USER PROGRAM, WHICH
CONSISTS OF ONE OR MORE JOBS WHICH IN THEIR TURN ARE MADE UP OUT OF ONE OR MORE SPECIFICATIONS.
^EACH SPECIFICATION STARTS WITH A KEYWORD AND TERMINATES AT THE NEXT
KEYWORD. ^THE KEYWORDS ARE: "^MODEL",
"^INPUT", "^OPTIONS", "^DATA", "^RUN", "^EXIT" AND "^HELP" (QUOTES INCLUDED).
.SK2;2.1##^^THE MODEL SPECIFICATION\\
^TO LET THE PROGRAM KNOW BETWEEN WHICH VARIABLES THE STATISTICIAN EXPECTS
A CERTAIN KIND OF RELATIONSHIP, HE MUST PROVIDE A MODEL SPECIFI- CATION,
WHICH CONSISTS OF THE KEYWORD "^MODEL" FOLLOWED BY A FORMULA (THE MODEL
STATEMENT), WHICH RESEMBLES THE NOTATION OF REGRESSION MODELS IN COMMON STATISTICAL
LITERATURE QUITE CLOSELY. ^FOR INSTANCE:
.SK;.C;"^MODEL" Y = ALPHA0 + ALPHA1 * X1 + ALPHA2 * X2;
^A MODEL FORMULA CONSISTS OF AN IDENTIFIER TO DENOTE THE DEPENDENT
VARIABLE (THE LEFT HAND PART), FOLLOWED BY AN '='#(EQUAL), FOLLOWED BY
THE SUM OF A NUMBER OF TERMS (THE RIGHT HAND PART), WHILE IT IS TERMINATED
WITH A ';'#(SEMICOLON). ^EACH TERM MUST
BE THE PRODUCT OF AN IDENTIFIER TO DENOTE THE PARAMETER (WHICH IS TO BE
ESTIMATED) AND AN IDENTIFIER TO DENOTE THE INDEPENDENT VARIABLE.
^AN EXCEPTION IS MADE FOR THE OPTIONAL CONSTANT TERM,
WHICH IS GIVEN AS A SINGLE IDENTIFIER DENOTING THAT CONSTANT TERM,
AND WHICH MAY BE PLACED ANYWHERE IN THE MODEL.
.SK;#####^EACH IDENTIFIER MUST START WITH A LETTER AND IS ALLOWED TO CONTAIN ANY
NUMBER OF LETTERS, DIGITS AND BLANKS.
^AS MOST PERIPHERAL EQUIPMENT OF A COMPUTER IS UNABLE TO
PROCESS SUB- OR SUPERSCRIPTIONS OR ^GREEK LETTERS, WE WRITE ALPHA0, ALPHA1
AND ALPHA2. ^IDENTIFIERS HAVE NO INHERENT MEANING, BUT SERVE FOR THE
IDENTIFICATION OF VARIABLES, PARAMETERS AND FUNCTIONS. ^THEY MAY BE CHOSEN
FREELY (EXCEPT FOR THE TWENTYONE STANDARD FUNCTION NAMES AND THE TEN OPTION NAMES,
CF.#SECTION 2.1.1 AND SECTION 2.3). ^IT IS ADVISED NOT TO USE THE SAME
IDENTIFIER TO DENOTE TWO (OR MORE) DIFFERENT QUANTITIES; FOR
REGRESSION PARAMETERS, HOWEVER, IT WILL NOT LEAD TO FATAL ERRORS, WHEREAS FOR
THE DEPENDENT AND INDEPENDENT VARIABLES DISTINGUISHABLE IDENTIFIERS MUST BE USED INDEED.
^CORRECT MODEL FORMULAE ARE FOR INSTANCE:
.SK;.C;"^MODEL" Y VARIABLE = CONSTANT + PARAMETER * X VARIABLE;
.BR;AND
.BR;.C;"^MODEL" DEPVAR = CONST + BETA1 * XVAR1 + BETA2 * XVAR2;
.SK2;2.1.1##^^TRANSFORMATIONS\\
^ALMOST ALL TRANSFORMATIONS A USER WOULD LIKE TO PERFORM ON HIS INPUT DATA FIT
QUITE NATURALLY IN THE MODEL FORMULA: EACH TRANSFORMATION IS EXPRESSED AS A
FORMULA ITSELF. ^IF, FOR INSTANCE, THE USER WANTS TO INCLUDE IN THE MODEL FORMULA
AS AN INDEPENDENT VARIABLE THE NATURAL LOGARITHM OF THE SUM OF TWO OTHER
VARIABLES, CALLED XVAR1 AND XVAR2, HE WRITES:
.SK;.C;^LN (XVAR1 + XVAR2)
.SK;^IN MODEL FORMULAE THE OPERATORS '+'#(PLUS), '-'#(MINUS), '*'#(ASTERISK),
AND '/'#(SLASH) ARE ALLOWED, ALL WITH THEIR CONVENTIONAL MEANING OF
ADDITION, SUBTRACTION, MULTIPLICATION AND DIVISION RESPECTIVELY.
^OF COURSE THE NORMAL OPERATOR PRECEDENCE RULES ARE OBEYED.
^SPECIAL OPERATORS ARE: ':'#(COLON), INTEGER DIVISION
AND##'_^'#(UPARROW), EXPONENTIATION.
.BR;^THE OPERATION TERM#:#FACTOR IS DEFINED ONLY FOR OPERANDS BOTH OF
TYPE INTEGER AND WILL YIELD A RESULT OF TYPE INTEGER, WITH THE SAME
SIGN AS WOULD BE OBTAINED BY NORMAL DIVISION, WHILE THE MAGNITUDE IS
FOUND BY DIVIDING THE TWO QUANTITIES AND TAKING THE WHOLE PART; MATHEMATICALLY
IT CAN BE DEFINED AS: A#:#B#=#^SIGN#(A#/#B)#*#^ENTIER#(^ABS#(A#/#B)),
.BR;FOR INSTANCE: 5#:#2#=#2 AND -7#:#2#=#-3.
.BR;^THE OPERATION FACTOR#_^#PRIMARY DENOTES EXPONENTIATION, WHERE
THE FACTOR IS THE BASE AND THE PRIMARY IS THE EXPONENT,
.BR;FOR INSTANCE: 5#_^#2#=#25 AND 2#_^#3#_^#2#=#64 BUT 2#_^#(3#_^#2)#=#512.
.SK;^ALSO THE FOLLOWING TWENTYONE STANDARD FUNCTIONS ARE ALLOWED:
.LM+3;.RM-3;.SK
^ABS#(^E), ^SIGN#(^E), ^SQRT#(^E), ^SIN#(^E), ^COS#(^E), ^TAN#(^E), ^LN#(^E),
^LOG#(^E), ^EXP#(^E), ^ENTIER#(^E), ^ROUND#(^E), ^MOD#(^E1,#^E2), ^MIN#(^E1,#^E2),
^MAX#(^E1,#^E2), ^ARCSIN#(^E), ^ARCCOS#(^E), ^ARCTAN#(^E), ^SINH#(^E),
^COSH#(^E), ^TANH#(^E) AND ^INDICATOR#(^E1,#^E2,#^E3)
.LM-3;.RM+3;.SK
IN WHICH ^E, ^E1, ^E2 AND ^E3 ARE EXPRESSIONS IN TERMS OF VARIABLES,
OPERATORS AND STANDARD FUNCTIONS. ^ROUND (^E) IS DEFINED AS: ^ENTIER (^E + 0.5) AND
.BR;^INDICATOR (^E1,#^E2,#^E3) IS DEFINED AS: <IF ^E1 _<= ^E2 _<= ^E3 <THEN 1 <ELSE 0.
^THE DEPENDENT VARIABLE MAY BE TRANSFORMED IN A SIMILAR WAY AND AS A
CONSEQUENCE THE MODEL FORMULA IN ITS MOST GENERAL FORM LOOKS LIKE:
.SK;###"^MODEL" ^G (Y) = B0 + B1 * ^F1 (X1,...,XM) +#...#+ BP * ^FP (X1,...,XM);
.SK;^SOME EXAMPLES OF TRANSFORMED MODEL FORMULAE ARE:
.SK;.C;"MODEL" Y = A0 + A1 * ^SQRT (X1 + X2) + A2 * ^SQRT (X3);
.BR;AND
.BR;.C;"<MODEL" ^ARCSIN (^SQRT (^Y)) = ^A0 + ^A1 * ^X + ^A2 * ^X _^ 2;
^A USER CAN SPECIFY MODEL FORMULAE IN WHICH TERMS WITH KNOWN REGRESSION
COEFFICIENTS APPEAR, BY SUBTRACTING THOSE TERMS FROM THE LEFT HAND PART OF
THE MODEL FORMULA, FOR INSTANCE:
.SK;.C;"^MODEL" Y - 5.4321 * X3 = A0 + A1 * X + A2 * X _^ 2;
.SK;^THIS APPLIES ESPECIALLY TO THE CONSTANT TERM;#IF THIS TERM IS KNOWN IT MUST
BE SHIFTED TO THE LEFT HAND PART.
^IF WEIGHTS ARE PRESENT IN THE INPUT DATA (OR CAN BE COMPUTED OUT OF THE
INPUT DATA), TO INDICATE THAT THE VARIANCES OF THE OBSERVATIONS ARE NOT ALL
EQUAL (CF.#SECTION 1.2.1), THE LEFT#HAND#PART OF THE MODEL#FORMULA CAN BE EXPANDED WITH A SO CALLED
WEIGHT#PART (WHICH CAN BE AN EXPRESSION), PRECEEDED BY A '_&'#(AMPERSAND),
FOR INSTANCE:
.SK;.C;"^MODEL" ^DEPVAR _& ^MAX (^ABS (^WEIGHT), 10) = ^CONST + ^PARAM * ^INDEPVAR;
.SK2;2.2##^^THE INPUT SPECIFICATION\\
^TO INDICATE WHICH NUMBERS OR SERIES OF NUMBERS IN THE INPUT DATA
BELONG TO WHICH VARIABLE IN THE MODEL FORMULA AND WHICH NUMBERS CAN BE
SKIPPED, THE PROGRAM EXPECTS AN INPUT SPECIFICATION. ^IT CONSISTS OF
THE KEYWORD "^INPUT", FOLLOWED BY A FORMULA (THE INPUT STATEMENT)
WHICH DESCRIBES THE ARRANGEMENT OF THE OBSERVATIONS IN THE INPUT DATA,
WHILE IT IS TERMINATED WITH A ';'#(SEMICOLON). ^THE BASIC IDEA
IS THAT NUMBERS FROM THE INPUT DATA ARE IDENTIFIED WITH THE NAMES
FROM THE INPUT FORMULA IN SUCH A WAY THAT (IN ORDER OF ENTRY)
NUMBERS BELONGING TO THE SAME NAME ARE PUT IN A QUEUE APPENDED TO
THAT NAME. ^FOR INSTANCE:
.SK;.C;"^INPUT" 100 * (CODENR, 10 * [YVAR], [XVAR1, XVAR2], -1);
.SK;MEANS THAT ONE HUNDRED SERIES OF NUMBERS (EACH, AS A CHECK,
TERMINATED IN THIS EXAMPLE BY -1) ARE PRESENT IN THE INPUT DATA. ^EACH SERIES
CONSISTS OF FOURTEEN NUMBERS: FIRST ONE VALUE WHICH IS READ AND
ASSIGNED TO THE NAME CODENR, THEN TEN VALUES FOR THE NAME YVAR, THEN
ONE VALUE FOR THE NAME XVAR1, FOLLOWED BY ONE VALUE FOR THE NAME
XVAR2 AND FINALLY THE VALUE -1.
^THE BASIC CONSTITUENT OF AN INPUT FORMULA IS A VARIABLE
ENCLOSED IN SQUARE BRACKETS, IN THE EXAMPLE:#[YVAR]. ^THE
CORRESPONDING NUMBER FROM THE INPUT DATA WILL BE APPENDED TO THE
QUEUE FOR THAT NAME. ^SEVERAL VARIABLES CAN BE PUT TOGETHER
IN A VARIABLE LIST BY SEPARATING THEM BY COMMAS AND ENCLOSING THEM
IN SQUARE BRACKETS, IN THE EXAMPLE:#[XVAR1,#XVAR2]. ^THIS ONLY SERVES
TO SAVE THE WRITING OF SEVERAL OPENING AND CLOSING BRACKETS.
^SEPARATE NUMBERS, SERIES OR BLOCKS OF NUMBERS CAN BE TREATED BY
PUTTING A REPETITION FACTOR (CONTROL) FOLLOWED BY AN ASTERISK
IN FRONT OF A VARIABLE LIST (OR IN FRONT OF AN INPUT FORMULA
WHICH MUST THEN BE ENCLOSED IN PARENTHESES), IN THE EXAMPLE: 100 *
AND 10 * .
^IF A REPETITION FACTOR IS 1, IT MAY BE OMITTED TOGETHER WITH THE
ASTERISK AND A PARENTHESES PAIR, BUT SQUARE BRACKET PAIRS MUST REMAIN.
^WHEN A NAME IS USED AS A REPETITION FACTOR, A VALUE MUST ALREADY
HAVE BEEN ASSIGNED TO IT, WHICH IS DONE BY GIVING THAT NAME, WITHOUT
SQUARE BRACKETS AND FOLLOWED BY A COMMA (OR CLOSING PARENTHESIS), EARLIER IN
THE INPUT FORMULA THAN THE USE OF THAT NAME AS A REPETITION FACTOR. ^THE
CORRES- PONDING NUMBER FROM THE INPUT DATA IS THEN ASSIGNED AS A VALUE TO
THAT NAME. ^IF SUCH NAMES ARE USED REPEATEDLY IN THE INPUT FORMULA,
THE CORRESPONDING NUMBERS FROM THE INPUT DATA ARE COMPARED WITH THE FIRST ONE
AND, IN THE CASE OF INEQUALITY, AN ERROR MESSAGE IS SUPPLIED.
^THIS MAY SERVE AS A CHECK AGAINST SHIFTED DATA READING.
^A SIMILAR CHECK CAN BE OBTAINED BY GIVING AN EXPLICIT NUMBER FOLLOWED BY A
COMMA (OR CLOSING PARENTHESIS), IN THE EXAMPLE:#THE -1.
^THE CORRESPONDING NUMBER FROM THE INPUT DATA IS THEN COMPARED WITH
THAT GIVEN NUMBER AND, IN THE CASE OF INEQUALITY, AN ERROR
MESSAGE IS PRODUCED.
^ALSO AN EXPRESSION IS ALLOWED AS A REPETITION
FACTOR, OR FOR THAT MATTER, AS A CHECK VALUE, PROVIDED THAT IT IS
ENCLOSED IN ANGLE BRACKETS, FOR INSTANCE: _<K+N_>. ^AS IN THE CASE OF SINGLE NAMES USED
AS A REPETITION FACTOR EACH (NON-STANDARD#FUNCTION AND NON-OPTION) NAME USED IN
SUCH A (SPECIAL) EXPRESSION MUST HAVE BEEN GIVEN,
FOLLOWED BY A COMMA, EARLIER IN THE INPUT FORMULA THAN THE USE
OF THAT NAME IN THE EXPRESSION.
^THE LINKAGE BETWEEN THE MODEL FORMULA AND THE INPUT
FORMULA IS ESTABLISHED BY USING THE SAME NAMES IN THE MODEL TERMS
AND IN THE INPUT VARIABLE LISTS. ^NUMBERS FROM THE INPUT DATA THAT
BELONG TO SUCH INPUT NAMES WILL BE TREATED AS OBSERVATIONS FOR THE
MODEL VARIABLES, WHILE NUMBERS THAT BELONG TO INPUT NAMES BETWEEN
SQUARE BRACKETS WHICH DO NOT APPEAR IN THE MODEL FORMULA, ARE SKIPPED.
^OFTEN, REPEATED OBSERVATIONS FOR THE DEPENDENT VARIABLE ARE
AVAILABLE. ^IN ORDER TO BE ABLE TO PROCESS THESE OBSERVATIONS
AUTOMATICALLY, IT IS NECESSARY THAT A VARIABLE LIST CONSISTING
ENTIRELY OF DEPENDENT VARIABLES IS PRECEEDED BY A REPETITION FACTOR
(FOLLOWED BY AN ASTERISK) INDICATING THE NUMBER OF REPETITIONS.
^IF A VARIABLE LIST CONTAINS INDEPENDENT AS WELL AS DEPENDENT
VARIABLES, THE NUMBER OF REPLICATIONS IS ASSUMED TO BE 1. ^A SERIES
OF (SAY 100) OBSERVATIONS FOR A DEPENDENT VARIABLE WITH NO
REPLICATIONS IS DENOTED AS:
.SK;.C;100 * ([DEP VAR])
.SK;^THE REPETITION FACTOR IN FRONT OF THE OPENING SQUARE BRACKET IS
OMITTED (BECAUSE IT IS 1), ALTHOUGH THE PARENTHESES ARE NOT.
^WITHOUT THE PARENTHESES IT WOULD MEAN 100 REPLICATIONS OF [DEP VAR].
.SK;<EXAMPLE####"^INPUT" K, N, _<K+N_> * (C, M, M * [Y], [X1,X2,X3,X4], C), -99;
.LM+12;.SK;.I-12;MEANS THAT:#THE FIRST NUMBER IS READ AND ITS VALUE ASSIGNED TO K,
.BR;THE NEXT NUMBER IS READ AND ITS VALUE ASSIGNED TO N,
.BR;THEN K+N TIMES THE FOLLOWING HAPPENS:
.LM+5;.BR;A NUMBER IS READ AND ITS VALUE ASSIGNED TO C,
.BR;THE NEXT NUMBER IS READ AND ITS VALUE ASSIGNED TO M,
.BR;THEN THE M REPLICATIONS FOR Y ARE READ,
.BR;NEXT THE OBSERVATIONS FOR X1, X2, X3 AND X4 ARE READ,
.BR;THEN A NUMBER IS READ AND ITS VALUE COMPARED WITH C,
.LM-5;.BR;FINALLY A NUMBER IS READ AND ITS VALUE COMPARED WITH -99.
.LM-12;.SK;^IF THE COMPARISONS FAIL, AN ERROR MESSAGE IS SUPPLIED AND EXECUTION OF THE JOB IS TERMINATED,
OTHERWISE (K+N) OBSERVATIONS FOR X1, X2, X3, X4 AND FOR EACH QUADRUPLE M REPLICATIONS FOR Y, HAVE BEEN IDENTIFIED.
.SK2;2.3##^^THE OPTION SPECIFICATION\\
^IT IS POSSIBLE TO HAVE THE PROGRAM PERFORM SOME TASKS OPTIONALLY
BY PROVIDING AN OPTION SPECIFICATION IN A JOB. ^IT CONSISTS OF
THE KEYWORD "^OPTIONS" FOLLOWED BY A LIST OF OPTION IDENTIFIERS OR
CORRESPONDING OPTION NUMBERS (THE OPTION STATEMENT), SEPARATED BY COMMAS AND TERMINATED WITH A
';'#(SEMICOLON). ^THE FOLLOWING TEN OPTIONS ARE AVAILABLE:
.SK;.TS20,40;.I15;OPTION NUMBER ##OPTION NAME
.SK;# #1 ^TRANSFORMED DATA MATRIX
.BR;# #2 ^CORRELATION MATRIX
.BR;# #3 ^RESIDUAL ANALYSIS
.BR;# #4 ^NO REGRESSION ANALYSIS
.BR;# #5 ^PROCESS SUBMODELS
.BR;# #6 ^PRINT INPUT DATA
.BR;# #7 ^NO INPUT DATA REWIND
.BR;# #8 ^SAVE ORIGINAL MODEL
.BR;# #9 ^TEST REDUCED MODEL
.BR;# 10 ^MISSING VALUES
^OPTIONS 1, 2, 3 AND 6 CAUSE THE CORRESPONDING PIECE OF INFORMATION TO BE
PRINTED. ^HOWEVER, OPTION 1 LISTS ONLY THOSE (POSSIBLY TRANSFORMED) VARIABLES
THAT ARE PRESENT IN THE MODEL FORMULA IN A NEAT TABULAR FORM, WHILE OPTION 6
LISTS ALL THE ORIGINAL INPUT DATA SERIALLY (ELEVEN NUMBERS PER LINE) WITHOUT
ANY SPECIAL LAYOUT, BECAUSE THE INPUT DATA CONSISTS (BY DEFINITION) OF AN
UNSTRUCTURED SERIES OF NUMBERS (CF.#SECTION 2.4).
.SK;.I5;^OPTION 4 SUPPRESSES THE REGRESSION ANALYSIS; IT IS MEANT TO BE USED IN
COMBINATION WITH OPTION 1 AND/OR 2.
^OPTION 5 CAUSES THE PROGRAM TO PROCESS SUBMODELS, WHICH ARE FORMED BY
A FORM OF BACKWARD ELIMINATION: EACH TIME THE LAST TERM FROM THE
RIGHT HAND PART FROM THE MODEL FORMULA IS OMITTED, BY DELETING
THE LAST COLUMN FROM THE DESIGN MATRIX, AND A REGRESSION ANALYSIS
IS PERFORMED WITH THE REDUCED DESIGN MATRIX. ^MESSAGES ARE GENERATED ABOUT
WHICH TERMS ARE OMITTED, WHILE FURTHER PROCESSING OF THE JOB CEASES WHEN THE
RESULTING MODEL FORMULA IS OF THE FORM: Y#=#C. ^MOREOVER A TEST IS MADE (UNDER THE
USUAL ASSUMPTIONS) WHETHER THE OMITTED TERMS DID CONTRIBUTE
SIGNIFICANTLY TO THE REGRESSION SUM OF SQUARES (CF.#SECTION 1.3.2.4).
^TO OPTION 5 A SPECIFIER LIST
MAY BE APPENDED, TO PREVENT THE PRO- DUCTION OF WASTE OUTPUT FOR UNWANTED
SUBMODELS. ^IN THIS LIST THE NUMBER OF TERMS TO BE OMITTED FROM THE MODEL FORMULA (COUNTING BACKWARDS, STARTING AT THE END) MUST BE GIVEN
ENCLOSED IN PARENTHESES. ^FOR EXAMPLE THE OPTION: PROCESS SUBMODELS (6, 10)
INSTRUCTS THE PROGRAM TO PROCESS ONLY TWO SUBMODELS, ONE WITH THE LAST
SIX TERMS OMITTED AND ONE WITH THE LAST TEN TERMS OMITTED (FROM THE ORIGINAL
MODEL#FORMULA). ^IF THE USER ASKS FOR MORE TERMS TO BE OMITTED THAN ARE PRESENT IN THE
MODEL FORMULA, AN ERROR MESSAGE IS SUPPLIED AND THE EXECUTION OF THAT JOB IS
TERMINATED. ^MOREOVER, IF NO EXPLICIT SPECIFIER LIST IS APPENDED TO
OPTION 5, THE OPTIONS 2 AND 3 YIELD NO EFFECT
(EVEN IF SPECIFIED), WHICH IS ALSO TO PREVENT THE PRODUCTION OF WASTE OUTPUT FOR THE SUBMODELS.
^OPTION 7 GIVES THE USER THE OPPORTUNITY TO PROCESS CONSECUTIVE
PIECES OF INPUT DATA IN CONSECUTIVE JOBS. ^NORMALLY THE PROCESSING OF THE
INPUT DATA FOR EACH JOB STARTS WITH THE FIRST NUMBER IN THE DATA SPECIFICATION
(OR WITH THE FIRST NUMBER IN THE DATASTREAM), AND THE PROGRAM
GIVES A (WARNING) MESSAGE IF THE INPUT FORMULA DOES NOT MATCH THE INPUT DATA
PRECISELY. ^THIS OPTION DISENGAGES THE MESSAGE AND CAUSES THE
PROGRAM TO CONTINUE PROCESSING INPUT DATA WHERE THE PREVIOUS JOB HAD FINISHED.
^OPTION 8 CAUSES THE RESIDUAL DEGREES OF FREEDOM AND RESIDUAL SUM OF
SQUARES FROM THE CURRENT JOB TO BE SAVED, IN ORDER TO BE ABLE IN THE NEXT JOB,
BY MEANS OF SPECIFYING OPTION 9, TO TEST WHETHER THE MODEL UNDER
CONSIDERATION IN THAT NEXT JOB, SHOWS A SIGNIFICANT INCREASE IN RESIDUAL
SUM OF SQUARES IN COMPARISON WITH THE MODEL IN THE PREVIOUS JOB.
^IN EFFECT THIS GIVES THE POSSIBILITY OF TESTING A HYPOTHESIS
CONCERNING A LINEAR COMBINATION OF THE PARAMETERS FROM A MODEL (CF.#SECTION 1.3.2.5),
FOR INSTANCE (CF.#<SEARLE [11] PP.#121-122):
.LM+13;.SK;"^MODEL 1"##Y = B1 * X1 + B2 * X2 + B3 * X3;
.BR;"^OPTIONS"##^SAVE ORIGINAL MODEL;
.BR;"^RUN"
.BR;"^MODEL 2"##Y - 4 * X1 = B2 * (X1 + X2) + B3 * X3;
.BR;"^OPTIONS"##^TEST REDUCED MODEL;
.BR;"^RUN"
.LM-13;.SK;CAUSES THE NULL HYPOTHESIS:#B1 = B2 + 4 TO BE TESTED (IN THE SECOND JOB).
^OPTION 10 MAY BE USED TO IDENTIFY SOME OBSERVATIONS OR REPETITIONS AS#'MISSING'.
^IN A SPECIFIER LIST, APPENDED TO THIS OPTION, THE MISSING VALUES MUST BE GIVEN ENCLOSED IN PARENTHESES.
^WHEN A REPETITION EQUAL TO A MISSING VALUE IS ENCOUNTERED IN THE INPUT DATA,
THE CORRESPONDING SET OF REPETITIONS FOR THE DEPENDENT VARIABLE(S) IS NOT
INCLUDED IN THE DESIGN MATRIX.
^WHEN AN OBSERVATION EQUAL TO A MISSING VALUE IS ENCOUNTERED IN THE INPUT DATA,
OR WHEN NONE OF THE REPETITIONS ARE INCLUDED IN THE DESIGN MATRIX, THE
CORRESPONDING SET OF OBSERVATIONS FOR THE INDEPENDENT VARIABLES TOGETHER WITH
THE (POSSIBLY EMPTY) SET OF REPETITIONS FOR THE DEPENDENT VARIABLE(S)
(I.E.#THE 'CASE') IS NOT INCLUDED IN THE DESIGN MATRIX.
.SK2;2.4##^^THE DATA SPECIFICATION\\
^THE DATA SPECIFICATION CONSISTS OF AN UNSTRUCTURED SERIES OF NUMBERS
(THE DATA STATEMENT), PRECEEDED BY THE KEYWORD "^DATA" AND TERMINATED AT
THE NEXT KEYWORD. ^THE STRUCTURE IS IMPOSED ONTO IT BY THE INPUT FORMULA.
^A SEQUENCE OF SYMBOLS IS CONSIDERED A NUMBER WHEN IT SATISFIES
THE DEFINITION OF NUMBER IN APPENDIX 2, TOGETHER WITH THE MACHINE DEPENDENT
RESTRICTIONS IMPOSED BY THE UNDERLYING <SIMULA PROGRAMMING SYSTEM.
^NOTE THAT THE DEFINITION OF NUMBER DOES NOT ALLOW ^^FORTRAN\\-LIKE
NUMBERS AS FOR INSTANCE##10.#OR#1_# (AT LEAST ONE DIGIT MUST FOLLOW).
^IT IS RECOMMENDED ALWAYS TO USE BLANKS AS DELIMITERS BETWEEN NUMBERS,
BUT THE USE OF OTHER NON-NUMERICAL SYMBOLS AS DELIMITERS WILL NOT
LEAD TO FATAL ERRORS, ONLY TO A SLIGHT INCREASE IN PROCESSING TIME.
^IF A DATAFILE IS SPECIFIED IN RESPONSE TO THE DATASTREAM REQUEST,
OR IS APPENDED TO THE "^RUN" KEYWORD (CF.#APPENDIX#1),
THE PROGRAM WILL TRY TO READ A RECORD OF INPUT DATA, WHICH THEN DO NOT HAVE TO BE
PRECEEDED BY THE KEYWORD "^DATA", FROM THAT FILE. ^THE RECORD ENDS AT THE
NEXT "^EOR"#KEYWORD. ^HOWEVER, A NON EMPTY DATA SPECIFICATION IN THE USER
PROGRAM WILL GET PRIORITY OVER READING INPUT DATA FROM THE SPECIFIED FILE,
WHILE AN EMPTY DATA SPECIFICATION CAUSES THE PROGRAM TO START READING
THE NEXT RECORD OF INPUT DATA FROM THE SPECIFIED FILE. ^IF THE DATA
SPECIFICATION AS WELL AS THE NEXT NONEMPTY RECORD IN THE DATA FILE DOES NOT
CONTAIN ANY NUMERICAL INFORMATION, AN ERROR MESSAGE IS SUPPLIED.
.SK;.TS20,40;<EXAMPLES REAL NUMBER ####VALUE
.SK;# ###1.234 ######1.234
.BR;# ####.98 ######0.98
.BR;# ##-0.5673_#2 ####-56.73
.BR;# ###+.02_#-1 ######0.002
.BR;# ####_#+3 ###1000.0
.SK2;2.5##^^THE USER PROGRAM\\
^IN A USER PROGRAM SEVERAL JOBS CAN BE SUBMITTED TO THE PROGRAM.
^EACH JOB IS SEPARATED FROM ITS PRECEDING ONE BY THE KEYWORD "^RUN",
WHILE THE ENTIRE USER PROGRAM IS TERMINATED WITH THE KEYWORD "^EXIT".
^IN THE FIRST JOB, THE MODEL, INPUT AND DATA SPECIFICATION MUST BE GIVEN
IN SOME ORDER. ^THE OPTION SPECIFICATION IS, OF COURSE, OPTIONAL. ^IN EACH
FOLLOWING JOB A SPECIFICATION WHICH IS NOT CHANGED MAY BE OMITTED, THE
PROGRAM THEN RETAINS THE LAST GIVEN SPECIFICATION. ^IF OPTIONS HAVE BEEN
SPECIFIED IN A PREVIOUS JOB AND ONE WANTS TO DELETE THEM, THIS IS DONE
BY PROVIDING A NEW OPTION SPECIFICATION WHICH MAY BE EMPTY IF NO OPTIONS ARE TO BE
EXECUTED (THAT IS BY ONLY PROVIDING: "^OPTIONS";).
^IN FRONT OF EACH JOB OR IN FRONT OF THE KEYWORD "^EXIT" A TEXT
MAY BE GIVEN FOR FURTHER IDENTIFICATION OF THE OUTPUT OF A JOB OR OF
THE OUTPUT OF THE ENTIRE USER PROGRAM. ^THE USE OF QUOTES IN THAT TEXT
SHOULD BE AVOIDED IN VIEW OF CONFUSION WITH THE KEYWORDS. ^THE PROGRAM
STARTS READING A (POSSIBLY EMPTY) TEXT AT THE BEGINNING OF THE NEXT LINE AFTER THE KEYWORD "^RUN" OF
THE PREVIOUS JOB (WITH THE FIRST JOB THE PROGRAM STARTS WITH THE
FIRST LINE IN THE INPUTSTREAM).
.SK2;2.6##^^EXAMPLES\\
^THE FOLLOWING USER PROGRAM CAN BE SUBMITTED WITHOUT ANY MODIFICATION TO THE##^MULTIPLE ^LINEAR ^REGRESSION ^ANALYSIS PROGRAM.
^IT CONSISTS OF FOUR JOBS, THREE OF WHICH ARE PRECEEDED BY AN IDENTIFYING HEADER,
WHILE THE WHOLE USER PROGRAM ENDS WITH AN IDENTIFYING TRAILER:
.SK2;.TS8,16,24,32,40,48,56,64,72;.LITERAL
**********************************
* Example 1 originates from: *
* DE JONGE [4], pp. 472 & 479. *
**********************************
"Model" y = c * Log (x) + a + b * x;
"Input" 5 * ([x], 10 * [y]);
"Options" Transformed data matrix, Correlation matrix,
Residual analysis, Process submodels (1, 2);
"Data" 25 0.67 0.70 0.75 0.76 0.78 0.80 0.83 0.84 0.88 0.89
50 0.88 0.92 0.93 0.96 0.98 1.00 1.01 1.03 1.06 1.07
80 0.96 0.98 0.99 1.03 1.05 1.06 1.08 1.11 1.15 1.17
130 1.07 1.09 1.11 1.13 1.14 1.14 1.19 1.22 1.25 1.29
180 1.10 1.13 1.17 1.19 1.20 1.21 1.23 1.25 1.28 1.33
"Run"
**********************************
* Example 2 originates from: *
* SEARLE [11], pp. 121-123 *
**********************************
"Input" 5 * [y, x1, x2, x3];
"Data" 8 2 1 4
10 -1 2 1
9 1 -3 4
6 2 1 2
12 1 4 6
"Model 1" y = a3 * x3 + a2 * x2 + a1 * x1;
"Options" Save original model, Process submodels (1);
"Run"
"Model 2" y - 4 * x1 = b2 * (x1 + x2) + b3 * x3; (eqn. 118, p. 121)
"Options" Test reduced model, Transformed data matrix;
"Run"
****************************************
* Example 3 originates from: *
* AFIFI & AZEN [1], pp. 88 & 93-100. *
****************************************
"Model" y = alfa0 + alfa1 * x;
"Input" 5 * ([x], n, n * [y]);
"Option" Transformed data matrix, Print input data;
"Data" 1 4 1.1 0.7 1.8 0.4
3 5 3.0 1.4 4.9 4.4 4.5
5 3 7.3 8.2 6.2
10 4 12.0 13.1 12.6 13.2
15 4 18.7 19.7 17.4 17.1
"Run"
*** Marten van Gelderen; Mathematisch Centrum ***
"Exit"
.END LITERAL
.PAGE;.C;<CHAPTER 3
.SK;.C;^^THE OUTPUT FROM THE PROGRAM\\
.SK2;3.1##^^STANDARD AND OPTIONAL PRINTED OUTPUT\\
^AFTER HAVING READ THE KEYWORD "^RUN", THE PROCESSING OF THE JOB
IS INITIATED. ^FIRST THE MODEL, INPUT AND OPTION TEXTS ARE PRINTED IN
THIS ORDER. ^NEXT AN ATTEMPT IS MADE AT TRANSLATING THE SPECIFICATIONS.
^ERRORS AGAINST SYNTAX OR SEMANTICS CAUSE ERROR MESSAGES TO BE
PRINTED BELOW EACH SPECIFICATION, WHILE FURTHER PROCESSING OF THAT JOB
CEASES. ^NOTE THAT THE PROCESSING OF THE NEXT JOB, IF PRESENT, WILL BE OF LITTLE OR NO USE
UNLESS THE SPECIFICATION WHICH DEVELOPED THE ERROR(S) IS CHANGED.
^NEXT THE (TRANSFORMED) DATA MATRIX IS FORMED AND PASSED TO THE
REGRESSION ROUTINES, WHICH SUPPLY THE FOLLOWING PRINTED OUTPUT IN THE ORDER
INDICATED:
.LM+3;.SK;#1)#A LISTING OF THE ORIGINAL INPUT DATA (OPTION 6),
.BR;#2)#THE (TRANSFORMED) DATA MATRIX (OPTION 1),
.BR;#3)#PER (TRANSFORMED) VARIABLE THE:
.BR;.I4;MEAN, STANDARD DEVIATION, MINIMUM AND MAXIMUM,
.BR;#4)#THE CORRELATION MATRIX OF THE (TRANSFORMED) VARIABLES (OPTION 2),
.BR;#5)#THE MULTIPLE CORRELATION COEFFICIENT (WITH ADJUSTMENT),
.BR;#6)#THE PROPORTION OF VARIATION EXPLAINED (WITH ADJUSTMENT),
.BR;#7)#THE STANDARD DEVIATION OF THE ERROR TERM,
.BR;#8)#THE ESTIMATES FOR THE REGRESSION PARAMETERS WITH
.BR;.I4;ESTIMATED STANDARD DEVIATION, ^F-RATIO AND RIGHT TAIL PROBABILITY,
.BR;#9)#THE CORRELATION MATRIX OF THE ESTIMATES (OPTION 2),
.BR;10)#THE ANALYSIS OF VARIANCE TABLE,
.BR;11)#THE RESIDUAL ANALYSIS (OPTION 3).
.LM+4;.SK;.I-6;^AD#1)#CF.#SECTION 2.4.
.BR;.I-6;^AD#2)#^THE TRANSFORMED DATA MATRIX GIVES THE INPUT DATA AFTER
POSSIBLE TRANSFORMATIONS ACCORDING TO THE MODEL SPECIFICATIONS HAVE BEEN
APPLIED. ^IF THE MODEL FORMULA CONTAINS NO TRANSFORMATIONS, THE ORIGINAL INPUT DATA
ARE GIVEN. ^THE DEPENDENT VARIABLE IS GIVEN AS A SEPARATE COLUMN.
^IN THE CASE OF REPLICATIONS FOR THE DEPENDENT VARIABLE, THE MEAN VALUE OF THEM IS
GIVEN, AND THE NUMBER OF REPLICATIONS IS GIVEN AS AN EXTRA (LAST) COLUMN.
^IF A WEIGHT- VARIABLE (OR -EXPRESSION) IS SPECIFIED IN THE MODEL FORMULA, THE (TRANSFORMED) DATA COMPRISING THE WEIGHTS ARE GIVEN AS AN EXTRA (LAST) COLUMN.
^EACH (TRANSFORMED) INDEPENDENT VARIABLE IS INDICATED BY
ITS CORRESPONDING PARAMETER. ^THIS ORIGINATED FROM THE FACT THAT
IT#IS#NOT OBVIOUS HOW TO DENOTE A VARIABLE WHICH IS TRANSFORMED LIKE:
^ARCSIN#(^SQRT#(Y+25)), WITH '^ARCSIN', WITH '^SQRT' OR PERHAPS WITH 'Y' ITSELF.
^THE DEPENDENT VARIABLE IS INDICATED BY 'DEP.VAR.'.
.BR;.I-6;^AD#4) AND 9) ^THE MATRIX OF THE ESTIMATED CORRELATION
COEFFICIENTS OF THE VARIABLES AND OF THE ESTIMATES ARE BOTH SUPPLIED
DEPENDING ON WHETHER OPTION 2 IS SPECIFIED OR NOT.
.BR;.I-6;^AD#5), 6) AND 7) CF.#SECTION 1.2.
.BR;.I-6;^AD#8)#^THE ^F-RATIO AND RIGHT TAIL PROBABILITY GIVE THE
USER THE OPPORTUNITY TO TEST THE SIGNIFICANCE OF A PARTICULAR REGRESSION
COEFFICIENT (CF.#SECTION 1.3.1).
.BR;.I-7;^AD#10)#^THE LAYOUT OF THE TABLE CLOSELY RESEMBLES THAT OF THE TABLE IN SECTION 1.3.2.
^THE ^F-RATIOS AND RIGHT TAIL PROBABILITIES GIVE THE
USER THE OPPORTUNITY TO TEST THE SIGNIFICANCE OF ALL THE REGRESSION
COEFFICIENTS OR OF A SUBSET OR COMBINATION THEREOF OR TO TEST THE ADEQUACY OF THE
(LINEAR) MODEL (CF.#SECTION 1.3.2).
.BR;.I-7;^AD#11)#^A TABLE OF OBSERVATIONS, FITTED VALUES, STANDARD
DEVIATIONS OF THE FITTED VALUES, RESIDUALS, STANDARDIZED RESIDUALS AND
STUDENTIZED RESIDUALS IS PROVIDED (CF.#SECTION 1.2.2).
^AS A CHECK ON COMPUTA- TIONS,
THE SUM OF THE RESIDUALS IS ALSO GIVEN. ^IF AN UNKNOWN CONSTANT TERM
IS PRESENT IN THE MODEL FORMULA, THIS SUM SHOULD BE ZERO. ^FURTHERMORE THE
UPPERBOUND FOR THE RIGHT TAIL PROBABILITY OF THE LARGEST ABSOLUTE STUDENTIZED
RESIDUAL IS GIVEN.
.LM-7; ^WITHOUT OPTIONS SPECIFIED, THE PRINTED OUTPUT FROM THE PROGRAM CONSISTS
OF 3), 5), 6), 7), 8) AND 10). ^IF OPTION 5 IS SPECIFIED, THE OUTPUT FOR THE
MODEL ITSELF IS GIVEN AS SPECIFIED BY THE OTHER OPTIONS, BUT FOR THE
SUBMODELS IT DEPENDS ON THE USE OF A SUBMODEL SPECIFIER LIST.
^WITHOUT THAT LIST THE OUTPUT FROM THE OPTIONS 1, 2 AND 3 IS
SUPPRESSED (EVEN IF THOSE OPTIONS ARE SPECIFIED).
^WITH THAT LIST ONLY THE SUPERFLUOUS PARTS OF THE OUTPUT (THAT
IS THE TRANSFORMED DATA MATRIX AND THE CORRELATION MATRIX OF THE
VARIABLES) ARE SUPPRESSED.
.PAGE;3.2##^^STANDARD AND OPTIONAL DATA OUTPUT\\
^IF AN OUTPUTFILE IS SPECIFIED IN RESPONSE TO THE OUTPUTSTREAM REQUEST,
OR IS APPENDED TO THE "^RUN" KEYWORD (CF.#APPENDIX#1), THE PROGRAM WRITES THE
FOLLOWING PIECES OF INFORMATION IN ONE RECORD TO THAT FILE:
.BR;.LM+3;.I-3;1)#IF OPTION 1 IS SPECIFIED: THE TRANSFORMED DATA MATRIX,
PRECEDED BY THE NUMBER OF ROWS AND COLUMNS RESPECTIVELY,
.BR;.I-3;2)#IF OPTION 2 IS SPECIFIED: THE CORRRELATION MATRIX OF THE
VARIABLES, PRECEDED BY ITS ORDER,
.BR;.I-3;3)#THE NUMBER OF SUBMODELS SPECIFIED IN THE LIST APPENDED TO OPTION 5,
IF THAT OPTION IS SPECIFIED AT ALL; OTHERWISE THE NUMBER 1.
^IT IS FOLLOWED BY (FOR EACH (SUB) MODEL): THE NUMBER OF ESTIMATED
PARAMETERS, THE ESTIMATES FOR THE PARAMETERS OF THE (SUB) MODEL, AND:
.BR;.LM+3;.I-3;A)#IF OPTION 2 IS SPECIFIED: THE VARIANCE-COVARIANCE
MATRIX OF THE ESTIMATES, PRECEDED BY ITS ORDER,
(<BE <CAREFUL: THIS IS <NOT THE CORRELATION MATRIX OF THE ESTIMATES,
WHICH IS PRINTED; HOWEVER, THE CORRESPONDENCE BETWEEN THE TWO MATRICES
IS ESTABLISHED BY THE RELATIONS (10) AND (11) IN "^HELP"/^THEORY),
.BR;.I-3;B)#IF OPTION 3 IS SPECIFIED: THE NUMBER OF RESPONDENTS, FOLLOWED
BY FOR EACH RESPONDENT THE: OBSERVATION, FITTED VALUE, STANDARD DEVIATION,
RESIDUAL, STANDARDIZED RESIDUAL AND STUDENTIZED RESIDUAL,
.LM-6;.BR;AND FINISHES BY WRITING AN "^EOR"#KEYWORD.
^AS IN THE CASE OF PRINTED OUTPUT, THE OUTPUT DESCRIBED IN 3) IS ONLY EFFECTED
FOR SUBMODELS, IF AN EXPLICIT SUBMODEL SPECIFIER LIST IS APPENDED
TO OPTION 5.
^AN INPUT SPECIFICATION TO DESCRIBE ONE RECORD OF DATA WRITTEN TO THE OUTPUTSTREAM WHEN
OPTIONS 1,#2,#3 AND 5 (WITH A SUBMODEL SPECIFIER LIST APPENDED TO IT) ARE SPECIFIED, COULD READ:
.SK;"^INPUT"##N, M, N * (M * [TRANSFORMED DATA ELEMENT]),
.BR;.I9;P, _<P * (P+1)#:#2_> * [CORRELATION ELEMENT],
.BR;.I9;S, S * (T, T * [ESTIMATE],
.BR;.I17;Q, _<Q * (Q+1)#:#2_> * [COVARIANCE ELEMENT],
.BR;.I17;R, R * (6 * [RESIDUAL ELEMENT]) );
^FOR THE ORIGINAL MODEL THE FOLLOWING RELATIONS HOLD:#Q#=#T, T#=#M-1, R#=#N AND P#=#M (OR P#=#M-1 IF REPLICATIONS AND/OR WEIGHTS ARE SPECIFIED); S IS THE NUMBER OF
PROCESSED (SUB)MODELS;#FOR EACH SUBMODEL T AND Q ARE DECREASED WITH THE NUMBER OF TERMS THAT ARE OMITTED FROM THE ORIGINAL MODEL.
^REAL NUMBERS IN THE PRINTED OUTPUT ARE GIVEN IN FIXED POINT FORMAT WITH A
SIX DECIMAL FRACTIONAL PART, THE ONLY EXCEPTIONS ARE THE ESTIMATES FOR THE REGRESSION PARAMETERS WITH THEIR STANDARD DEVIATIONS,
WHICH HAVE A TEN DECIMAL FRACTIONAL PART AND THE NUMBERS IN THE LISTINGS OF THE INPUT DATA AND
THE TRANSFORMED DATA MATRIX, WHICH HAVE A THREE DECIMAL FRACTIONAL PART. ^REAL NUMBERS IN THE DATA OUTPUT ARE GIVEN IN
FLOATING POINT FORMAT WITH A SIXTEEN DECIMAL MANTISSA AND A TWO DECIMAL
EXPONENT PART.
.SK2;3.3##^^ERROR MESSAGES\\
.SK;^ERROR MESSAGES AGAINST SYNTAX OR SEMANTICS HAVE THE FOLLOWING LAYOUT:
.SK;.C;^ERROR#:#_<ERROR TEXT_> OR _<ERROR NUMBER_>
.SK;^THE ERROR TEXT CORRESPONDING TO THE ERROR NUMBERS IS:
.SK;.NO FILL;.NO JUSTIFY;.TAB STOPS 5;
1 ^NO INPUT DATA GIVEN.
2 ^ALL INPUT DATA HAS BEEN SKIPPED.
3 ^ATTEMPT TO PROCESS MORE INPUT DATA THAN PROVIDED.
4 ^NUMBER IN THE INPUT DATA IS INCORRECT OR TOO LARGE.
5 ^IN A NUMBER '.' IS NOT FOLLOWED BY A DIGIT.
6 ^IN A NUMBER '_#' IS NOT FOLLOWED BY '+', '-' OR A DIGIT.
.SK
10 ^NO MODEL FORMULA GIVEN.
11 ^LEFT HAND PART IS NOT FOLLOWED BY '='.
12 ^EXPRESSION IS NOT FOLLOWED BY ')'.
13 ^OPTION NAME USED IN A PRIMARY IN AN EXPRESSION.
14 ^INCORRECT PRIMARY IN A FACTOR IN AN EXPRESSION.
15 ^INCORRECT (CONTROL) IDENTIFIER IN AN EXPRESSION.
16 ^PARAMETER LIST OF A STANDARD FUNCTION IS NOT FOLLOWED BY ')'.
17 ^STANDARD FUNCTION CALL WITH INCORRECT NUMBER OF PARAMETERS.
.SK
20 ^NO INPUT FORMULA GIVEN.
21 ^EXPRESSION IN A CONTROL IS NOT FOLLOWED BY '_>'.
22 ^OPTION NAME USED IN A CONTROL IN AN INPUT STATEMENT.
23 ^INPUT STATEMENT IN A DESCRIPTION IS NOT FOLLOWED BY ')'.
24 ^VARIABLE LIST IN A DESCRIPTION IS NOT FOLLOWED BY ']'.
25 ^INCORRECT DESCRIPTION IN AN INPUT STATEMENT.
26 ^INCORRECT IDENTIFIER IN A VARIABLE LIST.
27 ^ITEM IN A VARIABLE LIST IS NOT AN IDENTIFIER.
.SK
30 ^INCORRECT OPTION NUMBER IN AN OPTION STATEMENT.
31 ^INCORRECT OPTION NAME IN AN OPTION STATEMENT.
32 ^SPECIFIER LIST IS NOT FOLLOWED BY ')'.
33 ^NUMBER IN A SPECIFIER LIST IS INCORRECT OR TOO LARGE.
34 ^SPECIFIER LIST IS APPENDED TO INCORRECT OPTION.
35 ^SPECIFIER IS NOT A NUMBER.
36 ^SPECIFICATION IS NOT PROPERLY CONTINUED.
37 ^SPECIFICATION IS NOT TERMINATED WITH ';'.
.SK
40 ^NO DEFINED (INDEPENDENT) IDENTIFIER TO THE RIGHT OF '='.
41 ^INCORRECT USE OF A PARAMETER IN A REGRESSION TERM.
42 ^UNDEFINED (WEIGHT) IDENTIFIER TO THE LEFT OF '='.
43 ^UNDEFINED (DEPENDENT) IDENTIFIER TO THE LEFT OF '='.
44 ^NUMBER IN A REGRESSION TERM IS INCORRECT OR TOO LARGE.
45 ^TERM DOES NOT HAVE THE FORM: PARAM * FACTOR OR FACTOR * PARAM.
46 ^UNDEFINED (INDEPENDENT) IDENTIFIER IN A REGRESSION TERM.
47 ^NO REGRESSION PARAMETER IN A REGRESSION TERM.
.SK
50 ^DIVISION BY ZERO.
51 ^INTEGER DIVISION BY ZERO.
52 ^OBSERVATION FOR DEPENDENT VARIABLE IS IN ABSOLUTE VALUE TOO LARGE.
53 ^OBSERVATION FOR INDEPENDENT VARIABLE IS IN ABSOLUTE VALUE TOO LARGE.
54 ^EXPONENTIATION WITH ZERO BASE AND NON POSITIVE EXPONENT.
55 ^EXPONENTIATION WITH NEGATIVE BASE AND REAL EXPONENT.
56 ^WEIGHT FACTOR IS NOT POSITIVE.
.SK
60 ^ARGUMENT OF '^SQRT' IS NEGATIVE.
61 ^ARGUMENT OF '^LN' IS NOT POSITIVE.
62 ^ARGUMENT OF '^LOG' IS NOT POSITIVE.
63 ^ARGUMENT OF '^EXP' IS TOO LARGE.
64 ^ARGUMENT OF '^ARCSIN' IS IN ABSOLUTE VALUE LARGER THAN ONE.
65 ^ARGUMENT OF '^ARCCOS' IS IN ABSOLUTE VALUE LARGER THAN ONE.
66 ^ARGUMENT OF '^SINH' IS IN ABSOLUTE VALUE TOO LARGE.
67 ^ARGUMENT OF '^COSH' IS IN ABSOLUTE VALUE TOO LARGE.
.SK
70 ^NUMBER OF OBSERVATIONS FOR THE FIRST DEPENDENT VARIABLE IS ZERO.
71 ^NUMBERS OF OBSERVATIONS FOR THE DEPENDENT VARIABLES ARE NOT EQUAL.
72 ^NUMBER OF OBSERVATIONS FOR THE FIRST INDEPENDENT VARIABLE IS ZERO.
73 ^NUMBERS OF OBSERVATIONS FOR THE INDEPENDENT VARIABLES ARE NOT EQUAL.
74 ^CONTROL READS AN INCORRECT NUMBER IN THE INPUT DATA.
75 ^NUMBERS OF REPLICATIONS FOR THE DEPENDENT VARIABLES ARE NOT EQUAL.
76 ^GIVEN, READ OR COMPUTED REPLICATION FACTOR IS NOT INTEGRAL.
.FILL;.JUSTIFY;.SK
^IF THE ERROR NUMBER LIES BETWEEN:
.BR;.LM+11;.I-11;#5#AND#37,
IT IS FOLLOWED BY THE MOST RECENTLY PROCESSED IDENTIFIER,
NUMBER AND SYMBOL. ^ONLY THE FIRST EIGHT CHARACTERS OF EACH NAME
ARE DISPLAYED.
.BR;.I-11;41#AND#47,
IT IS FOLLOWED BY THE NUMBER OF THE RIGHT HAND PART REGRESSION
TERM WHICH CAUSES THE ERROR, OR A ZERO IF THE LEFT HAND PART IS
AT FAULT.
.BR;.I-11;50#AND#67,
IT IS FOLLOWED BY THE WRONG VALUE AND THE NUMBER
OF THE LINE IN THE TRANSFORMED DATA MATRIX WHICH CAUSES THE ERROR.
^INSTEAD OF THE WRONG VALUE, THE NUMBER OF THE RIGHT HAND PART REGRESSION TERM
WHICH CAUSES THE ERROR IS DISPLAYED WHEN THE ERROR NUMBER
LIES BETWEEN 50 AND 53.
.BR;.I-11;70#AND#76,
IT IS FOLLOWED BY THE CHECK VALUE AND THE WRONG VALUE.
^INSTEAD OF THE WRONG VALUE, THE VALUE OF THE CONTROLLING VARIABLE OF THE NEXT
ENCLOSING REPETITION LOOP IS DISPLAYED WHEN THE ERROR NUMBER IS 76.
.LM-11;.SK2;3.4##<EXAMPLES
^AN IMPRESSION OF THE PRINTED OUTPUT OF THE ^MULTIPLE#^LINEAR#^REGRESSION
^ANALYSIS PROGRAM MAY BE OBTAINED BY ACTUALLY SUBMITTING THE
USER PROGRAM IN SECTION 2.6 (WHICH RESIDES ON: ^HLP:#^MULEXA.HLP)
TO THE <MULREG PROGRAM.
.PG;.C;<APPENDIX 1
.SK;.C;^^TECHNICAL REMARKS\\
^THE FOLLOWING TECHNICAL REMARKS REFLECT THE##<SIMULA IMPLEMENTATION OF
THE ^MULTIPLE ^LINEAR ^REGRESSION ^ANALYSIS PROGRAM AS OF VERSION 5^H(246).
^ANY COMMENTS OR QUERIES CONCERNING THE FUNCTIONING OF THE SOFTWARE DESCRIBED
IN THIS DOCUMENT SHOULD BE ADDRESSED TO:
.BR;.C;^MARTEN VAN ^GELDEREN, <IKO ^COMPUTER ^SYSTEMS ^GROUP, ^POSTBOX 4395,
.BR;.C;1009#<AJ##^AMSTERDAM, ^THE ^NETHERLANDS.##(TELEPHONE:#31-(0)20-930951).
^THE PROGRAM RESIDES ON A DEVICE CALLED <USR:.
^IT IS STARTED BY TYPING: .<R#<MULREG, AND RESPONDS TO THE
STANDARD OUTPUT DEVICE (USUALLY <TTY:) WITH AN IDENTIFYING HEADER AND REQUESTS
TO SPECIFY FOUR FILES, AS FOLLOWS:
.SK;^MULTIPLE ^LINEAR ^REGRESSION ^ANALYSIS
.SK;^ENTER FILE SPECIFICATIONS
.BR;^INPUTSTREAM##:
.BR;^PRINTSTREAM##:
.BR;^DATASTREAM###:
.BR;^OUTPUTSTREAM#:
^THE INPUTSTREAM SERVES TO READ THE USER PROGRAM FROM;
THE PRINTSTREAM RECEIVES THE PRINTED OUTPUT FROM THE PROGRAM;
THE DATASTREAM SERVES TO READ THE SEPARATE INPUT DATA RECORDS FROM;
THE OUTPUTSTREAM RECEIVES THE DATA OUTPUT RECORDS FROM THE PROGRAM.
^IF THE DEFAULT CARRIAGE RETURN IS RESPONDED TO THE DATA- AND OUTPUT- STREAM
REQUESTS, THE PROGRAM
ASSUMES THAT NO SEPARATE INPUT DATA ARE PRESENT AND THAT NO DATA OUTPUT
IS REQUIRED.
^IF THE DEFAULT CARRIAGE RETURN IS RESPONDED TO THE INPUT- AND PRINT-
STREAM REQUESTS, THE PROGRAM
CONNECTS THE STANDARD INPUT AND OUTPUT DEVICES (USUALLY <TTY:) TO THE
INPUT- AND PRINTSTREAM RESPECTIVELY. ^TO NOTIFY THAT THE INPUTSTREAM
IS CONNECTED TO THE STANDARD INPUT DEVICE, THE PROGRAM DIS-
PLAYS THE PROMPTING CHARACTER '*' (ASTERISK).
^IF BOTH THE INPUT- AND
PRINTSTREAM ARE CONNECTED TO THE STANDARD INPUT AND OUTPUT DEVICES, THE PROGRAM
ECHOES EVERY TEXT IT CANNOT INTERPRET PROPERLY, PRECEEDED BY
THE ERROR CHARACTER '?' (QUESTION MARK). ^HOWEVER, IN
RESPONSE TO A SINGLE CARRIAGE RETURN THE TEXT '^FOR#HELP#TYPE:#"^HELP"' IS DISPLAYED.
^TO THE "^RUN" KEYWORD A FILE SPECIFICATION LIST MAY BE APPENDED,
PRE- CEEDED BY THE CHARACTER '/'#(SLASH), WITH THE FOLLOWING GENERAL FORMAT:
.SK;.C;/PRINT-SPEC;OUTPUT-SPEC=INPUT-SPEC;DATA-SPEC
.SK;^EACH OF THE SPECIFICATIONS WILL BE CONNECTED TO THE CORRESPONDING PROGRAM
STREAM RESPECTIVELY. ^ALSO, EACH OF THE SPECIFICATIONS MAY BE OMITTED,
WHICH MEANS THAT NOTHING WILL BE CHANGED TO THE CORRESPONDING STREAM AT ALL.
^HOWEVER, IF THE CHARACTER '#'#(BLANK) IS SUBSTITUTED FOR ONE (OR MORE) OF THE SPECIFICATIONS, THE DEFAULTS
-#AS DESCRIBED PREVIOUSLY#- WILL BE CONNECTED TO THE CORRESPONDING STREAMS RESPECTIVELY.
^IF THE SPECIFIED FILES DO NOT EXIST (FOR INPUT- OR DATASTREAMS) OR CANNOT
BE CREATED (FOR PRINT- OR OUTPUTSTREAMS), THE CORRESPONDING STREAMS ARE
DISPLAYED AGAIN, FOLLOWED BY THE CHARACTER '?' (QUESTION MARK), TO INDICATE
THE ERRONEOUS SITUATION AND TO ENABLE THE SPECIFICATION OF OTHER FILES
(OR DEFAULTS).
^IF THE PROGRAM ENCOUNTERS A PREMATURE END-OF-FILE CONDITION IN
THE INPUTSTREAM, IT WILL CONNECT THE INPUTSTREAM TO THE STANDARD INPUT
DEVICE AND THUS RESPOND WITH THE PROMPTING CHARACTER. ^NEW SPECIFICATIONS
FOR THE "^MODEL", "^INPUT", "^OPTIONS" OR "^DATA"
(OR KEYWORDS LIKE "^RUN" OR "^EXIT") MAY THEN BE ENTERED.
^IN THE INPUTSTREAM, THE PROGRAM DOES NOT
DISCRIMINATE BETWEEN UPPER AND LOWER CASE LETTERS. ^IDENTIFIERS MAY
CONTAIN ANY NUMBER OF BLANKS, HOWEVER, A CARRIAGE RETURN IS NOT
PERMITTED, RESTRICTING THE MAXIMUM LENGTH OF IDENTIFIERS TO
THE MAXIMUM NUMBER OF CHARACTERS IN ONE INPUT LINE.
^IN FRONT OF AND FOLLOWING THE OPENING QUOTE OF KEYWORDS ONLY
NON-PRINTING <ASCII CHARACTERS LIKE
TABS AND/OR BLANKS ARE PERMITTED, OTHERWISE THE KEYWORD
(AND THE WHOLE LINE FOLLOWING IT) IS NOT RECOGNIZED.
^TWO IMPLEMENTATION DEPENDENT RESTRICTIONS ARE IMPOSED ON USER PRO- GRAMS:
THE MAXIMUM NUMBER OF DIFFERENTLY SPELLED IDENTIFIERS AND NUMBERS IS 789
AND THE MAXIMUM NUMBER OF NESTED PARENTHESES IS 62.
^IN ADDITION TWO MACHINE DEPENDENT RESTRICTIONS ARE IMPOSED:
THE MAXIMUM NUMBER OF CHARACTERS IN ONE INPUT LINE IS 132
AND THE NUMBER OF SIGNIFICANT DIGITS IN COMPUTATIONS IS 18.
^IN RESPONSE TO THE KEYWORD "^HELP", THE INFORMATION IN THIS APPENDIX
(WHICH RESIDES ON <HLP:)
IS COPIED TO THE PRINTSTREAM. ^EACH OF THE FOLLOWING SWITCHES
MAY BE APPENDED TO THE "^HELP" KEYWORD, PRECEEDED BY THE CHARACTER '/' (SLASH),
IN ORDER TO OBTAIN MORE DETAILED INFORMATION (WHICH ALSO RESIDES ON <HLP:).
.LM+5;.TS15
.SK;/^THEORY ^REGRESSION MODEL _& LEAST SQUARES (SECTION#1.1#_.2),
.BR;/^TESTS ^POSSIBLE TESTS OF HYPOTHESES (SECTION#1.3),
.BR;/^MODEL ^SPECIFICATION OF THE MODEL FORMULA (SECTION#2.1),
.BR;/^INPUT ^SPECIFICATION OF THE INPUT FORMULA (SECTION#2.2),
.BR;/^OPTIONS ^POSSIBLE OPTIONS AND THEIR EFFECTS (SECTION#2.3),
.BR;/^DATA ^ACCEPTABLE NUMBERS AND THEIR DELIMITERS (SECTION#2.4),
.BR;/^USER ^SETUP OF A COMPLETE USER PROGRAM (SECTION#2.5),
.BR;/^EXAMPLE ^EXAMPLE OF A COMPLETE USER PROGRAM (SECTION#2.6),
.BR;/^PRINT ^STANDARD AND OPTIONAL PRINTED OUTPUT (SECTION#3.1),
.BR;/^OUTPUT ^STANDARD AND OPTIONAL DATA OUTPUT (SECTION#3.2),
.BR;/^ERRORS ^MEANING OF THE ERROR NUMBERS (SECTION#3.3),
.BR;/^SYNTAX ^DEFINITION OF THE SYNTAX OF A USER PROGRAM (APPENDIX#2).
.LM-5;.SK;^IF NO HELP INFORMATION IS AVAILABLE AN APPROPRIATE MESSAGE IS DISPLAYED.
.PAGE;.C;<APPENDIX 2
.SK;.C;^^DEFINITION OF THE SYNTAX OF A USER PROGRAM\\
.SK; ^THE SYNTAX OF A USER PROGRAM IS DEFINED IN AN EXTENDED VERSION OF A NOTATION KNOWN AS THE
^BACKUS#^NAUR#^FORM, IN SHORT: <BNF (CF.#<NAUR [10]).
^THE EXTENSIONS COMPRISE AN EXPLICIT REPETITION AND OPTIONALITY
CONSTRUCT TOGETHER WITH THE POSSIBILITY OF FACTORIZATION.
^THE <BNF MAY BE REGARDED AS A METALANGUAGE FOR THE DESCRIPTION OF A
USER PROGRAM.
^IN ADDITION TO THE SYMBOLS THAT ARE ADMISSIBLE IN A USER PROGRAM,
THE METALANGUAGE REQUIRES A NUMBER OF EXTRA SYMBOLS, CALLED
METASYMBOLS. ^THE TEN METASYMBOLS USED IN EXTENDED <BNF ARE:##::=, |, _<, _>,
{, }, [, ], ( AND ). ^THE#,#AND#.#ARE PART OF THE METALANGUAGE ^ENGLISH IN WHICH WE
ARE DESCRIBING <BNF. ^WE WRITE:
.SK;.I10;_<EXPRESSION_>::= ['+'#|#'-'] _<TERM_> { ('+'#|#'-') _<TERM_> }
^THE METASYMBOLS _< AND _> ARE USED AS DELIMITERS TO ENCLOSE THE NAME OF
A CLASS. ^THE METASYMBOL#::=#MAY BE READ AS 'IS DEFINED AS' OR AS 'CONSISTS OF'.
^THE METASYMBOL#|#IS READ AS 'OR'.
^REPETITION IS DENOTED BY CURLY BRACKETS, I.E.#{#A#} STANDS FOR E#|#A#|#AA#|#...
^OPTIONALITY IS EXPRESSED BY SQUARE BRACKETS, I.E. [#A#] STANDS FOR E#|#A.
^PARENTHESES MERELY SERVE FOR GROUPING (FACTORIZATION) I.E. (A#|#B)#C STANDS FOR AB#|#AC.
^TERMINAL SYMBOLS APPEAR ENCLOSED IN SINGLE APOSTROPHES.
^THE ABOVE PHRASE DEFINES AN EXPRESSION AS
A TERM, OPTIONALLY PRECEEDED BY A '+' OR A '-' AND FOLLOWED BY AN ARBITRARY
REPETITION OF TERMS, EACH PRECEEDED BY A '+' OR A '-'.
.SK;^THE SYNTAX OF A USER PROGRAM CAN THUS BE DEFINED AS FOLLOWS:
.PAGE;.NO FILL;.NO JUSTIFY;
.BR;_<LETTER_>::= ^^'A'|'B'|'C'|'D'|'E'|'F'|'G'|'H'|'I'|'J'|'K'|'L'|'M'|
.BR;.I12;'N'|'O'|'P'|'Q'|'R'|'S'|'T'|'U'|'V'|'W'|'X'|'Y'|'Z'|\\
.BR;.I12;'A'|'B'|'C'|'D'|'E'|'F'|'G'|'H'|'I'|'J'|'K'|'L'|'M'|
.BR;.I12;'N'|'O'|'P'|'Q'|'R'|'S'|'T'|'U'|'V'|'W'|'X'|'Y'|'Z'
.SK;_<DIGIT_>::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
.SK;_<MODEL KEYWORD_>::= '"^MODEL"' | '"<MO"'
.BR;_<INPUT KEYWORD_>::= '"^INPUT"' | '"<IN"'
.BR;_<OPTION KEYWORD_>::= '"^OPTIONS"' | '"<OP"'
.BR;_<DATA KEYWORD_>::= '"^DATA"' | '"<DA"'
.BR;_<RUN KEYWORD_>::= '"^RUN"' | '"<RU"'
.BR;_<EXIT KEYWORD_>::= '"^EXIT"' | '"<EX"'
.SK;_<FUNCTION NAME_>::= '^ABS' | '^SIGN' | '^SQRT' | '^SIN' | '^COS' | '^TAN' |
.BR;.I19;'^LN' | '^LOG' | '^EXP' | '^ENTIER' | '^ROUND' | '^MOD' |
.BR;.I19;'^MIN' | '^MAX' | '^ARCSIN' | '^ARCCOS' | '^ARCTAN' |
.BR;.I19;'^SINH' | '^COSH' | '^TANH' | '^INDICATOR'
.SK;_<OPTION NAME_>::= '^TRANSFORMED DATA MATRIX' | '^CORRELATION MATRIX' |
.BR;.I17;'^RESIDUAL ANALYSIS' | '^NO REGRESSION ANALYSIS' |
.BR;.I17;'^PROCESS SUBMODELS' | '^PRINT INPUT DATA' |
.BR;.I17;'^NO INPUT DATA REWIND' | '^SAVE ORIGINAL MODEL' |
.BR;.I17;'^TEST REDUCED MODEL' | '^MISSING VALUES'
.SK;_<OPTION NUMBER_>::= '1' | '2' | '3' | '4' | '5' |
.BR;.I19;'6' | '7' | '8' | '9' | '10'
.SK;_<NUMBER_>::= ['+' | '-'] _<UNSIGNED NUMBER_>
.BR;_<UNSIGNED NUMBER_>::= _<DECIMAL NUMBER_> | _<EXPONENT PART_> |
.BR;.I21;_<DECIMAL NUMBER_> _<EXPONENT PART_>
.BR;_<DECIMAL NUMBER_>::= _<UNSIGNED INTEGER_> | _<FRACTIONAL PART_> |
.BR;.I20;_<UNSIGNED INTEGER_> _<FRACTIONAL PART_>
.BR;_<EXPONENT PART_>::= '_#' _<INTEGER_>
.BR;_<FRACTIONAL PART_>::= '.' _<UNSIGNED INTEGER_>
.BR;_<INTEGER_>::= ['+' | '-'] _<UNSIGNED INTEGER_>
.BR;_<UNSIGNED INTEGER_>::= _<DIGIT_> { _<DIGIT_> }
.SK;_<IDENTIFIER_>::= _<LETTER_> { _<LETTER_> | _<DIGIT_> }
.SK;_<DATA SPECIFICATION_>::= _<DATA KEYWORD_> [ _<INPUT DATA_> ]
.BR;_<INPUT DATA_>::= _<NUMBER_> { _<NUMBER_> }
.SK;_<OPTION SPECIFICATION_>::= _<OPTION KEYWORD_> [ _<OPTION STATEMENT_> ] ';'
.BR;_<OPTION STATEMENT_>::= _<OPTION_> { ',' _<OPTION_> }
.BR;_<OPTION_>::= _<SIMPLE OPTION_> [ '(' _<SPECIFIER LIST_> ')' ]
.BR;_<SIMPLE OPTION_>::= _<OPTION NAME_> | _<OPTION NUMBER_>
.BR;_<SPECIFIER LIST_>::= _<SPECIFIER_> { ',' _<SPECIFIER_> }
.BR;_<SPECIFIER_>::= _<NUMBER_>
.SK;_<INPUT SPECIFICATION_>::= _<INPUT KEYWORD_> _<INPUT STATEMENT_> ';'
.BR;_<INPUT STATEMENT_>::= _<INPUT PART_> { ',' _<INPUT PART_> }
.BR;_<INPUT PART_>::= _<CONTROL_> | _<DESCRIPTION_> | _<CONTROL_> '*' _<DESCRIPTION_>
.BR;_<CONTROL_>::= _<NUMBER_> | _<IDENTIFIER_> | '_<' _<EXPRESSION_> '_>'
.BR;_<DESCRIPTION_>::= '(' _<INPUT STATEMENT_> ')' | '[' _<VARIABLE LIST_> ']'
.BR;_<VARIABLE LIST_>::= _<VARIABLE_> { ',' _<VARIABLE_> }
.BR;_<VARIABLE_>::= _<IDENTIFIER_>
.SK;_<MODEL SPECIFICATION_>::= _<MODEL KEYWORD_> _<MODEL STATEMENT_> ';'
.BR;_<MODEL STATEMENT_>::= _<LEFT HAND PART_> '=' _<RIGHT HAND PART_>
.BR;_<LEFT HAND PART_>::= _<EXPRESSION_> [ '_&' _<WEIGHT PART_> ]
.BR;_<WEIGHT PART_>::= _<EXPRESSION_>
.BR;_<RIGHT HAND PART_>::= ['+'] _<TERM_> { '+' _<TERM_> }
.BR;_<EXPRESSION_>::= ['+' | '-'] _<TERM_> { ('+' | '-') _<TERM_> }
.BR;_<TERM_>::= _<FACTOR_> { ('*' | '/' | ':') _<FACTOR_> }
.BR;_<FACTOR_>::= _<PRIMARY_> { '_^' _<PRIMARY_> }
.BR;_<PRIMARY_>::= _<UNSIGNED NUMBER_> | _<IDENTIFIER_> |
.BR;.I13;_<FUNCTION DESIGNATOR_> | '(' _<EXPRESSION_> ')'
.BR;_<FUNCTION DESIGNATOR_>::= _<FUNCTION NAME_> [ '(' _<PARAMETER LIST_> ')' ]
.BR;_<PARAMETER LIST_>::= _<PARAMETER_> { ',' _<PARAMETER_> }
.BR;_<PARAMETER_>::= _<EXPRESSION_>
.SK;_<USER PROGRAM_>::= { _<JOB_> } _<EXIT KEYWORD_>
.BR;_<JOB_>::= { _<SPECIFICATION_> } _<RUN KEYWORD_>
.BR;_<SPECIFICATION_>::= _<MODEL SPECIFICATION_> | _<INPUT SPECIFICATION_> |
.BR;.I19;_<OPTION SPECIFICATION_> | _<DATA SPECIFICATION_>
.PAGE;.FILL;.JUSTIFY;.C;<APPENDIX 3
.SK;.C;^^TECHNICAL DESCRIPTION OF THE PROGRAM\\
.SK; ^IN THIS APPENDIX A MORE OR LESS TECHNICAL DESCRIPTION OF THE PROGRAM
IS GIVEN IN TERMS OF VARIABLES, PROCEDURES AND CONTROL FLOW.
.SK;^BASICALLY THE PROGRAM LOGIC IS AS FOLLOWS:
.SK;.LM+17;^INIT PROGRAM
.BR;^ENTER _& OPEN FILES
.BR;.I-5;^JOB:#^READ JOB
.BR;.I-5;^RUN:#^INIT COMPILER TABLES
.BR;^COMPILE MODEL _& INPUT
.BR;^INIT DATA BUFFERS
.BR;^EXECUTE (TO PRODUCE DESIGN MATRIX)
.BR;^REGRESSION ANALYSIS
.BR;^PRINT RESULTS
.BR;<GOTO ^JOB
.BR;.I-6;^EXIT:#^CLOSE FILES
.LM-17;.SK;^IN CASE OF AN ERRORSITUATION IN ONE OF THE SECTIONS, NO FURTHER
ACTION IS UNDERTAKEN AND CONTROL IS TRANSFERRED TO ^JOB.
.SK;1. ^THE PROGRAM IS INITIALIZED BY SETTING VARIOUS DECLARED SYSTEM AND
COMPILER CONSTANTS TO THEIR APPROPRIATE VALUES. ^SOME OF THEM ARE HIGHLY
MACHINE DEPENDENT CONSTANTS, OTHERS ARE INTERNAL CODINGS OR TABLE LIMITS.
^THE FOUR BASIC INFORMATIONSTREAMS IN THE PROGRAM:# THE INPUT-,
PRINT-, DATA- AND OUTPUTSTREAM ARE CONNECTED TO THE
EXTERNAL FILE SPECIFICATIONS, ENTERED BY THE USER, IN THE PROCEDURE ENTERFILES.
.SK;2. ^A JOB IS READ SPECIFICATION BY SPECIFICATION (AFTER PRINTING AND
SKIPPING LEADING TEXT BY A CALL ON ECHOTEXT) BY MEANS OF CALLS ON
READTEXT OR READDATA, DEPENDING ON WHICH SPECIFICATION FROM THE JOB IS READ.
^BOTH PROCEDURES FIRST SET UP A BUFFER ADMINISTRATION AS WILL BE DESCRIBED IN SECTION 3,
THEN PROCESS AN INPUT LINE OR INPUT NUMBERS AND FINALLY
CALL ON READLINE TO OBTAIN FURTHER INFORMATION. ^THEIR TASK IS FINISHED
WHEN THE VARIABLE ENDTEXT BECOMES TRUE, THAT IS WHEN THE NEXT INPUT LINE STARTS WITH
A '"' (QUOTE), WHICH IS THE BEGINNING OF WHAT CAN BE A KEYWORD, OR WHEN THE
END-OF-FILE CONDITION IS MET.
^IF AN INPUT DATA RECORD IS TO BE READ FROM A FILE, THE SAME ADMINISTRATION IS
SETUP AND THE SAME ROUTINE FOR READING THE ACTUAL NUMBERS IS USED,
ONLY INFORMATION IS OBTAINED FROM THE DATASTREAM
RATHER THAN THE INPUTSTREAM. ^THE ECHOING OF INPUT TEXT AND INPUT DATA IS
DONE VIA CALLS ON THE PROCEDURES PRINTTEXT AND PRINTDATA RESPECTIVELY.
.SK;3. ^THE TEXT AND DATA STORAGE SECTION PROVIDES TWO KINDS OF
<SIMULA CLASSES, ONE FOR TEXT STORAGE AND ANOTHER FOR DATA STORAGE.
^THE CLASS TEXTSTORAGE PROVIDES A PROCEDURE TO STORE LINES OF TEXT
INTO A LINKED LIST OF TEXTBUFFERS (STARTING AT BASE). ^THE PROCEDURE
NEXTLINE RETRIEVES THOSE LINES (AFTER A RESET). ^THE CLASS DATASTORAGE
PROVIDES A PROCEDURE TO STORE REAL NUMBERS INTO A DATABUFFER WHICH IS
LINKED INTO A LINKED LIST OF SUCH BUFFERS (STARTING AT BASE). ^IN
CASE OF BUFFER OVERFLOW A NEW BUFFER IS CREATED AND LINKED INTO THE LIST.
^A SUPER CLASS INPUTSTORAGE PROVIDES THE RESET PROCEDURE AND A PROCEDURE
NEXTNUMBER TO RETRIEVE NUMBERS OUT OF THE BUFFERS IN A <FIFO (FIRST
IN, FIRST OUT) MANNER. ^ANOTHER SUPER CLASS RIGHTSTORAGE PROVIDES
A PROCEDURE LASTNUMBER, WHICH IS QUITE SIMILAR TO NEXTNUMBER, EXCEPT THAT
IT RETRIEVES NUMBERS OUT OF THE BUFFERS IN A <LIFO (LAST IN, FIRST OUT)
MANNER.
.SK;4. ^THE COMPILER USES TWO TABLES:#ONE CALLED PROGRAM AND ANOTHER CALLED
HASHTABLE. ^THE FIRST TABLE ACCEPTS THE (MACRO) INSTRUCTIONS GENERATED
BY THE COMPILER VIA CALLS OF LOAD. ^THE SECOND TABLE IS USED BY
THE PROCEDURE NEXTATOM TO IDENTIFY THE VARIOUS ITEMS IN THE MODEL
AND INPUT FORMULAE.
^THE RECOGNITION METHOD USED IS
TWIN-PRIME-HASHING DESCRIBED IN <KNUTH [5] P.#522.
^ALPHANUMERICAL ITEMS CAN BELONG TO ONE OF THE FOLLOWING
CLASSES: IDENTIFIER, NUMBER, FUNCTIONNAME OR OPTIONNAME,
WHICH ARE ALL SUPERCLASSES OF ATOMCELL.
^THE PROCEDURE NEXTATOM
CALLS ON NEXTCHAR AND DELIVERS A POINTER TO AN ATOMCELL OF THE APPROPRIATE
TYPE CONTAINING, AMONG OTHERS, THE ACTUAL TEXT OF THE ITEM AND ITS
INDEX IN THE HASHTABLE.
^THE COMPILER ASSUMES THAT THE RUNNING SYSTEM HAS AT ITS
DISPOSAL A (PROGRAMMED PSEUDO) REGISTER ^F WHICH IS CAPABLE OF
HANDLING BOTH INTEGER AND REAL NUMBERS. ^FURTHERMORE A (PROGRAMMED
PSEUDO) MEMORY ORGANIZATION KNOWN AS A STACK MUST BE AVAILABLE TO THE RUNNING
SYSTEM. ^A STACKPOINTER REFERS TO THE FIRST FREE POSITION
IN THE STACK. ^ALL BINARY OPERATIONS WILL TAKE PLACE WITH
THE TOP OF THE STACK AS FIRST OPERAND AND ^F AS THE SECOND. ^THE RESULT
IS DELIVERED IN ^F AND AS A SIDE EFFECT THE STACKPOINTER IS DECREASED
BY 1. ^WHEN THE CONTENTS OF ^F IS SAVED IN THE STACK, THE STACKPOINTER
IS INCREASED BY 1.
^THE FUNDAMENTAL IDEA BEHIND MOST PROCEDURES FOR TRANSLATING
THE MODEL AND INPUT FORMULAE IS, THAT THE FIRST ATOM OF THE SYNTACTICAL
UNIT TO BE PROCESSED BY THAT PROCEDURE HAS BEEN READ ALREADY (ITS 'VALUE'
BEING ASSIGNED TO LASTATOM). ^THE PROCEDURE CONSIDERS ITSELF TO HAVE
FINISHED ITS TASK AFTER READING THE FIRST ATOM THAT NO LONGER CAN BELONG
TO THAT UNIT SYNTACTICALLY. ^MEANWHILE THE TRANSLATION OF THAT UNIT
HAS BEEN PRODUCED. ^A MORE ELABORATE DESCRIPTION OF THE PROCEDURE
SYSTEM (ARITHMETIC) EXPRESSION CAN BE FOUND IN <KRUSEMAN <ARETZ [6] AND [7]. ^REFERENCE
[8] PROVIDES THE DESCRIPTION OF A COMPLETE <ALGOL 60 COMPILER. ^WE ONLY
MENTION HERE THAT EVERY EXPRESSION IS TRANSFORMED INTO A MACRO PROGRAM
THAT CORRESPONDS TO THE REVERSED POLISH FORM, THUS:
.SK;.C;(A+B)#*#(C-D)#_^#E##BECOMES:##AB+#CD-#E#_^#*.
.SK;^THE PROCEDURE FOR TRANSLATING THE INPUT FORMULA MUST, AMONG
OTHERS, GENERATE INSTRUCTIONS TO PERFORM THE LINKAGE BETWEEN IDENTIFIERS
FROM THE MODEL FORMULA AND NUMBERS FROM THE INPUT DATA.
^WHILE TRANSLATING A MODEL FORMULA, IDENTIFIERS TO THE RIGHT OF THE
EQUAL SIGN ARE ASSIGNED TYPE 1, THOSE TO THE LEFT OF THE EQUAL SIGN
TYPE 2. ^IF THESE IDENTIFIERS APPEAR IN A VARIABLE LIST THE TYPES
ARE CHANGED INTO 3 AND 5 RESPECTIVELY. ^WHILE TRANSLATING AN INPUT SPECIFICATION,
IDENTIFIERS IN A VARIABLE LIST
NOT APPEARING IN THE MODEL FORMULA ARE ASSIGNED TYPE 4, THOSE IN THE
INPUT FORMULA NOT APPEARING IN A VARIABLE LIST ARE ASSIGNED TYPE 6.
^MEANWHILE INSTRUCTIONS ARE GENERATED TO PUT THE NEXT NUMBER FROM THE
INPUT DATA IN THE APPROPRIATE COLUMN OF THE (YET UNTRANSFORMED) DESIGN MATRIX,
OR TO SKIP THAT NUMBER. ^FOR VARIABLE LISTS THAT CONSISTS ENTIRELY OF
IDENTIFIERS NOT APPEARING IN THE MODEL FORMULA, SPECIAL INSTRUCTIONS
TO SKIP THE CORRESPONDING NUMBERS ALL IN ONE, ARE GENERATED.
.SK;5. ^IN THE PROCEDURE CHECK MODEL A CHECK IS MADE IF THE MODEL
FORMULA, AFTER THE LINKAGE TO THE INPUT DATA (BY MEANS OF THE INPUT FORMULA),
STILL SATISFIES SOME ELEMENTARY STATISTICAL CONDITIONS, LIKE:
.LM+3;.BR;.I-3;A)#EACH TERM MUST BE THE PRODUCT OF A PARAMETER AND A FACTOR,
.BR;.I-3;B)#IN THAT FACTOR NO IDENTIFIER MAY APPEAR THAT IS NOT PRESENT
IN A VARIABLE LIST IN THE INPUT FORMULA. (^AN ATTEMPT TO PERFORM
REGRESSION ANALYSIS WITH VARIABLES FOR WHICH NO INPUT DATA IS PRESENT MAY
NOT SUCCEED.)
.LM-3; ^IN THE PROCEDURE CHECK INPUT A CHECK IS MADE IF FOR EACH VARIABLE
IN THE MODEL FORMULA AN EQUAL AMOUNT OF NUMBERS IS PRESENT, MOREOVER
A CHECK IS MADE IF ALL NUMBERS IN THE INPUT DATA HAVE ACTUALLY BEEN
PROCESSED (OPTION 7 DISENGAGES THIS CHECK).
.SK;6. ^THE EXECUTION SECTION OF THE PROGRAM IS ACTIVATED BY A CALL
OF THE PROCEDURE EXECUTE WHICH, AMONG OTHER THINGS, SIMULATES THE
BASIC CYCLE OF A COMPUTER:
.SK;.LM+15;.I-6;NEXT:#GET THE INSTRUCTION INDICATED BY THE PROGRAMCOUNTER
.BR;INCREASE THE PROGRAMCOUNTER WITH 1
.BR;ISOLATE THE INSTRUCTION AND ADDRESS PART
.BR;EXECUTE THE INSTRUCTION
.BR;<GOTO NEXT
.LM-15;.SK;^THIS CYCLE ENDS WHEN THE PROGRAMCOUNTER TRIES TO LEAVE THE
(TRANSLATED MACRO) PROGRAM. ^THE (MACRO) INSTRUCTIONS ITSELF ARE CODED
VIA THE SWITCH LISTS MACRO AND MACRO2.
.SK;7. ^AFTER THE EXECUTION OF THE INPUT- AND MODELINSTRUCTIONS, THE
(TRANSFORMED) DESIGN MATRIX IS DELIVERED TO THE REGRESSION ROUTINE(S).
^THE ACTUAL COMPUTATION OF THE REGRESSION COEFFICIENTS IS DONE VIA A
CALL OF LSQDEC FOLLOWED BY A CALL OF LSQSOL AND LSQINV. ^THE FIRST TWO
OF THESE PROCEDURES ARE DESCRIBED EXTENSIVELY IN <DEKKER [2] PP.#65-69.
^THE VECTOR AND MATRIX MULTIPLICATION IS DONE VIA CALLS OF VECVEC, MATVEC,
TAMVEC AND TAMMAT, DESCRIBED IN <DEKKER [2] PP.#8-9.
^THE ALGORITHMS FOR PHI AND ^FISHER ARE COPIED FROM <CACM:#ALGORITHMS
209 AND 322 RESPECTIVELY.
.SK;^ALL OTHER COMPUTATIONS ARE STRAIGHTFORWARD.
.PAGE;<REFERENCES
.LM+12;.SK2;.I-12;#[1]##<AFIFI,#<A.A.#_&#<S.P.#<AZEN, ^STATISTICAL ^ANALYSIS;
^A ^COMPUTER ^ORIENTED ^APPROACH; ^ACADEMIC ^PRESS, (1972).
.SK;.I-12;#[2]##<DEKKER,#<T.J., <ALGOL 60 PROCEDURES IN NUMERICAL
ALGEBRA, PART 1; <MC ^TRACT 22, ^MATHEMATISCH ^CENTRUM, ^AMSTERDAM.
.SK;.I-12;#[3]##<DRAPER,#<N.R.#_&#^H.#<SMITH, ^APPLIED ^REGRESSION
^ANALYSIS; ^JOHN ^WILEY _& ^SONS, (1966).
.SK;.I-12;#[4]##<JONGE,#^H.#<DE, ^INLEIDING TOT DE ^MEDISCHE
^STATISTIEK, DEEL <II; ^NEDERLANDS ^INSTITUUT VOOR ^PRAEVENTIEVE
^GENEESKUNDE, (1960).
.SK;.I-12;#[5]##<KNUTH,#<D.E., ^THE ^ART OF ^COMPUTER ^PROGRAMMING,
^VOL.#3, ^SORTING AND ^SEARCHING; ^ADDISON ^WESLEY, (1973).
.SK;.I-12;#[6]##<KRUSEMAN#<ARETZ,#<F.E.J., <ALGOL 60 TRANSLATION
FOR EVERYBODY; ^ELEKTRONISCHE ^DATENVERARBEITUNG, ^VOL.#6 (1964), 6, P.#233-244.
.SK;.I-12;#[7]##<KRUSEMAN#<ARETZ,#<F.E.J., ^PROGRAMMEREN VOOR
REKENAUTOMATEN, (^DE <MC <ALGOL 60 VERTALER VOOR DE <EL ^X8); <MC ^SYLLABUS
13, ^MATHEMATISCH ^CENTRUM, (1972).
.SK;.I-12;#[8]##<KRUSEMAN#<ARETZ,#<F.E.J., <P.J.W.#<TEN#<HAGEN#_&#
<H.L.#<OUDSHOORN, ^AN <ALGOL 60 COMPILER IN <ALGOL 60; <MC ^TRACT 48,
^MATHEMATISCH ^CENTRUM, (1973).
.SK;.I-12;#[9]##<LUND,#<R.E., ^TABLES FOR AN APPROXIMATE TEST FOR
OUTLIERS IN LINEAR MODELS; ^TECHNOMETRICS, ^VOL.#17 (1975), 4, P.#473-476.
.SK;.I-12;[10]##<NAUR,#^P.#(ED.), ^REVISED REPORT ON THE ALGORITHMIC
LANGUAGE <ALGOL 60; ^REGNECENTRALEN, ^COPENHAGEN, (1964).
.SK;.I-12;[11]##<SEARLE,#<S.R., ^LINEAR ^MODELS; ^JOHN ^WILEY _&
^SONS, (1971).
.SK;.I-12;[12]##<THEIL,#^H., ^PRINCIPLES OF ^ECONOMETRICS; ^JOHN
^WILEY _& ^SONS, (1971).
.LM-12;.PAGE;#