PDP-10 Archive: decus/20-0149/multes.hlp from decuslib20-05

Trailing-Edge - PDP-10 Archives - decuslib20-05 - decus/20-0149/multes.hlp
There are 2 other files named multes.hlp in the archive. Click here to see a list.
 
Multiple Linear Regression Analysis

TESTS OF HYPOTHESES

A PARTICULAR REGRESSION COEFFICIENT

     If the disturbances are mutually independent and normally  distributed
(with E(e) = 0 and var(e) = Isigma^2) and with a (preset) level of signifi-
cance alpha, a significance test for a  particular  regression  coefficient
can be performed, or more specifically:  the null hypothesis is:

          H0:  ai = 0 (given that all other aj are in the model),

which is tested against the alternative hypothesis:

          H1:  ai is not equal to zero,

by treating FRi = bi^2/var(bi) as a realization of a Fisher(1,n-p) variate.

     However, this test  must  be  used  with  caution,  because  with  the
(preset)  level  of  significance alpha, only one coefficient can be tested
properly, while the computer output lists statistics for all  coefficients.
It seems very tempting to test the coefficients serially one at a time, but
one must keep in mind that in doing so the level  of  significance  of  the
whole test rises above the nominal value.


ANALYSIS OF VARIANCE

     In the analysis of variance table the different contributions  to  the
total uncorrected sum of squares Y'Y (which is the first part in the table)
are given.  

     The second part of the table assumes  the  presence  of  an  (unknown)
constant  term  in  the  model;   if  this  term is absent, the 'mean'-line
disappears and in the 'regression'-line p-1 changes into p  and  b'X'Y-nu^2
changes into b'X'Y.

     The third part of the table is only present when repeated observations
for the dependent variable are available, in which case:
k  is the number of groups of replications,
mi is the number of replications in group i, and
W = (w1,...,wk)', with wi = Sum(j,1,mi,yij) / Sqrt(mi), for i = 1,...,k.

     The fourth part of the  table  is  only  present  if  a  reduction  is
requested  and  possible.   SSQ then stands for the residual sum of squares
from a regression analysis with the first p-q  out  of  the  p  independent
variables (1 <= q <= p-1), while SSE stands for the residual sum of squares
from a regression analysis with the original p independent variables.

                           Analysis of variance
source of                                                       right  tail
variation        df   sum of squares    mean square   F-ratio   probability
---------------------------------------------------------------------------
total            n          Y'Y
---------------------------------------------------------------------------
mean             1         nu^2          MSM = nu^2     FRM      P(FM>=FRM)
regression      p-1    b'X'Y - nu^2         MSR         FRR      P(FR>=FRR)
residual        n-p     Y'Y - b'X'Y      MSE = s^2
---------------------------------------------------------------------------
 lack of fit    k-p     W'W - b'X'Y         MSL         FRL      P(FL>=FRL)
 pure error     n-k      Y'Y - W'W          MSP
---------------------------------------------------------------------------
 reduction       q       SSQ - SSE          MSQ         FRQ      P(FQ>=FRQ)
---------------------------------------------------------------------------

     The column 'mean square' is  obtained  by  division  of  the  sums  of
squares by their corresponding degrees of freedom;  The column 'F-ratio' is
obtained by division of the mean  squares  by  the  residual  mean  square,
except  for  the  lack of fit F-ratio, which is obtained by division of the
lack of fit mean square by the pure error mean square, thus:

   MSM = nu^2/1, MSR = (b'X'Y-nu^2)/(p-1), MSE = s^2 = (Y'Y-b'X'Y)/(n-p),

     MSL = (W'W-b'X'Y)/(k-p), MSP = (Y'Y-W'W)/(n-k), MSQ = (SSQ-SSE)/q,

       FRM = MSM/MSE, FRR = MSR/MSE, FRL = MSL/MSP and FRQ = MSQ/MSE.


     If the disturbances are mutually independent and normally  distributed
(with E(e) = 0 and var(e) = Isigma^2) and with a (preset) level of signifi-
cance alpha, a significance tests can be performed for:

1.  The mean of the observations for the dependent variable, or more speci-
    fically:  the null hypothesis is:

          H0:  E(u) = 0,

    which is tested against the alternative hypothesis:

          H1:  E(u) is not equal to zero,

    by treating FRM as a realization of a Fisher(1,n-p) variate.

2.  The regression equation, or more specifically:  the null hypothesis is:

          H0:  a1 = ... = ap = 0, except for the ai that denotes
                                  the constant term (if present),

    which is tested against the alternative hypothesis:

          H1:  at least one of a1,...,ap is not equal to zero,

    by treating FRR as a realization of a Fisher(p-1,n-p) variate.

3.  The adequacy (linearity) of the model, or more specifically:  the  null
    hypothesis is:

          H0:  the linear model is adequate (that is, no model signifi-
               cantly improves the prediction of Y over the linear model),

    which is tested against the alternative hypothesis:

          H1:  the linear model is not adequate,

    by treating FRL as a realization of a Fisher(k-p,n-k) variate.

4.  A subset of regression coefficients,  or  more  specifically:   suppose
    without  loss  of  generality  that  the  subset consists of the last q
    coefficients, then the null hypothesis is:

          H0:  ar = ... = ap = 0, with r = p-q+1,

    which is tested against the alternative hypothesis:

          H1:  at least one of ar,...,ap is not equal to zero,

    by treating FRQ as a realization of a Fisher(q,n-p) variate.

5.  A linear combination of the regression coefficients, or  more  specifi-
    cally:  the null hypothesis is:

          H0:  c'a = m, in which c is a vector of constants with order q+1,

    which is tested against the alternative hypotheses:

          H1:  c'a is not equal to m,

    by substituting c'a = m in the original model, shifting the known terms
    to  the  left hand part, combining the corresponding terms in the right
    hand part, and testing the thus derived reduced model 
    by treating FRQ as a realization of a Fisher(q,n-p) variate.

In each case the right tail probability P(F >= FR) can be found in the last
column of the analysis of variance table.