PDP-10 Archive: decus/20-0149/mulmod.hlp from decuslib20-05

Trailing-Edge - PDP-10 Archives - decuslib20-05 - decus/20-0149/mulmod.hlp
There are 2 other files named mulmod.hlp in the archive. Click here to see a list.
 
Multiple Linear Regression Analysis

THE MODEL SPECIFICATION

     To let the program  know  between  which  variables  the  statistician
expects  a  certain  kind of relationship, he must provide a model specifi-
cation, which consists of the keyword "Model" followed by  a  formula  (the
model  statement),  which  resembles  the  notation of regression models in
common statistical literature quite closely.  For instance:

             "Model"  y = alpha0 + alpha1 * x1 + alpha2 * x2;

     A model formula consists of an  identifier  to  denote  the  dependent
variable  (the left hand part), followed by an '=' (equal), followed by the
sum of a number of terms (the right hand part), while it is terminated with
a  ';' (semicolon).   Each  term  must  be  the product of an identifier to
denote the parameter (which is to be estimated) and an identifier to denote
the  independent  variable.  An exception is made for the optional constant
term, which is given as a single identifier denoting  that  constant  term,
and which may be placed anywhere in the model.

     Each identifier must start with a letter and is allowed to contain any
number  of  letters,  digits and blanks.  As most peripheral equipment of a
computer is unable to process sub- or superscriptions or Greek letters,  we
write alpha0, alpha1 and alpha2.  Identifiers have no inherent meaning, but
serve for the identification of variables, parameters and functions.   They
may  be chosen freely (except for the twentyone standard function names and
the ten option names, cf. "Help"/Options).  It is advised not  to  use  the
same  identifier  to  denote  two  (or  more)  different  quantities;   for
regression parameters, however, it will not lead to fatal  errors,  whereas
for  the  dependent  and  independent variables distinguishable identifiers
must be used indeed.  Correct model formulae are for instance:

          "Model" y variable = constant + parameter * x variable;
and
          "Model" depvar = const + beta1 * xvar1 + beta2 * xvar2;


TRANSFORMATIONS

     Almost all transformations a user would like to perform on  his  input
data  fit  quite  naturally  in  the model formula:  each transformation is
expressed as a formula itself.  If, for instance, the user wants to include
in  the  model  formula as an independent variable the natural logarithm of
the sum of two other variables, he writes:  (if those two  other  variables
are called:  xvar1 and xvar2)

                           Ln (xvar1 + xvar2) .

In model formulae the operators  '+' (plus),  '-' (minus),  '*' (asterisk),
and  '/' (slash)  are  allowed,  all  with  their  conventional  meaning of
addition, subtraction, multiplication and division respectively.  Of course
the  normal  operator  precedence rules are obeyed.  Special operators are:
':' (colon), integer division and  '^' (uparrow), exponentiation.
The operation term : factor is defined  only  for  operands  both  of  type
integer  and  will  yield  a  result of type integer, with the same sign as
would be obtained by normal division,  while  the  magnitude  is  found  by
dividing  the  two quantities and taking the whole part;  mathematically it
can be defined as:  a : b = Sign (a / b) * Entier (Abs (a / b)),
for instance:  5 : 2 = 2 and -7 : 2 = -3.
The operation factor ^ primary denotes exponentiation, where the factor  is
the base and the primary is the exponent,
for instance:  5 ^ 2 = 25 and 2 ^ 3 ^ 2 = 64 but 2 ^ (3 ^ 2) = 512.  

Also the following twentyone standard functions are allowed:

   Abs (E),  Sign (E),  Sqrt (E),  Sin (E),  Cos (E),  Tan (E),  Ln (E),
   Log (E),  Exp (E), Entier (E), Round (E), Mod (E1, E2), Min (E1, E2),
   Max (E1, E2), Arcsin (E), Arccos (E), Arctan (E), Sinh (E), Cosh (E),
   Tanh (E) and Indicator (E1, E2, E3)

in which E, E1, E2 and E3 are expressions in terms of variables,  operators
and standard functions.  Round (E) is defined as:  Entier (E + 0.5) and
Indicator (E1, E2, E3) is defined as:  IF E1 <= E2 <= E3 THEN 1 ELSE 0.

     The dependent variable may be transformed in a similar way  and  as  a
consequence the model formula in its most general form looks like:  

   "Model" G (y) = b0 + b1 * F1 (x1,...,xm) + ... + bp * Fp (x1,...,xm);

Some examples of transformed model formulae are:

          "model" y = a0 + a1 * Sqrt (x1 + x2) + a2 * Sqrt (x3);
and
          "MODEL" Arcsin (Sqrt (Y)) = A0 + A1 * X + A2 * X ^ 2;

     A user can specify model formulae in which terms with known regression
coefficients  appear, by subtracting those terms from the left hand part of
the model formula, for instance:

           "Model" y - 5.4321 * x3 = a0 + a1 * x + a2 * x ^ 2;

This applies especially to the constant term;  if this  term  is  known  it
must be shifted to the left hand part.

     If weights are present in the input data (or can be  computed  out  of
the input data), to indicate that the variances of the observations are not
all equal (cf. "Help"/Theory), the left hand part of the model formula  can
be  expanded  with  a  so  called weight part (which can be an expression),
preceeded by a '&' (ampersand), for instance:

   "Model" Depvar & Max (Abs (Weight), 10) = Const + Param * Indepvar;