PDP-10 Archive: decus/20-0149/mulinp.hlp from decuslib20-05

Trailing-Edge - PDP-10 Archives - decuslib20-05 - decus/20-0149/mulinp.hlp
There are 2 other files named mulinp.hlp in the archive. Click here to see a list.
 
Multiple Linear Regression Analysis

THE INPUT SPECIFICATION

     To indicate which numbers or series  of  numbers  in  the  input  data
belong  to  which  variable  in  the model formula and which numbers can be
skipped, the program expects an input specification.  It  consists  of  the
keyword  "Input",  followed  by  a  formula  (the  input  statement)  which
describes the arrangement of the observations in the input data,  while  it
is  terminated with a ';' (semicolon).  The basic idea is that numbers from
the input data are identified with the names from the input formula in such
a  way  that (in order of entry) numbers belonging to the same name are put
in a queue appended to that name.  For instance:

         "Input" 100 * (codenr, 10 * [yvar], [xvar1, xvar2], -1);

means that one hundred series of numbers (each, as a check,  terminated  in
this example by -1) are present in the input data.  Each series consists of
fourteen numbers:  first one value which is read and assigned to  the  name
codenr,  then  ten  values  for  the name yvar, then one value for the name
xvar1, followed by one value for the name xvar2 and finally the value -1.

     The basic constituent of an input formula is a  variable  enclosed  in
square brackets, in the example: [yvar].  The corresponding number from the
input data will be appended to the queue for that name.  Several  variables
can  be  put  together  in a variable list by separating them by commas and
enclosing them in square brackets, in  the  example: [xvar1, xvar2].   This
only serves to save the writing of several opening and closing brackets.

     Separate numbers, series or  blocks  of  numbers  can  be  treated  by
putting a repetition factor (control) followed by an asterisk in front of a
variable list (or in front of an input formula which must then be  enclosed
in parentheses), in the example:  100 * and 10 * .

     If a repetition factor is 1, it  may  be  omitted  together  with  the
asterisk  and  a  parentheses  pair,  but square bracket pairs must remain.
When a name is used as a repetition factor, a value must already have  been
assigned  to it, which is done by giving that name, without square brackets
and followed by a comma (or closing  parenthesis),  earlier  in  the  input
formula  than  the  use  of  that name as a repetition factor.  The corres-
ponding number from the input data is then assigned  as  a  value  to  that
name.   If  such  names  are  used  repeatedly  in  the  input formula, the
corresponding numbers from the input data are compared with the  first  one
and,  in  the  case  of inequality, an error message is supplied.  This may
serve as a check against shifted data reading.   A  similar  check  can  be
obtained  by  giving  an  explicit  number  followed by a comma (or closing
parenthesis), in the example: the -1.  The corresponding  number  from  the
input  data  is  then  compared  with that given number and, in the case of
inequality, an error message is produced.

     Also an expression is allowed as a  repetition  factor,  or  for  that
matter,  as  a check value, provided that it is enclosed in angle brackets,
for instance: <k+n>.  As in the case of single names used as  a  repetition
factor  each  (non-standard function  and  non-option)  name used in such a
(special) expression must have been given, followed by a comma, earlier  in
the input formula than the use of that name in the expression.

     The linkage between  the  model  formula  and  the  input  formula  is
established  by  using  the  same names in the model terms and in the input
variable lists.  Numbers from the input data  that  belong  to  such  input
names  will  be  treated  as  observations  for  the model variables, while
numbers that belong to input names between square  brackets  which  do  not
appear in the model formula, are skipped.

     Often, repeated observations for the dependent variable are available.
In  order  to  be  able  to process these observations automatically, it is
necessary that a variable list consisting entirely of  dependent  variables
is  preceeded  by  a repetition factor (followed by an asterisk) indicating
the number of repetitions.  If a variable list contains independent as well
as  dependent  variables, the number of replications is assumed to be 1.  A
series  of  (say  100)  observations  for  a  dependent  variable  with  no
replications is denoted as:

                             100 * ([dep var])

The repetition factor in front of the opening  square  bracket  is  omitted
(because  it  is  1),  although  the  parentheses  are  not.   Without  the
parentheses it would mean 100 replications of [dep var].

EXAMPLE    "Input" k, n, <k+n> * (c, m, m * [y], [x1,x2,x3,x4], c), -99;

means that: the first number is read and its value assigned to k,
            the next number is read and its value assigned to n,
            then k+n times the following happens:
                 a number is read and its value assigned to c,
                 the next number is read and its value assigned to m,
                 then the m replications for y are read,
                 next the observations for x1, x2, x3 and x4 are read,
                 then a number is read and its value compared with c,
            finally a number is read and its value compared with -99.

If the comparisons fail, an error message is supplied and execution of  the
job  is terminated, otherwise (k+n) observations for x1, x2, x3, x4 and for
each quadruple m replications for y, have been identified.