Google
 

Trailing-Edge - PDP-10 Archives - decuslib10-06 - 43,50430/clustr.rnd
There are no other files named clustr.rnd in the archive.
.BOTTOM NUMBER
.NONUMBER
.roman numerals
.UPPER CASE
.FILL
.SPACING 1
.JUSTIFY
.RIGHT MARGIN 70
.DOCUMENT ID CLUSTR USERS MANUAL
.BOTH SIDES
.FIGURE 8
.CENTRE 75
^A NUMERICAL CLASSIFICATION SUITE\
.SKIP 4
.CENTRE 75
^CLUSTR\
.SKIP 6
.CENTRE 75
^C.J. ANDREWS\
.DOCUMENT ID
.PAGE
.DOCUMENT ID CLUSTR USERS MANUAL
.PAGE
.LEFT MARGIN 5
.CENTRE
^&ABSTRACT\&
.lower case
.NUMBER
.PARAGRAPH 5
^THIS MANUAL DESCRIBES A SUITE OF PROGRAMS WHICH ARE CAPABLE OF DEALING EFFECTIVELY WITH SETS OF DATA WHICH ARE TO BE NUMERICALLY CLASSIFIED.
^THE DATA REPRESENT SEVERAL ENTITIES WHICH ARE DESCRIBES BY RELEVANT
ATTRIBUTES.
.PARAGRAPH
^THE METHOD BY WHICH THE CLASSIFICATION IS PERFORMED MAY BE CONTROLLED IN A MOST FLEXIBLE MANNER, BY SEVERAL EASILY SET USER OPTIONS. ^THESE
OPTIONS CONTROL THE FOLLOWING STEPS IN THE CLASSIFICATION PROCESS: 
.LEFT MARGIN 10
.RIGHT MARGIN 60
.UPPER CASE
.skip 1
(i)   A transformation of the raw data may optionally be carried out in one of several ways.
.break
(ii)  A choice of dissimilarity indices may be made.
.break
(iii) A choice of sorting and clustering strategies is available.
.break
(iv)  Output optionally available includes printouts of trellis diagrams, two way tables and summaries of the raw data, and plots of derived
dendrograms from the sorting strategies.
.break
(v) optional Ordination derived from the methods of Principal
Component Analysis and/or Principal Coordinate analysis, may be selected.
.skip 1
.RIGHT MARGIN 72
.left margin 5
.paragraph
The program as outlined performs both normal and inverse analyses
of two-dimensional raw data in the form of entities versus
attributes.  Such data are commonly generated in psychological, taxonomic, and biological studies and also in studies in other social sciences.
.paragraph
CLUSTR can also be used to classify three dimensional data (entity-1 x entity-2 x  attribute) as is often required in ecological
study, for example in <sites x times x species> analysis.  This extension in no way affects the two-dimensional study of data, and is entirely
transparent to users of the latter facility.
.paragraph
An extension of the known TAXAN program has been accomplished, and the
output options of CLUSTR may be coupled with the ability of TAXAN to
handle disordered multistate data.
This facility is useful for taxonomists.
.DOCUMENT ID
.page
.DOCUMENT ID CLUSTR USERS MANUAL
.NONUMBER
.PAGE
.NUMBER
.centre
^&ACKNOWLEDGEMENTS\&
.paragraph
The author gratefully acknowledges the help of several people in the
preparation of these programs.  Prof. W. Stephenson
provided the initial motivation and need for the suite, and over a
long period has provided the author with many helpful discussions
and suggestions as to its functioning.  Dr. H.T. Clifford has 
contributed to the author's thinking in many areas of classification
and is similarly acknowledged.
.paragraph
Dr. Clifford provided access to Dr. E. Burr's program TAXAN which forms
the basis of the sorting section of CLUSTR.  It has been modified 
with the help of Mr. R.A. Cook of La Trobe University.  Mr. R.D. Nilsson provided certain ideas for flexible core management which
the author has subsequently incorporated into CLUSTR.
.paragraph
The University of Queensland Prentice Computer Centre has provided
finance for the latest stage of development of this suite and is also
gratefully acknowledged.
.skip 2
.centre
^&SUPPORT\&
.paragraph
Support for this suite is available in two forms.
.paragraph
Users who desire advice on classificatory  methodology
may consult Dr. H.T. Clifford
of the Botany Department, University of Queensland. Such users
should first have gained familiarity with the methods, via one of the
standard texts (e.g. 1,2)
.paragraph
Those who desire some assistance with the construction and running
of CLUSTR data decks may consult the University Computer Centre via its user consultation services. An attempt should first have 
been made to set up the deck before consultation, as
shown in this manual.
.DOCUMENT ID
.page
.DOCUMENT ID CLUSTR USERS MANUAL
.NONUMBER
.PAGE
.SPACING 2
.NUMBER
.centre
Contents
.nofill
.nojustify
(i)   Abstract                                        
(ii)  Acknowledgements                                
(iii) Support                                         
(iv)  Contents
.skip 1
Part I The Analysis of Two Dimensional (2D) Data
1.0   The Classification Process - an overview
2.0   CLUSTR operations for 2D data
3.0   Data input format for 2D data
4.0   TAXAN-CLUSTR interface
.skip 1
Part II  The Analysis of Three Dimensional (3D) Data
5.0   3D Clasification - an overview
6.0   CLUSTR operations for 3D data
7.0   Data input format for 3D data
.skip 1
Part III  CLUSTR on the PDP-10
8.0   8.1  Controlling  CLUSTR on the PDP-10 Batch System
      8.2  Controlling CLUSTR via a PDP-10 remote terminal
9.0  CLUSTR system components and interactions
.SPACING 1
.fill
.justify
.chapter 1
.arabic numerals
.skip 10
.centre
Part 1
.skip 15
.centre
The Analysis of Two Dimensional (2D) Data
.page
1.0  ^&The Classification Process - an overview\&
.paragraph
Throughout this part it is assumed that a user has a data
matrix that he wishes to "have classified" and that he
has some basic knowledge of what sort of classification he
wishes to have performed on that  matrix.  Those users who
desire more information on the theory of
numerical classification itself are referred to one of the
standard texts on the subject (e.g. 1,2).
.footnote 6
.CENTRE
* * * * * * * * * *
1.  Clifford,H.T.  Stephenson, W. "An Introduction
to Numerical Classifiction" Ac.Pr. 1975
.break
2. Sneath,P.H.A., and Sokal,R.R., "Numerical Taxonomy", Freeman, 1973.
!
It will be the object of this section (section 1.0) to define terminology
used with regard to this program, and set the stage for describing the
options this program provides at each stage of its execution.
.paragraph
An examination is now made of the processes which are carried out on a set
of data in classifying it.
.paragraph
It is assumed that n entities are described by the values of v variates.
The data for each of the entities can be set out in a matrix X (an n x v
matrix), viz:
.test page 10
.blank 10
.paragraph
In this matrix Xij represents the value of the ith attribute(variate)
of the jth entity.
.paragraph
Taking into account all v variates for each entity it is
possible to produce (n#x#n) indices of dissimilarity between all the
possible pairs of entities.  It may not however always
be desirable to produce these indices from the raw data and so
a new matrix F (n x v) can first be produced such that each
element of F is a TRANSFORMED version of the elements of X (i.e. as in matrix F below)
.figure 10
The transforming function could be defined for example as
.skip 1
.nofill
.nojustify
              f(Xij)=log(Xij+1)
 
            or f(Xij)=Xij   i.e. operate on raw data
.fill
.justify
.paragraph
From matrix F can now be produced the matrix D, a square nxn matrix
whose i,j th element Dij represents the dissimilarity between elements
i and j of the F matrix.  This matrix has elsewhere been termed the
Q matrix (3);
.footnote 3
.CENTRE
* * * * * * * * * *
3. Gower, T.C., "Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis".  Biometrika 53, 325-388 1966
!
Also the term TRELLIS DIAGRAM has been applied. 
.paragraph
The trellis diagram can now be "sorted" so that elements of closest
affinity can be brought together progressively in clusters.  This
process has variously been termed sorting, clustering, grouping or
fusing.  The means of determining which element of D is next fused
into an existing group is called the "sorting strategy".  Initially
each entity is regarded as a group of size 1.
The sorting strategy therefore
operates on the D matrix, and successive fusions progressively reduce
the size of the matrix. What is produced in essence is
a re-ordering of the rows (or columns) of the D matrix.
.paragraph
Two factors are noted when entities fuse with existing groups (which
may possibly be other single entities).  These are, the entities which
fuse, and the dissimilarity level at which they fuse.  The result
of all fusions for one trellis diagram can be expressed in terms of
these two factors in a DENDROGRAM e.g.
.test page 10
.blank 10
.paragraph
In the above case entities 1 and 2 have fused to form group 6, 5 and 4
to give 7, and entity 3 has fused with group 7 to form a larger group
8. Similarly, 6 and 8 fuse to give 9 and the process is complete at this stage
since one cluster alone finally exists.
.paragraph
The base line of this dendrogram can be used to give an ordering of
the initial entities into "groups of closest affinity" at various
levels of affinity (see for example groups B and C above).
.paragraph
It is now possible to rearrange the columns of the original X
matrix to reflect this ordering and in so doing the grouping
of entities should become visually more obvious in that
matrix.
.paragraph
So far this general discussion has centred on classifying entities
against each other taking account of all possible variates.
It is also posible to perform the so called inverse analysis classifying attributes against each other.  In this case it is possible, may be
desirable, but is not mandatory to use a different transformation,
different dissimilarity index, and different sorting strategy than was
used for the "normal" analysis.  The end products, however, are the same
- a dendrogram specifying in this case attribute correlations, and
a resorting this time of the rows of the X matrix.
.paragraph
When both resorting of rows and columns of the X matrix occurs, a so called
TWO-WAY table is produced.
.paragraph
Once a trellis diagram has been obtained, it is possible to perform
a principal co-ordinate analysis of this diagram to obtain a
representation of the n entities or v variates in a space which can be
interpreted geometrically.  The clusters may then be viewed visually.
The method by which this can be done is elsewhere described and is
quite complex.  For details of its derivation the user is
referred to references (3) and (4).
.footnote 3
.CENTRE
* * * * * * * * * *
4.  Gower, J.C., "Multivariate Analysis and Multidimensional
Geometry", Statistician, 17,13-28 1967
!
.paragraph
Principal Component Analysis may also be performed, and by this alternate
method the n entities may be represented in a v-dimensional space.
The latter analysis operates on the original F matrix, and is equivalent to
a rotation of the original v axes against which the entities of F
are represented, to new orthogonal positions which maximise attribute
variance along their length.
.paragraph
The  two Principal Analyses are members of a group of analyses
known under the general title of Ordination.
.paragraph
The CLUSTR program provides users with all the features outlined above,
i.e. transformation, dissimilarity calculation, sorting, 
production of two-way tables, and ordination, for both
normal and inverse analyses, with possible differences in strategy for each.
.paragraph
The various different strategies available will be defined in Section
2.0 and how they may be specified in running the program will
be shown in Section 3.0 of this manual.
.page
2.0  ^&CLUSTR operations for 2D data\&
.paragraph
The various options available to a user at each level of the clustering
process are now outlined.  It is assumed at this stage that the raw
data matrix has been read from an input medium.  The method by which
this is done will be given in Section 3.0 and 8.0.
.skip 2
2.1  ^&Summary of Raw Data\&
.paragraph
An initial summmary of the raw data is printed in 2D analysis.  For an
X matrix of the following form,
.test page 10
.blank 10
.paragraph
a quantity Pi is derived which may be either
.indent 10
(a) the total of all variate values for that entity.
.nofill
.nojustify
.skip 1
.test page 3
                    v
              Pi =    Xji            [later termed S ]
                  j=1
.fill
.justify
.skip 1
.indent 7
or (b) the average variate value for that entity
.nofill
.nojustify
.skip 1
.test page 5
                    v
                      Xji
                   j=1
            Pi =                      [later termed A]
                      v
.skip 2
.fill
.justify
2.2 ^&Transformation Option\&
.paragraph
The transformation option available for each element of the X matrix
(F(Xij) of 1.0) may be:
.nofill
.nojustify
.skip 1
   either (i) F(Xij)=log(Xij)    log transform     [L]
.skip 1
       or(ii) F(Xij)= Xij        no tranformation  [N]
.skip 1
.test page 5
                          Xij      (standardize by row
      or(iii) F(Xij) =                             [S]
                        n
                           Xik      total for normal
                      k=1           analysis)
.skip 1
.test page 5
                           Xij      (standardize by column
              F(Xij) =                             [S]
                        v
                           Xkj       total for normal
                      k=1            analysis)
.skip 1
        or(iv) F(Xij) = Xij ** (1/n) where n is integral  [P]
.skip 1
        or(v)  F(Xij) = (Xij-m)/s                  [V]
               where m and s are the mean and standard
               deviation in rows or columns.
.skip 2
.fill
.justify
2.3  ^&Dissimilarity Index Option\&
.paragraph
The current dissimilarity indices allowed are:
.skip 1
.nofill
.nojustify
       (i)  Bray-Curtis                           [B]
.skip 1
.test page 7
                  v
                     Fki-Fkj     (For
                k=1               normal
       dij =                      analysis)
                  v
                     Fki+Fkj 
                k=1
.skip 1
.test page 7
                 n
                    Fik-Fjk      (For
                k=1               inverse
       dij =                 
                 n
                    Fik+Fjk       analysis)
                k=1
.skip 1
.fill
.justify
.indent 10
where Fij is the element found in the transformation step as
F(Xij).
.skip 1
.nofill
.nojustify
       (ii) Manhattan Metric                      [M]
.skip 1
.test page 3
               v
         dij =     Fki-Fkj 
              k=1
.skip 1
.test page 3
               n
   or    dij =     Fik-Fjk 
              k=1              for analyses as before
.skip 1
       (iii) Canberra Metric                     [C]
.skip 1
.test page 3
               1   v   Fki-Fkj 
        dij =  -                
               v  k=1  Fki+Fkj 
.skip 1
.test page 3
   or          1   n   Fik-Fjk 
        dij =  -                
               n  k=1  Fik+Fjk 
.skip 1
               for analyses as before
.skip 1
       (iv) D-squared Euclidean Distance         [D]
.skip 1
.test page 3
                v          2
        dij =     (Fki-Fkj)
               k=1
.skip 1
.test page 3
      or        n          2
        dij =     (Fik-Fjk)
               k=1
.skip 1
                 for analyses as before
.skip 1
       (v) Matching Coefficient                  [A]
.skip 1
.test page 5
               v
                  (Wij)
              k=1      k
        dij =            
                   v
.skip 1
.test page 5
      or       n
                  (Wij)
              k=1      k
        dij =            
                  n
.skip 1
.fill
.justify
.paragraph
where (Wij)#=1 if elements i and j have equal values for
Fik and Fjk. Thus both negative and positive matches are considered.
.nofill
.nojustify
.skip 1
.test page 6
      (vi) ratio coefficient                     [R]
.skip 1
.test page 4
                               g
                       v  (Fik)
           dij = v -            
                     k=1  (Fjk)
.skip 1
.test page 4
        or                     g
                      n   (Fki)
          dij = n -                         
                      k=1 (Fkj)
.skip 1
.fill
.justify
.paragraph
where g is chosen to be either +1 or -1 to make the fraction
less than or equal to unity.
.break
.skip 1
General Note on Dissimilarity Measures
.paragraph
It will be seen later that missing data may be specified in the CLUSTR data
input deck. If any of the elements of F specified above are missing
for particular items in the summations then those summation items
contribute nothing to the summations.
.fill
.justify
.skip 2
2.4  ^&Sorting Strategies\&
.paragraph
Several commonly used sorting strategies are available in this program
suite.  The point at which a sorting strategy becomes identifiable is
that point at which the dissimilarity index is recalculated between
a new group and all other groups after the former group has received
a new member.  Lance and Williams (5) have shown that some
common strategies (at least most of those described here) could be
expressed in terms of the following general form:
.footnote 3
.CENTRE
* * * * * * * * * *
5.  Lance, G.N. and Williams, W.T., "A Generalised Sorting
Strategy for Computer Classifiations" Natre (Lond.) 212,218 1966
!
.test page 5
.blank 5
.paragraph
where the choice of #,#,# and # determine
the strategy. The following diagram defines the other terms in the
expression.
.test page 15
.figure 15
.paragraph
The strategies allowed in CLUSTR are defined in terms of these parameters as follows:
.paragraph
(i) Nearest Neighbour
.test page 3
.figure 3
.paragraph
(ii) Furthest Neighbour
.test page 3
.figure 3
.paragraph
(iii) Group Average
.test page 3
.figure 3
.paragraph
(iv) Simple Average
.test page 3
.figure 3
.paragraph
(v) Centroid
.test page 3
.figure 3
.paragraph
(vi) Incremental Sum Of Squares
.indent 15
Not of the Lance and Williams type previously mentioned.
The user is referred to Burr(6).
.footnote 3
.centre
*#*#*#*#*#*#*#*#*#*#
(6)Burr, E.J., 'Cluster Sorting with Mixed Character Types. II
Fusion Strategies' Aust. Comp. Jour. 2,98-103 1970.
!
.paragraph
(vii) Variance
.indent 15
Again not of the Lance and Williams type; see(6).
.paragraph
(viii) Flexible
.test page 3
.figure 3
.skip 2
2.5  ^&Other Options\&
.paragraph
At each stage of the classification process, the user may optionally
select output of various results.  These include trellis diagrams,
dendrograms and two way tables.
.skip 2
2.6 ^&Ordination\&
.paragraph
Ordination is treated as a slightly separate topic as it may 
proceed fairly independently of other analyses.
Once trellis diagrams have been obtained, Principal Coordinate analysis
may proceed, and once transformations have been carried out (i.e.
matrix F options have been specified) Principal Components Analysis may
be performed.
.skip 2
2.6.1 ^&Method of Principal Coordinate Analysis\&
.paragraph
The method used for Principal Coordinate Analysis is that given by Gower(4)
as modified by Williams and Dale(7).
.footnote 2
.LEFT MARGIN 5
.CENTRE
* * * * * * * * * *
7.  Dale, M.B. Personal Communication
.left margin 10
!
.paragraph
Given a dissimilarity matrix D of elements Dij the procedure is as
follows:
.left margin 21
.indent -6
1 (a)  For Bray Curtis trellises form the similarity matrix Sij=1 - Dij
.break
.indent -6
##(b)  For Manhattan Metric trellises form the matrix
       Sij=(-1/2)*Dij**2
.break
.indent -6
2.  "Transform" the elements by
     Sij <= Sij - S.j - Si. + S..
where . represents the mean over the appropriate row or column
.break
.indent -6
3.  Find the eigenvectors and eigenvalues of the matrix (Sij) and
standardise each vector so that the sum of squares of its elements is equal to the corresponding eigenvalue.
.break
.indent -6
4.  The k th elements of each of the n vectors then represents the
co-ordinates of the kth point with respect to axes defined by the n
vectors.  The relative magnitude of each eigenvalue gives the relative
"importance" of each eigenvector axis.
.left margin 5
.skip 2
2.6.2 ^&Principal Component Analysis\&
.paragraph
The method used for Principal Component Analysis is that outlined by various
authors e.g. Seal(8),Blackith and Reyment(9), and Sneath
and Sokal(2).
.footnote 6
.centre
* * * * * * * * * *
(8) Seal, H.L., "Multivariate Statistical Analysis For Biologists",
Methuen, London, 1964.
.break
(9) Blackith,R.E., and Reyment, R.A., (eds), "Multivariate Morphometrics"
, Ac. Pr., N.Y., 1971.
.left margin 10
!
.paragraph
Given a matrix F the method is
.break
.left margin 15
1. Form the matrix, R, of variance and covariance between entities
or attributes.
.break
2. The eigenvectors of R give the Principal Components we are
seeking.
.break
3. Standardize the eigenvectors so that they are of length 1, giving
matrix V.
.break
4. The matrix of new coordinate points P of the entities represented
by F is given by P=V'F.
.left margin 5
.skip 1
2.7  ^&Conclusion\&
.paragraph
Section 2 has given the formal definition of each of the facilities
available in CLUSTR.  How to specify these to the running program will
be examined in Section 3.
.page
3.0 ^&Data Input Format For 3D Data\&
.paragraph
To run the CLUSTR program, a user codes his data onto cards and forms
a data deck. The data deck contains four types of card in the 
manner of 3.1:
.skip 2
3.1 ^&Data Deck Structure\&
.paragraph
The data deck consists of four types of cards placed in the following
order:
.test page 15
.blank 15
.paragraph
Each of these cards will be discussed in the following sections.
The samples card and dimensions card  will be outlined first
as they are least complex. The options card will next be detailed, and the
section concluded with details of the missing data cards.
.skip 2
3.2 ^&Dimensions Card\&
.paragraph
The dimensions card contains the maximum dimensions of the matrix to be
input. The format is:
.test page 10
.blank 10
.paragraph
The title may be up to 65 characters in length and serves to identify all
subsequent output.
.skip 2
3.3 ^&Samples Cards\&
.paragraph
The samples cards contain the data for the n x v attribute matrix X.
.paragraph
There are two different forms for the input which are selectable depending
on which the user finds most convenient.
The forms are either to input the data by entity, or by attribute, i.e.
row-wise or column-wise in the X matrix.
.skip 1
Entity format:
.paragraph
A user specifies the various attributes for a given entity together with
an attribute score. The form is
.test page 10
.blank 10
.paragraph
More than one card for a given entity is permissable.
.skip 1
Attribute Format:
.paragraph
A user specifies
the scores for several entities for one given attribute per card.
The form is
.test page 10
.blank 10
.paragraph
More than one card per attribute is permissable.
.skip 2
3.4 ^&Options Card\&
.paragraph
The options card specifies the manner in which the analysis is to be
performed. It contains codes to indicate to the program which of the
particular facilities in section 2.0 a user wishes to specify at each
stage of his analysis.
.paragraph
It is organized into 11 blocks each of which contains 5 columns, as follows:
.test page 10
.figure 10
.test page 10
.figure 10
.paragraph
Each block declares the optins required for an identifiable stage in the
analysis. The form of the blocks will next be given.
.SKIP 2
3.4.1 ^&DATA INPUT BLOCK\&
.paragraph
The data input block contains three fields which signify the form of the
input deck following.
.left margin 15
.skip 1
(a)  Samples Card Format
.break
Column 1 contains an option which specifies which format the samples cards will appear in:
.indent 10
E - entity format
.indent 10
A - attribute format
.skip 1
(b)  Divide Option
.skip 1
Columns 2-4 contain a numeric constant which will be used to divide into
each incoming sampled number.  If left blank no division is performed.
This allows the user to scale his input deck for example 10 or 100.
.skip 1
.test page 3
(c)  Listing Option
.skip 1
The program initially lists summary totals or averages as previously
defined.  Column 5 contains one of the following:
.indent 10
S - summed values
.indent 10
A - average values
.left margin 5
A sample form is:
.test page 10
.blank 10
.skip 2
3.4.2  ^&Logarithm Option Block\&
.paragraph
While the transformation of log (Xij + 1) may more logically appear
to be included in the transformation block (3.4.3) it is included in
a separate block as
this will provide greater flexibility for 3D analysis extension later
described.
.paragraph
If a log (Xij + 1) transformation of the complete X matrix
is required then the L option is specified in column 9 of the logarithm
option block. Further transformation may optionally be applied as
previously defined by using options in the transformation block.
.paragraph
A sample form for this block is:
.figure 10
.skip 2
.test page 3
3.4.3 ^&Transformation Block\&
.paragraph
The transformations of the X matrix (log'ed or not) are specified in
columns 11-15 of the options card.  The options are:
.left margin 15
(i)  standardise (by row or column) ####- S option
.break
(ii) power option  -  ###################-#P option
.break
.left margin 20
the power to which each data element is raised is specified in 3.4.4.
This option is used for taking  the nth root of an element.
.left margin 15
(iii) do nothing to the element - #####-#N option
.break
(iv) express as std deviation from mean##-##V option
.left margin 5
.skip 1
The form of the Transformation block is:
.break
.test page 9
.blank 9
.skip 2
3.4.4  ^&Power Block\&
.paragraph
If the P option has been used in 3.4.3 the power block specifies
the power to which each element is raised.  If 3 is placed in the
appropriate column each element is raised to the 1/3 power and so
on.  In general if n is specified the elements are raised to the 1/n
power.
.paragraph
The form is:
.test page 9
.blank 9
.paragraph
If the P option is not used, these columns may be left blank.
.skip 2
3.4.5  ^&Dissimilarity Block\&
.paragraph
This block indicates the type of dissimilarity measure to be
used in creating the trellis diagrams for later sorting.  The options are:
.left margin 15
.skip 1
(i)    "B" option - Bray-Curtis measure
.break
(ii)  "M" option - Manhattan Metric
.break
(iii)"C" option - Canberra Metric
.break
(iv)  "D" option - Euclidean distanc squared
.break
(v)    "A" option - Matching Co-efficient
.break
(vi)  "R" option - Ratio Measure
.break
.skip 1
This field may be blank if dissimilarity calculation is not required.
The form is e.g.:
.left margin 5
.test page 10
.blank 10
.skip 2
3.4.6  ^&Printout Block\&
.paragraph
This block is a simple Yes/No option block to control the printout of each triangular trellis diagram.  The options are:
.skip 1
.indent 10
(i)  Y  -  Yes, print it out
.indent 10
(ii) N - don't print it out.
.test page 10
.blank 10
.skip 2
3.4.7  ^&Sorting Block\&
.paragraph
This block indicates the type of sorting algorithm to be used in
generating the dendrograms from the triangular matrices.  The algorithms are:
.left margin 25
.nofill
.nojustify
^&Option\&
N   nearest neighbour
F   furthest neighbour
G   weighted average
A   simple average
C   centroid
I   incremental sum of squares
V   variance
X   flexible
.skip 1
This field may be left blank if a sort is not required.
.left margin 5
.fill
.justify
A sample form is:
.test page 10
.blank 10
.skip 2
3.4.8  ^&Dendrogram Block\&
.paragraph
This block controls whether each dendrogram is produced and the
form of its output. The options are:
.skip 1
.LEFT MARGIN 19
.indent -4
L - produce dendrogram, output is listing of group fusions
.indent -4
P - produce plot of dendrogram
.indent -4
B - produce both plot and listing of dendrogram
.indent -4
N - (or blank column) dendrogram not required
.skip 1
.LEFT MARGIN 5
Example:
.test page 10
.blank 10
.skip 2
3.4.9  ^&Two Way Table Block\&
.paragraph
Data which have previously been log-transformed may be de-transformed
when printing out a two-way table. The options which may be specified are:
.skip 1
.left margin 15
L - 'leave as is' printout (don't attempt to detransform)
.break
U - 'un-logged' printout (detransform the data)
.break
B - both L and U above
.break
N - (or blank column) not required
.skip 1
.break
.left margin 5
.test page 11
Example:
.blank 10
.skip 2
3.4.10 ^&Ordination Block\&
.paragraph
The Ordination block specifies whether or not to produce
an ordination analysis of the appropriate trellis diagram, or F matrix.
The options available are:
.skip 1
.break
.left margin 15
R - produce Principal Coordinate analysis
.break
M - produce Principal Component  analysis
.break
B - produce both R and M above
.break
N - (or blank column) ordination not required
.break
.left margin 5
.paragraph
Its form is:
.blank 10
.paragraph
In Principal Component analysis the F matrix is multiplied by a weighting
matrix, W, to give the matrix of new coordinates, C, i.e.
.test page 10
.blank 10
.paragraph
Using the normal notation of matrix algebra, it will be seen that the element
Wij represents the weighting placed on old axis j in the F data,
in its contribution to the score on the new axis i in the C data.
.paragraph
The W matrix is automatically printed out in  Principal Component analysis.
.paragraph
It should be noted that the analyses of large matrices for
eigenvectors and eigenvalues involves extensive calculation, and
may therefore be expensive to perform.
.skip 2
3.4.11 ^&Note\&
.paragraph
If any particular option is not required for the user's particular
analysis it may be left blank.  An interpretation of the options card
is printed out at the beginning of the output to serve as a check for
the user.
.skip 2
3.4.12  ^&Examples\&
.paragraph
The following are sample options cards for illustrative purposes:
.figure 10
.skip 1
.nofill
.nojustify
.left margin 15
.indent -5
Example 1 above specifies
.skip 1
i    input deck in entity form to be divided by 10
ii   initial listing to be averaged values
iii  log transform the data
iv   standardize the logged data in columns by total, for entity analysis
           "      "    "      "  "  rows    "    "  ,  " attrbt  analysis
v    use Bray-Curtis for entity analysis
     use Manhattan Metric for attribute analysis
vi   print out the entity trellis but not the attribute trellis
vii  sort both trellises using group average sorting
viii produce both dendrograms
ix   print both logged and unlogged two-way tables
x    perform a princ compnt analysis of the attributes
.figure 10
.skip 1
.indent -5
Example 2 above specifies
.skip 1
i    Deck is in attribute format, no division, summated printout
ii   no log transformation
iii  entities untouched, attributes standardised
iv   entities Bray-Curtis, attributes Manhattan Metric
v    Attribute trellis only
vi   entities nearest neighbour sorting, attributes flexible sorting
vii  attribute dendrogram only
viii princ coord analysis of entities
     Both analyses of attributes
.left margin 5
.fill
.justify
.skip 2
3.5 ^&Missing data options\&
.paragraph
Any entity-attribute pair which will not occur, and should be flagged as such,
is specified on a missing data card. This card has the following form:
.figure 10
.paragraph
Up to seven entity-attribute pairs may be specified, and more than one missing
data card is permissable. Users are referred to the general note regarding
missing data in section 2.3 of this manual.
.page
4.0 ^&TAXAN-CLUSTR Interface\&
.paragraph
TAXAN is a program which provides a subset of the facilities of
CLUSTR, but in addition handles disordered multistate data.
.paragraph
TAXAN has been modified so that it calls on sections of CLUSTR to
provide some of the advanced output and analysis features available
in other CLUSTR analyses. Thereby the user gains the advantage of being 
able to use disordered multistate data.
.paragraph
The input form of a TAXAN data deck is summarized below. The program
as modified provides:
.left margin 15
.skip 1
(a) normal TAXAN output
.break
(b) plot of a dendrogram
.break
(c) principal coordinate analysis of the produced dissimilarity matrix
.break
.left margin 5
.skip 3
TAXAN Card Deck
.skip 1
.left margin 15
.indent -5
Card 1
.skip 1
 A 0-65 character title starting in column 1
.skip 1
.indent -5
Card 2
.skip 1
The numbers NENT NB NN ND NO J1 J2 PLOT ORDIN in that order in any
columns across the card. The numbers must be separated by spaces,
and zeroes must be explicitly stated. The numbers represent:
.skip 1
NENT - number of entities in analysis
.break
NB###-#number#of#binary#attributes
.break
NN###-####"###"##numeric####"
.break
ND###-####"###"##disordered#m/s#"
.break
NO###-####"###"##ordered#m/s####"
.break
J1###-#clustering#option
.break
.left margin 20
0,1 nearest neighbour
.break
2   furthest neighbour
.break
3   group average
.break
4   simple average
.break
5   centroid
.break
6   incremental sum of squares
.break
7   variance
.break
.left margin 15
J2   - standardisation option (numeric and ordered m/s only)
.break
.left margin 20
1   standardise by division by range
.break
2########"######"#####"#####"##twice#variance
.break
.left margin 15
PLOT - optional plot of dendrogram
.break
.left margin 20
0 plot required
.break
1 no plot required
.break
.left margin 15
ORDIN - optional Principal Coordinate analysis
.break
.left margin 20
0 ordination required
.break
1 ordination not required
.left margin 15
.skip 1
.indent -5
Card 3
.skip 1
Size of multistates 40 per card, using sufficient cards  to give
ND+NO values. The values on each card should appear in
columns 1,3,5,...,77,79.
.skip 1
.indent -5
Data Cards
.skip 1
The data cards for each entity follow in the following order:
.skip 1
.left margin 20
.nofill
.nojustify
Binary data for entity 1
numeric  "   "     "   1
disordered m/s     "   1
ordered  m/s       "   1
 
  repeat for entity 2
 
       .
       .
       .
  repeat for entity NENT
 
.fill
.justify
.left margin 15
.skip 1
.indent -5
Binary data format
.skip 1
Binary data appears as 1,0,'*', or ' ' in successive columns of data cards
, 60 values per card using as many cards as necessary to fill the required
NB number.
.skip 1
.indent -5
Numeric data format
.skip 1
Numeric data appear 30 numbers per card, again using as many cards as
are necessary to fill NN values. The numbers appear in any columns,
separated by spaces.
.skip 1
.indent -5
Multistate data format
.skip 1
Multistate data appear as numerically coded values or '*' or ' '.
They appear 40 values per card using as many values as are necessary
to satisfy ND or NO values respectively. They are coded to appear
in columns 1,3,5,...,77,79 of a data card.
.left margin 5
.skip 1
.TEST PAGE 6
LIMITATIONS
.paragraph
Presently TAXAN caters for a maximum of
.skip 1
.nofill
.nojustify
        NENT = 84
        NM   = 40
        NB   = 40
        NO+ND= 40
.fill
.justify
.paragraph
Further enquiries regarding the properties of TAXAN should be directed to Dr. Clifford
of the University of Queensland's Botany Department.
Problems with its operational aspects should, as with
CLUSTR, be directed to the Computer Centre.
.chapter
.skip 20
.centre
Part II
.skip 15
.centre
The Analyses of Three Dimensional Data
.page
5.0 ^&3D CLASSIFICATION - AN OVERVIEW\&
.BREAK
.paragraph
Three dimensional classification has found application particularly
in the ecological literature (e.g. 10,11).
.footnote 7
.centre
*#*#*#*#*#*#*#*#*#*
(10) Williams,W.T., and Stephenson,W., "The analysis of three-dimensional
data (sites x species x times) in marine ecology", J. Exp. Mar.
Ecol. 11,207-227.
.break
(11) Stephenson,W., Williams,W.T., and Cook, S., "The macrobenthos
of soft bottoms in southern Moreton Bay (south of Peel Island)"
, Mem. Queensl. Mus. 17,73-124.
!
The processes and operations which are performed in 3D analysis
have analogues in 2D analysis, and so this section of the manual merely
extends the concepts introduced in sections 1 and 2.
.paragraph
The X matrix now takes on the following form
.blank 10
.paragraph
The values of attribute k when two entities have the values i and j is given
by Xijk.
.paragraph
It may therefore be seen that 3D analysis concerns itself with classifying
two entities using the values of given attributes occurring for
each possible co-incident pair of entities. It may of course not be possible
for a given entity-1 entity-2 combination to occur, and so missing data options in 3D
analysis are extended.
.paragraph
Having defined the 3D data matrix, the 
3D classification reduces itself to two 2D classification problems. From
the X matrix (3D), one produces two 2D auxiliary data matrices,
A# and A#,shown diagrammatically below.
.test page 10
.blank 10
.paragraph
The A# (n#x v) matrix is produced by a mathematical reduction over all
entity-2 values.
.paragraph
The A# (n#x v) matrix is produced by a mathematical reduction over all
entity-1 values.
.paragraph
Once this reduction is accomplished, both normal and inverse
analyses of both A# and A# matrices can be performed as outlined in 
part#I of the manual.
.paragraph
The various methods of mathematical reduction are defined formally in section
6.0 of this manual, and the method of setting out 3D data, options,
dimensions, and missing data cards is given in section 7.0.
.page
6.0 ^&CLUSTR operations for 3D data.\&
.paragraph
The various options available to a user at each level of the classification
process are now formally defined. It is assumed at this stage that the raw
data matrix has been read from an input medium. The method by
which this may be done will be given in sections 7.0 and 8.0.
.break
.skip 2
6.1 ^&Summary of raw data\&
.break
.paragraph
An initial summary of the raw data is printed in 3D analysis. For an X
matrix of the following form:
.test page 10
.blank 10
.paragraph
quantities Pi and Qi are derived, representing  entity-1 and entity-2 summaries.
The quantities may be:
.nofill
.nojustify
.break
.left margin 15
.test page 8
(a) totals     [S]
 
        Pi =    Xijk
             jk
 
        Qj =    Xijk
             ik
 
.test page 11
(b) averages   [A]
 
               Xijk
            jk
       Pi =          
               n v
 
               Xijk
            ik
       Qj =             
               n v
 
.fill
.justify
.left margin 5
.skip 2
.test page 18
6.2 ^&Reduction options\&
.paragraph
For an X matrix of the form shown below,
.figure 10
.tab stops 40,45
.nofill
.nojustify
	n  entity-1
	n  entity-2
	v  variates
	 Xijk value of variate k when
	 entity-1=i and entity-2=k
.fill
.justify
.paragraph
the quantities in the A# matrix may be defined as:
.nofill
.nojustify
.left margin 15
 
.test page 5
(a) log-total       [T]
 
        A ik =  log  (  Xijk  +1 )
         1         10  j
 
 
.test page 7
(b) log-average     [L]
 
                        Xijk
                       j
        A ik =  log  (          + 1 )
         1         10     n
 
 
.test page 7
(c) average         [A]
 
                 Xijk
                j
        A ik =         
         1         n
 
.test page 6
(d) summed          [S] 
 
        A ik =  Xijk
         1     j
 
 
.fill
.justify
.left margin 5
.paragraph
Similarly, the quantities in A# are defined by:
.nofill
.nojustify
.left margin 15
 
.test page 5
(a) log-total       [T]
 
        A jk =  log  (  Xijk  +1 )
         2         10  i
 
 
.test page 7
(b) log-average     [L]
 
                        Xijk
                       i
        A jk =  log  (          + 1 )
         2         10     n
 
 
.test page 7
(c) average         [A]
 
                 Xijk
                i
        A jk =         
         2         n
 
.test page 6
(d) summed          [S] 
 
        A jk =  Xijk
         2     i
 
 
.fill
.justify
.left margin 5
.paragraph
Subsequent transformation of the A# and A# matrices, indeed all subsequent
classification processes are carried out exactly as if A# and A# were
independent 2D matrices for analysis as in part I of this manual. In particular
A# and A# are operated on as in section 2.2 to produce independent F matrices
.paragraph
It will now be obvious why the statement in the introductory paragraph in 3.4.2
is made. The log transformation of data is part of the process (at least
computationally) of reduction of 3D data to 2D data, and so merits a separate
block in the options card. Not that the effect obtained is conceptually
different than from other conditionings#-#but the allocation of a separate
block allows log transformations to be performed in conjunction with for
example standardisations.
Also it is noted that log transformation of a 2D matrix (in the part#I
sense) is achieved by treating it as a degenerate 3D matrix.
.paragraph
How to specify subsequent operations on the A# and A# matrices is a topic
discussed in section 7.0 following.
.PAGE
7.0 ^&Data input format for 3D data.\&
.paragraph
The data deck structure for 3D data is exactly the same as for
2D data shown in section 3.1. In general, the options specified for 2D data
are immediately transferable to 3D data, however some extensions are
allowable and will be discussed in the following paragraphs.
.skip 1
7.1 ^&Dimensions card\&
.paragraph
The format for the dimensions card is as follows:
.blank 10
7.2 ^&Samples cards\&
.paragraph
Once again there are two formats for data input.
.skip 1
Entity format
.paragraph
A user specifies attribute scores for each entity1-entity2 combination
as folows:
.test page 10
.blank 10
.paragraph
More than one card for a given entity pair is allowed.
.skip 1
Attribute format
.paragraph
A user specifies the scores for several entity pairs for that one attribute
on one or more cards. The form is:
.test page 10
.figure 15
.blank 10
.paragraph
Each entity pair is specified in its own 5 column block as follows:
.test page 9
.blank 9
.paragraph
More then one card per attribute is permissable.
.skip 1
7.3 ^&3D options card\&
.paragraph
The options card for 3D analysis is merely an extension of the
2D options card to allow for the extra 2D analysis required for A#.
.paragraph
It is organized into exactly the same eleven blocks as shown in 3.4, with
the exception that the log option block (cols 6-10)
is now better termed the REDUCTION BLOCK (see also 6.2)
.skip 1
7.3.1 ^&Data input block\&
.paragraph
The form of the data input block is exactly the same for 3D
analysis as it is for 2D analysis as shown in 3.4.1.
.skip 1
7.3.2 ^&Reduction block\&
.paragraph
The reduction block specifies the way in which the 3D matrix
is reduced to two 2D matrices. The options as defined previously are
.skip 1
.left margin 15
.nofill
.nojustify
T - log total
L - log average
A - average
S - summation
.left margin 5
.fill
.justify
.paragraph
The form of the block is:
.test page 11
.blank 11
.nofill
.nojustify
.test page 6
7.3.3 Transformation block
7.3.4 Power block
7.3.5 Dissimilarity block
7.3.6 Printout block
7.3.7 Sorting block
7.3.8 Dendrogram block
.fill
.justify
.paragraph
Each of these blocks in 3D analysis contain exactly the same range of options
as they did in 2D analysis (see 3.4.3 to 3.4.8 for available options).
The blocks specify options for each of the four possible analyses,
as follows:
.test page 15
.blank 15
7.3.9 ^&Two way table block\&
.break
.paragraph
The options available for two-way table printout are the same as outlined
in 3.4.9, the ordering within the block being the same as in 7.3.2 .
.skip 1
7.3.10 ^&Ordination block\&
.paragraph
The options available for ordination are the same as those outlined in 
3.4.10, the ordering being the same within the block as shown in 7.3.3-7.3.8 .
.skip 1
7.3.11 ^&Note\&
.paragraph
If a particular analysis is not required for the particular users task
the appropriate option column may be left blank where indicated in the previous
text. CLUSTR recognizes the situation where 3D data is input but only for one
of the 2D analyses by recognizing the appropriate blank column in the
reduction block.
.skip 1
7.3.12 ^&Example\&
.paragraph 
The following example is an extension of the example in 3.4.12. As well
as all the options specified there, the extra options specified
for entity-2 attribute analysis are enumerated below.
.test page 7
.blank 7
.nofill
.nojustify
.left margin 15
(i)  'average' reduction for ent2 attribute matrix
(ii) standardise for normal analysis
     no transform for inverse analysis
(iii)Canberra metric for normal analysis
     Euclidean distance squared for inverse analysis
(iv) no trellis printout
(v)  flexible sorting for both analysis
(vi) all dendrograms required
(vii)both two way tables required
(viii)  no ordination required
.fill
.justify
.left margin 5
.skip 2
7.4 ^&Missing data options\&
.paragraph
There are three types of missing data cards in 3D analysis.
They are enumerated below.
.skip 1
(a) Missing entity1-entity2 combination
.paragraph
If a given entity1-entity2 combination does not occur, i.e. value of
all attributes for that combination cannot occur, the following missing
data card is used:
.test page 10
.blank 10
(b) Missing entity-1 attribute combination
.paragraph
If for a given entity1 a given attribute is impossible, regardless
of the value of entity-2, the following form is used:
.test page 10
.blank 10
(c) Missing entity-2 attribute combination
.paragraph
If for a given entity-2, a given attribute cannot occur, regardless of
the value of entity-1, the following form is used:
.figure 10
.centre
*#*#*#*#*#*#*#*#*#*#*#*#*
.chapter
.centre
Part III
.skip 15
.centre
CLUSTR on the PDP10
.page
8.0 ^&CLUSTR on the PDP-10\&
.paragraph
This section defines the system control cards required to control
CLUSTR on the University of Queensland PDP-10 timesharing system
where it currently operates under version 6.02 of the monitor.
.skip 1
8.1 ^&The UQ Batch System\&
.paragraph
CLUSTR requires input from a card reader, and gives output to both
line printer and plotter. Intermediate storage is required on
moving head disc, on the users area.
.paragraph
The line printer, card reader and plotter on the PDP-10 are so-called
spooling devices. That is, input from, or output to the real devices
occurs well before or after the program's actual running. In the
intervening time interval data to or from these devices is stored
as files of information on moving head disc.
.paragraph
The task of providing control cards for CLUSTR therefore involves
associating the appropriate spooled disc files with the running of
the program at the appropriate time.
.paragraph
The following deck setup is typical and will be examined in some detail.
.nofill
.nojustify
.left margin 15
.skip 2
$SEQUENCE
$JOB   <user supplied parameters>
$DATA
 
 
   <  CLUSTR  data deck  >
 
 
$EOD
$TOPS10
_.NOERROR
_.R  STA:CLUSTR
_.PRINT *.LPT
_.PLOT PLT1:=*.PLT
_.DEL *.DAT
$EOJ
.fill
.justify
.LEFT MARGIN 5
.skip 2
.paragraph
The $SEQUENCE and $JOB cards identify the current users job to the
system, the job being terminated by a $EOJ card.
.paragraph
The $DATA to $EOD cards include the CLUSTR data deck and 
instruct the batch system to set up a disc file.
 This file will later be used for input to CLUSTR as if it had come
directly from the card reader. It is thus one of the spooled files.
The disc file will be placed on the disc area set aside for the
current user.
.paragraph
Having set up the appropriate input streams, the command .R STA:CLUSTR
then causes the CLUSTR to begin execution.
.paragraph
During the course of its execution CLUSTR may produce
disc files meant for the line printer and plotter, i.e. spooled output files.
These files are placed in the appropriate queues for the real devices by 
the .PRINT and .PLOT commands.
.paragraph
The .DELete command tidies the user disc area by deleting the disc
files which CLUSTR has created during the course of its running
( xxxxxx.DAT files ).
.paragraph
This sample deck setup is quite typical of most operations a user is likely
to perform. It is modifiable, and will largely depend on whether or not 
the CLUSTR data deck already exists on disc as from a previous run or is
to be read during this run using a $DATA - $EOD construction.
.skip 1
8.2 ^&The UQ Remote Terminal\&
.paragraph
CLUSTR is primarily designed for use via the PDP-10 batch stream.
To use it via a remote terminal a little knowledge of the FORTRAN
operating system is reqyired, though once again only sufficient to
organise the required input/output.
.paragraph
CLUSTR recognises when it is being run from a remote terminal, and
expects input from FORTRAN unit number 5, and does output to unit
number 6 (as opposed to 2 and 3 when running from the batch system).
A user then uses interactive commands to associate units 5 and 6 with
the appropriate (possibly spooled) peripheral devices. The following command
sequence is typical:
.left margin 15
.nofill
.nojustify
 
#.ASSIGN CDR:5
#.ASSIGN LPT:6
#    .
#    .
#.SET CDR ABC
#    .
#    .
#.R STA:CLUSTR
 
 
.fill
.justify
.left margin 5
.paragraph
The .ASSIGN commands associate units 5 and 6 with the card reader and line
printer respectively. The .R command is as previously discussed in
8.1.
.paragraph
Then a user invokes exactly the same .PRINT, .PLOT and .DEL 
commands as shown in 8.1.
.left margin 5
.paragraph
It has been assumed that a .CDR file has already been set up on user
disc area, perhaps by a previous $DECK - $EOD combination from a
previous batch run, or perhaps using EDITOR to create the file
interactively.
.paragraph
A possible variation on the above interactive procedure is the following
.skip 1
.left margin 15
.nofill
.nojustify
#.ASSIGN DSK:5
#.ASSIGN DSK:6
#.R STA:CLUSTR
.fill
.justify
.left margin 5
.paragraph
In this case the non-spooled disc device has been assigned for input and
output. By convention the disc file expected for input is the file FOR05.DAT,
and output is sent to FOR06.DAT. No .SET CDR command is required as
the card reader (spooled) has no longer been involved.
.left margin 5
.page
9.0 ^&CLUSTR system components and interactions\&
.paragraph
This section is not aimed at the average user, but is rather designed
to give an overview of CLUSTR structure to the programmer who may
be faced with the maintenance or modification of CLUSTR. A resume of each
program is given, and the way in which inter-program communication
is achieved
via disc files. Further the structure of each of the disc files is given.
The explanations are all in terms of 3D analyses, because as far as CLUSTR
is concerned ALL analyses are 3D, possibly with unit thickness in the 2D
case.
.paragraph
It is assumed that the entity1 x entity2 x attribute analysis to be performed
has dimensions n#xn#xv.
.paragraph
The diagram in fig. 9.1 summarizes CLUSTR action.
.paragraph
It may be seen that CLUSTR is really only the first of seven programs
which are run in sequence automatically. The programs communicate via disc
data files which are all binary files.
It will also be evident that if execution aborts at any stage due to
some external error(etc), executeion may be re-commenced at one of the
intervening stages.
.skip 1
9.1 ^&Program synopses\&
.skip 1
CLUSTR
.paragraph
The first program in the suite reads dimensions and options cards, de-codes
them and writes codified versions of them to disc for use by later programs.
It then reads the samples cards storing two matrices for each of the
A# and A# reduced matrices. The two matrices for each one record total
entity 1 or 2 attribute values, and the number of occurences
of each entity 1 or 2 to allow for possible missing data.
.paragraph
Throughout the listings copious references to sites, times and species
occur. CLUSTR was originally written for ecological analyses,
and later generalised. For sites, times and species, in the listings
should be read entity1, entity2 and attributes.
.paragraph
Routines LISTC and LISRAW generate data summaries.
.paragraph
Routines LOGRD1,AV1,LOGAV1,LOGRD2,AV2, and LOGAV2 then perform
the reductions using the four 'A# and A#' matrices. These are written to disc
for CONDIT.
.test page 45
.page
.skip 1
CONDIT
.paragraph
The second program is responsible for conditioning the reduced matrices
ready for dissimilarity analysis by BRACUR.
.paragraph
Four matrices are generated in the conditioning process,
corresponding to one for each of the inverse and normal analyses
for both entity1/attribute and entity2/attribute data.
.paragraph
As BRACUR does dissimilarity calculations only by column, for inverse
analysis CONDIT writes the matrix transpose to disc.
.skip 1
BRACUR
.paragraph
The third program writes four trellises ready for sorting by DENDRO.
.skip 1
DENDRO
.paragraph
This program is the later part of Prof. Burr's TAXAN2 program. The
CLUSTR suite grew from an original desire to add facilities
to this program. It was considered that better facilities could be made
available if the sorting section only of TAXAN2 was added to CLUSTR
and not vice versa. Consequently most of TAXAN2 has been discarded
and this is the only section of code remaining not written by
the author. It would, for obvious reasons gleaned from the listings,
bear replacement, and be so replaced at a later stage.
.paragraph
The basic method used is that given by Clifford and Stephenson(1),
and following the Lance and Williams approach(5).
.skip 1
DENPLT
.paragraph
The DENPLT program is the program which plots dendrograms from tha output
of the DENDRO program.
.paragraph
It is basically a recursive binary tree drawer(!), made iterative due
to the constraints of FORTRAN.
.paragraph
The ordering of leaves along the baseline of the dendrogram defines
the resorting of the original data matrix, and is written to disc
for use by the two-way table printer.
.skip 1
RESLST
.paragraph
The RESLST program prints a resorted version of the matrices contained
in AVGE.DAT, after resorting according to the ordering reflected
in RESVC.dat, written out by DENPLT.
.paragraph
RSAV.DAT is an auxiliary file used in the sorting.
.skip 1
PCA
.paragraph
PCA is the program which performs all the ordination using either
trellis diagrams (DENn.DAT) or conditioned data matrices
(MATn.DAT).
.skip 2
.left margin 10
.indent -2
General notes
.skip 1
.indent -2
1. At each stage of the classification process only those options specified
in AVGE.DAT from the options card are performed.
.skip 1
.indent -2
2. Owing to a quirk on the author's part, all arrays are
stored by row and not by the conventional FORTRAN column. This
means that the first subscript always refers to a column position
along a row, and the second refers to a row position down a column for
each element.
.paragraph
Particular note of this fact should be taken in PCA where interface
to standard SSP routines occurs.
.skip 1
.indent -2
3. Use is made throughout of FORTRAN's variable dimension facility,
and the utility routine MORCOR. For further details consult the listing of MORCOR.
.skip 1
.indent -2
4. In each case where four matrices are written out, their
names are of a form like for example MATn.DAT. The 'n' specifies
the analysis for which the data is intended, viz:
.skip 1
.nofill
.nojustify
.left margin 15
1  entity1/attribute data  normal analysis
2          "               inverse    "
3  entity2/attribute  "    normal     "
4          "          "    inverse    "
.fill
.justify
.left margin 5
.skip 2
9.2 ^&File Formats\&
.paragraph
The following is a record by record description of each binary file:
.skip 2
.nofill
.nojustify
.left margin 10
.indent -3
AVGE.DAT
.skip 1
a)  1 record  of  3 words  n# n# v
              and 13 words A5 title
b)  1 record  of  36 words coded options
c)  v records of  n# words A# matrix
d)  v records of  n# words A# matrix
 
             (TAXAN dummy AVGE.DAT does not include c and d)
.skip 2
.indent -3
MATn.DAT  (e.g. MAT1.DAT )
.skip 1
a) 1 record  of 2 words ( number rows, number columns)
                          e.g. v n#
b) conditioned matrix e.g. v records of n# words
.skip 2
.indent -3
DENn.DAT   (e.g. DEN1.DAT)
.skip 1
a) 1 record of 1 word - size of (square) matrix
b) 1 record of 21 words - title and subtitle
c) lower triangle of the square matrix
         e.g. n#-1 records of 1,2,3,4...n#-1 words each
.skip 2
.indent -3
DENIn.DAT  (e.g.  DENI1.DAT)
.skip 1
a) 1 record of 1 word - no. of entities on dendrogram baseline
b) 1 record of 21 words - title subtitle for dendrogram
c) fusions e.g. n#-1 records with four words a,b,c,d.
     these indicate entities a and b fuse to give c at level d
.skip 2
.indent -3
RESVC.DAT
.skip 1
a) n# records of 1 word each giving order along baseline
      of entity1 from dendrogram 1.
b) v  records similarly from dendrogram 2
c) n#    "        "       "       "     3
d) v     "        "       "       "     4
.skip 2
.indent -3
RSAV.DAT  - resorted conditioned matrix
.skip 1
a) v records of n# entities
b) v records of n# entities