PDP-10 Archive: swskit-documentation/cfs-info.memos from QT020_T20_4.1_6.1_SWSKIT

Trailing-Edge - PDP-10 Archives - QT020_T20_4.1_6.1_SWSKIT_851021 - swskit-documentation/cfs-info.memos

There is 1 other file named cfs-info.memos in the archive. Click here to see a list.

MEMOS IN THIS FILE:


   o	CFS Functional Specification
   o	Global CFS Job Numbers Specification
   o	Structure Handling Modifications for CFS
   o	Dual-ported Disk Handling with CFS
   o	CFS Processing Description
   o	CFS Error Recovery Processing
   o	CFS Resource Handling
   o	CFS Changes to TOPS-20 to Support 6.1

-------------------




                Functional Specification for


                 TOPS20 COMMON FILE SYSTEM

                                                      Page 2


1.0  PRODUCT OVERVIEW

1.1  Product Description

   This project is to develop a  "Common  File  System"  for
TOPS20.   The Common File System capabilities are applicable
to configurations of two or  more  36-bit  processors,  each
with its own main memory, interconnected by a high speed bus
("CI").  The objective of the Common File System ("CFS")  is
that  disk  structures  and  files  within such a system are
available to jobs  on  all  processors,  regardless  of  the
physical connections of the disk devices.



1.1.1  Architectural Position -

   CFS is  a  component  of  the  "loosely-coupled  systems"
architecture  and is the first piece of that architecture to
be implemented.  Some of the other components are:

     1.  DECnet/CI

     2.  CI-wide IPCF

     3.  CI-wide ENQ/DEQ

     4.  CI-wide GALAXY

As can be seen, the ultimate LCS product  is  an  extensible
multi-processor  system.   CFS  is  being  implemented first
because it is the most visible of the pieces and because  it
provides  a  useful  extension  to  TOPS-20 even without the
other LCS components.



1.1.2  Reltationship To Other CI Products -

   CFS is indpendent of the other high-level  CI  protocols.
That  is,  CFS  can  exist on a system that does not support
MSCP.  All that is required is  the  SCA  layer  of  the  CI
protocol.   In  the  following  sections, mention is made of
MSCP, the MSCP server and other CI  applications  protocols.
These references are provided to explain the relationship of
CFS to the other committed  CI  products,  but  CFS  remains
distinct  and  independent  of them.  The specifics of these
other protocols, and any limitations  or  restrictions,  are
described in other documents.

                                                      Page 3


1.2  Markets

   The Common File System  is  a  general  operating  system
capability  and  is  applicable  to all present DECSYSTEM-20
markets.



1.3  Competitive Analysis

   This  project  provides  more  and  larger  configuration
alternatives than previously available.  This project is not
closely related to "distributed processing" in  that  it  is
only  applicable  to  configurations  on  a CI and therefore
within the 100 meter limit of the CI.

   VAX/VMS is developing a means for multiple processors  to
reference files on a single disk system;  however, the basic
difference in filesystem architecture between TOPS20 and VMS
makes the projects somewhat different.

   Related capabilities include  "Network  File  Access"  or
other  techniques for moving files among nodes of a network.
CFS is a more powerful and transparent form of  file  access
because it implements all monitor file primitives visible to
the user program and operates over a high speed bus.

   "Multiple Processors" (implying  shared  memory  as  with
TOPS10 SMP) is a related capability.  SMP is a more powerful
approach to the  use  of  multiple  processors  in  that  it
provides   greater  transparency  and  better  dynamic  load
leveling.  There are compensating advantages of CFS over SMP
in  the  area  of failsoft and isolation of failures, and in
the maximum size of configurations which can be supported.



1.4  Product Audience

   The principal customer for CFS is one who now has, or who
needs, multiple KL-10 processors and wishes to run them as a
single system.  Since DEC is not offering a follow-on PDP-10
processor,  most,  if not all, of the LCG customers fit this
description.

   CFS-20 is meant as a complement to DECnet  services,  and
in  some  cases  sites  with  multiple  KL10's may find that
sharing via DECnet is adequate.

                                                      Page 4


2.0  PRODUCT GOALS

2.1  Performance

   1.  All unprivileged  monitor  calls  which  affect  disk
files  on present one-processor TOPS20 systems will work and
will have the intended effect on any disk  structure  within
the configuration.

   2.  The overhead associated with maintaining  the  common
file data base on multiple processors will cause an increase
of not more than 10% in execution time  of  file  primitives
and operations.

   3.  A processor referencing files on a disk not  directly
connected  will incur no additional overhead in transferring
data.

   We expect to use MSCP (Mass Storage Control Protocol) for
the  data  transfers to support file operations over the CI.
This will  exist  on  the  CI  along  with  other  protocols
supporting  other  functions.  MSCP should achieve efficient
use of the CI, low-overhead operation of the  monitors,  and
high-bandwidth file interchange.  File structure information
such as directories and index blocks is  passed  exactly  as
read  from  disk.   By passing TOPS20 file data directly, we
avoid the overhead of copying and conversion  incurred  with
other protocols.

   However, a processor acting as a file server for  another
processor will incur overhead for this activity not relating
to jobs running on it.  This overhead will involve primarily
instructions  executed  at  interrupt  level and main memory
space to buffer data being transferred.

   CFS supports shared-writable  pages  (simultaneous  write
file  access)  on  multiple  processors.   This  is used for
various internal mechanisms (e.g.   directory  lookup,  disk
allocation  tables) as well as user program functions.  This
type of  access  generates  IO  activity  and  overhead  not
present  on  single-processor  systems.  Because data cannot
actually be referenced simultaneously by two processors,  it
must  be  moved from one to another by the operating system.
Users  will  be  advised  of   this   and   should   arrange
applications so as to avoid frequent write references to the
same data from different processors.

   Since the monitor itself uses this facility, we conducted
a  study  of  monitor reference patterns to ensure that this
activity will not be a significant bottleneck.  We  recorded
monitor   reference   patterns   to   directories  and  disk
allocation tables under actual and  simulated  loads.   This
was  done  by  using the SNOOP facility to detect and record
references where the job making the reference  is  different
from  the job which made the most recent previous reference.

                                                      Page 5


This provides worst-case data on  the  frequency  of  moving
monitor  data between processors.  We determined that only a
few (3 or less)  directories  were  referenced  sufficiently
often  to  be  of  interest.   These  were all common system
directories (e.g.  <SUBSYS>), and the frequency was  not  so
high  as  to  suggest  a  problem.   This  small  additional
overhead is greatly outweighed by the disk space savings  of
not  having to duplicate the SYS:  files for each of the CFS
processors.



2.2  Environments

   Minimum configuration requires two processors and  a  CI.
Each  processor  must  have  a  connection  to the CI.  (The
question has been raised as to whether CFS might be operable
over  an NI connection.  This will not be supported in first
release.  There should be no logical reason that NI couldn't
be used, but additional study and experience is necessary to
understand   the   performance   implications.    Additional
implementation work would also be needed.)

   Each processor must have its own  main  memory,  swapping
device, boot device, and console.

   Each processor must have direct access to its public disk
structure.   In  a  future  release,  it  may be possible to
eliminate  this  requirement.   However,  there  are   other
requirements  for a directly connected disk (e.g.  swapping)
which will also have to be addressed.

   The maximum configuration for first release of CFS is two
processors;  however there shall be no CFS-specific software
limitation on a larger number.  This limit is based  on  our
current  knowledge of the CI and the lack of experience with
this architecture.  The practical limit may  be  higher.   A
maximum of one CI will be supported for first release.



2.3  Reliability Goals

   1.  A customer should  be  able  to  improve  net  system
availability   of  his  configuration  by  use  of  multiple
processors and the CFS.

   2.  The CFS should cause no significant decrease  in  the
reliability of each single processor.

   3.  Failure of one processor will have no effect on other
processors  except  for  file data which is in the memory of
the failing processor.

                                                      Page 6


2.4  Non Goals

   1.  CFS will support only disks.

   2.  CFS is not intended to work  with  operating  systems
other  than  TOPS20 or with machine architectures other than
36-bit.

   3.  CFS does not provide any automatic balancing  of  job
load  among  processors  in  a  configuration  as  does SMP.
However, users should find it convenient  to  login  to  the
less  loaded  processor  and/or  to  switch  processors  (by
logging  out  and  back  in  again)  if  the  load   becomes
unbalanced.

   4.  Applications that rely on ENQ/DEQ and OF%DUD are  not
supported by CFS.

   5.  IPCF  applications  will  not  communicate  over  CFS
processors.



3.0  FUNCTIONAL DEFINITION

3.1  Operational Description

   Two  or  more  processors  are   interconnected   via   a
high-speed bus ("CI") having a bandwidth at least comparable
to disk transfer rates.  A disk which is to  be  used  by  a
processor  must  have  a direct path to that processor;  the
disk must be either on  the  CI,  on  a  directly  connected
MASSBUS,  or  attached  to  another  KL-10  running  an MSCP
server.

                                                      Page 7



               ------------       --------------
               !   HSC    !       !   HSC      !
               !   DISK   !       !   DISK     !
               !          !       !            !
               ------------       --------------
                    !!                  !!
                    !!                  !!
     CI =====================================/ / / =============
           !!                 !!                     !!
           !!                 !!                     !!
      ------------        --------------         -------------
      !   KL10   !        !   KL10     !         !  KL10     !
      !  CPU&MEM !        !  CPU&MEM   !         ! CPU&MEM   !
      ------------        --------------         -------------
             !!
             !! MASSBUS
         -------------
         !           !
         ! MASSBUS   !
         !      DISKS!
         -------------



   One or more logical structures exist on the set of disks.
All  of  these  structures are visible to jobs on all of the
processors  unless  the  system  administrator  specifically
declares   particular   structures   as   "exclusive"  to  a
particular processor.

   In order to provide access to Massbus disks connected  to
a  KL10,  the  KL10 will act as a logical disk controller on
the  CI  for  the  Massbus  disks.   There  is  no   visible
distinction between a disk structure directly connected to a
processor and one which is accessed via  another  processor.
The  usual  monitor  calls  are  used  to  access  files and
structures, and all file open modes  are  allowed  with  the
exceptions  listed  below.  Shared file access is permitted,
and programs need not be aware that  other  jobs  sharing  a
file  are  on  different  processors;   however,  it  may be
advisable for reasons of efficiency  to  avoid  simultaneous
modification of a file on different processors.

   File facilities specifically include:

        1.  File naming and  lookup  conventions  (GTJFN)  -
     File names on the common file system include structure,
     directory and  subdirectories,  file  name,  extension,
     generation  number,  and  attributes.  Full recognition
     and wild-carding is available;  name stepping  (GNJFN);
     normal access to FDB.

                                                      Page 8


        2.   Usual  open  and  close  modes  (OPENF,  CLOSF,
     CLZFF).

        3.  Usual data transfer primitives, both  sequential
     and  random  (BIN,  BOUT,  SIN, SOUT, SINR, SOUTR, RIN,
     ROUT, DUMPI, DUMPO).

        4.  File-to-process  mapping  (PMAP)  including  all
     modes   (shared   read,  copy-on-write,  shared  write,
     unrestricted read).

        5.  The device type associated  with  files  on  the
     common  file  system is the same as that presently used
     for disk.

        6.  Privileged operations MSTR (mount structure) and
     DSKOP.

   The above includes all file system primitives relating to
accessing  files  and transferring data but does not include
other primitives which may use certain file system  entities
but  which  are  considered separate and distinct facilities
(e.g.  ENQ/DEQ).



3.2  Restrictions

   A file open  with  OF%DUD  (don't  update  disk)  on  one
processor  may  not  be opened on any other processor.  This
results from the fact that processors  share  file  data  by
writing any changed files to the disk before passing control
to another processor.  Since OF%DUD implies  that  the  disk
copy  of  a  file  may not be changed until the user process
approves the change, OF%DUD cannot be supported with CFS.

   Other devices such as magtapes and line printers are  not
part  of  CFS and may not be open simultaneously on multiple
processors.

   Use of simultaneous write access with active  writing  of
file  data  by  jobs  on  different  processors requires the
system to move pages among the processors and hence will  be
much  slower than on a single processor.  The write token is
maintained on a per-OFN basis.  This means  that  a  program
requiring  write  access  to any one or more pages must have
exclusive access to the entire  OFN.   Each  OFN  represents
256K  words  of  the  file.   For  large  files, programs on
different processors could be executing  simultaneous  write
references  with  no  delay if they were referencing data in
different 256K sections of the file.

   A structure must be "mounted" on any processor  which  is
to  access  files  on  it.   To be physically removed from a
drive, a structure must be  dismounted  by  all  processors.

                                                      Page 9


The relevant Galaxy components should be modified to provide
mount information from processors  other  than  the  one  on
which  they  are running, but this in not planned for FCS of
CFS.  Hence an operator will have to query the  OPR  program
on  each processor to find out what users have the structure
mounted.  Each processor will  know,  however,  which  other
processors  have  the structure mounted so that the operator
can quickly determine if the structure can be removed.

   Finally,  it  is  not  possible  for  two  or  more   CFS
processors  to  establish an ENQ resource for the same file.
This restriction of ENQ is made to prevent malfunctioning of
programs  that  rely  on ENQ as a file semaphore and will be
removed once the LCS-wide ENQ/DEQ facility is provided.



4.0  COMPATIBILITY

4.1  DEC Products

   All program  and  user  interfaces  are  compatible  with
previous versions of TOPS20.

   Mountable disk structures are  compatible  with  previous
versions of TOPS20.



4.2  DEC Standards

   The CFS will use the corporate SCA protocol on the CI bus
and will use a private SYSAP-level protocol.

   The CFS will not use DECNET.



4.3  External Standards

   None applicable.

                                                     Page 10


5.0  EXTERNAL INTERACTIONS AND IMPACT

5.1  Users

   All users of the disk file system are potential users  of
CFS;  however most users will not be aware of or affected by
CFS.  Some applications developers will rely on CFS to allow
applications to exist on multiple processors and communicate
through files.



5.2  Products That Use This Product

   The following may use CFS:  RMS, DIF, language OTS's.



5.3  Products That This Product Uses

   The following hardware components are required:

        KLIPA (CI20) - Interface between KL10 and CI bus.

        KL10 Microcode -  modifications  to  support  "write
     access in CST".

   The following software modules are required:

        KLIPA driver

     and time.  Systems Communications Services (SCA/SCS).
   The following are optional MSCP driver
     MSCP server



5.4  Other Systems (Networks)

   The CFS is not visible to other network hosts;  the files
in the CFS disk structures may be accessible by remote nodes
as provided  by  other  facilities  (DAP,  NFT,  etc.)  Each
processor  in a CFS configuration is a separate network node
with its own node name.

   The CFS itself does not use node names to reference files
and  hence is independent of any constraints or requirements
of network node naming.

                                                     Page 11


5.5  System Date And Time

   The CFS systems guarantee that they all use the same date
and  time.   This  requirement insures that files written on
one of the processors will have a  creation  date  and  time
consistent with the other CFS processors.  If the processors
were allowed to have different date and time values, many of
the file-oriented utilities would malfunction.

   This is accomplished by having the  systems  inform  each
other  whenever the local date and time is changed.  Also, a
newly loaded system will use the date and time  provided  by
the  other  CFS  systems.  This last item implies that a CFS
system loaded while at least one other CFS system is already
running  will  not  have to prompt the operator for the date
and time.  In order to make the  start-up  dialog  seem  the
same, the system will type:


               The date and time is:  xxxxxxxxxx


where it now prompts for  the  date  and  time.   This  also
serves as a check on the date and time.



5.6  Job Numbers

   The CFS systems must use a mutually exclusive set of  job
numbers.   This  is  because many system utilities, and user
programs, include the job number in the  name  of  "session"
files  and  other per job data to avoid conflicts among jobs
using the same file directories.   CFS  systems,  therefore,
will  acquire  a  set  of global job numbers to use and will
insure that no other CFS system uses  those  numbers.   This
implies  that  the  user-visible  job  number,  as seen in a
SYSTAT command, may not correspond to the monitor's internal
representation for that job.  However all JSYSes that either
provide or accept a job number will be modified  to  account
for the the new gloabl job numbers.



5.7  Interprocessor Communications

   CFS  provides  only  sharing  of  files.   Without   some
ancillary capability, such as DECnet, processes on different
CFS processors have now way of exchanging "events"  such  as
interrupts.   Processes  on the same processor have a choice
of several IPC mechanisms, such as

     1.  DECnet

                                                     Page 12


     2.  IPCF

     3.  ENQ/DEQ

     4.  THIBR/TWAKE

All of these provide inter-process events (viz.  interrupts)
and  may  also  "carry"  some  amount of data (e.g.  an IPCF
message).  However, CFS provides a data carrying  mechanism,
namely  shared  files,  but  it  provides no intrinsic event
generator.

   CI/DECnet is the ideal mechanism for a CFS IPC.  However,
CI/DECnet  may  not  be  available with the first release of
CFS.  Therefore, there will be no reliable IPC  for  use  by
"distributed" applications.

   It is possible, however, to implement THIBR/TWAKE  across
CFS  processors.   This  is  true because CFS will guarantee
that the job numbers used  by  the  various  processors  are
mutually exclusive of one another.

   Presently, there is no commitment to provide  a  CFS-wide
TWAKE  (THIBR  needs  no  changes), but the work required is
modest.



5.8  Data Storage, File/Data Formats, And Retrieval

   The CFS requires an open file data base which is resident
in each processor of a configuration.  2-4 words per OFN are
required.  Other resident storage requirements are one  page
(512  words)  or  less.   As  a  side effect of allowing all
processors access to  all  mounted  structures,  it  may  be
desirable to build standard monitors with a larger number of
mountable structures than at present.

   The  file  structure  will  be  identical  with  previous
releases of TOPS20.

   Files may be  saved  and  restored  with  DUMPER  without
regard  to  which  processor  DUMPER  is run on, except that
DUMPER must be running on the  processor  which  has  direct
connection to the required tape drive.



5.9  File And Data Location

   CFS is unaware of the physical  location  of  file  data.
That is, a shared file may be located on a CI disk, a shared
Massbus disk or on a disk accessed by the MSCP server.

                                                     Page 13


   This latter case, that of the MSCP server, should be used
only  when  absolutely required.  That is, if the file could
be located on a CI disk or a shared Massbus disk, it  should
be.   Files  accessed  through  the  MSCP  server  impose  a
significant burden on the processor running the server,  and
if  such  files are accessed frequently, the result may well
be unacceptable.  Clearly, files that  must  reside  on  PS:
structures  and  must  also  be  shared  may  by shared only
through an MSCP server.  However, such files should  not  be
frequently accessed by other processors.

   It is, for example, entirely inappropriate to  place  all
of  the  SYS:   files for all of the CFS processors on disks
that must be accessed be the MSCP server.



5.10  Protocols

   CFS will use the corporate SCA protocol on the CI bus.

   CFS will use a  private  protocol  for  control  of  file
openings,  structure  mounts,  file  state transitions, etc.
There is no present corporate protocol which supports  these
functions.

   The CFS protocol uses only  SCA  messages.   The  general
format of a CFS message is:

DEFSTR  CFUNQ,SCALEN,35,18      ;NUMBER OF THIS VOTE OR REQ UNIQUE CODE
DEFSTR  CFCOD,SCALEN,17,6       ;OPCODE FOR VOTING
        .CFVOT==1               ;VOTER
        .CFREP==2               ;REPLY TO VOTE
        .CFRFR==3               ;RESOURCE FREED
        .CFCEZ==4               ;SEIZE RESOURCE
        .CFBOW==5               ;Broadcast OFN change
        .CFBEF==6               ;Broadcast EOF
DEFSTR  CFFLG,SCALEN,11,12      ;Flags
 DEFSTR CFODA,SCALEN,0,1        ;Opt data present
 DEFSTR CFVUC,SCALEN,1,1        ;Vote to include HSHCOD
CFROT==SCALEN+1                 ;ROOT CODE FOR THIS VOTE
CFQAL==SCALEN+2                 ;QUALIFIER CODE FOR THIS VOTE
CFTYP==SCALEN+3                 ;Vote reply or request type
CFDAT==SCALEN+4                 ;Optional data, it present
CFDT1==SCALEN+5                 ; second word of optional data
CFDST0==CFDT1+1                 ;STR free count in bit table
CFDST1==CFDST0+1                ;Transaction count of CFDST0

   This format is used to both request CFS resources and  to
reply to resource requests.

   The  SYSAP  name  for  CFS  is:   LCS20$CFS.   This  name
uniquely  identifies the TOPS-20 CFS SYSAP for a homogeneous
CI environment.  Since there is no central registry of SYSAP
names,  configuring  a  CI  with other processor types (e.g.

                                                     Page 14


VAX) may result in confusion of names and protocols.



5.11  Protocol Operation

   The CFS protocol is a "veto"  protocol.   That  is,  each
request  must be approved by all of the CFS processors or it
is disallowed.  Therefore, a single dissenting processor  is
sufficient to refuse a request.

   Each processor is required to remember only the resources
it  owns.  Therefore, when it "votes", it expresses only the
relationship  of  the  request   to   its   own   resources.
Consequently, each processor must be polled every time a CFS
resource change is to occur.



5.12  Modifications To MSTR

   The MSTR JSYS has been modified to allow structures to be
declared  to be "shared" or "exclusive".  A shared structure
may be mounted by other CFS processors, whereas an exclusive
structure may be mounted only on this processor.

   The  structure  status  bit,  MS%EXL  declares   that   a
structure  is  to be mounted exlusively and is returned with
the appropriate value with the structure status.

   Also, there is a new MSTR function, .MSCSM, that  changes
the  shared/exclusive attribute of a mounted structure.  The
calling sequence is:
                        MSTR

AC1: -2,,.MSCSM
AC2: ADDR

ADDR:   device designator
ADDR+1: new attribute




5.13  CFS Components

   CFS  is  implemented  throughout  the  TOPS-20   monitor.
However,  the code specific to the CFS protocol is contained
in the module CFSSRV.  CFSSRV is the CFS SYSAP as well as  a
collection  of  routines  to  interface  to  the preexisting
TOPS-20 services.  CFSSRV uses the following SCA call backs:

     1.

                                                     Page 15


            .SSMGR message received

     2.  

            .SSPBC port broke connection

     3.  

            .SSCTL connect to listen

     4.  

            .SSCRA connect response available

     5.  

            .SSMSC message/datagram send complete

     6.  

            .SSNCO node on-line

     7.  

            .SSNWO node off-line

     8.  

            .SSOSD OK to send data

     9.  

            .SSRID remote initiated disconnect

    10.  

            .SSCIA credit available


   In addition, CFS uses the following SCAMPI routines:

     1.  SC.SOA

     2.  SC.RCD

     3.  SC.CON

     4.  SC.DIS

     5.  SC.SMG

     6.  SC.RMG

                                                     Page 16


     7.  SC.LIS

     8.  SC.REJ

     9.  SC.ACC


   SCAMPI must reliably inform CFS of any  CI  configuration
changes,  including newly established or failed port-to-port
VCs.

   The remaining CFS  code  is  found  in-line  as  part  of
existing TOPS-20 file system services.

   CFS uses only SCA messages.



5.14  Significant Data Structures

   The  advent  of  CFS  creates  the  following  new   data
structures and conventions:

     1.  A new per-OFN word, SPTO2

     2.  directory locks are now CFS resource blocks

     3.  directory allocation entries are now  CFS  resource
         blocks

     4.  frozen write file openings create two resources

     5.  other file openings create only one resource

     6.  each OFN has a CFS "access token" as a CFS resource

     7.  each mounted structure creates two CFS resources

     8.  BAT block locks are CFS resource blocks


   Note that in some cases the  CFS  resource  replaces  the
existing lock, viz.  directory locks, and in other cases the
CFS resource exists as a "copy"  of  the  information,  viz.
structure  mounts,  so  that  the  CFS  protocol service can
manage the resource locally.   In  principle,  there  is  no
difference  between  these  kinds of resources, and only the
higher-level monitor code  that  creates  the  CFS  resource
knows which kind each is.

   The bundled CFS, that is the release  6  monitor  without
CFS  support, still uses CFS to manage the "changed" monitor
resources.   However,  in  many  cases,  as  with  the  file
resources,  the  CFS  resource  is  not created as it is not
needed for any internal monitor coordination.

                                                     Page 17


5.15  Interfaces To CFSSRV

   CFSSRV  contains  a  number  of  jacket   routines   that
interface  between  the  TOPS-20  file system and the CFSSRV
resource manager.  The sigificant interface routines are:

                        CFSAWT/CFSAWP

        T1/ OFN
        T2/ access needed

Returns:        +1 always

Called to manage the access token

                        CFSLDR/CFSRDR

        T1/ Structure number
        T2/ directory number

Returns:        +1 always

Lock/unlock directory

                        CFSSMT

        T1/ Structure number
        T2/ access needed

Returns:        +1 failed. Access invalid
                +2 success

Mount strcuture

                        CFSSDM

        T1/ Structure number

Returns:        +1 always

Dismount structure

                        CFSSUG

        T1/ Structure number
        T2/ access

Returns:        +1 can't change access
                +2 success

Change structure access

                        CFSGFA

                T1/ Structure number

                                                     Page 18


                T2/ XB address
                T3/ Access type

Returns:        +1 access conflicts with other system(s)
                +2 success

Acquire file open locks

                        CFSFFL

                T1/ Structure number
                T2/ XB address

Returns:        +1 always

Delete file open resources

                        CFSFWL

                T1/ Structure number
                T2/ XB address

Returns:        +1 always

Free frozen write resource

                        CFSGWL

                T1/ Structure number
                T2/ XB address

Returns:        +1 conflict with other CFS system
                +2 success

Acquire frozen writer resource


   As these jacket routines are really an integral  part  of
the file system, the interfaces to these routines are really
internal  file   system   conventions   and   not   external
interfaces.   Therefore,  the detail of how these interfaces
work is beyond the scope of this functional document.



5.16  PHYSIO Services Required

   CFSSRV requires a  routine  in  PHYSIO  to  request  that
dual-ported  disks  not  be accessed by this processor.  The
call is:

   CALL PHYMPR

                                                     Page 19


   Returns:     +1

   Also, it requires a  routine  to  cancel  the  action  of
PHYMPR:

   CALL PHYUPR

   Returns:     +1

   In addition to these, CFS requires that  PHYSIO  and  its
lower  level drivers correctly support access to dual-ported
Massbus disks.  In particular, work  must  be  completed  in
managing  dual-ported disks and in insuring that the port is
released at the proper times.



6.0  RELIABILITY/AVAILABILITY/SERVICEABILITY (RAS)

6.1  Failures Within The Product

   Failures  within  the  CFS-specific  software  will  most
likely  cause  a crash of one processor in a multi-processor
environment.  Such failures may  include  loss  of  recently
modified file data.  Failures which affect inactive files or
file  directories  are  possible,  but  should  be  no  more
frequent than at present.



6.2  Failure Of A CFS Processor

   Operation of CFS should permit crash of one processor for
any   reason   without  loss  of  other  processors  in  the
configuration.  CFS relies on SCAMPI to detect  a  processor
failure  and  consequently the CFS protocol has no mecahnism
for  idle  polling.   If  a  processor  fails,   the   other
processors  will  be unaffected, except that the CFS code on
each of the  surviving  processors  must  "renegotiate"  any
outstanding requests for file accesses.

   A processor may be brought  on  line  without  restarting
other processors in the configuration.

   Any disks which are available only via a failed processor
will   be   unavailable   so   long  as  that  processor  is
inoperative.  If such disks are dual-ported to  a  different
processor, they may be mounted via that processor and remain
in use although all open files must be re-opened.

   With HSC50, most disk errors will  not  be  seen  by  the
processor(s).   All  recovery and logging will be handled by
the HSC50.   Any  disk  errors  that  are  reported  to  the
processor  will  be logged in the system error file for that
processor.  Disk errors occurring on pages  that  are  being

                                                     Page 20


"passed  through"  a  processor  (e.g.   a  KL10 servicing a
request for a Massbus disk) will be logged on the  processor
to  which the disk is directly connected.  If a hard failure
occurs such  that  the  server  processor  must  inform  the
requesting   processor   that   the  request  could  not  be
completed, then the requesting processor will also  log  the
failure.



6.3  CI Failures

   Should the CI fail, or should a processor's  KLIPA  fail,
the  CFS processors must insure that data on shared disks is
not corrupted.  This is accomplished as follows.

   If a processor detects it is no longer connected  to  the
CI  it  must refrain from referencing any sharable disks.  A
sharable disk is  any  HSC-based  disk  or  any  dual-ported
MASSBUS  disk.  A processor is considered no longer attached
to the CI if it cannot send a "loopback" message to itself.

   Should a processor's KLIPA fail, and  then  be  restarted
(e.g.   by  reloading the microcode), the processor will not
be  able  to  continue  running  if  there  are  other   CFS
processors  on  the CI.  This prohibition avoids the problem
of the system rejoining the CFS network  having  stale  data
about  the  CFS  resources,  or having data about a previous
incarnation of the network.

   Should a processor be "cut off" from the CI indefinitely,
it  may  continue  running  but without being able to access
sharable disks.



6.4  Testing For Errors

   CFS will be run in the DVT environment so that it may  be
evaluated with regard to faults.

   Many of the "normal" CFS errors  may  be  tested  without
explicit  fault  insertion.  The following simple procedures
tests much of the CFS error recovery code

     1.  halt one of the CFS processors

     2.  bring up a new CFS processor

     3.  reload one of the KLIPAs

                                                     Page 21


7.0  PACKAGING AND SYSTEM GENERATION

7.1  Distribution Media

   CFS is an unbundled product.  Each monitor has  the  bulk
of  the  CFS  support,  but  non-CFS  monitors  have a dummy
version of the  CFSSRV  protocol  module.   CFS  sites  will
receive a separate tape containing the proper CFSSRV.



7.2  Sysgen Procedures

   Only CFSSRV  differs  from  a  bundled  to  an  unbundled
monitor.   The  SYSFLG  switch, CFSSCA specifies the type of
monitor.



7.3  Bundled And Unbundled CFS Systems

   There is no protection  in  the  monitor  for  running  a
bundled  and  an  unbundled monitor on the same CI.  This is
particularly  important  for  F-S  procedures  as  the  KLAD
monitor  may  not be compatible with the system environment.
Running a mixed configuration is potentially catastrophic as
file structures may be destroyed.



8.0  REFERENCES

   1.  Functional Specification for Loosely Coupled  Systems
(LCS) - Fred Engel, 30 April 1980

   2.  LCG CI Port  Architecture  Specification,  11-July-83
(Keenan)

   3.  LCS and the Common File System (Memo) -  Dan  Murphy,
15 Jan 1980

   4.  CFSDOC (memo) - Arnold Miller




                  CFS Global Job Numbering

                  Functional Specification





1.0  INTRODUCTION

     The development of Common File System (CFS) on  TOPS-20
produces  a unique problem, requiring changes to the way job
numbers  on  TOPS-20  are  handled.   Many  applications  on
TOPS-20  systems  assume  that their job number represents a
unique identifier for their job, and use that number in  the
process of synthesizing temporary file names for work space.
TOPS-20/CFS presents the possibility of  two  or  more  jobs
from  different  systems  accessing the same user directory,
and hence those jobs are required to have unique job numbers
throughout the CFS network.



2.0  BACKGROUND

     Applications commonly require temporary files in  which
to  store  interim  data  to  be  read,  written, sorted, or
otherwise manipulated.  Programs such  as  MACRO,  and  LINK
commonly  create  temporary files when the volume of data to
be processed (in this example,  a  program  being  assembled
and/or  linked)  exceeds  the  capacity  of  virtual memory.
These applications must give names to their temporary  files
that  make  them  unique  in  their file system environment,
since it is always possible to have multiple copies  of  the
same  application  accessing  the  same user directory.  The
solution to this identity crisis has always been to use  the
local  job  number as part of the text of the temporary file
name itself.

     In a CFS environment,  not  only  is  it  possible  for
multiple  users  on  a  single processor to be accessing the
same user directory, it is also possible for multiple  users
on  separate processors to do so.  Because of this, a method
was developed to provide every job in a CFS environment with
a  job  number  that uniquely identifies that job throughout
the CFS environment.  This in turn, changes the way  TOPS-20
treats job identifiers internally.

     Job numbers on TOPS-20 systems have actually been  used
for  two purposes, which up until Release 6 were combined in
a single identifier - the 'job number'.  The job number  was
both  a  unique identifier for each job, as well as an index
into various internal monitor tables containing  information
about that job.  With TOPS-20 Release 6, these two functions
have been split.  A 'Global Job Number' acts as  the  unique

                                                      Page 2


identifier  for  each  job  on  all  of the systems of a CFS
environment.  A 'Local Job Index'  is  used  by  TOPS-20  to
access  monitor  job  tables for information about any given
job on the local  processor.   Global  Job  Numbers  can  be
translated to and from the local job index for any given job
on the system by  calling  CFS  routines  described  in  the
following section.

     To avoid further conflict of terminology, this document
will  always  refer  to a Global Job Number as simply a 'job
number', and to a Local Job Index as a 'job index'.



3.0  NEW FUNCTIONALITY

     There are four basic functions  required  to  implement
Global   Job  Numbers  in  TOPS-20.   Initialization,  which
includes allocation of 'blocks' of Global Job  Numbers  from
the  CFS  pool;  assignment, which allocates a single Global
Job number to a given local  job  index;   deassignment,  to
deallocate a Global job number of a job that is logging out;
and translation, to provide the monitor with access  to  the
Global job number of given local index, and vice versa.

     One side effect of these changes has been to invalidate
all  comparisons  of  a  user-specified  job number with the
highest  valid  job  number  on  a  given  system,  a  value
represented  by  the  symbol  NJOBS.   This  symbol does not
represent the highest Global Job Number possible, but merely
the size of the local monitor's job tables.  The translation
routines, then, should also  provide  validation  of  a  job
number  or  job index being translated, and error returns to
signify when invalid data was provided, or when a given  job
number or index simply doesn't exist.



4.0  NEW DATA STRUCTURES

     All of the new data structures for implementing  Global
Job Numbers can be found in STG.  These include:

     1.  JOBGLB - A table of twelve-bit entries  indexed  by
         Global  job  numbers  used to look up the local job
         index for each global job.  This table  resides  in
         the resident monitor.

     2.  GLBJOB - A 12-bit byte pointer that  ALWAYS  points
         to the beginning of JOBGLB.  This is used via ADJBP
         n,GLBJOB, where n contains the global  job  number,
         to look up the local job index.

                                                      Page 3


     3.  JOBMBT - A bit table of MXGLBS bits,  one  bit  for
         each  possible  global job number.  Bits set to one
         indicate available global job slots.  This table is
         in the resident monitor.

     4.  GBLJNO - a new JSB location  used  to  contain  the
         Global  job  number for the current job.  JOBNO (in
         the PSB) contains  the  local  job  index  for  the
         current job.




5.0  ROUTINES AFFECTED

     There are currently several modules  affected  by  this
new   functionality.   CFSSRV  contains  the  following  new
routines to  support  the  four  major  functions  described
above:

     1.  CFGTJB - to negotiate with the  other  CFS  systems
         for  blocks  of  unique job numbers, and initialize
         the Global Job Number data structures.

     2.  JBGET1 - given a local job index, this will  return
         a  Global  Job Number assigned to that index.  This
         routine will only return an error when there are no
         more available global job numbers.

     3.  JBAVAL - the antecedent  to  JBGET1,  this  routine
         releases  a global job number when a job is logging
         out.

     4.  GL2LCL, and LCL2GL -  to  translate  a  Global  Job
         Number  into  a  local  job  index (GL2LCL), and to
         translate a local index into a  Global  Job  Number
         (LCL2GL).   These  routines will return error codes
         to indicate if the caller has specified an  invalid
         or a non-existent job number or index.


     In addition, the following modules require  changes  to
support  the  new  discrepency  between  job numbers and job
indices:


     1.  APRSRV

         In BUGH5, Print Global Job number,  not  local  Job
         index (JOBNO)

     2.  DIRECT

         In  DELTS1,  check  temp  file  generation   number
         against  global  job number, instead of local index

                                                      Page 4


         (JOBNO)

     3.  ENQ

         In  ENQOK,  translate  user-specified  Global   Job
         Number to local index

         In VALRQ1, use GBLJNO, instead of JOBNO.

         In ENCF0H, Translate Local Job index to Global  Job
         Number before returning it to the user.

     4.  FORK

         In .SJPRI, don't compare Job number to  NJOBS,  use
         GL2LCL to convert it.

     5.  FUTILI

         In CKJBNO/CKJBLI,  the  argument  is  a  local  job
         index, so make sure the comments say so.

     6.  GTJFN

         In DEFVER, user Global  job  number  in  GBLJNO  to
         create Temp file version  

     7.  IPCF

         In MSEND1, call MSHEAD with GBLJNO instead of JOBNO

         In MUTCRE, Assume all job numbers are  global,  and
         convert  them  to  locals  before attempting to use
         them.  -1 means use GBLJNO and convert it.

         In MUTFOJ, convert local job index  to  Global  Job
         number before returning it to the user.

         In SPLMES, use GBLJNO to  build  message,  but  use
         JOBNO for index into job tables.

         In LOGIMS/LOGOMS, use Global job number  in  RH  of
         message header

         Samething in LOGOMO.

     8.  JSYSA

         In ACES01, if  user  specifies  another  job,  call
         GL2LCL  to convert it from a global job number to a
         local index

         In CRJOB1, save Global job number of caller, rather
         than local index

                                                      Page 5


         In CRJDSN, call  GL2LCL  to  translate  Global  job
         number to local...

         In  .GACCT,  translate  user-specified  Global  Job
         number to a local index

         In ALOCRS, get caller's job number from GBLJNO, not
         JOBNO

         In  .SETJB,  translate  user-specified  Global  Job
         number to a local index

         In UFNI01, use global job number (GBLJNO) in UHJNO,
         not JOBNO

     9.  MEXEC

         In SYSINE, assign a Global Job number from CFS, and
         save it in GBLJNO

         In  RUNDD3,  Initialize  CFS  Global   Job   Number
         database...

         In LOG2, call JBAVAL to release Global  job  number
         just before HLTJB call.

         In LOGJOB, Print Global job number during logout.

         In  ELOGO,  translate  user-specified  global   job
         number into local index

         In .GJINF, use GBLJNO  to  return  user's  own  job
         number, instead of JOBNO

         In .GETAB, overhaul GTTAB table to  use  new  GTJOB
         routine  to  translate  user-specified  Global  Job
         number into local  job  index.   Make  the  tables'
         'size'  be  the  highest  legal  Global Job number,
         MXGLBS, for range checking instead of NJOBS,  which
         is only the highest index value.

         In  .GETJI,  translate  user-specified  global  job
         number into local index

         In GETJIT  table,  return  other  jobs  Global  Job
         number (from JSB)

         In ATACH1, same as .GETJI

    10.  MSTR

         In  MSTJOB,  always  return  local  job  index  for
         Global, convert globals.

                                                      Page 6


    11.  SCHED

         In   .TWAKE,,   SKDRTJ,   and    SKDSJC,    convert
         user-supplied job number from Global to local.

    12.  STG

         Make JBWDS be a function of MXGLBS, not  NJOBS,  so
         global job numbers will work.

         A new JSB location, GBLJNO, will contain the Global
         job number for the current job.


1.0 Introduction

This memo describes the changes to TOPS-20 structure handling made
for the CFS-20 project.

A structure is mounted either for exclusive use or for shared use.
A structure mounted for exclusive use may be mounted on exactly
one processor and a structure mounted for shared use may
be mounted on one or more CFS-20 processors.

(Items denoted by a * are still under consideration, but are unlikely to
change very much).

2.0 MSTR changes

The designation of shared or exclusive is available as a new
structure status bit, MS%EXC. If this bit is set in the returned
status word of the .MSSGS function, the structure is mounted on
this processor for exclusive use. If the bit is not set, it is mounted on this,
and perhaps other, processors for shared use.

A structure is declared to be shared or exclusive when it is mounted.
A new flag bit, MS%EXL, has been defined for the .MSMNT function to declare
that the structure is to be mounted for exclusive use. If this bit is not
set, the structure is to be mounted for shared use. A mounted structure
is either mounted shared or exclusive, there is no "promiscuous" or
"unrestricted" mount option.

A structure that is mounted for the use of a single job, MS%XCL, will
implicitly be mounted for the exclusive use of this processor. That
is, setting MS%XCL implies the setting of MS%EXL.

*A new function has been added to MSTR, .MSCSM. This function is used
to change the designation of a mounted structure from exclusive to
shared or from shared to exclusive. The function requires a structure
designator and the desired structure mount type. The specification
is as follows:

		MSTR

T1/	count,,.MSCSM
T2/	address of argument block (E)

E/	structure device designator
E+1/	new mount attribute as follows:

	MS%EXL =>	structure is to be exclusive
	0=>		structure is to be shared

Errors:

MSTRX2 Wheel or Operator privilege required
MSTX17 Status change denied by CFS
MSTX16 Status change not available for non-CFS systems


If the structure is already mounted with the requested attribute, the
JSYS will succeed.

*3.0 GALAXY changes (details to be provided by GALAXY group)

MOUNTR supports two new structure attributes, SHARED and EXCLUSIVE.
MOUNTR.CMD may contain these new attributes. If a structure is not
designated either SHARED or EXCLUSIVE, it will default to SHARED for
CFS-20 systems and EXCLUSIVE for non-CFS-20 systems. This defaulting
is a function of the MSTR JSYS.

Likewise, OPR recognizes these new attributes as well. If one
directs OPR to change the structure mount attribute of a mounted
structure, MOUNTR will use the .MSCSM function of MSTR to request
the change.

MOUNTR will also use .MSCSM to implement the DISMOUNT/REMOVE operation.
When a user requests that a removable structure be removed, MOUNTR
will attempt to set the structure mount attribute for the structure
to EXCLUSIVE. This is so it can determine if any other CFS-20
systems are using the structure. If the request succeeds, the
structure may be removed. If it fails, the operator must first
dismount the structure on the other systems that have it mounted
for SHARED access. This inelegant approach is in lieu of a high-level
operator protocol that would serve to unify the operation of
a CFS-20 installation.

4.0 Miscellaneous changes

4.1 ENQ/DEQ

CFS does not provide for a distributed ENQ/DEQ facility. The current
wisdom says that ENQ/DEQ is an LCS feature and is independent of
CFS. This means that we will develop a distinct SYSAP for ENQ/DEQ
along with its protocol and functional description. However we
will not be able to do this for release 6.

Without a distributed ENQ/DEQ, we are unable to distribute many
commercial and data base applications. However, some of these
applications, when distributed, will appear to run and consequently
jeopardize the integrity of the data base (that is, TOPS-20 does
not prevent someone from attempting to distribute the application).

Applications that rely on the OF%DUD option of the OPENF JSYS
will not be allowed to run on two or more CFS nodes. Only processes
on a single machine will be able to open the file as CFS will
detect a distributed application that uses OF%DUD.

Applications that do not rely on OF%DUD, but still rely on ENQ/DEQ
for coordination, will malfunction. This is because the various
ENQ/DEQ data bases are maintained independently of one another,
and the resources locked on one processor will not preclude
resources locked on another. Therefore, it is possible, for example,
for a process on one processor to have an exclusive ENQ on
a file and a process on a different processor to also have
an exclusive ENQ on this file! Clearly, this is a violation of
the application's intent.

In order to prevent such a malfunction, ENQ/DEQ has been changed
for release 6. If an ENQ is done specifying a JFN as the
locked resource, ENQ will attempt to acquire a CFS file
resource. This CFS resource will be "exclusive" and therefore
will prevent any other CFS processor from performing an ENQ
on the same file, even if the ENQ is compatible with the
original one. DEQ will release the CFS resource.

This "feature" assumes that all ENQs are exclusive ones, clearly
a false assumption. However, it allows a simple, fool-proof
solution to the problem described above. Note that this change
does not affect ENQs done by other processes on the same processor.
These "local" ENQs continue to function as they always have - for better
or for worse.

The only change that a user will notice is that an ENQ might now
fail with a "file is busy" error. The monitor will not enqueue
requests that fail because of a conflict over the CFS resource,
even if the user requested waiting. To do that, or anything
else, would require implementing part or all of the distributed
ENQ/DEQ service, and we've simply not the time or resources
to consider that.




+---------------+
! d i g i t a l !   I N T E R O F F I C E  M E M O R A N D U M
+---------------+

TO:  
                                       DATE:  12-Nov-84

                                       FROM:  Clair Grant
                                              Ron McLean

                                       DEPT:  Large Systems
                                              Software Engineering

                                       LOC:   MRO1-2/L10

                                       EXT:   6877


SUBJ:  TOPS-20 Multi-Access Disk Management Specification

     Most of the important actions decsribed in this specification  involve
accessing  a  disk when CI communication is, or has been, disrupted.  Thus,
while still true for an  HSC-controlled  disk,  most  of  the  details  are
interesting only if you are referring to a dual-ported MASSBUS disk because
if you can't communicate over the CI, you can't  access  an  HSC-controlled
disk anyway.

                                                                Page 2


        1.0     GOALS  . . . . . . . . . . . . . . . . . . . . . . . 3
        1.1       No Data Corruption . . . . . . . . . . . . . . . . 3
        1.2       5.1 Compatibility  . . . . . . . . . . . . . . . . 3
        1.3       Minimal Overhead Writing To The Disk . . . . . . . 3
        2.0     RESTRICTIONS . . . . . . . . . . . . . . . . . . . . 4
        2.1       Disk Serial Numbers  . . . . . . . . . . . . . . . 4
        2.2       Configurations . . . . . . . . . . . . . . . . . . 4
        3.0     DEPENDENCIES ON RSX20F . . . . . . . . . . . . . . . 4
        3.1       Disk Configuration . . . . . . . . . . . . . . . . 4
        3.2       Stopping The CI Microcode  . . . . . . . . . . . . 5
        4.0     USER INTERFACE . . . . . . . . . . . . . . . . . . . 5
        4.1       CHECKD Program . . . . . . . . . . . . . . . . . . 5
        4.2       SMON%  . . . . . . . . . . . . . . . . . . . . . . 5
        4.3       SETSPD Program . . . . . . . . . . . . . . . . . . 6
        4.4       BUGHLTs  . . . . . . . . . . . . . . . . . . . . . 6
        4.5       PAR>SHUT . . . . . . . . . . . . . . . . . . . . . 6
        5.0     DATA STRUCTURES  . . . . . . . . . . . . . . . . . . 6
        5.1       Processor Data Block (PDB) . . . . . . . . . . . . 6
        5.2       Unit Data Block (UDB)  . . . . . . . . . . . . . . 7
        5.3       Request-ID Status (RIDSTS) . . . . . . . . . . . . 8
        5.4       CI Wire Status (WIRSTS)  . . . . . . . . . . . . . 8
        6.0     DISK ACCESS LOGIC  . . . . . . . . . . . . . . . . . 8

                                                                Page 3


1.0  GOALS

     The multi-access project has 3 goals:  1) no data corruption,  2)  5.1
compatibility,  and 3) minimal writing of management data (overhead) to the
disk.  The  second  goal  is  a  bit  unusual  in  that  we  are  providing
compatiblity  for  a feature we didn't support prior to 6.0, we just didn't
prevent it.  Some customers have come to depend on this and it  is  in  our
best interest to allow them to continue as they have in the past.



1.1  No Data Corruption

     The major goal of this project is to  ensure  data  integrity  on  all
multi-accessed  disks  in the TOPS-20 file system.  This is accomplished by
allowing TOPS-20 to write to such a disk in only 2 cases:  1)  when  it  is
communicating via CFS with the other CPUs in the CI network, or 2) when the
other CPUs are known to be down.



1.2  5.1 Compatibility

     Prior to Release 6.0 TOPS-20 did not  provide  a  facility  to  manage
access  to  multi-accessed  disks.  But, it did not prevent a customer from
porting an RP06 to  2  systems  and  writing  software  to  manage  such  a
configuration.  This was clearly stated as unsupported by TOPS-20.

     In Release 6.0 TOPS-20 manages multi-accessed disks with  CFS,  yet  a
customer may still wish to use whatever local management scheme was used in
the past on certain disks.  TOPS-20  provides  for  this  by  allowing  the
customer  to  declare  disk  drives  and disk packs as "don't-care" access,
meaning they don't-care to have TOPS-20  manage  multi-access  and  TOPS-20
should honor all write requests.


                                   NOTE

               The customer  must  explicitly  request  this
               "don't-care"  designation,  as  described  in
               later sections.





1.3  Minimal Overhead Writing To The Disk

     Some amount of writing to all multi-access disks is required  by  this
management  scheme;   this  will be kept to a minimum.  This is the primary
reason a keep-alive mechanism was rejected, to avoid  constant  writing  to
the disk.

                                                                Page 4


2.0  RESTRICTIONS

2.1  Disk Serial Numbers

     All disks in the TOPS-20 file system must have unique  serial  numbers
in  order  for  TOPS-20  to  operate  properly;   serial numbers are now an
integral part of TOPS-20's disk management and  TOPS-20  outputs  a  BUGCHK
when it discovers a disk without a serial number.

     Any disks without serial numbers must be  fixed.   Since  RP20s  don't
have  serial  numbers,  a CHECKD command will be created to assign a serial
number to an RP20.

     Unfortunately, TOPS-20 can't  tell  the  difference  between  2  disks
drvies  with  the  same  serial  number  (which is bad) and a disk which is
accessible via 2 RH20 channels (which is OK).  Therefore, it is up  to  the
system manager to guarantee all disks have unique serial numbers.



2.2  Configurations

     TOPS-20 will not properly manage data in the following configurations:

     1.  It is illegal to have a MASSBUS disk drive ported to 2 CPUs  which
         have the same CI node number on different CI networks.

     2.  It is also illegal to define a disk drive as don't-care to one CPU
         and  do-care (the default) to another CPU when the 2 CPUs can both
         access the disk.




3.0  DEPENDENCIES ON RSX20F

     Work in RSX20F is required by this project;  this  work  will  provide
TOPS-20 with better disk and CI configuration information, allowing TOPS-20
to more accurrately manage multi-accessed disks.



3.1  Disk Configuration

     RSX20F must communicate its disk configuration to TOPS-20  by  passing
the  drive  serial  numbers  in a configuration packet.  This helps TOPS-20
determine which disk drives are potentially being assessed  by  other  CPUs
connected  to  the  STAR,  as  opposed  to those whose other port is to the
Console Front End.

                                                                Page 5


3.2  Stopping The CI Microcode

     RSX20F must stop the CI-20 u-code whenever the HALT or ABORT  commands
are  executed  by the PARSER.  ALso, HALT.CMD should cause the CI-20 u-code
to halt.  The instruction CONO KLP,400000 will halt the CI u-code.


                                   NOTE

               The CONTINUE command should not  do  anything
               to  the  CI-20.  This means that after a HALT
               (which stops the CI-20) the CI-20 will not be
               restarted  by the CONTINUE.  The halted CI-20
               will  be  restarted  when  detected  by   the
               once-a-second check in PHYKLP.





4.0  USER INTERFACE

4.1  CHECKD Program

     CHECKD needs new commands which allow a user to declare a structure as
"don't-care"  and  "do-care".   This  command  will  use the MSTR% function
.MSHOM (modify home block) to set the newly-defined word HOMDCF in the home
block of each disk in the structure.

CHECKD>ENABLE DON'T-CARE structure-name

CHECKD>DISABLE DON'T-CARE structure-name


     CHECKD needs another new command to set the serial number of  an  RP20
disk.

CHECKD>SET DRIVE-SERIAL-NUMBER (FOR RP20)
Enter deciaml serial number:




4.2  SMON%

     A new SMON% function .SFDCD  is  needed  to  declare  to  the  running
monitor that a disk drive is "don't-care".  It's arguments are:

AC1/ channel
AC2/ controller
AC3/ unit

                                                                Page 6


4.3  SETSPD Program

     SETSPD needs to have a new  command  which  will  use  the  new  SMON%
function.  This command will be used in n-CONFIG.CMD.

DONTCARE channel controller unit



4.4  BUGHLTs

     The TOPS-20 BUGHLT code will stop the CI-20 u-code.



4.5  PAR>SHUT

     If a SHUT (or PAR>DEP 20=1) causes the KL to halt, then the KLIPA will
also be halted.  If you end up at a breakpoint or nothing at all happens on
the KL, nothing will be done to the KLIPA.



5.0  DATA STRUCTURES

5.1  Processor Data Block (PDB)

     The PDB resides in physical block 3 on a disk and  has  the  following
format:

        -----------------------------------------------
        !     Current Drive Serial Number (word 1)    !
        -----------------------------------------------
        !     Current Drive Serial Number (word 2)    !
        -----------------------------------------------
        !Non-CI CPU Serial Number ,,                  !
        -----------------------------------------------
        ! Node 0 Serial Number  ,,                !A!B!
        -----------------------------------------------
        ! Node 1 Serial Number  ,,                !A!B!
        -----------------------------------------------
        ! Node 2 Serial Number  ,,                !A!B!
        -----------------------------------------------
        ! Node 3 Serial Number  ,,                !A!B!
        -----------------------------------------------
        ! Node 4 Serial Number  ,,                !A!B!
        -----------------------------------------------
        ! Node 5 Serial Number  ,,                !A!B!
        -----------------------------------------------
        ! Node 6 Serial Number  ,,                !A!B!
        -----------------------------------------------
        ! Node 7 Serial Number  ,,                !A!B!
        -----------------------------------------------
        ! Node 8 Serial Number ,,                 !A!B!
        -----------------------------------------------

                                                                Page 7


        ! Node 9 Serial Number ,,                 !A!B!
        -----------------------------------------------
        ! Node 10 Serial Number ,,                !A!B!
        -----------------------------------------------
        ! Node 11 Serial Number ,,                !A!B!
        -----------------------------------------------
        ! Node 12 Serial Number ,,                !A!B!
        -----------------------------------------------
        ! Node 13 Serial Number ,,                !A!B!
        -----------------------------------------------
        ! Node 14 Serial Number ,,                !A!B!
        -----------------------------------------------
        ! Node 15 Serial Number ,,                !A!B!
        -----------------------------------------------


1B34 - this node's wire A to the STAR is good

1B35 - this node's wire B to the STAR is good


     When a disk unit is discovered, at system start up  or  coming  online
when  the  system  is  running,  PHYSIO  will  make the following series of
checks:

     1.  If the drive serial number in the disk's  PDB  doesn't  match  the
         current  drive's  serial  number  (the  disk  pack has been moved;
         therefore any data in the PDB is invalid), zero the entire PDB and
         then  fill in the drive serial number and our CPU information (CPU
         serial number and CI wire status).  Otherwise, move  to  the  next
         check...

     2.  If the disk is dual-ported,  fill  in  our  CPU  information  (CPU
         serial  number and CI wire status).  Otherwise, the disk is single
         ported to us so eliminate any old data  by  zeroing  the  PDB  and
         filling  in  our  CPU  information  (CPU serial number and CI wire
         status).




5.2  Unit Data Block (UDB)

     A new bit U1.FED will be set in the UDB of  a  front-end  disk.   (See
section 3.1).

     A new bit U1.DCU in the UDB indicates the unit (disk drive) as  having
been  declared  don't-care by SETSPD;  another new bit U1.DCD indicates the
disk's home block had the HOMDCF word set by SETSPD.  When a disk is  found
spinning  on a drive, U1.DCU and U1.DCD are set appropriately;  if they are
both set, the disk will be declared don't-care.  However, if one bit is  on
and the other is off, TOPS-20will treat the disk as do-care.

                                                                Page 8


     The UDB contains a copy of the PDB.



5.3  Request-ID Status (RIDSTS)

     The table RIDSTS (indexed by  CI  node  number)  contains  information
about the current status of each path to each of the other nodes on the CI.
This status is a result of periodically  sending  REQUEST-IDs  to  all  the
other  nodes  on  the  CI,  alternating  paths  on  consecutive sends to an
individual node.  Of interest to  the  disk  service  are  the  bits  which
indicate if the last REQUEST-ID received an answer or there was no response
on that path to the node.  If REQUEST-IDs  are  being  answered  we  assume
there is a TOPS-20 running on the remote system, and if REQUEST-IDs are not
being answered we assume TOPS-20 is not currently  running  on  the  remote
system.



5.4  CI Wire Status (WIRSTS)

     The locations CIWIRA and CIWIRB indicate the  results  of  the  latest
periodic loopback packets to the STAR on the 2 wires;  0 = succeeded, non-0
= failed.



6.0  DISK ACCESS LOGIC

     Upon receiving a transfer request from PAGEM,  PHYSIO  will  make  the
following series of checks:

     1.  If the disk is single-ported, don't-care, both  ports  to  us,  or
         belongs  to 20F, allow access.  Otherwise (the disk is dual-ported
         to another CPU), move to next check..

     2.  If other nodes have never accessed the disk (their offsets in  the
         PDB are 0), allow access.  Otherwise, move to next check..

     3.  If both of our CI wires are bad, don't allow  access.   Otherwise,
         move to next check...

     4.  If REQUEST-IDs are not being answered by the  other  nodes,  allow
         access.  Otherwise, move to next check..

     5.  If there are CFS connections to the  other  nodes,  allow  access.
         Otherwiese, disallow access.

CFS PROCESSING DOCUMENTATION


CFSSRV is a lock manager. The locks it manages represent resources in
the  system,  but  CFSSRV  is  not  aware  of  the mapping of lock to
resource. The mapping, or meaning, is made  by  the  creator  of  the
resource.

Files  are a resource with CFS locks. Each file has the following CFS
locks:

	. open type

	. write access

	. ENQ/DEQ lock

In addition, each of the sections of the file, represented by an OFN,
has an access token. Therefore a file has up to 512 access tokens.

When  a  file  is opened, the "open type" and "write access" lock are
acquired. The "open type" is either"

	. shared read (frozen)

	. shared read/write (thawed)

	. exclusive (restricted)

	. promiscuous (unrestricted)

The word in parentheses represents the argument to OPENF%.

If the opener requests "frozen write" access, then if the "open type"
lock  is  successfully  locked,  i.e.  no  one has the file open in a
conflicting mode, the "write access" lock is  acquired.  This  is  an
exclusive  lock that represents the single "frozen write" user of the
file. The lock is held by the system that has the file opened "frozen
write".

Each of the locks described above apply to a file, that is  something
described  by an FDB. In addition to these, each file has some number
of OFNs, one for each file section that is in use. Therefore, a  file
may have up to 512 OFNs or file sections.

Each  active  OFN  has  an  "access  token"  lock.  The  access token
represents the ability of the system to access the data described  by
the OFN. The access token may be held in one of the following modes:

	. place-holder

	. read-only

	. exclusive (read or write)

A  read-only  access  token  may  be  held  by  any number of systems
simultaneously. An exclusive token is held  by  only  one  system.  A
"place-holder"  access  token  is  an  artifact  that permits the CFS
systems to agree on the  end-of-file  correctly.  It  also  has  some
ramifications  for  bit  table  access  tokens that will be described
later.  Place-holder  tokens  are  also  an  optimization  to   avoid
reallocating tokens that have been "lost" to another system.

The  file access token is the most fundamental CFS lock in that it is
used not only to control simultaneous access to user files, but  also
to manage directories and bit tables.

The  access  token  state  transition  table is given below, with the
action required to make the designated state change

\
 \ new		read		exclusive	place-holder
  \
   \
old \
------------------------------------------------------------
read		nothing		vote		DDMP*

exclusive	DDMP**		nothing		DDMP*

place-holder	vote		vote		nothing

Where:

vote	means that the other CFS systems must be asked for permission
	to make the state transition. Voting is a fundamental operation
	of CFSSRV and is done by a software implemented broadcast.

DDMP*	means that DDMP must run and remove all of the OFN's pages
	from memory and update the disk copy of any modified pages.

DDMP**	means that DDMP must run to update to disk any modified
	pages and any in memory pages must be set to "read only".
	This latter operation is performed by clearing the CST write
	bit. The CST write bit has been implemented in KL paging explicitly
	to support loosely-coupled multi-processors.

While  DDMP  is performing a CFS-directed operation, all pages of the
OFN are inaccessible to any other process. This is achieved by a bit,
SPTFO, set in SPTO2 by DDMP.

Access permission to a file moves among the CFS  systems  on  demand.
Each system must remember its state of the token so it may respond to
requests for the access permission.

The token consists of:

	. The structure name

	. the OFN disk address

	. a flag bit to indicate this is the access token

	. state

	. end-of-file pointer

	. end-of-file transaction number

	. fairness timer

	. the OFN this token is for

and, if this is a token for a bit table:

	. structure free count

	. structure free count transaction number

The fairness timer is a CFS service that allows a resource to be held
on  a  node  for a guaranteed interval. Therefore, the owner need not
lock the resource and arrange to unlock it later.  Rather  it  simply
places  the  guarantee  interval  in  the  resource block and the CFS
protocol takes care of the rest.

Place-holder tokens exist principally to hold the  values  associated
with the end-of-file pointer and with the structure free count. It is
important that these be held by each system, because the owner of the
OFN  token  may  crash  and  therefore  the last known state of these
quantities must be remembered so that the remaining  nodes  may  have
the  best  possible value for them. The transaction count is intended
to determine whose value is the most recent should the owner  not  be
present  to  contribute  the  current  value.  During  the voting for
acquiring a token, these values are passed among the CFS  nodes,  and
the  node  conducting the vote retains the values associated with the
largest transaction number.

The  file  access  token  represents  the rights that a system has to
access a file section. That is, the  token  is  associated  with  the
file's contents.

However,  the  owner  of  a  file,  i.e. the system holding exclusive
rights to access the file, also has the right to  modify  the  file's
index  block.  The  owning system may add pages to the file or delete
pages from the file.

OFNs are treated specially in TOPS-20. Unlike the file's data  pages,
an  OFN  may  not be discarded when the system gives up its access to
the file and read from its home  on  the  disk  when  the  access  is
reacquired.  An  active  index block, represented by an OFN, contains
paging information that must be retained while the  file  is  opened.
For  this  reason,  a  system needs to be informed if the index block
contents are changed by another system.

This  information is disseminated in CFS by a broadcast message. Each
time a system writes a changed index block to disk, it informs all of
the other  CFS  systems  by  a  broadcast  message.  Note  that  this
broadcasting  is done only when the changed index block is written to
disk, and not each time the index  block  is  modified.  A  broadcast
message  is  used instead of including this in optional data with the
access token for  reasons  explained  in  a  later  section  of  this
document.

When  a  CFS  system receives such a message, it sets a status bit in
the apporpriate OFN so that the  next  time  a  process  attempts  to
reference the OFN the following will happen:

	. the disk copy of the index block is examined.

	. for each changed entry, update the local OFN

This  reconciliation  of  the  index  block  with  the  local  OFN is
accomplished by the routine DDXBI.

VOTING IN CFS

When a node needs to "upgrade" its access to  a  resource,  including
acquiring  a  new resource, it must poll each of the other CFS nodes.
This is so because none of the CFS nodes is a  master  and  therefore
there  is  no a priori location for resolving access requests. CFS is
not only a democracy, but somewhat of a cacophony.

Voting, then, requires "broadcasting" to each other node the required
resource and access. Each node must respond with  its  permission  or
denial.

The  CI  does not support broadcast, and even if it did, it would not
support a reliable broadcast. Therefore, CFS implements  broadcasting
by sending a message to each of the other nodes, one-at-a-time.

A vote request contains:

	. function code

	. resource "name" (seventy-two bits)

	. access desired

	. vote number

A reply contains:

	. function code

	. resource name (seventy-two bits)

	. reply (yes, no or "qualified yes")

	. vote number

	. optional data

The  message  contains  a function code because votes and replies are
only one kind of CFS to CFS communication.

The vote number is used to insure that the reply  is  to  the  proper
request. The requestor may "restart" a vote at any time. It does this
be  "canceling"  the  current  vote, acquiring a new vote number, and
broadcasting the new  request.  A  vote  number  is  a  monotonically
increasing, thirty-six bit quantity.

A vote will be restarted for one of the following reasons:

	. a configuration change is reported by SCA

	. a previous vote "times out".

The  latter  should  rarely  occur,  and  is  likely  indicative of a
malfunctioning CFS on some other system. In some cases, a  node  will
not  reply  if  it  is  unable  to  acquire the appropriate space for
constructing a message. There are a small number of cases where  this
is  legal,  and  for  these  cases,  the  requestor  must revote when
appropriate.

When a reply is received, the vote number must match  the  number  in
the associated resource block.

The replies to a vote are:

	. unconditional yes.

	. no

	. conditional yes.

	. cancel yes condition

A conditional yes means that the respondent will approve the request,
but  it  needs  to  perform a local housekeeping operation first. The
most common form of this is voting for  an  access  token  where  the
respondent  must  first update the disk copy of the file, and perhaps
flush all of its local copies of the file data.  When  the  condition
has been satisfied, a "condition satisfied" reply is sent.

Each resource has a "delay mask". This mask has a bit for each of the
other  CFS nodes, and whenever a node replies with "conditional yes",
its bit is set in the resource's delay  mask.  Therefore,  a  process
that  is  waiting for the conditions to be satisfied, simply examines
the delay mask periodically and waits for all of the delay bits to be
cleared. While any delay bits are set, the vote is considered  to  be
still  in  progress,  and  therefore  any  configuration  change will
require restarting the vote.

Conditional yes votes, and the associated delay mask, are provided to
eliminate the need for nodes to reply "no" when there  are  temporary
conditions  preventing  the  approval  of  the  request. The overhead
required to process such replies, and to wait for them, is offset  by
the gains in not having to revote in the face of such conditions.

CFS provides the following basic voting services:

	. Acquire a resource. If the resource is known on this
	node, but the current state conflicts with the request, the
	currently held resource is released and a vote is taken.

	This service is called specifying either "retry until
	successful", or return after one try.

	. Upgrade a resource. This service tries only once. It
	also guarantees that the currently held resource will not
	be released. In fact, the resource may be held and "locked"
	locally when "upgrade" is requested.

	. Acquire local resource. This is used for resources
	not shared by other CFS nodes, but managed by CFS. Examples
	are directory locks on exclusive structures.


VOTE MECHANISM

A vote is started by the routine VOTEW. Ordinarily, one does not call
this  routine  directly,  but  rather one requests a resource, and if
necessary, VOTEW will be called to conduct a vote.

VOTEW always waits for the vote results. The results are  tallied  at
interrupt  level  by  noting  the  number  of replies received in the
associated resource block. VOTEW periodically examines  the  resource
block testing for:

	. all tallies received

	. a "no" vote recorded

	. a configuration change

The actions taken are as follows:

	. configuration change: restart the vote

	. a "no" vote: return to the called

	. all tallies received:

	 . if no "conditional yes" votes, return to caller

	 . If one or more "conditional yes" votes, wait for
	   the "condition satisified" replies. While waiting,
	   a configuration change could occur, requiring the vote
	   to be restarted.


RESOURCE ACQUISITION AND UPDATING

CFS  resources  are acquired and changed in response to requests from
other parts of the monitor. Rather than describe each one, it will be
instructive to consider how the file related resources are  acquired,
maintained, and destroyed.

When  a  file  is  opened,  and  the first OFN is created, ASOFN will
create the static CFS resources: open type and, if  appropriate,  the
frozen writer token.

Anytime  an OFN is created, be it in response to opening the file, or
one of the "long file" OFNs, ASOFN will create the access token.

The  access token state is verified by various of the file system and
memory management routines. The most common place for this is in  the
page  fault  handler.  The two exceptions to this are for a bit table
access token and a long file "super index block". The bit table token
is acquired and "locked" when  the  bit  table  lock  is  locked  and
released  only  when the bit table is unlocked. The token for a super
index block is occasionally acquired in DISC by the routine  (NEWLFT)
that  creates  new long file index blocks. In theory, these exception
cases need not be exceptions. That is, the code could simply rely  on
the  normal management of the token during page faults to insure data
integrity. However, in these cases, the code  must  perform  multiple
operations on the file data "atomically". That is, it must modify two
or  more  pages,  or  it  must  "test  and  set"  a location with the
assurance that no other accesses to the data occur between the steps.
On a single system, this is done by a NOSKED  to  prevent  any  other
process from running. In an LCS environment, NOSKED is not sufficient
(although  it  is necessary!). Another form of interlock must be used
to prevent a process on another system from  examining  or  modifying
the  data.  It  turns  out  that the access token satisfies this need
quite well.

The  above  discussion  implies  that the page fault handler, when it
acquires an access token for an OFN, does not "lock" the token on the
system. That is, the token is  acquired  but  not  "held".  This  may
result  in  the  token  being  preempted by another system before the
process is able to reexecute the instruction  that  caused  the  page
fault.  The  "fairness" timer in the token resource is one attempt to
minimize such thrashing.

The access token is acquired on the following conditions:

	. when an OFN is being created

	. when the OFN is locked

	. when a page fault occurs because the current access is
	 not correct

The  current  state of the token is kept in the CFS resource block as
well as in the OFN data base. The field, SPTST, is  the  current  OFN
state of an OFN. The values are:

	0 => no access

	.SPSRD => read only

	.SPSWR => read/write

SPTST  is  modified  by the routines in CFSSRV that are called to set
the state of the file. The values are set here,  and  not  in  PAGEM,
PAGFIL  or  PAGUTL  because  the  OFN state must be set while the CFS
resource block is interlocked against change.

The routines to modify the state of an OFN token are:

	. CFSAWT - acquire token but don't hold it

	. CFSAWP - acquire token and hold it

TOKEN MANAGEMENT

Once  a  token  is  "owned" on a system, it will remain in that state
until it is required on another system. That is, if the token is held
for read/write access (exclusive), then all references to  the  pages
of the OFN will succeed without CFSSRV being invoked.

If  a  token  must be revoked because another system needs it, CFSSRV
signals DDMP to process the data pages. This is done by:

	. Setting bits in the field STPSR in the OFN data base.

	. Setting the OFN's bit in the bit mask OFNCFS.

	. Waking up DDMP.

The  field  STPSR is a two-bit quantity indicating the type of access
required by the requesting system. DDMP's action is as follows:

read-only needed:

	Write all modified pages to the disk. Clear all of the CST
	write bits in all in-memory pages.

read/write needed:

	Write all modified pages to disk. Flush all "local" copies
	of data including any copies on the swapping space. Swap out
	the OFN page if it is in memory (actually, simply place it on
	RPLQ).

Once  DDMP  has  performed  the necessary operation, it calls CFSFOD.
This  routine  will  set  the  OFN  state  and  the  resource   state
appropriately as follows:

read-only requested:

	set OFN state to .SPSRD and set resource state to "read".

read/write requested:

	set OFN state to 0 and set resource state to "place-holder".

CFSFOD  also  copies  the current end-of-file information from OFNLEN
into  the  resource  block  and  finally  it  sends  the   "condition
satisfied" message to the requestor.

While  DDMP  is performing its work on behalf of CFS, it sets the bit
SPTFO in the OFN data base. This bit is examined by  the  page  fault
handler,  and  by  CFSAWP/CFSAWT to see if the OFN is in a transition
state. If SPTFO is set, and the process  requiring  the  OFN  is  not
DDMP,  then the process is blocked until SPTFO is cleared by DDMP. In
order to facilitate identifying DDMP from all other processes, a  new
word  has been added to the PSB called DDPFRK. If DDPFRK is non-zero,
then the current process is indeed DDMP and SPTFO should be ignored.

UNUSED RESOURCES

Whenever  a  node  replies  "no"  to  a  request, it remembers in the
associated resource block the node(s) that have  been  rejected.  The
only  reason  for  unconditionally  denying  a  request  is  that the
resource is "held" locally. If a resource cannot be  granted  because
of  the  fairness  timer, the "no" response includes an optional data
word of the time the resource is to be held. Therefore, the requestor
knows precisely when to request the resource anew.

When a held resource is "released" (or undeclared), CFS examines  the
rejection  mask  for  the  resource.  For each node identified in the
mask, a "resource released" message is sent indicating that this is a
propitious time to try to acquire the resource. There is no guarantee
the new request will be granted as the resource could be held  again,
or  another node could have requested, and been granted, the resource
first.

DELETING FILE RESOURCES

The   access   token  is  deleted  whenever  the  associated  OFN  is
deassigned.

The static file resources are released when the file is closed.  This
is performed in RELOFN.


CHANGES TO EXISTING CONCURRENCY CONTROL SCHEMES

As  a  result  of CFS, much of the concurrency control in TOPS-20 has
become distributed. In some cases, this has been done by  creating  a
companion  resource to an already existing one. As example of this is
the file open mode resource described above.

In other cases, existing locks have been replaced by CFS resources.

The  decision  as  to  which  technique  to  employ  was  made  on  a
case-by-case basis. The significant criterion was how easy it was  to
eliminate  the  existing  concurrency control and replace it with the
CFS management. The file resources proved difficult to  do.  However,
there  are  two important pieces of the monitor's structure that were
easily  and  efficiently  replaced:  directory  locks  and  directory
allocation tables.

Directory locks are now CFS  resources.  A  directory  lock  resource
contains:

	. the seventy-two bit identifier

	. owning fork

	. access type

	. share count

	. waiting fork bit table

In  fact,  a  directory  lock resource is the sole instance of a "CFS
long block".

Directory  locks  are  always  acquired  for  exclusive use. However,
unlike  file  access  tokens,  directory  locks  are  never   granted
"conditionally".  This  is  because  directories  are  files, and the
directory contents are subject to negotiation by the associated  file
access  token. That is, acquiring exclusive use of the directory lock
resource is independent of acquiring permission to read or write  the
directory  contents.  When some process on the owning system attempts
to read or write the directory contents, it must  first  acquire  the
file  access token in the proper state. Although this sounds somewhat
inefficient, i.e. requiring  the  node  to  acquire  two  independent
resources, it is in fact a remarkably efficient adaptation of the CFS
resource  scheme.  This  is  so  because a node need not know how the
directory contents will be used when it acquires the directory  lock.
That  is the way the lock was handled before CFS, and preserving this
convention means that the code to acquire the  directory  lock  under
CFS  is as efficient as possible. The state of the file access token,
and consequently the degree of sharing of the directory contents,  is
determined  by  how  the  contents  are referenced and not by how the
directory is locked. This means that a process may lock the directory
lock without knowing how it will reference the associated  data,  and
its   reference   patterns  determine  what  other  negotiations  are
required.

The directory allocation table is a local "cache" for the information
normally  stored in the directory. Each active OFN is associated with
a  directory  allocation  entry.  Each  entry  is  for  exactly   one
directory.  The  entry,  before  CFS,  contained:  structure  number,
directory number, share count, and remaining allocation.

Under  CFS,  an  active  allocation entry contains: structure number,
directory number, share count, and pointer to the CFS resource block.
The CFS resource block  contains,  besides  the  normal  CFS  control
information,  the  remaining  allocation  for  the  directory  and  a
transaction number. The transaction number serves the same purpose as
the transaction number associated with a file end-of-file pointer.

CFS  may  have  an "unused" resource block for a directory allocation
entry. That is, even though there is no active  directory  allocation
entry,  there may be a CFS resource block representing the directory.
This is because CFS attempts to retain knowledge of resources for  as
long  as possible to avoid having to vote when some process wishes to
create the resource  anew.  However,  CFS  will  destroy  any  unused
resource allocation entry that is requested by another system.

TRANSACTION NUMBER

The  optional  data  items, "end-of-file pointer" and "structure free
space", have an associated value called the "transaction number".

One  either  uses  centralized  or   decentralized   control   in   a
"loosely-coupled  multiprocessor"  system.  In  a centralized system,
control  information  and  updating  is  coordinated  by  a   master.
Transactions  are "serialized" by virtue of having a single owner for
the resoruce and therefore a single manager of the resource data.  In
a  decentralized  system,  the various systems share the ownership of
resources and use some sort of  "concurrency  control"  technique  to
manage resources.

CFS  is a decentralized system. A resource is not owned or managed by
any particular system, but rather the responsibility for the resource
is passed from system to system as required.  As  such,  it  may  not
always  be  possible  to uniquely identify a particular system as the
owner. This may cause a problem when a system  needs  to  become  the
owner,  and  therefore  must  determine  the  current  status  of the
resource in question.

There are two possibilites that a nascent owner may encounter:

	. The previous owner is present and indentifiable.

	. There is no system that is the previous owner

and of this latter case:

	 . the existing control information is accurate

	 . the existing control information in not accurate.

Clearly,  if  the previous owner is present, the new owner has all of
the information it needs to proceed with its transaction.

If the previous owner cannot be identified, then the new  owner  must
be  able  to  determine  which of the systems has the current control
information about the resource. It may be that none of them has,  and
this  is a problem that exists even on a single-processor system. The
result of such a problem may be "lost pages", inconsistent data bases
and other such  phenomena.  As  in  a  single-processor  system,  the
problem occurs because the resource control information is lost as an
effect of a system crashing.

In  order  to  determine  the  most  up-to-date  information  about a
resource, each system maintains a transaction count  along  with  the
information.   Whenever   it   acquires  information  with  a  larger
transaction count than its own value, it knows  that  information  is
more  current  and it must replace its own copy with the new data and
count. Whenever a system unilaterally changes its copy of the control
information, it must also increment the associated transaction count.
Since a system may perform such an update only when it has  write  or
exclusive  access  to  the  resource,  the  system  need  change  the
transaction count only when it must downgrade its access.

Due  to  the  nature of the CFS voting and resource management, it is
possible for a  system  to  acquire  a  resource  but  to  receive  a
different value for the resource control information from each of the
other  systems  (this  will  happen only if the owner crashed. If the
owner didn't crash, then at least two of the other systems must  have
the  same  control  information and transaction count). In this case,
the transaction counts are  used  to  identify  the  most  up-to-date
value.

The   transaction   count  is  really  a  "clock"  that  is  used  to
"time-stamp" information. When systems communciate with  one-another,
they synchronize the clocks by sending each other the current counts.
Most network concurrency schemes use clocks for similar purposes, and
most  of  the  uses  and implementations are considerably more exotic
than this one. However, since CFS needs the clock only  to  determine
relative ages, and not absolute ages, of information, this simplified
clock is adequate.

An  alternative to using transaction counts is to "broadcast" changes
to resources. This has the disadvantage that it  is  costly  in  both
processor  and  communications  time and resources. However, CFS does
use broadcasting  in  a  few  cases  where  the  lack  of  up-to-date
information could result in data being destroyed. The two cases are:

	. an OFN being modified and written to disk

	. an EOF value being written into the directory copy on the disk

As  both  of  these  represent  changes  in the permanent copy of the
resource, it is essential that all of the other systems have  current
copies or knowledge of the update.

CFS MESSAGE SUMMARY

Items marked with a "*" are sent as broadcast messages.

*1. request resource  (vote)

2. reply to request:

	a. unconditional yes

	b. unconditional no

	c. no with retry time

	d. conditional yes

3. resource available

4. condition satisfied

*5. OFN updated

*6. EOF changed

In addition, each message type may carry specific optional data items,
up to four words of optional data per message.


CFS ERROR RECOVERY PROCESSING


1.0 Introduction

Failures in a CFS-20 network may result in various partitions of
the original homogeneous system. Many of these partitions are
simply detected and harmless even if not properly administered. However,
there is an important class of partitioning that is both
difficult to detect and extremely harmful to the shared data
bases.

2.0 Failures

The CFS network exhibits many potential failure cases. The most
common will be a single processor crashing. Fortunately, this
common case poses no real threat to the integrity of the disks.

A processor is considered to have crashed if its SCA keep-alive
ceases. Once this is determined, SCA will inform all of the
SYSAPs that the circuit to that processor has failed and
the CFS SYSAP will begin the recovery described above.

In general, if a processor crashes, the remaining processors will
simply stop sending it CFS messages. Also, when the crash is
first detected, all outstanding CFS votes are restarted so that
the requesting processor will not have to determine if it has already
recorded the vote of the failed processor. These two measures
are adequate to synchronize the CFS network protocol.

This is all well and good so long as the reason the remote
SCA has ceased to maintain its keep-alive counter is that
the processor has crashed and will be restarted. However,
this same effect will be observed if the remote processor's
link to the CI, the KLIPA, fails.

CI failures are manifested in two important ways: a single processor's
link to the CI fails, and there is some failure on the CI cable.

2.1 Single processor link failure

In this case, the port or link has failed. This processor is unable
to send or receive data over the CI. Other CI processors can not
send data to this processor.

This failure is detectable by the attached processor in one of several
ways:

	. The local CI port raises a unique error

	. The local CI port no longer responds to local commands
	(e.g. DATAI or CONI).

	. A CI failure can be diagnosed (see 2.2)

2.2 CI failure

In general, the entire CI will not fail. This is because the CI is
made up of many cables, all connected together with a "star coupler".
Therefore, if a single cable is crushed or cut, only the processor
connected to that cable will suffer. Of course, some catastrophe
could befall the star coupler, but this is so remote that we won't
consider it.

Therefore, it is possible to detect this case by sending a message
to oneself. Such a message will be delivered only if the path
from the local port to the star coupler is working. This technique,
called "loopback", is quite a powerful diagnostic and may be used
to detect port failures as well.

2.3 CI failure consequences

In general, a CI failure should result in no harm to any HSC-based disks.
This is so because the failed processor can neither participate in CFS
negotiations nor can it access the HSC.

However, if the failed processor is connected to one or more Massbus
disks, and one or more of these disks is also connected to another
processor, the data on the shared disks is in jeopardy. This is so
because the failed processor and the other one can no longer coordinate
accesses to the shared disk. If one of the processors does not unilaterally
cease accessing the disk, the data and disk structure will be damaged.

While it is possible for a CI or port failure to occur that a processor
cannot detect, this is a remote possibility. Therefore, we will assume
it is possible to decide which processor has been "cut off" from the
CFS network (at least THAT processor can determine it is the one)
and therefore it is possible to negotiate the ownership of the
shared disks.

In summary, the problem of shared Massbus disks connected to a
processor that is "cut off" from the CI is an important CFS failure
that must be detected and properly handled.

3.0 Massbus disks

Massbus disks are of two varieties: DCL and non-DCL disks.

A DCL disk is either an RP04, RP06 or RP07. Each of these disk drives
provides a status bit called "programmable". A drive displaying this
bit has a dual-port option installed, and the port select switch is
in the A/B position. There is no indication whether the other
port is attached to anything or whether the port is in use.

A non-DCL disk is an RP20. An RP20 drive does not provide the
programmable bit. However, the RP20 does have an onboard controller
with its own unit number. By convention, a dual-ported onboard controller
has a non-zero unit number and a single-ported onboard controller
has a unit number of zero. Note that this is an installation guideline
and not a hardware enforced requirement. However, the CFS failure recovery
algorithm depends on its being adhered to.

TOPS-20 maintains a software status bit, available via the MSTR
read status call, called "sharable". A drive that is sharable is
either a DCL drive displaying the programmable bit or an RP20
that has an onboard controller with a non-zero unit number.

4.0 Fail safe procedures

There appear two choices for recovering from a CI failure:

	1. The processor that is cut off must refrain from
	using any of the sharable Massbus disks.

	2. The processor that is cut off determines which of the
	sharable disks are in use by another processor, and it
	refrains from using only those disks.

The first choice is simple to implement and entirely safe. However, it
does preclude certain "legitimate" configurations and therefore
does not provide the maximum availability.

The second choice is fraught with complications, although some of
the unsupported configurations from the first choice are available.

Implementing choice 2 is complicated and the benefits derived
are a function of the likelihood of a CI failure. Such failures
should be rare, and therefore the effort to implement choice
2 is inappropriate. Therefore, TOPS-20 release 6 will implement
choice 1 as follows.

3.1 The algorithm

A processor that is "cut off" from the CI, and is either being loaded
or is running when the fault occurs must refrain from addressing
any sharable disks. The disks that may be referenced are:

	. Any non-sharable disk

	. The disk being used as the front-end file system (see section 3.2)

This will be enforced by means of a new unit status bit (a bit in UDBSTS).
When this bit is set, all references to a sharable
disk unit will be returned by PHYSIO with an error indication. Therefore,
a unit that is not usable when the fault is detected may be made usable
by switching its port selection switch to either the A only or the B only
position.

The errors reported back by PHYSIO will likely be manifested as
"IO data errors" to the users.

The monitor should also report this to the operate via a unique
BUGCHK so that the operator may decide whether to change
the port selection or not.

3.2 The front-end file system disk pack

As noted in section 3.1, TOPS-20 needs to know which pack, if any,
contains the "in use" front-end file system. The front end's file system
may be:

	. on a floppy

	. on a drive available only to the front-end

	. on a drive shared by the front-end and the KL10

In addition, any valid TOPS-20 structure may contain a front-end file
system, but at most one of these structures may be in use by the
front-end.

It may not be possible for TOPS-20 to determine which of the valid
structures is being used by the front-end. However, certain configuration
measures can eliminate the ambiguities. Following is the algorithm
for making the determinations:

	. TOPS-20 must determine from the -11 reload word the boot
	device for the front-end. This word is provided by RSX20F
	during "protocol intialization" and is saved by DTESRV in
	the "comm region".

	. If the boot device is a disk, the reload word will contain
	the RH11 unit number. This unit number will be the same
	as the RH20/Massbus unit number.

	. At this point we have to rely on configuration guidelines.
	If TOPS-20 can detect only one drive with the specified
	unit number that also contains a valid front-end file system,
	then it may assume that pack is being shared with RSX20F.
	Note that this does not have to be true. The drive being
	used by RSX20F could be connected to only the RH11, and the
	drive that TOPS-20 can "see" could be connected to another
	KL10. However, either we state and require these configuration
	rules, or we have to forego many important error recovery
	procedures.

	One way to obviate some of the ambiguities  would be for RSX20F
	to also relate whether the drive it is using for its file
	system is programmable. If it is, then it is reasonable to assume
	that the other port is connected to the KL10 (again, not necessarily
	so, but to not do so makes little or no sense).

One remaining problem is the collection of drives connected to the
RH11 and to the KL10 that are not being used by the front-end. Although it
is prefectly safe for a "cut off" CFS node to continue using any such
disk drive, it cannot tell that the drive is not connected to another
KL10 and therefore cannot use the drive. This implies that a "cut off"
CFS may not access drives shared with front-end's RH11 unless the
drive is first locked onto the RH20 port. This limitation will preclude
performing certain files-11 maintenance functions together with
routine time-sharing access to these drives.

5.0 Choice 2

This section is a brief outline of the steps required to implement choice
2. This is not meant to be an exhaustive description of the algorithm,
but rather a sketch of the required elements.

5.1 Conflict detection

A processor that is cut off from the CI has only one means of communicating
with the system sharing a given Massbus disk: the disk drive itself.
Therefore, each processor attached to a sharable Massbus disk must
signal its presence now and then so that the other system, when need be,
may detect that presence.

The most straightforward way to do this is for each processor attached
to a sharable disk to guarantee that it will change some register
in the drive periodically. That is, each processor must either
use that register to accomplish a data transfer, or it must
store a canonical value into the register. If we select the
"drive command register" as the "keep-alive" register, then
each processor must either store a drive command (i.e. do a
legal operation) or store its CI drop number in the register (it would
do this without setting the "go" bit so that the drive would not interpret
this datum as a command).

Then, a system that wishes to determine if a processor is attached to
the other port would:

	. write its CI drop number in the drive command register

	. wait 5 seconds.

	. read the drive command register. If it has been changed, then
	the other port is in use.

	. if it is unchanged, and this is the second poll, the drive
	is not being used. If it is the first poll, repeat steps 2 and beyond.

This algorithm will determine whether the processor that has been "cut
off" may continue to use the drive.

5.2 Ownership

The technique in 5.1 does not, however, prevent the case where
the other processor is not detectable because it is not running.
In this case, when this other processor is loaded, and if it
is connected to the CI, it will conclude it may use the drive also.
This will again create the case of two processors using a drive
without the benefit of a concurrency control procedure.

Therefore, we must use the medium itself to detect and manage this case.
In the case where a processor is being loaded and it discovers
a valid medium on a sharable drive, or the case where a valid
medium appears on a previously off-line drive, a concurrency
control algorithm must be execute by the processor or processors
attached to the drive.

The only case of interest is that of a "cut off" processor" having
concluded it may use a disk drive, verified the medium on the drive,
and made the data available to the system. All other cases are
covered by the drive conflict algorithm.

Following is a sufficent concurrency control algorithm:


	. if this a "cut off" processor, and the drive
	is sharable, but not in use on the other port,
	write a predefined word in the home block with a
	code to indicate this use.

	. when a "CI connected processor" is loaded, for each
	"sharable" drive that is being used on the other port
	and that has a valid medium mounted, it must read the
	home block to see if the medium is being used
	by a "cut off" processor.

Note that this algorithm has the problem that disk packs
may be dismounted without the "cut off" word being reset. This
will happen if the "cut off" processor crashes before the structure
is formally dismounted. In this case, it may not be possible to
mount the pack again. However, this case can be detected and reported
by OPR, thereby giving the operator the choice of forcibly
clearing the restriction.

5.3 Conclusion

As can be seen, choice 2 requires a significant effort in the monitor.
Perhaps the technique presented could be refined some more to
reduce the work and the risk, but in any event it will be more
complicated than choice 1. The nature of Massbus disks would
seem to militate against supporting any such "high availability"
option and therefore, TOPS-20 will not do so.

CFS-RESOURCES.TXT


                                 CFS Resources
                                David Lomartire
                                    7-Nov-84

                            Last Update:  10-Apr-85

				================








	Files:
		o  File open token..............Page 2
		o  Frozen writer token..........Page 4
		o  File access token............Page 5

	Directory Locks:

		o  Directory lock token.........Page 14
		o  Directory allocation token...Page 16

	Structures:

		o  Structure name token.........Page 22
		o  Drive serial number token....Page 22

	BAT Blocks:

		o  BAT Block lock token.........Page 26

	ENQ/DEQ:

		o  ENQ Token....................Page 28

	Appendix A

		o  Flow chart of CFSGET.........Page 29

									Page 1
1.   Files  - An opened file has one or two CFS file resources (file open token
and possibly frozen writer token) and a file access token for each  active  OFN
(at least one). The field SPTST of SPTO2 holds the access specified in the file
access  token.  While DDMP is performing CFS-directed operations on a file, all
pages of the OFN are inaccessible to any other process. This is achieved  by  a
bit SPTFO, set in SPTO2 by DDMP. FKSTA2 will contain the resource block address
(when appropriate) of the CFS resource the fork is waiting on.

     Below  is the table of contents fragment from CFSSRV which illustrates the
various file related routines:

   15. File open resource manager
        15.1.   CFSGFA (Acquire file opening locks) . . . . .   83
        15.2.   CFSFFL (Release file locks) . . . . . . . . .   84
        15.3.   CFSURA (Downgrade to promiscuous) . . . . . .   85
   16. Frozen writer resource manager
        16.1.   CFSGWL (Get write access) . . . . . . . . . .   86
        16.2.   CFSFWL (Free write access). . . . . . . . . .   87
   18. File access token resource manager
        18.1.   CFSGWT (Get write token value). . . . . . . .   91
        18.2.   CFSAWP/CFSAWT (Acquire write token) . . . . .   92
        18.3.   CFSDWT (Write token revoked). . . . . . . . .   95
        18.4.   CFSOVT (Approve sharing of OFN resource). . .   95
        18.5.   CFSGOC (Get count of resource sharers). . . .   96
        18.6.   CFSDAR (Optional data for access token) . . .   97
        18.7.   CFSFWT (Free write token) . . . . . . . . . .   97
        18.8.   CFSUWT (Release access token) . . . . . . . .   98
        18.9.   CFSBOW (Broadcast OFN update) . . . . . . . .  100
        18.10.  CFSBEF (Broadcast EOF). . . . . . . . . . . .  101
        18.11.  CFSBRD (Main broadcast routine) . . . . . . .  102
        18.12.  CFSFOD (DDMP force out done). . . . . . . . .  103

									Page 2
                                File Open Token
                                ---------------

CFS routines:	CFSGFA - Acquire file opening locks
		CFSFFL - Release file locks
		CFSURA - Downgrade to promiscuous (unrestricted, OF%RDU)

     When the file is opened via OPENF%, it will have one of the following open
types assigned to it:

Open type (spec)   CFS term (code)		OPENF% term	OPENF% bits
---------------------------------------------------------------------------
shared read	   .HTOSH - read-only shared	Frozen		not OF%THW
shared read/write  .HTOAD - full sharing     	Thawed		OF%THW
exclusive	   .HTOEX - exclusive		Restricted	OF%RTD
promiscuous	   .HTOPM - promiscuous read	Unrestricted	OF%RDU
local exclusive	   1B0!.HTOEX - local exclusive	   --		OF%DUD


                   ** CFSGFA - Acquire file opening locks **

     Called by:		GETCFS in PAGUTL

     Upon  entry to CFSGFA, the access type is converted into one of the access
codes shown above. Next, HSHLOK is called to see if a file open  token  already
exists  for  this  file.  If  it  does, a call is made to CFSUGD to upgrade the
already existing access to the new access which is requested.

     If  a  file  open  token does not already exist, CFSSPC is called to get a
short request block. CFSSPC will return a block with HSFLAG, HSHPST, and HSHOKV
zeroed as well as HSHRET set to 1,,SHTADD. Then, the following is placed in the
block:

	HSHROT		six-bit structure name
	HSHQAL		index block address
	HSFLAG		HSHTYP=access, HSHLCL set if local exclusive

     Finally,  CFSGTT  is  called  to get the token (with "try only once" set).
(Note, if the structure is set local exclusive, CFSGTT will discover  this  and
use  CFSGTL.  This will mean that HSHVTP will not be updated with the access of
the vote since no vote is required.) If the token is  acquired,  the  following
has been updated in the resource block:

	HSFLAG		HSHVTP=access of vote,  HSHCNT incremented (owned) 
	HSHFRK		FORKX of running fork
	HSHTIM		TODCLK stamp when vote approved and token obtained

     If, upon return from CFSGTT, we do not need the newly created block (T1 is
non-zero), SHTADD will be called to return the extra block to the CFS pool.

 									Page 3
                       ** CFSFFL - Release file locks **

     Called by:		FRECFS in PAGUTL

     The routine CFSFFL simply calls CFSNDO to release the file open token. See
the discussion of CFSAWT/CFSAWP for a description of CFSNDO.


         ** CFSURA - Downgrade to promiscuous (unrestricted, OF%RDU) **

     Called by:		RELOFN in PAGUTL when OFOPC in SPTO2 is 0
			(no more "normal" (non-unrestricted) openings)

     The  routine  CFSURA  is called by PAGEM to downgrade the access of a file
open token to .HTOPM whenever all open OFNs are closed. It calls CFSUGD with  a
new access of .HTOPM and decrements HSHCNT when the new access is obtained.

									Page 4
                              Frozen Writer Token
			      -------------------

CFS routines:	CFSGWL - Get write access
		CFSFWL - Free write access

     If  a  file  is  opened  for frozen write (OF%WR and not OF%THW), then the
frozen writer token is acquired after the file open token is obtained. This  is
an exclusive access token that represents the single "frozen write" user of the
file. It is held only by the system which has the file open for frozen write.


                        ** CFSGWL - Get write access **

     Called by:		CHKACC and GETCFS in PAGUTL

     Upon  entry  to  CFSGWL,  a short resource block is obtained via a call to
CFSSPC. CFSSPC will return a block with HSFLAG, HSHPST, and  HSHOKV  zeroed  as
well as HSHRET set to 1,,SHTADD. The following is then placed in the block:

	HSHROT		six-bit structure name
	HSHQAL		FILEWL+index block address
	HSFLAG		HSHTYP=.HTOEX

     Finally,  CFSGTT  is  called  to get the token (with "try only once" set).
(Note, if the structure is set local exclusive, CFSGTT will discover  this  and
use  CFSGTL.  This will mean that HSHVTP will not be updated with the access of
the vote since no vote is required.) If the token is  acquired,  the  following
has been updated in the resource block:

	HSFLAG		HSHVTP=access of vote,  HSHCNT incremented (owned) 
	HSHFRK		FORKX of running fork
	HSHTIM		TODCLK stamp when vote approved and token obtained

     If, upon return from CFSGTT, we do not need the newly created block (T1 is
non-zero), SHTADD will be called to return the extra block to the CFS pool.


                        ** CFSFWL - Free write access **

     Called by:		RELOFN and FRECFS in PAGUTL

     The  routine CFSFWL simply calls CFSNDO to release the write access token.
See the discussion of CFSAWT/CFSAWP for a description of CFSNDO.

									Page 5
                      File Access Token (OFN access token)
                      ------------------------------------

CFS routines:	CFSAWT/CFSAWP - Acquire/Acquire and reserve access token
		CFSUWT        - Release access token
		CFSFWT	      - Free write token       
		CFSDWT	      - Write token revoked	       (callback)       
		CFSOVT	      - Approve sharing of OFN	       (callback)       
		CFSDAR	      - Optional data for access token (callback)
		CFSBOW	      - Broadcast OFN change
		CFSBEF	      - Broadcast EOF
		CFSFOD	      - DDMP force out done

     Each  active  OFN  has  an access token. It may be in one of the following
modes:

	*  place-holder 	     - .HTPLH  (this value must be zero!)
	*  full sharing 	     - .HTOAD
	*  exclusive (read or write) - .HTOEX

     The location CFSOFN points to a table which is NOFN long and is indexed by
OFN. It contains the address of the resource block which describes that OFN.


         ** CFSAWT/CFSAWP - Acquire/Acquire and reserve access token **

     Called by:		NEWLFP in DISC   for .HTOEX access  (CFSAWP)
			UPDLEN in DISC   for .HTOEX access    
			GETLEN in DISC   for .HTOAD access    
			MAPBTB in DSKALC for .HTOEX access  (CFSAWP and CFSAWT)
			RELMPG in PAGEM  for .HTOAD access  (CFSAWP)
			NTWRTK in PAGEM  for .HTOEX access   
			NIC    in PAGEM  for .HTOEX access	    
			OFNTKN in PAGUTL for .HTOAD access  
			DDXBI  in PAGUTL for .HTOAD access  (CFSAWP)
			UPDPGS in PAGUTL for .HTOAD access  (CFSAWP)
			ASGOFN in PAGUTL for .HTOAD access  (CFSAWP)
			LCKOFN in PAGUTL for .HTOAD access  (CFSAWP)
			MRKOFN in PAGUTL for .HTOAD access  (CFSAWP)

     The routines CFSAWT and CFSAWP acquire the access token. The latter leaves
the resource block reserved on the system. The former does not.

     Upon  entry,  the  access  type  is checked. If zero, full shared (.HTOAD)
access is acquired.  If  not  equal  to  zero,  exclusive  (.HTOEX)  access  is
acquired.  Next,  the  SPTFO  bit in SPTO2 is checked to see if DDMP is forcing
this OFN to disk. If so, the fork goes into WTFOD wait. Otherwise we proceed to
lookup in the CFS OFN table (pointed to by CFSOFN) the address of the  resource
block  for  this OFN. If none exists (entry is zero), we continue at CFSAW1 and
add an entry.

									Page 6
     At  CFSAW1,  GNAME  is  called  to  get the structure name. Then CFSSPC is
called to obtain a short resource  block.  CFSSPC  will  return  a  block  with
HSFLAG,  HSHPST,  and  HSHOKV  zeroed  as  well as HSHRET set to 1,,SHTADD. The
following is then placed in the block:

	HSHROT		six-bit structure name
	HSHQAL		FILEWT+index block address
	HSFLAG		HSHTYP=access, HSHKPH set
	HSHCOD		OFN
	HSHPST		1,,CFSDWT
	HSHOKV		1,,CFSOVT
	HSHCDA		1,,CFSDAR
	HSHOP1		0 (transaction number)

     Next,  a  call  is  made  to  get  the  resource.  CFSGET is called if the
structure is shared. If the structure is set exclusive, CFSGTL is  used.  (Note
that  if  CFSGTL  is  called, HSHVTP will not be updated with the access of the
vote since no vote is required.) The call is made to "retry  until  successful"
so,  upon  the  return,  we  have acquired the resource. The following has been
updated in the resource block:

	HSFLAG		HSHVTP=access of vote,  HSHCNT incremented (owned) 
	HSHFRK		FORKX of running fork
	HSHTIM		TODCLK stamp when vote approved and token obtained

     Now, HSHFCT is set to be TODCLK+WRTTIM. A check is made to see if optional
data  was  returned  during the vote. The optional data will be the most recent
value of OFNLEN in HSHOPT and the transaction  number  in  HSHOP1.  This  value
represents  what  the  other  node believes OFNLEN is for that OFN. If there is
optional data present from the vote (HSHODA is set),  this  must  be  the  most
current  value of the file length and it is stored in OFNLEN. In this case, the
other node had more recent information than us so we must update  our  copy  of
OFNLEN.  Otherwise, no other copy exists and we initialize the optional data of
the resource block to contain the current OFNLEN entry for that OFN. HSHOP1  is
incremented  to  initialize the transaction number. In this case, no other node
had more recent information than us so we establish ourselves as the node which
knows the state of OFNLEN. Note that if this access token was for a bit  table,
the  structure  free count must also be maintained or established. The callback
routine CFSDAR will insure that STOFRC is called to update the  structure  free
count if appropriate.

     If  CFSAWT  was  originally called, CFSNDO will be called to undeclare the
resource. CFSNDO will decrement HSHCNT and HSHNBT will  be  "searched"  looking
for  previously rejected hosts to notify (via CFNOHN). If CFSAWP was originally
called, the resource will remain  owned  by  the  fork.  In  either  case,  the
resource  block  remains in the hash table and HSHKPH will remain set. The only
distinction is whether we own the resource or not.

     Finally,  the SPTST field in SPTO2 is set to the correct state; .SPSWR (2)
for .HTOEX access tokens or .SPSRD (1) for .HTOAD access tokens.

									Page 7
     The  description above describes what occurs when CFSAW1 is entered to add
a new OFN entry. However, there is another option. If CFSOFN  already  contains
the  address  of  a resource block, then this OFN has already been added and is
known to this CFS system.

     First,  the post callback address (HSHPST) is checked and set to 1,,CFSDWT
if it was zero. The block is then locked against removal by  either  CFSRSE  or
CFSUWT  by  incrementing  HSHLKF.  Both  of  these routines can remove resource
blocks from the hash table. So, we  are  locking  the  resource  block  against
possible  removal  from  the  hash table via the use of HSHLKF. Now, a check is
made to see if there is anyone waiting for this block. This is done by checking
HSHTWF, HSHUGD, and HSHWVT. HSHTWF gets set when the fork is going to go into a
wait state on that block (like CFSRWT). The setting of STKVAR WTFLAG  indicates
to  CFSUGW  (the  wait  routine)  if we set HSHTWF. If WTFLAG is -1, HSHTWF was
already set and if it is zero, we set HSHTWF. HSHTWF is also set when  we  call
CFSUGD  to upgrade access to the token. HSHUGD is set when we are performing an
upgrade vote on the resource. HSHWVT is set when we are voting on a resource.

     So,  if  there is someone waiting on this block, we continue at CFSUGW. We
pass into CFSUGW the address of the wait  routine;  in  this  case  CFGVOT.  In
CFSUGW,  we  place  TODCLK+^D500  in  HSHTIM  and call CFSWUP (the general wait
routine) to wait until the vote has completed. Upon  return  from  CFSWUP,  the
block  will  no longer be in a "wait state". We will clear HSHTWF if we had set
it (WTFLAG=0) and check HSHPST to see if the block has been released  (it  will
be  zero  if  so).  If  it  has not been released, we unlock it by decrementing
HSHLKF, clear some bits which could  be  left  over  from  voting  (HSHRTY  and
HSHVRS)  and  start over again at CFSAWL to try to acquire the access token. If
the block has been released, we get the value of HSHLKF and  decrement  it.  If
the  resulting  value  is  non-zero, we are not the last locker, so we couldn't
obtain the access token  and  return  without  changing  the  access  token  we
currently have (current access reflected in SPTO2). If the decremented value of
HSHLKF  is  now  zero, then we are the "owner" of the resource block. The block
address in CFSOFN will be cleared and CFSRMV will be called to remove the block
from the hash table. Again, we will return without changing the access  of  the
OFN. CFSRMV will not post the removal.

									Page 8
     If  there  is  no  one  waiting  on the resource, we can proceed to try to
upgrade our access. First, we check STRTAB to see if the structure  is  mounted
exclusively.  If  so,  we  make  HSHTYP  be  exclusive  (.HTOEX).  This is done
regardless of the access we were asking for  because  access  on  an  exclusive
structure  is  always  .HTOEX (it needs to be nothing other because this is the
only node which can use this structure). We will never be refused access to  an
OFN on an exclusive structure due to setting HSHTYP to .HTOEX (as shown below).

     Now,  we  check HSHTYP to see what kind of access we hold on the resource.
If is is exclusive (.HTOEX), then we are granted access. If CFSAWP was  called,
HSHCNT  is  incremented  in order to hold ownership. Finally, the OFN access is
set in SPTO2.
 
     If HSHTYP is not exclusive, we will have to upgrade our access to the OFN.
First  we set HSHTWF to indicate we are processing this resource block. Next we
call CFSUGD to try to upgrade our access. CFSUGD will modify HSHTYP and  HSHCNT
to  reflect  the  state  of the resource after the upgrade attempt. HSHTYP will
contain the current access and HSHCNT will contain the number of owners of  the
resource.

     If  CFSUGD successfully allows the upgrade, we will clear HSHTWF and place
TODCLK+WRTTIM in HSHFCT. The block will be unlocked via a decrement  of  HSHLKF
and,  if  CFSAWT  was originally called, HSHCNT will be decremented so that the
resource will not be held. If HSHCNT is zero, CFNOHS will be called  to  notify
any  nodes  which  were rejected in the interim. Next, if any optional data was
returned in the upgrade vote, it is placed in OFNLEN. This would  be  the  file
length for that OFN. Finally, the OFN access in SPTO2 is updated to reflect the
new access state.

     If  CFSUGD does not allow the upgrade, HSHWTM is checked to see if a retry
wait time was given. If so, it  is  added  to  TODCLK  and  placed  in  HSHTIM.
Otherwise,  TODCLK+^D20  is  used. Processing will now continue at CFSUGW, with
the wait address specified as CFSRWT. CFSRWT  will  awaken  under  one  of  the
following conditions:

	1.  The block can no longer be found in the hash table (it has been 
	    released)

	2.  The same block is found by HSHLOK (address match) and :
		a) HSHVRS or HSHRTY is set in the resource block
		b) HSHTIM is less than or equal to TODCLK

	3.  A different block is found by HSHLOK and:
		a) HSHCNT is zero
		b) HSHTYP is not .HTOEX but does match the desired access

									Page 9
                      ** CFSUWT - Release access token **

     Called by:		FRECFS in PAGUTL

     When  a  file is closed, CFSUWT is called to release the access token. The
resource block is found (via HSHLOK) and the OFN is retrieved from HSHCOD.  The
SPTO2 entry for this OFN is checked to see if anyone is waiting for this access
token.  This is indicated by the field SPTFR being set which indicates that CFS
has requested a DDMP force out to be done for that access token. (SPTFR is  set
in routine CFSDWT and cleared in CFSFOD.)

     If  SPTFR is set then a force out has been requested for this OFN. At this
point we clear the bit in OFNCFS which indicates which OFN  DDMP  should  force
out  and  call  CFSFDF  to  signal  that  the  force out is done. (CFSFDF is an
alternate entry point to CFSFOD which will always signal that the force out  is
done  regardless  of the number of sharers remaining on that OFN. In effect, it
"forces" the force out regardless of the current number of sharers of that OFN.
CFSFOD will only signal that the force out is done when there are  2  or  fewer
sharers  indicated by HSHCNT.) (OFNCFS is a multi-word bit mask scanned by DDMP
to determine which OFNs need to be forced out.) Finally, CFSUWT is continued at
the beginning to try to release the access token again.

     If SPTFR is zero, then this OFN is not being forced out. We call CFNOHS to
notify any hosts which we rejected for this OFN. Next we check HSHLKF to see if
the  resource block is locked. This will be set by CFSAWT/CFSAWP to prevent the
block from being removed from the hash table. If the block is locked, we simply
clear HSHTYP, HSHCNT, HSHKPH, and HSHPST. By  clearing  HSHKPH,  the  block  is
eligible for removal as stale by routine CFSRSE. If the block is not locked, we
clear  the  corresponding  entry  in  CFSOFN and remove the block from the hash
table via a call to CFSRMV. CFSRMV will not post the removal.


                        ** CFSFWT - Free write token **

     Called by:		DDXBI  in PAGUTL
			UPDPGS in PAGUTL
			ASGOFN in PAGUTL
			ULKOFN in PAGUTL
			UMPBTB in DSKALC

     The  routine CFSFWT simply calls CFSNDS to release the write access token.
CFSNDS is an alternate entry point to CFSNDO  which  will  employ  a  "fairness
test" when notifying other nodes of the resource release. It will set HSHRFF in
the  block  just  before  calling CFNOHS. CFSFWT is the only routine that calls
CFSNDS. See the discussion of CFSAWT/CFSAWP for a description of CFSNDO.

									Page 10
                 ** CFSDWT - Write token revoked  (callback) **

     Called by:		CFSRTV when we want to release the resource
			(Note:  CFSRSE has the ability to call a post
				routine but it should never be called 
				for a file access token since HSHKPH
				will be set and this will prevent 
				the block's removal.  In fact, if
				CFSDWT were to be called, incorrect
				or needless DDMP action could result.)

     This  routine  is  the  callback  routine placed in HSHPST when the access
token is formed. To post removals, CFSRMX is called. CFSRMX  is  the  alternate
entry  point  to CFSRMV used to do posting of removals. CFSRMX will insure that
CFSDWT is called to do any cleanup that is needed before the resource block  is
removed from the hash table.

     CFSDWT  simply  invokes  DDMP  to  force  out  the  pages of the OFN being
released. If the resource is a place holder, then nothing is  done  and  CFSDWT
just  returns. Otherwise, SPTFR is set in SPTO2 for the OFN. Note that SPTFR is
a two bit field (bits 22 and 23) so both bits will be set. Next, if we own  the
resource exclusively (HSHTYP) and the vote request is for any access other than
exclusive  (HSHVTP),  we  will clear bit 22, which is named SPTSR. If we do not
own it exclusively (we own it .HTOAD) then SPTSR will remain set.

     SPTSR  is  checked  by routine DDOCFS (in PAGUTL) when deciding what to do
with the copy of the pages. If SPTSR is zero, then the vote request was not for
exclusive access (and we had .HTOEX access) so UPDPGY is called  to  update  to
disk any modified pages and set any in memory pages to "read-only" (via the use
of the CST write bit). (In the access token state transition table in the spec,
this  is  shown  as  DDMP**.)  This  will result in a new access for the OFN of
.HTOAD (full  sharing)  on  the  processor  which  used  to  own  the  resource
exclusively.  If  SPTSR  is set, then the vote request was for exclusive (or we
only had .HTOAD access) so UPDPGX is called to  update  to  disk  any  modified
pages  and  remove  all  the  OFN pages from memory. (In the access token state
transition table in the spec, this is shown as DDMP*.) This will  result  in  a
new  access for the OFN of .HTPLH (place-holder) on the processor which used to
own the resource exclusively.

     Finally, DDCFSF is incremented to wake up DDMP and the appropriate OFN bit
in the OFNCFS bit-mask is set to indicate to DDMP which OFN to force out. Also,
T1  is  set  to  zero.  This is important because, upon return to CFSRMX, T1 is
checked and, if zero, a -1 will be placed in T1 upon return to CFSRTV  and  the
block  will not be removed from the hash table. This value in T1 is taken to be
the vote type which is placed in .CFTYP. So, this is how a -1  (or  conditional
yes)  is  generated; namely, a vote comes in which causes an access token to be
released and requires DDMP to run. When  this  is  done,  CFSFOD  will  send  a
"condition  satisfied"  message (.CFTYP = -2) to indicate that the force out is
done. (The appropriate HSHDLY bit for the node is set when the -1  is  received
and cleared when the -2 is received. This is done in CFSRVT.)

									Page 11
                ** CFSOVT - Approve sharing of OFN (callback) **

     Called by:		CFSRTV when vote is to be approved

     This  routine  is  placed  in  HSHOKV when the access token is formed. The
routine is called when the vote is to be approved in order to place this node's
optional data in the vote packet.

     If HSHOP1 is non-zero, then this node has some copy (it may be old) of the
file  length  information  (OFNLEN for that OFN). Both HSHOPT (the file length)
and HSHOP1 (the transaction number) are placed in CFDAT and CFDT1  of  the  CFS
send packet. Also, the CFODA flag will be set to indicate that optional data is
present.  If  HSHOP1  is  zero  (no  transaction  number) then this node has no
optional data to contribute concerning the file  length  and  CFDT1  (the  file
length transaction number) is set to zero. This is done because there is also a
structure  free  count  which  can be sent. So, just using CFODA is not enough.
However, sending a transaction number of zero will insure  that  processing  of
the file length data will be ignored.

     Finally  the  HSHBTF flag is checked to see if this is a bit table OFN. If
it is, SNDFRC will be called to set up the send packet with the structure  free
count  data.  SNDFRC  will call GETFRC and place the returned value of the free
count in the CFDST0 of the send packet. Next, the  current  transaction  number
for  the  structure  free count is retrieved from the CFSSTR table. This table,
indexed by structure number, contains the transaction number  values  for  each
structure.  This  transaction  value  is  placed  in  CFDST1 and, if we are the
exclusive owner of the OFN, it is incremented. This will insure that the  count
is  updated  on  the  remote system. Finally, CFODA is set to indicate there is
optional data present in the vote packet.

     This  optional  data  information will be processed by the CFSDAR callback
routine (described below).


           ** CFSDAR - Optional data for access token  (callback) **

     Called by:		CFSRVT when a vote packet arrives with optional data

     This  routine  is  placed  in  HSHCDA when the access token is formed. The
routine is called when the vote packet arrives in order to process the optional
data in the packet.

     First,  HSHBTF  is  checked to see if this a bit table. If it is, then the
structure free count may have to be updated. If structure free  count  data  is
present  (CFDST1  non-zero)  and  if  the  remote  node's  structure free count
transaction number is greater than our own (CFDST1  >  CFSSTR(str)),  then  our
copy of CFSSTR is updated and STOFRC is called to update the free count.

     Next,  we  continue  at  CFADAR  and check CFDT1 to see if any file length
optional data is present. If it is and the transaction number is  greater  than
ours,  we  return  +2  to  CFSRVT  and store the data and transaction number in
HSHOPT and HSHOP1 and set HSHODA to indicate optional data  is  present.  (This
data  will  be  taken  out  of  the  resource  block  and  placed  in OFNLEN by
CFSAWT/CFSAWP). If the transaction number in the packet  is  not  greater  than
ours, we return to CFSRVT and ignore the optional data.

									Page 12
                       ** CFSFOD - DDMP force out done **

     Called by:		DDOCFS in PAGUTL

     This  routine  is  responsible  for  signaling that DDMP has completed the
force out of OFN pages. The corresponding entry in CFSOFN is checked to see  if
an  access  token exists for it. If the table entry is empty, then this node no
longer has the resource so SPTST (the OFN access) and SPTFR (the DDMP force out
flag bits) are cleared and CFSFOD returns successfully.

     If  there is a CFSOFN table entry, it contains the address of the resource
block for the access token. We retrieve the number of sharers of  the  resource
from  HSHCNT  and,  if  greater  than  2, we cannot signal success so we return
failure. Next, any optional data that is present in the  access  token  (HSHOPT
and  HSHOP1)  is placed in a newly acquired vote packet (obtained via a call to
GVOTE1). Note that if HSHODA is zero, then there is no  optional  data  in  the
token  so  the  file length information is obtained directly from OFNLEN. If we
own exclusive access to the  OFN,  then  the  transaction  number  (HSHOP1)  is
incremented  to insure that the remote system will use the optional data we are
providing since it is the most current. If this is a bit table OFN,  SNDFRC  is
called to place structure free count optional data into the vote packet.

     Next,  we clear HSHRFF and set the new access of the OFN. If SPTSR is set,
then HSHTYP is set to .HTPLH and SPTST is set to 0. If SPTSR is not  set,  then
HSHTYP  becomes  .HTOAD  and  SPTST  becomes  .SPSRD.  Finally, we clear SPTFR,
decrement HSHCNT to "unown" the resource and  call  SCASND  to  send  the  vote
packet.  The packet type code is a -2, which is a "condition satisfied" message
which is used to indicate to the remote that the DDMP force out has  completed.
We will then return successfully.

									Page 13
2.   Directory locks and directory allocation - Directory locks are now managed
by  CFS  and  are  a CFS resource. The old LOKTAB and associated storage is now
gone. Each time a directory is locked or unlocked, a CFS resource is created or
modified. The remaining directory allocation of each active directory is also a
CFS resource.

     Below  is the table of contents fragment from CFSSRV which illustrates the
various directory related routines:

   14. Directory lock resource manager
        14.1.   CFSLDR (Lock directory) . . . . . . . . . . .   75
        14.2.   CFSRDR (Unlock directory) . . . . . . . . . .   77
        14.3.   CFSDAU (Acquire allocation entry) . . . . . .   78
        14.4.   CFAFND/CFAGET (Find/Get allocation table) . .   78
        14.5.   CFASTO (Store new allocation value) . . . . .   80
        14.6.   CFAULK (Unlock allocation entry). . . . . . .   80
        14.7.   CFAREM (Remove allocation entry). . . . . . .   80
        14.8.   CFAUPB (Undo keep here bit) . . . . . . . . .   80
        14.9.   CFAVOK (Vote to be approved). . . . . . . . .   81
        14.10.  CFADAR (Optional data present). . . . . . . .   81
        14.11.  CFARMV (Voter remove entry) . . . . . . . . .   81
        14.12.  GETDBK (Find resource block). . . . . . . . .   82

									Page 14
                              Directory Lock Token
                              --------------------

CFS routines:	CFSLDR - Lock directory
		CFSRDR - Unlock directory

     Directory  locks  are  the  only  example  of  long resource blocks. Also,
directory locks are always exclusive (.HTOEX) resources. For  the  duration  of
the lock, the process is CSKED.


                         ** CFSLDR - Lock directory **

     Called by:		LCKDNM in DIRECT (via CALLRET)

     First,  a check is made to see if the resource already exists. If it does,
and it is in "use" (HSHWVT or  HSHCNT  are  set),  then  a  vote  is  required.
Otherwise,  we  can  lock  the  directory  and  the following is updated in the
resource block:

	HSFLAG		HSHCNT incremented (owned)
	HSHFRK		FORKX of running fork
	HSHTIM		TODCLK stamp to indicate when resource acquired

     If  the  resource is not known to this node, or the current resource block
for the directory is in use, then a vote is required. First, CFSSPL  is  called
to  obtain a long resource block. CFSSPL will return a block with HSHLOS set in
HSFLAG, HSHPST, and HSHOKV zeroed as well  as  HSHRET  set  to  1,,LNGADD.  The
following is then placed in the block:

	HSHROT		six-bit structure name
	HSHQAL		DRBASE + directory number
	HSFLAG		HSHTYP=.HTOEX
	HSHCOD		DRBASE
	HSHFCT		TODCLK+DIRTIM

     Finally,  CFSGTT  is called to get the token with "retry until successful"
set so, upon return,  we  will  have  acquired  the  resource.  (Note,  if  the
structure  is  set  local  exclusive, CFSGTT will discover this and use CFSGTL.
This will mean that HSHVTP will not be updated with  the  access  of  the  vote
since  no  vote  is  required.)  The following has been updated in the resource
block:

	HSFLAG		HSHVTP=access of vote,  HSHCNT incremented (owned) 
	HSHFRK		FORKX of running fork
	HSHTIM		TODCLK stamp when vote approved and token obtained

     If, upon return from CFSGTT, we do not need the newly created block (T1 is
non-zero), LNGADD will be called to return the extra block to the CFS pool.

									Page 15
                        ** CFSRDR - Unlock directory **

     Called by:		ULKDNM in DIRECT (via CALLRET)

     The  routine  CFSRDR  simply  calls  CFSNDO  to release the directory lock
token. For these long blocks, CFSNDO will do something extra before the call to
CFSOHS; it will check the HSHBTT for any waiting forks to wake up. If it  finds
one,  CFSOHS will not be called to notify remotes of the token release. In this
way, the forks on the local node will have priority over forks on remote  nodes
for  acquiring  directory  locks.  See  the  discussion  of CFSAWT/CFSAWP for a
description of CFSNDO.

									Page 16
                           Directory Allocation Token
			   --------------------------

CFS routines:   CFSDAU 	      - Acquire allocation entry
		CFAFND/CFAGET - Find/Get allocation table
		CFASTO 	      - Store new allocation value
		CFAULK 	      - Unlock allocation entry
		CFAREM 	      - Remove allocation entry
		CFAUPB 	      - Undo keep here bit
		CFAVOK 	      - Vote to be approved	(callback)
		CFADAR 	      - Optional data present	(callback)
		CFARMV 	      - Voter remove entry	(callback)

     The  remaining  allocation  of  a  directory is cached in memory to aid in
processing page faults and file page creation. For each active directory, there
is an allocation entry and this entry is a CFS resource. The optional  data  is
the resource block carries the remaining allocation for this directory.



                    ** CFSDAU - Acquire allocation entry **

     Called by:		Various routines in PAGEM and PAGUTL.
			Each calls provides one of the following function 
			codes:

			.CFAGT==:0   -   Get and lock current allocation
			.CFAST==:1   -	 Store allocation
			.CFARL==:2   -	 Release allocation table
			.CFARM==:3   -	 Remove entry
			.CFAUP==:4   -	 Undo hold bit
			.CFAFD==:5   -	 Find it

     This  routine  is  called  and a function code is provided. Then, based on
this code, we dispatch off to the correct routine.

		Function code		Dispatch routine
		-------------		----------------
		   .CFAGT		    CFAGET
		   .CFAST		    CFASTO
		   .CFARL		    CFAULK  
		   .CFARM		    CFAREM  
		   .CFAUP		    CFAUPB  
		   .CFAFD		    CFAFND

									Page 17
                ** CFAFND/CFAGET - Find/Get allocation table **

     Called by:		QLOK 	      in PAGEM
			ASGALC 	      in PAGUTL (with CF%PRM set)
			REMALC 	      in PAGUTL
			GETCAL/GETCAH in PAGUTL (with CF%NUL and CF%HLD set)

     These  routines are used to lock the allocation table. CFAFND differs from
CFAGET only in that it will not create a  new  resource  block;  it  will  only
succeed  if  the  block already exists and can be found via the routine GETDBK.
Currently, CFAFND is never dispatched to. CFAGET will return +1  if  a  resched
took place during its operation and +2 if one did not.

     Upon entry to CFAGET, the access is determined. If CF%HLD was specified in
the  flag  bits  (the  left  half  of  T3, the Flags,,Operation word), then the
requested access is exclusive (.HTOEX).  Otherwise,  the  access  is  for  full
sharing (.HTOAD). GETDBK is called to obtain the address of the resource block.
If one does not exist, we continue at CFSDAO.

     At  CFSDAO, we will create a new resource block and vote for access to the
token. CFSSPC is called to get a short request  block.  CFSSPC  will  return  a
block  with  HSFLAG,  HSHPST,  and  HSHOKV  zeroed  as  well  as  HSHRET set to
1,,SHTADD. Then, the following is placed in the block:

	HSHROT		six-bit structure name
	HSHQAL		DRBAS0+directory number
	HSFLAG		HSHTYP=access, HSHKPH set if CF%PRM specified
	HSHPST		1,,CFARMV
	HSHOKV		1,,CFAVOK
	HSHCDA		1,,CFADAR
	HSHOPT		0
	HSHOP1		0 (transaction number)

     Finally,  CFSGTT is called to get the token (with "retry until successful"
set). (Note, if the structure is set local exclusive, CFSGTT will discover this
and use CFSGTL. This will mean that HSHVTP will not be updated with the  access
of  the  vote  since  no  vote  is  required.)  When the token is acquired, the
following has been updated in the resource block:

	HSFLAG		HSHVTP=access of vote,  HSHCNT incremented (owned) 
	HSHFRK		FORKX of running fork
	HSHTIM		TODCLK stamp when vote approved and token obtained

     If, upon return from CFSGTT, we do not need the newly created block (T1 is
non-zero), SHTADD will be called to return the extra block to the CFS pool.

     Next,  HSHLOK  is  called  to  retrieve  the resource block and the access
desired in the call to CFSDAU is checked. If it is not CF%HLD, then we  do  not
want write access so we "unown" the resource by decrementing HSHCNT.

									Page 18
     Finally,  HSHOP1  was  checked to see if optional data was returned in the
voting process. If it is non-zero, then there was optional data  returned  (the
current  allocation) and it is in HSHOPT. We return +1 from CFAGET (and CFSDAU)
now with T1 containing the current allocation, T2 the resource  block  address,
and  T3  the  transaction number. If HSHOP1 was zero, then no optional data was
returned. This means that no other node had more recent information that us  to
contribute  so  we  must  establish ourselves as the node that holds the latest
information. The value of the allocation (passed into CFSDAU in T3 and held  in
ALLC)  is  placed  in HSHOPT and HSHOP1 (the transaction number) is incremented
only if this is not a temporary entry (CF%NUL was not  specified  in  the  flag
bits  passed  into CFSDAU). We return +1 with the allocation in T1 and resource
block address in T2.

     The  description  above  outlines  what  happens when CFAGET is invoked to
return the allocation for a resource which did not exist before on  this  node.
However,  if  one  already  exists  (and is found by GETDBK) then the following
takes place.

     First,  we  check  the access bits set in T2 to see if this is a permanent
block (CF%PRM set) and, if so, HSHKPH is set in the HSFLAG  word.  A  check  is
then  made to see if anyone is using this block by checking HSHTWF, HSHWVT, and
HSHUGD. If any of these bits are set, then we wait a while at  routine  CFGVOT.
This  wait  routine  will  wakeup when HSHTWF, HSHWVT, and HSHUGD are all zero.
Upon wakeup, we try again at the top of the CFAGET routine.

     If  no  one is waiting for this block, we check to see what kind of access
we have to this resource (in HSHTYP). If it matches the type of access  we  are
requesting,  HSHCNT  is incremented. Otherwise, our access must be upgraded. We
set HSHTWF to indicate we are waiting  for  this  token,  and  call  CFSUGD  to
upgrade  our  access.  If  CFSUGD allows the upgrade, we will have obtained the
desired access to the resource and HSHTWF will be  cleared.  Otherwise,  CFSUGD
has  denied  our  upgrade  attempt and we must wait. If HSHWTM is non-zero (the
wait time for when to try again) then this is placed in HSHTIM.  If  HSHWTM  is
zero,  then  TODCLK+^D500  is  placed  in  HSHTIM. Then we wait at CFSRWT. Upon
return from CFSRWT (see the discussion of file access token for the  conditions
of  return  for CFSRWT), we try to get the token again by continuing at the top
of CFAGET.

     Once  we have acquired the desired access to the token, we check to see if
we wanted write access (CF%HLD set). If not, HSHCNT is decremented  to  "unown"
the  resource. Next, our fork number is placed in HSHFRK. Finally, T1 is loaded
with the allocation (HSHOPT), T3 with the transaction number (HSHOP1),  and  T2
with  the  resource  block address. We will return +1 if we ever had to enter a
wait routine or if we needed to upgrade our access. Otherwise, we  will  return
+2.

									Page 19
                   ** CFASTO - Store new allocation value **

     Called by:		QSET   in PAGEM
			ADJALC in PAGUTL

     This  routine  is  used  to place a new allocation value into the resource
block for particular directory  allocation  token.  Once  the  value  has  been
stored, the resource will be released. So, a store a an allocation entry has an
implied release following.

     The  block  is  located  via  a  call to GETDBK. Then the allocation value
(which is passed in and resides in ALLC) is placed in HSHOPT.  The  transaction
number  is  incremented  (HSHOP1) to insure that this is the most current entry
known. Finally, we  fall  through  into  CFAULK  to  release  the  resource.  A
description of CFAULK follows.


                     ** CFAULK - Unlock allocation entry **

     Called by:		QREL          in PAGEM
			GETCAL/GETCAH in PAGUTL

     The routine CFAULK simply calls CFSNDO to release the directory allocation
token. See the discussion of CFSAWT/CFSAWP for a description of CFSNDO.


                     ** CFAREM - Remove allocation entry **

     Called by:		REMALC        in PAGUTL
			GETCAL/GETCAH in PAGUTL

     This routine removes a resource from the hash table. The resource block is
located  via  a  call  to  GETDBK  and  HSHCNT is decremented. If the number of
sharers goes to zero, then no one is using this resource and CFSRMV  is  called
to removed it from the hash table. CFSRMV will not post the removal.


                       ** CFAUPB - Undo keep here bit **

     Called by:		DASALC in PAGUTL

     This  routine  is  used to "unlock" a resource from the node. The resource
block is found via a call to  GETDBK  and  the  "keep  here"  bit  (HSHKPH)  is
cleared.  (This  will  allow  the  routine CFARMV to signal to CFSRMX that this
resource block should not be held on the node and can be removed from the  hash
table.  CFSRMX is called from CFSRTV when an incoming vote results in releasing
the resource. If HSHKPH is set when CFARMV is called, the  resource  becomes  a
place-holder  since  it was desired that the block always remain on this node.)
Finally, CFSNDO is called to undeclare the  resource.  See  the  discussion  of
CFSAWT/CFSAWP for a description of CFSNDO.

									Page 20
                 ** CFAVOK - Vote to be approved (callback) **

     Called by:		CFSRTV when vote is to be approved

     This  routine is placed in HSHOKV when the allocation token is formed. The
routine is called when a vote is to be approved for this resource in  order  to
place  this node's optional data (the allocation data held by this node) in the
vote packet. The "optional data present in vote" flag (CFODA)  is  set  in  the
vote  packet so that the optional data will be noticed by CFSRVT when this vote
packet is received.


                ** CFADAR - Optional data present (callback) **

     Called by:		CFSRVT when a vote packet arrives with optional data

     This  routine is placed in HSHCDA when the allocation token is formed. The
routine is called when the vote packet arrives in order to process the optional
data in the packet.

     The  transaction  number  present  in the packet is compared to our own in
HSHOP1. If the packet's value is greater than ours, we return +2 to CFSRVT  and
store  the  data  and transaction number in HSHOPT and HSHOP1 and set HSHODA to
indicate optional data is present. (This will be  noticed  by  CFAGET  and  the
allocation  will  be  returned to the caller.) If the transaction number in the
packet is not greater than ours, we return to CFSRVT and  ignore  the  optional
data.


                  ** CFARMV - Voter remove entry (callback) **

     Called by:		CFSRTV when we want to release the resource
			(Note:  CFSRSE has the ability to call a post
				routine but it should never be called 
				for an allocation token since HSHKPH
				will be set and this will prevent 
				the block's removal.)

     This  routine is the callback routine placed in HSHPST when the allocation
token is formed. To post removals, CFSRMX is called. CFSRMX  is  the  alternate
entry  point  to CFSRMV used to do posting of removals. CFSRMX will insure that
CFARMV is called to do any cleanup that is needed before the resource block  is
removed from the hash table.

     Basically,  CFARMV checks HSHKPH to see if the block should be kept on the
node. If not, CFARMV will return +1 and CFSRMX will remove the  resource  block
from  the  hash  table  and  return  indicating the resource is unconditionally
available. Otherwise, the block is to be kept on the system so HSHTYP is zeroed
(this sets the state to place-holder; .HTPLH) and return +2 to  CFSRMX.  CFSRMX
will  then zero HSHCNT and HSHTYP and return to CFSRTV with 0 in T1. This value
will be used as  the  vote  type  (.CFTYP)  to  the  other  node;  0  indicates
unconditional  yes.  (Note  that  CFARMV assumes T1 contains the address of the
vote packet and is, therefore, not 0. So, before returning to CFSRMX, T1 is not
changed. This is important because CFSRMX will check T1 and,  if  it  is  zero,
assumes  that  this is a delayed yes return for a file access token. So, CFARMV
is depending upon the fact that T1 is non-zero when returning to  CFSRMX.  This
should never change.)

									Page 21
3.   Structures  -  Structure mounting is managed by CFS in order to coordinate
access to the structure by various  CFS  processors.  CFS  requires  that  each
mounted  structure be mounted with the same access by all accessing processors,
and that the structure  have  the  same  "alias"  name  on  all  the  accessing
processors.   This   is  accomplished  by  providing  2  resources  to  control
structures: structure name and drive serial number resources.

     Below  is the table of contents fragment from CFSSRV which illustrates the
various structure related routines:

   19. Structure resource manager
        19.1.   CFSSMT (Acquire structure resource) . . . . .  105
        19.2.   CFMNAM (Register structure name). . . . . . .  107
        19.3.   CFMDSN (Register drive serial number) . . . .  108
        19.4.   CFSSUG (Upgrade or downgrade mount) . . . . .  109
        19.5.   CFSSDM (Release mount resource) . . . . . . .  110
        19.6.   STRVER (Structure verify) . . . . . . . . . .  111

									Page 22
               Structure Name Token and Drive Serial Number Token
	       --------------------------------------------------

CFS routines:	CFSSMT - Acquire structure resource
		CFSSUG - Upgrade or downgrade mount
		CFSSDM - Release mount resource
		CFMNAM - Register structure name
		CFMDSN - Register drive serial number

     In  order  to  mount  a structure, both the structure name token and drive
serial number token must be acquired. The access type (whether the structure is
accessed shared or exclusive) is controlled  by  the  DSN  resource  only.  The
structure  name  token  is always created with full sharing. When a structure's
access type is changed,  only  the  DSN  resource  needs  to  be  updated.  For
multi-pack  structures,  only  the first unit is described by a resource block.
Since CFS matches a structure with a DSN, it is important that if the structure
is moved  to  another  drive  that  the  CFS  resources  be  renamed.  This  is
accomplished by having PHYSIO call CFS at CFRDSN describing the old and new UDB
for  the disk pack. Also, with the new HDA disks, it is possible for the DSN to
change during timesharing. If this occurs, PHYSIO will call CFSDSN so that  the
DSN resource can be renamed.


                   ** CFSSMT - Acquire structure resource **

     Called by:		MNTPS  in DSKALC  (for exclusive and shared)
			MSTMNT in MSTR	  (access based on user flags)

     This  routine is called when a structure is first mounted on a system. It,
in turn calls routines CFMNAM to register the  structure  name  and  CFMDSN  to
register  the  driver  serial  number.  CFSSMT  insures  that the alias for the
structure is not already in use for another structure and that the name of  the
structure  is  the  same  as  what is in use by any other CFS system. These two
conditions are sufficient to allow the alias to be used as the root name of the
structure.

     Upon  entry  to  CFSSMT, a check is made to see if this is a "reduced" CFS
system. This is controlled by defining the switch CFSDUM. If this  is  defined,
then  this is a "reduced" CFS. This system is on the CI and uses SCA to connect
to other CI-based systems. However, this processor will  not  share  structures
with  any  other  system  but  will  insure that the structures it is using are
mutually exclusive from structures used by any other CI-based  TOPS-20  system.
This  implies  that  this system will establish connections to other reduced or
full CFS systems and will participate in structure mounting votes.

     If  this  is  a  reduced  system  (MYPOR1 is not less than zero), then the
access is forced to be exclusive (.HTOEX). Otherwise, the access is  determined
from the call (passed in T2) and it will be set to either full sharing (.HTOAD)
or exclusive (T2 is zero = .HTOAD, otherwise .HTOEX).

     Finally,  CFMNAM  is  called  to register the name and CFMDSN is called to
register the serial number. If the call to  CFMNAM  fails,  we  RETBAD  and  if
CFMDSN fails, we continue at CFSSDM to undo the mount and then return failure.

									Page 23
     At  CFMNAM, we will create a new resource block and vote for access to the
structure name token. CFSSPC is called to get a  short  request  block.  CFSSPC
will  return  a  block with HSFLAG, HSHPST, and HSHOKV zeroed as well as HSHRET
set to 1,,SHTADD. Then, the following is placed in the block:

	HSHROT		six-bit structure name or alias
	HSHQAL		STRCTN
	HSFLAG		HSHTYP=.HTOAD, HSHAVT set, HSVUC set
	HSHCOD		UDBDSN XOR (STRCTK+UDBDSH)

     Finally,  CFSGET is called to get the token (with "try only once" set). If
the token is acquired, the following has been updated in the resource block:

	HSFLAG		HSHVTP=access of vote,  HSHCNT incremented (owned) 
	HSHFRK		FORKX of running fork
	HSHTIM		TODCLK stamp when vote approved and token obtained

     If, upon return from CFSGET, we do not need the newly created block (T1 is
non-zero), SHTADD will be called to return the extra block to the CFS pool.

     At  CFMDSN, we will create a new resource block and vote for access to the
drive serial number token. CFSSPC is called  to  get  a  short  request  block.
CFSSPC  will  return  a block with HSFLAG, HSHPST, and HSHOKV zeroed as well as
HSHRET set to 1,,SHTADD. Then, the following is placed in the block:

	HSHROT		UDBDSN (swapped)
	HSHQAL		STRCTK+UDBDSH
	HSFLAG		HSHTYP=access, HSHAVT set, HSVUC set
	HSHCOD		six-bit structure name or alias

     Finally,  CFSGET is called to get the token (with "try only once" set). If
the token is acquired, the following has been updated in the resource block:

	HSFLAG		HSHVTP=access of vote,  HSHCNT incremented (owned) 
	HSHFRK		FORKX of running fork
	HSHTIM		TODCLK stamp when vote approved and token obtained

     If, upon return from CFSGET, we do not need the newly created block (T1 is
non-zero), SHTADD will be called to return the extra block to the CFS pool.

     If  we just got the token exclusively, the STEXL flag is set in the status
bits of the SDB for that structure.

     If  we  were  unable  to  acquire  either of the structure resources, then
CFSGET will return failure and T2 will contain the reason for the failure. This
denial reason code is formed on a node which is responding  "NO"  to  our  vote
request.  This  code  is placed in the vote message buffer, extracted by CFSRVT
when the vote response arrives, and then  placed  in  HSHDRC  in  the  resource
block. Before CFSGET releases the block, it retrieves the code and places it in
T2.  (If  the failure was due to a conflict on the voting node, and no vote was
sent out, T2 will be set to -1.) This code will be interpreted and a meaningful
TOPS-20 error code will be passed back.

									Page 24
                   ** CFSSUG - Upgrade or downgrade mount **

     Called by:		MSTCSM in MSTR

     This  routine is used to change the access to a mount resource. It is only
valid for full CFS systems. The new access is  passed  in  T2  (T2  is  zero  =
.HTOAD, otherwise .HTOEX). Then the resource is located via a call to HSHLOK and
CFSUGA is called to upgrade (or downgrade) to the desired access.

     If  we  were  unable to upgrade our access to the structure resource, then
CFSUGA will return failure and T2 will contain the reason for the failure. This
denial reason code is formed on a node which is responding  "NO"  to  our  vote
request.  This  code  is placed in the vote message buffer, extracted by CFSRVT
when the vote response arrives, and then  placed  in  HSHDRC  in  the  resource
block.  Before  CFSUGA  returns, it retrieves the code and places it in T2. (If
the failure was due to a conflict on the voting node, and no vote was sent out,
T2 will be set to -1.) This code will be interpreted and a  meaningful  TOPS-20
error code will be passed back.


                     ** CFSSDM - Release mount resource **

     Called by:		MNTER4 in MSTR
			MSTDIS in MSTR

     This  routine is called to release the mount resource upon a dismount. For
both the resource name token and the drive serial number  token,  the  resource
block  is  found  via  a  call to HSHLOK and then removed via a call to CFSRMV.
CFSRMV will not post the removal.

									Page 25
4.   BAT  Block Lock - The BAT block lock on a structure is now a CFS resource.
It is an exclusive resource.

     Below  is the table of contents fragment from CFSSRV which illustrates the
various BAT block related routines:

   17. BAT block resource manager
        17.1.   CFGBBS (Set BAT block lock) . . . . . . . . .   89
        17.2.   CFFBBS (Release BAT block lock) . . . . . . .   90

									Page 26
                              BAT Block Lock Token
			      --------------------

CFS routines:	CFGBBS - Set BAT block lock
		CFFBBS - Release BAT block lock


                       ** CFGBBS - Set BAT block lock **

     Called by: 	LKBAT in DSKALC

     Upon  entry  to CFGBBS, CFSSPC is called to obtain a short resource block.
CFSSPC will return a block with HSFLAG, HSHPST, and HSHOKV zeroed  as  well  as
HSHRET set to 1,,SHTADD. The following is then placed in the block:

	HSHROT		six-bit structure name
	HSHQAL		-1
	HSFLAG		HSHTYP=.HTOEX

     Next,  a  call  is  made  to  get  the  resource.  CFSGET is called if the
structure is shared. If the structure is set exclusive, CFSGTL is  used.  (Note
that  if  CFSGTL  is  called, HSHVTP will not be updated with the access of the
vote since no vote is required.) The call is made to "retry  until  successful"
so,  upon  the  return,  we  have acquired the resource. The following has been
updated in the resource block:

	HSFLAG		HSHVTP=access of vote,  HSHCNT incremented (owned) 
	HSHFRK		FORKX of running fork
	HSHTIM		TODCLK stamp when vote approved and token obtained

     If, upon return from CFSGET, we do not need the newly created block (T1 is
non-zero), SHTADD will be called to return the extra block to the CFS pool.


                     ** CFFBBS - Release BAT block lock **

     Called by:		ULKBAT in DSKALC

     The  routine  CFFBBS  simply  calls  CFSNDO  to release the BAT block lock
token. See the discussion of CFSAWT/CFSAWP for a description of CFSNDO.

									Page 27
5.   ENQ/DEQ - In order to allow ENQ/DEQ to operate in a CFS environment, there
is  a  temporary CFS resource representing the ENQ on a file. Note that this is
not the same as global ENQ/DEQ across all the systems  in  a  CFS  environment.
This is always an exclusive resource.

     Below  is the table of contents fragment from CFSSRV which illustrates the
various ENQ related routines:

   20. File enqueue resource manager
        20.1.   CFSENQ (Get ENQ resource) . . . . . . . . . .  112
        20.2.   CFSDEQ (Release ENQ resource) . . . . . . . .  113

									Page 28
                                   ENQ Token
				   ---------

CFS routines:	CFSENQ - Get ENQ resource
		CFSDEQ - Release ENQ resource

     Each  time  an  ENQ file resource is first requested (an ENQ lock block is
created), CFS is called to register an exclusive ENQ resource for the file.  If
the  requesting  processor  succeeds in creating the ENQ resource, then the ENQ
will be allowed. Otherwise, the ENQ is denied.


                        ** CFSENQ - Get ENQ resource **

     Called by:		CFETST in ENQ

     Upon entry to CFSENQ, ENQSET is called to set up T1 and T2 with the proper
root  and  qualifier.  Then  CFSSPC is called to obtain a short resource block.
CFSSPC will return a block with HSFLAG, HSHPST, and HSHOKV zeroed  as  well  as
HSHRET set to 1,,SHTADD. The following is then placed in the block:

	HSHROT		six-bit structure name
	HSHQAL		FILEEQ+index block address
	HSFLAG		HSHTYP=.HTOEX, HSHLCL is set

     Next,  a  call  is  made  to  get  the  resource.  CFSGET is called if the
structure is shared. If the structure is set exclusive, CFSGTL is  used.  (Note
that  if  CFSGTL  is  called, HSHVTP will not be updated with the access of the
vote since no vote is required.) The call is made to "try  only  once".  If  we
acquire the resource, the following has been updated in the resource block:

	HSFLAG		HSHVTP=access of vote,  HSHCNT incremented (owned) 
	HSHFRK		FORKX of running fork
	HSHTIM		TODCLK stamp when vote approved and token obtained

     If, upon return from CFSGET, we do not need the newly created block (T1 is
non-zero), SHTADD will be called to return the extra block to the CFS pool.


                      ** CFSDEQ - Release ENQ resource **

     Called by:		CRELOK in ENQ
			LOKREL in ENQ

     The  routine  CFSDEQ simply calls CFSNDO to release the ENQ token. See the
discussion of CFSAWT/CFSAWP for a description of CFSNDO.

									Page 29
                                   Appendix A

                 CFS File Access Token State Transition Diagram
		 ----------------------------------------------


                                (Grant request)
				  /#\	   |
		        Any REMOTE #	   |
			  request  #	   |
                                   #      \|/
			   *************************
      LOCAL read reference * Place-holder (.HTPLH) * LOCAL write reference
               ------------*           or	   *------------
               |           *   Not locally known   *	       |       
	       |	   *************************	       |       
     --------->|			  -------------------->|<---------
     |	       |			  |                    |	 |
     | Denied \|/			  |		      \|/ Denied |
     <-------[VOTE]			  |		    [VOTE]------->
	       |			  |		       |
      	       | Granted		  |		       | Granted
	      \|/			  |		      \|/     
      ----------------------		  |	     -----------------------
      | Update OFN access  |		  |	     | Update OFN access   |
      | to "read" (.SPSRD) |		  |	     | to "write" (.SPSWR) |
      ----------------------		  |	     -----------------------
	       |			  |		       |
	       |			  |		       |
	      \|/	      LOCAL write |		      \|/     
     ************************  reference  |	   *************************
 --->* Read Access (.HTOAD) *------------->	   * Write Access (.HTOEX) *<---
 |   ************************			   *************************   |
 |	       #  #					       #       	       |   
 | REMOTE read #  #					       # Any REMOTE    |   
 |   request   #  # REMOTE write			       #   request     |   
 |	       #  #   request				      \#/     	    (Deny)
 <---(Grant)<###  #				      --------------------     |   
  \		  #				      |	Is resource held | Yes |
(Deny) 		 \#/				      |	      or      	 >----->
  |	 --------------------			      |	Fairness valid?	 |
  |  Yes | Is resource held |			      --------\|/---------
  <------<       or	    |	    			       |       
         | Fairness valid?  |				       | No    
	 --------\|/---------				       |      
		  |					      \|/     
		  | No				    -----------------------
		  |				Yes | Was REMOTE request  |
		  |<--------------------------------< for "write" access? |
		  |  				    ----------\|/----------
		 \|/					       |       
     ---------------------------			       | No
     | Update modified to disk |			       | 
     |   Flush in-core pages   |			      \|/
     ---------------------------		------------------------------
		  |				|  Update modified to disk   |
		  |				| Set in-core to "read-only" |
		 \|/				------------------------------
        ----------------------				       |       
        | Update OFN access  |				      \|/      
        | to "none" (0)      |			      ----------------------
        ----------------------			      | Update OFN access  |
		  |				      |	to "read" (.SPSRD) |
		  |				      ----------------------
		 \|/				               |
	       (Grant)					      \|/      
		  |					    (Grant)   
		  |					       |
		 \|/					      \|/      
       * Place-holder access *				* Read access *

CFSCOD.MEM



                                CFS Code Changes

                                     to the

                                 TOPS-20 Monitor
  

                                      ---


                                David Lomartire
                             TOPS-20 Monitor Group

                                   28-Jun-85

                            Last Update:  28-Jun-85


                                      ===



                                  Introduction


     This  document describes the code which has be added or changed to support
CFS in Version 6.1. Only the changes which  are  enclosed  by  the  conditional
assembly switch entitled CFSCOD are described.



                                    Contents


		o  Code changes to JSYSA.................Page 1
                o  Code changes to PHYSIO................Page 1
		o  Code changes to MSTR..................Page 1
		o  Code changes to MEXEC.................Page 2
		o  Code changes to DIRECT................Page 3
		o  Code changes to DISC..................Page 4
		o  Code changes to DSKALC................Page 6
		o  Code changes to PAGEM.................Page 8
		o  Code changes to PAGUTL................Page 12

								Page 1

                             Code Changes to JSYSA
			     ---------------------


	[]  STAD1 (.STAD)

		o  Calls CFTADB
		   	-  Broadcasts time to other cluster system when
			   STAD% done.



                             Code Changes to PHYSIO
			     ----------------------


	[]  CHK18 (PHYCHK)

		o  Calls CFRDSN
			-  Informs CFS of a unit switch
			-  Called from scheduler level via CLK2CL dispatch



                              Code Changes to MSTR
			      --------------------

	[]  MSTM30 (MSTMNT)

		o  Calls CFSSMT
			-  Registers the mount with CFS


	[]  MNTER4 (MSTMNT error)

		o  Calls CFSSDM
			-  Error recovery after mount has occurred
			-  Releases the CFS mount resource during clean up


	[]  MSTD50 (MSTDIS)

		o  Calls CFSSDM
			-  Releases the CFS mount resource upon dismount


	[]  MSTCSM

		o  Calls CFSSUG
			-  New routine added to change structure access
			-  Calls CFS to set structure SHARED/EXCLUSIVE

								Page 2

                             Code Changes to MEXEC
			     ---------------------


	[]  GETSWM

		o  Calls CFSCSC
			-  Creates CFS private pages during startup


	[]  RUNDD3 (RUNDD)

		o  AOS CFSSKC after KLIPA is loaded and started
			-  Causes the scheduler to run and call CFONLT
			-  Checks the state of our wires to the star

		o  Calls FILRST, CFSJYN, CFGTJB, and MNTPS after SETSPD run
			-  Various cluster initialization procedures


	[]  RUNDI4 (RUNDD)

		o  Calls CFTADC
			-  Obtain time received by CFS, if any
			-  Cluster time preferred over front-end time
			-  Front-end time preferred over user input

		o  Calls BRDTIM
			-  Broadcast time if entered by operator at startup
			-  Higher serial numbered systems will broadcast to 
			   lower numbered ones
			-  NOTE:  Race condition exists if systems are started
				  at the same time.  Cluster time may not
			          agree if lower numbered systems start after
			     	  higher numbered ones.


	[]  CHKR

		o  Calls CFTADC
			-  Check to see if CFS has received a new time

		o  Calls CFSJ0
			-  Checks on desire to run "background" task
			-  Currently, only STRVER will be called to run
			   result of CFSJYN setting SCVER.

								Page 3

                             Code Changes to DIRECT
			     ----------------------


	[]  LCKDNM				<Never called NOSKED>

		o  All old code replaced by CALLRET CFSLDR
			-  CFS manages directory locking
			-  LCKDNM is invoked from GDIRST and SETDIR
			-  CRSKED while directory is locked


	[]  ULKDNM				<Never called NOSKED>

		o  All old code replaced by CALLRET CFSRDR
			-  CFS manages directory unlocking
			-  ULKDNM invoked from ULKDM0; which is itself
			   invoked via the ULKDIR macro or CALL USTDIR


	[]  INVIDX

		o  Calls REMALC (see PAGUTL section)
			-  Removes allocation entry when directory deleted

								Page 4

                              Code Changes to DISC
			      --------------------


	[]  NEWLFT (NEWLFP)			<Never called NOSKED>

		o  Calls to FRECFL and GETCFL (see PAGUTL section)
			-  Used to release old OFN token and acquire new
			   one on PTT when file goes long

		o  Call to CFSAWP
			-  Gets and holds access token for PTT
			   when PTT already exists

		o   Calls to FRETOK
			-  CALLRET to CFSFWT to free the access token
			   acquired above
			-  Called upon LNGFX1 error, DSKASN failure, 
			   ASLOFN failure and in exit path


	[]  FRETOK				<Never called NOSKED>

		o  CALLRET to CFSFWT
			-  Releases the write token acquired at NEWLFT


	[]  DSKCL8 (DSKCLZ)

		o  Calls CFSBEF
			-  Tell CFS to broadcast the EOF upon file close


	[]  UPDLEN/UPDFLN			<Never called NOSKED>

		o  Calls CFSAWT
			-  Obtains the write token for OFN being updated
			-  NOTE:  Why is CFSAWT used and not CFSAWP?  
			          I assume that the token is acquired in the 
			          first place so that the OFNLEN value is forced 
			          to be correct.  (Similar to code in GETLEN)
			          Yet, CFSAWT allows for the token to be given
			          up.  So, there is no guarantee that the 
			          system will have the write token when the 
			          new length is placed in OFNLEN.  Will the
				  transaction numbers force this all to work
			          itself out?  Why was the write token needed?
				  Why not just the read token?

								Page 5

	[]  GETLEN				<Never called NOSKED>

		o  Calls CFSAWT
			-  Acquires the read token for the OFN so that the
			   most current information will be placed in 
			   OFNLEN (from the voting process)


	[]  DSKREN

		o  Call QCHK (see PAGEM section)
			-  Checks the current quota during rename

								Page 6

                             Code Changes to DSKALC
			     ----------------------


	[]  DSKAWM/DSKASW			<May be called NOSKED>

		o  Calls MAPBTF (see below)
			-  Alternate entry points to DSKASN
			-  DSKAWM does preallocation, DSKASW does not
			-  Both called only from DSKGET in PAGUTL depending
			   upon whether SPTNA set in SPTO2 and if SF%DPR is
			   set in FACTSW
			-  Both return if MAPBTF does a "resched"


	[]  MAPBTF				<Returns NOSKED upon success>

		o  Calls CFSAWT				<May be called NOSKED>
			-  Obtain write token to bit table
			-  Done only if system does not already have
			   .SPSWR access
			-  Will cause "resched" return from MAPBTF

		o  Calls XBCHK (see PAGEM section)	<May be called NOSKED>
			-  Reverify index block; done if SPTSFD set
			-  Will cause "resched" return from MAPBTF
			-  Done only if system already has .SPSWR access

		o  Calls CFSAWP				<Always called NOSKED>
			-  Obtain and hold write token to bit table
			-  Done only if system already has .SPSWR access


	[]  UMPBTB				<Returns OKSKED upon success>

		o  Calls CFSFWT				<Always called NOSKED>
			-  Called to release the write token to bit table


	[]  LKBAT				<Returns NOSKED upon success>

		o  Calls CFGBBS				<Always called NOSKED>
			-  Called to get the Bat Block lock


	[]  ULKBAT				<Returns OKSKED upon success>

		o  Calls CFFBBS		       
			-  Called to release the Bat Block lock

								Page 7

	[]  MNTBTB/WRTBTB

		o  Calls CFSSBB
			-  Called to update the resource block to reflect
			   that it is for a bit table


	[]  UPDBTB

		o  New code added which will RET if bit table OFN already
		   purged (SPTSFD set) or if system does not have .SPSWR
		   access to the OFN.  Skips disks update of bit table.


	[]  MNTPS

		o  Calls CFSSMT
			-  Routine called at startup to mount the PS:

								Page 8

                             Code Changes to PAGEM
			     ---------------------


	[]  SETP7A (MVPT)

		o  Calls QSET and QRUP
			-  Modifying the disk allocation during disk 
			   address assignment via call to DSKGET
			-  QRUP call on failure or resched by DSKGET


	[]  RELP4 (RELMP5)

		o  Calls CFSAWP			       	<Never called NOSKED>
			-  Obtains and holds the read token to the OFN
			-  Code goes CRSKED and OKSKED before call, 
			   NOSKED after


	[]  RELADR (RELMP5)

		o  Calls QCHKH and QSET
			-  Get and set new allocation upon "abort" unmap
			-  QCHKH always called NOSKED
			-  OKSKED done before call to QSET


	[]  RELP3 (RELMP5)

		o  Calls QCHKH and QSET
			-  Get and set new allocation when private page 
			-  OKSKED and NOINT done before both calls


	[]  GETT1A (GETTPD)

		o  New routine added to handle analysis of page or
		   page table not in core
			-  Dispatch is NTWRTK if write token needs to 
			   be modified
			-  Dispatch is GETXSM when XB must be verified
			-  Dispatch is NIC if page is not in core
			   (this is the action of the old code in this case)


	[]  GETT1B (GETTPD)

		o  Code to handle page age less than 100 in CST
			-  Dispatch is FRCWT if OFN being forced out;
			   this will cause fork to enter WTFOD
			-  Dispatch is GETXSM when XB must be verified
			-  Dispatch is TRP0 to handle each possible age value
			   (this is the action of the old code in this case)

								Page 9

	[]  GETTIC (GETTPD)

		o  Code to handle null pointer; page doesn't exist
			-  Dispatch is NTWRTX if write token is needed
			-  Dispatch is NPG otherwise
			   (this is the action of the old code in this case)


	[]  NTWRTK/NTWRTX/NTWRT0

		o  Calls CFSAWT				<Always called NOSKED>
			-  Routines added to acquire the OFN access token
			-  NTWRTK will obtain what is required (read/write)
			-  NTWRTX will always obtain the write token
			-  NTWRT0 has desired access passed into T2 (but 
			   write access will always be obtained if needed)


	[]  GETXSM

		o  Calls XBCHK
			-  Code added to cause the index block to be verified


	[]  XBCHK

		o  CALLRET to DDXBI (see PAGUTL section)
			-  Routine which will verify the index block


	[]  NWRBS

		o  Routine added to handle no write access in CST
			-  If exclusive needed and not currently held,
			   JRST NTWRT0 to obtain write token
			-  NOTE:  If OFN being forced out, code does
				  JRST NTWRT0 without setting T2. 
			-  Otherwise, code simply sets CSWRB in CST


	[]  NIC6A (NIC)

		o  Sets dispatch address to NTWRTX if CALL GETTPD 
		   provided a dispatch of NTWRTK
			-  Code to handle fault when page has an 
			   unassigned address
			-  Must dispatch out of NIC code if GETTPD
			   indicates the correct dispatch is NTWRTK

								Page 10

	[]  NIC62 (NIC6A)

		o  Calls NIC6UC
			-  Unlock the PTT if CALL QCHKHW did a resched

		o  Calls QREL
			-  Releases hold on quota if OFN is locked
			-  Goes into OFNLTK scheduler test

		o  Calls QREL and CFSAWT
			-  Done to obtain the write token when write 
			   access is required
			-  The allocation token is also released
			-  OKSKED done before both calls, NOSKED after

		o  Calls QSET and QRUP
			-  Update allocation after call to DSKGET assigns page
			-  QRUP called when DSKGET fails or rescheds


	[]  QLOK

		o  Calls CFSDAU
			-  Function .CFAGT for exclusive (CF%HLD)
			-  Gets allocation and locks entry
			-  RETs on a "resched" return from CFSDAU


	[]  QREL/QREL1

		o  Calls CFSDAU
			-  Function .CFARL
			-  Releases the allocation entry lock
			-  NOPs on a "resched" return from CFSDAU


	[]  QCHK

		o  Calls QLOK and QREL
			-  Checks the allocation, returns +1 if over, 
			   +2 if ok (with allocation in T1)
			-  Calls QLOK for read only
			-  NOPs on a "resched" return from QLOK


	[]  QCHKH

		o  Calls QLOK
			-  Complete replacement of old routine
			-  Check allocation and lock entry
			-  Calls QLOK for write access
			-  NOPs on a "resched" return from QLOK

								Page 11

	[]  QCHKHW

		o  Calls QLOK and QREL and QREL1
			-  Complete replacement of old routine
			-  Check allocation and lock entry only if
			   lock without a "resched"
			-  Calls QREL1 on a "resched" return from QLOK
			   and QREL if over quota and not privileged


	[]  QSET

		o  Calls CFSDAU
			-  Function .CFAST
			-  Sets the allocation and releases the entry lock
			-  NOPs on a "resched" return from QLOK


	[]  QRUP

		o  Calls QLOK and QSET
			-  Routine to undo decrement of quota and 
			   release the lock
			-  NOPs on a "resched" return from QLOK


	[]  SWPIQ3 (SWPIQ1)

		o  Sets or clears CSWRB appropriately for swapped in page

								Page 12

                             Code Changes to PAGUTL
			     ----------------------


	[]  OFNTKN

		o  Calls CFSAWT				<Never called NOSKED>
			-  Routine used by bit table logic (GSTRPG)
			   to insure that file is accessible (read access)


	[]  DDOCFS (DDMPF)

		o  Calls to CFSFOD, CFSGOC, UPDBTB, UPDPGX, UPDPGY, 
	                    UPDOF0,SWPOUX
			-  New routine to force out pages requested by
			   another processor;  called by DDMPA
			-  CSKED done before all calls


	[]  DDXBI

		o  Calls CFSAWP				<May be called NOSKED>
			-  Routine to swap in a forced out index block
			-  Obtains the read token in order to hold access
			   to the OFN on the system
			-  CSKED done before call

		o  Calls CFSFWT
			-  Releases the read token for the OFN
			-  Call is made while still CRSKED


	[]  UPDOF0

		o  Calls CFSBOW
			-  Routine called to update an OFN to disk
			-  Informs CFS that it should broadcast the
			   the OFN write - this will cause the XB to 
			   be verified on the other system if it gets
			   the token


	[]  UPDPGS/UPDPGR/UPDPGX/UPDPGY/UPDPG0
	  (regular/quick/flush/readonly/swaplow)

		o  Calls CFSAWP				<Never called NOSKED>
			-  Obtains and holds the read token for the OFN

		o  Calls CFSFWT
			-  Releases the read token acquired above

								Page 13

	[]  ASGOFW (ASNOFN)

		o  Calls CFSAWP				<Always called NOSKED>
			-  Obtains and holds the read token for an OFN
          		   which already exists on the system
			-  Always called CRSKED and NOSKED

		o  Calls CFSFWT
			-  Releases the read token for the OFN
			-  NOTE:  ULKOFN has been changed to also call
				  CFSFWT.  All of the exit points depend
				  on this.  Also, LCKOFN is not used to 
				  lock the OFN, an IORM is.  So, the code
				  does the CSKED and NOINT so that ULKOFN
				  is happen when it does it's ECSKED and
			    	  OKINT.

		o  Calls UPDPGX
			-  Flush all pages of OFN if this is the first
			   real (not OF%RDU) opening
			-  OKSKED done before the call, NOSKED after


	[]  ASOF6 (ASNOFN)

		o  Calls CFSAWP and GETCFS		<Never called NOSKED>
			-  Obtains and holds the read token for the 
			   newly created OFN
			-  Acquires any other file tokens needed
			-  OKSKED, NOINT, and CSKED done before calls, 
			   NOSKED after


	[]  ASOF4/ASOF5/ASOF3 (ASNOFN)			<Always called NOSKED>

		o  Calls FRECFS
			-  Various exits from ASOF6
			-  Releases read token acquired above


	[]  ASOF6A (ASGALC)			<Always called NOSKED>

		o  Calls CFSDAU
			-  Function .CFAGT for shared and held (CF%PRM)
			-  Routine to create a new directory allocation 
			   entry
			-  NOPs on a "resched" return from CFSDAU


	[]  CHKLAC				<Always called NOSKED>

		o  Calls GETCF0 and FRECF0
			-  Checks CFS access to OFN that is not locally
			   known to this system
			-  OKSKED done before both calls

								Page 14

	[]  CHKACC				<Always called NOSKED>

		o  Calls GETCFS
			-  Done when the count of regular opens is 
			   zero in order to check the CFS access
			-  OKSKED done before call, NOSKED after

		o  Call CFSGWL
			-  Checks for frozen writer access to an 
			   already opened file
			-  OKSKED done before call, NOSKED after


	[]  CHKOFN				<Never called NOSKED>

		o  Calls GETCF0 and FRECF0
			-  Checks for access to an OFN which is not 
			   currently on this system
			-  NOTE:  Comment says "ask for exclusive" but
				  the code actually asks for shared 
				  (THAWB).  Which is correct?


	[]  DASOFN

		o  Code added to do ECSKED if SPTLKB set
			-  Done because ASGOFW locks OFN without 
			   calling LCKOFN


	[]  DASALC

		o  Calls CFSDAU
			-  Function .CFAUP
			-  Clears the allocation entry for an OFN
			-  NOPs on a "resched" return from CFSDAU


	[]  REMALC

		o  Calls CFSDAU
			-  Functions .CFAGT and .CFARM
			-  New routine used to remove an allocation entry
			   when a directory is deleted
			-  NOPs on a "resched" return from CFSDAU


	[]  LCKOFN/LCKOFA			<May be called NOSKED>

		o  Calls CFSAWP
			-  Obtains and holds the access token to the OFN
			-  LCKOFN acquires the read token, LCKOFA the write
			-  CSKED done before call

								Page 15

	[]  ULKOFN

		o  Calls CFSFWT
			-  Releases the access token acquired above
			-  Does ECSKED after the call


	[]  RLOF00 (RELOFN)

		o  Calls QCHKH and QSET			<Always called NOSKED>
			-  Code added to set new allocation


	[]  RELOF7 (RELOFN)

		o  Calls FRECFS				<Always called NOSKED>
			-  Release all CFS file tokens for the OFN being 
			   released


	[]  RELOF6 (RELOFN)

		o  Calls CFSFWL and CFSURA
			-  Code done when file still open because share
			   count is not zero
			-  Free write lock if no more writers
			-  Downgrade to promiscuous if no more real opens


	[]  CLROFN

		o  Calls FRECFS
			-  Routine to clear OFN when swapping out page
			-  Releases all CFS file resources for the OFN


	[]  INVOFX (INVOFN)

		o  Calls FRECFS				<Always called NOSKED>
			-  Release all CFS file resources for OFN an 
			   a dismounted structure
			-  NOTE:  A bug exists in this code.  The exit
			   	  out of INVOFX goes OKSKED but it is 
				  possible to JRST to INVOFX without 
				  going NOSKED first.


	[]  GETCFS/GETCFL/GETCF0		<Never called NOSKED>

		o  Calls CFSGFA, CFSGWL, and FRECF0
			-  New routine to acquire all needed CFS file 
			   resources for an OFN

								Page 16

	[]  FRECFS/FRECFL/FRECF0		<May be called NOSKED>

		o  All entry points may call CFSFFL and CFSFWL
			-  Release file open and frozen writer token always
			   unless OFN is OFN2XB

		o  FRECFS and FRECF0 call CFSUWT
			-  Always release the OFN access token
			-  NOTE:  CFSUWT will simulate a CFSFOD call if 
				  the OFN is marked as one requested to 
			  	  be forced out.  Is it always guaranteed 
				  that by the call to FRECFS/FRECF0, the 
				  disk for the OFN is in a consistent 
				  state and fully updated?  Any nodes
				  waiting for this OFN to be forced out
				  will be told "ok" and DDOCFS will not 
				  force out the OFN.


	[]  MRKOFN				<Never called NOSKED>

		o  Calls CFSAWP
			-  Obtains the read token for the OFN to be 
			   marked dismounted
			-  CSKED and NOINT done before call; done for 
			   the same reason as code in ASGOFW


	[]  GETCAL/GETCAH			<May be called NOSKED>

		o  Calls CFSDAU
			-  Functions CF%NUL+CF%HLD+.CFAGT,
			   .CFARL (if GETCAL), .CFARM
			-  Routine to search for the directory allocation
			-  NOPs on the "resched" returns from CFSDAU


	[]  ADJALC

		o  Calls CFSDAU				<Always called NOSKED>
			-  Function .CFAST
			-  Adjusts allocation entry value
			-  NOPs on a "resched" return from CFSDAU