Trailing-Edge
-
PDP-10 Archives
-
BB-H311D-RM
-
documentation/cfs-info.memos
There is 1 other file named cfs-info.memos in the archive. Click here to see a list.
MEMOS IN THIS FILE:
o CFS Functional Specification
o Global CFS Job Numbers Specification
o Structure Handling Modifications for CFS
o Dual-ported Disk Handling with CFS
o CFS Processing Description
o CFS Error Recovery Processing
o CFS Resource Handling
-------------------
Functional Specification for
TOPS20 COMMON FILE SYSTEM
Page 2
1.0 PRODUCT OVERVIEW
1.1 Product Description
This project is to develop a "Common File System" for
TOPS20. The Common File System capabilities are applicable
to configurations of two or more 36-bit processors, each
with its own main memory, interconnected by a high speed bus
("CI"). The objective of the Common File System ("CFS") is
that disk structures and files within such a system are
available to jobs on all processors, regardless of the
physical connections of the disk devices.
1.1.1 Architectural Position -
CFS is a component of the "loosely-coupled systems"
architecture and is the first piece of that architecture to
be implemented. Some of the other components are:
1. DECnet/CI
2. CI-wide IPCF
3. CI-wide ENQ/DEQ
4. CI-wide GALAXY
As can be seen, the ultimate LCS product is an extensible
multi-processor system. CFS is being implemented first
because it is the most visible of the pieces and because it
provides a useful extension to TOPS-20 even without the
other LCS components.
1.1.2 Reltationship To Other CI Products -
CFS is indpendent of the other high-level CI protocols.
That is, CFS can exist on a system that does not support
MSCP. All that is required is the SCA layer of the CI
protocol. In the following sections, mention is made of
MSCP, the MSCP server and other CI applications protocols.
These references are provided to explain the relationship of
CFS to the other committed CI products, but CFS remains
distinct and independent of them. The specifics of these
other protocols, and any limitations or restrictions, are
described in other documents.
Page 3
1.2 Markets
The Common File System is a general operating system
capability and is applicable to all present DECSYSTEM-20
markets.
1.3 Competitive Analysis
This project provides more and larger configuration
alternatives than previously available. This project is not
closely related to "distributed processing" in that it is
only applicable to configurations on a CI and therefore
within the 100 meter limit of the CI.
VAX/VMS is developing a means for multiple processors to
reference files on a single disk system; however, the basic
difference in filesystem architecture between TOPS20 and VMS
makes the projects somewhat different.
Related capabilities include "Network File Access" or
other techniques for moving files among nodes of a network.
CFS is a more powerful and transparent form of file access
because it implements all monitor file primitives visible to
the user program and operates over a high speed bus.
"Multiple Processors" (implying shared memory as with
TOPS10 SMP) is a related capability. SMP is a more powerful
approach to the use of multiple processors in that it
provides greater transparency and better dynamic load
leveling. There are compensating advantages of CFS over SMP
in the area of failsoft and isolation of failures, and in
the maximum size of configurations which can be supported.
1.4 Product Audience
The principal customer for CFS is one who now has, or who
needs, multiple KL-10 processors and wishes to run them as a
single system. Since DEC is not offering a follow-on PDP-10
processor, most, if not all, of the LCG customers fit this
description.
CFS-20 is meant as a complement to DECnet services, and
in some cases sites with multiple KL10's may find that
sharing via DECnet is adequate.
Page 4
2.0 PRODUCT GOALS
2.1 Performance
1. All unprivileged monitor calls which affect disk
files on present one-processor TOPS20 systems will work and
will have the intended effect on any disk structure within
the configuration.
2. The overhead associated with maintaining the common
file data base on multiple processors will cause an increase
of not more than 10% in execution time of file primitives
and operations.
3. A processor referencing files on a disk not directly
connected will incur no additional overhead in transferring
data.
We expect to use MSCP (Mass Storage Control Protocol) for
the data transfers to support file operations over the CI.
This will exist on the CI along with other protocols
supporting other functions. MSCP should achieve efficient
use of the CI, low-overhead operation of the monitors, and
high-bandwidth file interchange. File structure information
such as directories and index blocks is passed exactly as
read from disk. By passing TOPS20 file data directly, we
avoid the overhead of copying and conversion incurred with
other protocols.
However, a processor acting as a file server for another
processor will incur overhead for this activity not relating
to jobs running on it. This overhead will involve primarily
instructions executed at interrupt level and main memory
space to buffer data being transferred.
CFS supports shared-writable pages (simultaneous write
file access) on multiple processors. This is used for
various internal mechanisms (e.g. directory lookup, disk
allocation tables) as well as user program functions. This
type of access generates IO activity and overhead not
present on single-processor systems. Because data cannot
actually be referenced simultaneously by two processors, it
must be moved from one to another by the operating system.
Users will be advised of this and should arrange
applications so as to avoid frequent write references to the
same data from different processors.
Since the monitor itself uses this facility, we conducted
a study of monitor reference patterns to ensure that this
activity will not be a significant bottleneck. We recorded
monitor reference patterns to directories and disk
allocation tables under actual and simulated loads. This
was done by using the SNOOP facility to detect and record
references where the job making the reference is different
from the job which made the most recent previous reference.
Page 5
This provides worst-case data on the frequency of moving
monitor data between processors. We determined that only a
few (3 or less) directories were referenced sufficiently
often to be of interest. These were all common system
directories (e.g. <SUBSYS>), and the frequency was not so
high as to suggest a problem. This small additional
overhead is greatly outweighed by the disk space savings of
not having to duplicate the SYS: files for each of the CFS
processors.
2.2 Environments
Minimum configuration requires two processors and a CI.
Each processor must have a connection to the CI. (The
question has been raised as to whether CFS might be operable
over an NI connection. This will not be supported in first
release. There should be no logical reason that NI couldn't
be used, but additional study and experience is necessary to
understand the performance implications. Additional
implementation work would also be needed.)
Each processor must have its own main memory, swapping
device, boot device, and console.
Each processor must have direct access to its public disk
structure. In a future release, it may be possible to
eliminate this requirement. However, there are other
requirements for a directly connected disk (e.g. swapping)
which will also have to be addressed.
The maximum configuration for first release of CFS is two
processors; however there shall be no CFS-specific software
limitation on a larger number. This limit is based on our
current knowledge of the CI and the lack of experience with
this architecture. The practical limit may be higher. A
maximum of one CI will be supported for first release.
2.3 Reliability Goals
1. A customer should be able to improve net system
availability of his configuration by use of multiple
processors and the CFS.
2. The CFS should cause no significant decrease in the
reliability of each single processor.
3. Failure of one processor will have no effect on other
processors except for file data which is in the memory of
the failing processor.
Page 6
2.4 Non Goals
1. CFS will support only disks.
2. CFS is not intended to work with operating systems
other than TOPS20 or with machine architectures other than
36-bit.
3. CFS does not provide any automatic balancing of job
load among processors in a configuration as does SMP.
However, users should find it convenient to login to the
less loaded processor and/or to switch processors (by
logging out and back in again) if the load becomes
unbalanced.
4. Applications that rely on ENQ/DEQ and OF%DUD are not
supported by CFS.
5. IPCF applications will not communicate over CFS
processors.
3.0 FUNCTIONAL DEFINITION
3.1 Operational Description
Two or more processors are interconnected via a
high-speed bus ("CI") having a bandwidth at least comparable
to disk transfer rates. A disk which is to be used by a
processor must have a direct path to that processor; the
disk must be either on the CI, on a directly connected
MASSBUS, or attached to another KL-10 running an MSCP
server.
Page 7
------------ --------------
! HSC ! ! HSC !
! DISK ! ! DISK !
! ! ! !
------------ --------------
!! !!
!! !!
CI =====================================/ / / =============
!! !! !!
!! !! !!
------------ -------------- -------------
! KL10 ! ! KL10 ! ! KL10 !
! CPU&MEM ! ! CPU&MEM ! ! CPU&MEM !
------------ -------------- -------------
!!
!! MASSBUS
-------------
! !
! MASSBUS !
! DISKS!
-------------
One or more logical structures exist on the set of disks.
All of these structures are visible to jobs on all of the
processors unless the system administrator specifically
declares particular structures as "exclusive" to a
particular processor.
In order to provide access to Massbus disks connected to
a KL10, the KL10 will act as a logical disk controller on
the CI for the Massbus disks. There is no visible
distinction between a disk structure directly connected to a
processor and one which is accessed via another processor.
The usual monitor calls are used to access files and
structures, and all file open modes are allowed with the
exceptions listed below. Shared file access is permitted,
and programs need not be aware that other jobs sharing a
file are on different processors; however, it may be
advisable for reasons of efficiency to avoid simultaneous
modification of a file on different processors.
File facilities specifically include:
1. File naming and lookup conventions (GTJFN) -
File names on the common file system include structure,
directory and subdirectories, file name, extension,
generation number, and attributes. Full recognition
and wild-carding is available; name stepping (GNJFN);
normal access to FDB.
Page 8
2. Usual open and close modes (OPENF, CLOSF,
CLZFF).
3. Usual data transfer primitives, both sequential
and random (BIN, BOUT, SIN, SOUT, SINR, SOUTR, RIN,
ROUT, DUMPI, DUMPO).
4. File-to-process mapping (PMAP) including all
modes (shared read, copy-on-write, shared write,
unrestricted read).
5. The device type associated with files on the
common file system is the same as that presently used
for disk.
6. Privileged operations MSTR (mount structure) and
DSKOP.
The above includes all file system primitives relating to
accessing files and transferring data but does not include
other primitives which may use certain file system entities
but which are considered separate and distinct facilities
(e.g. ENQ/DEQ).
3.2 Restrictions
A file open with OF%DUD (don't update disk) on one
processor may not be opened on any other processor. This
results from the fact that processors share file data by
writing any changed files to the disk before passing control
to another processor. Since OF%DUD implies that the disk
copy of a file may not be changed until the user process
approves the change, OF%DUD cannot be supported with CFS.
Other devices such as magtapes and line printers are not
part of CFS and may not be open simultaneously on multiple
processors.
Use of simultaneous write access with active writing of
file data by jobs on different processors requires the
system to move pages among the processors and hence will be
much slower than on a single processor. The write token is
maintained on a per-OFN basis. This means that a program
requiring write access to any one or more pages must have
exclusive access to the entire OFN. Each OFN represents
256K words of the file. For large files, programs on
different processors could be executing simultaneous write
references with no delay if they were referencing data in
different 256K sections of the file.
A structure must be "mounted" on any processor which is
to access files on it. To be physically removed from a
drive, a structure must be dismounted by all processors.
Page 9
The relevant Galaxy components should be modified to provide
mount information from processors other than the one on
which they are running, but this in not planned for FCS of
CFS. Hence an operator will have to query the OPR program
on each processor to find out what users have the structure
mounted. Each processor will know, however, which other
processors have the structure mounted so that the operator
can quickly determine if the structure can be removed.
Finally, it is not possible for two or more CFS
processors to establish an ENQ resource for the same file.
This restriction of ENQ is made to prevent malfunctioning of
programs that rely on ENQ as a file semaphore and will be
removed once the LCS-wide ENQ/DEQ facility is provided.
4.0 COMPATIBILITY
4.1 DEC Products
All program and user interfaces are compatible with
previous versions of TOPS20.
Mountable disk structures are compatible with previous
versions of TOPS20.
4.2 DEC Standards
The CFS will use the corporate SCA protocol on the CI bus
and will use a private SYSAP-level protocol.
The CFS will not use DECNET.
4.3 External Standards
None applicable.
Page 10
5.0 EXTERNAL INTERACTIONS AND IMPACT
5.1 Users
All users of the disk file system are potential users of
CFS; however most users will not be aware of or affected by
CFS. Some applications developers will rely on CFS to allow
applications to exist on multiple processors and communicate
through files.
5.2 Products That Use This Product
The following may use CFS: RMS, DIF, language OTS's.
5.3 Products That This Product Uses
The following hardware components are required:
KLIPA (CI20) - Interface between KL10 and CI bus.
KL10 Microcode - modifications to support "write
access in CST".
The following software modules are required:
KLIPA driver
and time. Systems Communications Services (SCA/SCS).
The following are optional MSCP driver
MSCP server
5.4 Other Systems (Networks)
The CFS is not visible to other network hosts; the files
in the CFS disk structures may be accessible by remote nodes
as provided by other facilities (DAP, NFT, etc.) Each
processor in a CFS configuration is a separate network node
with its own node name.
The CFS itself does not use node names to reference files
and hence is independent of any constraints or requirements
of network node naming.
Page 11
5.5 System Date And Time
The CFS systems guarantee that they all use the same date
and time. This requirement insures that files written on
one of the processors will have a creation date and time
consistent with the other CFS processors. If the processors
were allowed to have different date and time values, many of
the file-oriented utilities would malfunction.
This is accomplished by having the systems inform each
other whenever the local date and time is changed. Also, a
newly loaded system will use the date and time provided by
the other CFS systems. This last item implies that a CFS
system loaded while at least one other CFS system is already
running will not have to prompt the operator for the date
and time. In order to make the start-up dialog seem the
same, the system will type:
The date and time is: xxxxxxxxxx
where it now prompts for the date and time. This also
serves as a check on the date and time.
5.6 Job Numbers
The CFS systems must use a mutually exclusive set of job
numbers. This is because many system utilities, and user
programs, include the job number in the name of "session"
files and other per job data to avoid conflicts among jobs
using the same file directories. CFS systems, therefore,
will acquire a set of global job numbers to use and will
insure that no other CFS system uses those numbers. This
implies that the user-visible job number, as seen in a
SYSTAT command, may not correspond to the monitor's internal
representation for that job. However all JSYSes that either
provide or accept a job number will be modified to account
for the the new gloabl job numbers.
5.7 Interprocessor Communications
CFS provides only sharing of files. Without some
ancillary capability, such as DECnet, processes on different
CFS processors have now way of exchanging "events" such as
interrupts. Processes on the same processor have a choice
of several IPC mechanisms, such as
1. DECnet
Page 12
2. IPCF
3. ENQ/DEQ
4. THIBR/TWAKE
All of these provide inter-process events (viz. interrupts)
and may also "carry" some amount of data (e.g. an IPCF
message). However, CFS provides a data carrying mechanism,
namely shared files, but it provides no intrinsic event
generator.
CI/DECnet is the ideal mechanism for a CFS IPC. However,
CI/DECnet may not be available with the first release of
CFS. Therefore, there will be no reliable IPC for use by
"distributed" applications.
It is possible, however, to implement THIBR/TWAKE across
CFS processors. This is true because CFS will guarantee
that the job numbers used by the various processors are
mutually exclusive of one another.
Presently, there is no commitment to provide a CFS-wide
TWAKE (THIBR needs no changes), but the work required is
modest.
5.8 Data Storage, File/Data Formats, And Retrieval
The CFS requires an open file data base which is resident
in each processor of a configuration. 2-4 words per OFN are
required. Other resident storage requirements are one page
(512 words) or less. As a side effect of allowing all
processors access to all mounted structures, it may be
desirable to build standard monitors with a larger number of
mountable structures than at present.
The file structure will be identical with previous
releases of TOPS20.
Files may be saved and restored with DUMPER without
regard to which processor DUMPER is run on, except that
DUMPER must be running on the processor which has direct
connection to the required tape drive.
5.9 File And Data Location
CFS is unaware of the physical location of file data.
That is, a shared file may be located on a CI disk, a shared
Massbus disk or on a disk accessed by the MSCP server.
Page 13
This latter case, that of the MSCP server, should be used
only when absolutely required. That is, if the file could
be located on a CI disk or a shared Massbus disk, it should
be. Files accessed through the MSCP server impose a
significant burden on the processor running the server, and
if such files are accessed frequently, the result may well
be unacceptable. Clearly, files that must reside on PS:
structures and must also be shared may by shared only
through an MSCP server. However, such files should not be
frequently accessed by other processors.
It is, for example, entirely inappropriate to place all
of the SYS: files for all of the CFS processors on disks
that must be accessed be the MSCP server.
5.10 Protocols
CFS will use the corporate SCA protocol on the CI bus.
CFS will use a private protocol for control of file
openings, structure mounts, file state transitions, etc.
There is no present corporate protocol which supports these
functions.
The CFS protocol uses only SCA messages. The general
format of a CFS message is:
DEFSTR CFUNQ,SCALEN,35,18 ;NUMBER OF THIS VOTE OR REQ UNIQUE CODE
DEFSTR CFCOD,SCALEN,17,6 ;OPCODE FOR VOTING
.CFVOT==1 ;VOTER
.CFREP==2 ;REPLY TO VOTE
.CFRFR==3 ;RESOURCE FREED
.CFCEZ==4 ;SEIZE RESOURCE
.CFBOW==5 ;Broadcast OFN change
.CFBEF==6 ;Broadcast EOF
DEFSTR CFFLG,SCALEN,11,12 ;Flags
DEFSTR CFODA,SCALEN,0,1 ;Opt data present
DEFSTR CFVUC,SCALEN,1,1 ;Vote to include HSHCOD
CFROT==SCALEN+1 ;ROOT CODE FOR THIS VOTE
CFQAL==SCALEN+2 ;QUALIFIER CODE FOR THIS VOTE
CFTYP==SCALEN+3 ;Vote reply or request type
CFDAT==SCALEN+4 ;Optional data, it present
CFDT1==SCALEN+5 ; second word of optional data
CFDST0==CFDT1+1 ;STR free count in bit table
CFDST1==CFDST0+1 ;Transaction count of CFDST0
This format is used to both request CFS resources and to
reply to resource requests.
The SYSAP name for CFS is: LCS20$CFS. This name
uniquely identifies the TOPS-20 CFS SYSAP for a homogeneous
CI environment. Since there is no central registry of SYSAP
names, configuring a CI with other processor types (e.g.
Page 14
VAX) may result in confusion of names and protocols.
5.11 Protocol Operation
The CFS protocol is a "veto" protocol. That is, each
request must be approved by all of the CFS processors or it
is disallowed. Therefore, a single dissenting processor is
sufficient to refuse a request.
Each processor is required to remember only the resources
it owns. Therefore, when it "votes", it expresses only the
relationship of the request to its own resources.
Consequently, each processor must be polled every time a CFS
resource change is to occur.
5.12 Modifications To MSTR
The MSTR JSYS has been modified to allow structures to be
declared to be "shared" or "exclusive". A shared structure
may be mounted by other CFS processors, whereas an exclusive
structure may be mounted only on this processor.
The structure status bit, MS%EXL declares that a
structure is to be mounted exlusively and is returned with
the appropriate value with the structure status.
Also, there is a new MSTR function, .MSCSM, that changes
the shared/exclusive attribute of a mounted structure. The
calling sequence is:
MSTR
AC1: -2,,.MSCSM
AC2: ADDR
ADDR: device designator
ADDR+1: new attribute
5.13 CFS Components
CFS is implemented throughout the TOPS-20 monitor.
However, the code specific to the CFS protocol is contained
in the module CFSSRV. CFSSRV is the CFS SYSAP as well as a
collection of routines to interface to the preexisting
TOPS-20 services. CFSSRV uses the following SCA call backs:
1.
Page 15
.SSMGR message received
2.
.SSPBC port broke connection
3.
.SSCTL connect to listen
4.
.SSCRA connect response available
5.
.SSMSC message/datagram send complete
6.
.SSNCO node on-line
7.
.SSNWO node off-line
8.
.SSOSD OK to send data
9.
.SSRID remote initiated disconnect
10.
.SSCIA credit available
In addition, CFS uses the following SCAMPI routines:
1. SC.SOA
2. SC.RCD
3. SC.CON
4. SC.DIS
5. SC.SMG
6. SC.RMG
Page 16
7. SC.LIS
8. SC.REJ
9. SC.ACC
SCAMPI must reliably inform CFS of any CI configuration
changes, including newly established or failed port-to-port
VCs.
The remaining CFS code is found in-line as part of
existing TOPS-20 file system services.
CFS uses only SCA messages.
5.14 Significant Data Structures
The advent of CFS creates the following new data
structures and conventions:
1. A new per-OFN word, SPTO2
2. directory locks are now CFS resource blocks
3. directory allocation entries are now CFS resource
blocks
4. frozen write file openings create two resources
5. other file openings create only one resource
6. each OFN has a CFS "access token" as a CFS resource
7. each mounted structure creates two CFS resources
8. BAT block locks are CFS resource blocks
Note that in some cases the CFS resource replaces the
existing lock, viz. directory locks, and in other cases the
CFS resource exists as a "copy" of the information, viz.
structure mounts, so that the CFS protocol service can
manage the resource locally. In principle, there is no
difference between these kinds of resources, and only the
higher-level monitor code that creates the CFS resource
knows which kind each is.
The bundled CFS, that is the release 6 monitor without
CFS support, still uses CFS to manage the "changed" monitor
resources. However, in many cases, as with the file
resources, the CFS resource is not created as it is not
needed for any internal monitor coordination.
Page 17
5.15 Interfaces To CFSSRV
CFSSRV contains a number of jacket routines that
interface between the TOPS-20 file system and the CFSSRV
resource manager. The sigificant interface routines are:
CFSAWT/CFSAWP
T1/ OFN
T2/ access needed
Returns: +1 always
Called to manage the access token
CFSLDR/CFSRDR
T1/ Structure number
T2/ directory number
Returns: +1 always
Lock/unlock directory
CFSSMT
T1/ Structure number
T2/ access needed
Returns: +1 failed. Access invalid
+2 success
Mount strcuture
CFSSDM
T1/ Structure number
Returns: +1 always
Dismount structure
CFSSUG
T1/ Structure number
T2/ access
Returns: +1 can't change access
+2 success
Change structure access
CFSGFA
T1/ Structure number
Page 18
T2/ XB address
T3/ Access type
Returns: +1 access conflicts with other system(s)
+2 success
Acquire file open locks
CFSFFL
T1/ Structure number
T2/ XB address
Returns: +1 always
Delete file open resources
CFSFWL
T1/ Structure number
T2/ XB address
Returns: +1 always
Free frozen write resource
CFSGWL
T1/ Structure number
T2/ XB address
Returns: +1 conflict with other CFS system
+2 success
Acquire frozen writer resource
As these jacket routines are really an integral part of
the file system, the interfaces to these routines are really
internal file system conventions and not external
interfaces. Therefore, the detail of how these interfaces
work is beyond the scope of this functional document.
5.16 PHYSIO Services Required
CFSSRV requires a routine in PHYSIO to request that
dual-ported disks not be accessed by this processor. The
call is:
CALL PHYMPR
Page 19
Returns: +1
Also, it requires a routine to cancel the action of
PHYMPR:
CALL PHYUPR
Returns: +1
In addition to these, CFS requires that PHYSIO and its
lower level drivers correctly support access to dual-ported
Massbus disks. In particular, work must be completed in
managing dual-ported disks and in insuring that the port is
released at the proper times.
6.0 RELIABILITY/AVAILABILITY/SERVICEABILITY (RAS)
6.1 Failures Within The Product
Failures within the CFS-specific software will most
likely cause a crash of one processor in a multi-processor
environment. Such failures may include loss of recently
modified file data. Failures which affect inactive files or
file directories are possible, but should be no more
frequent than at present.
6.2 Failure Of A CFS Processor
Operation of CFS should permit crash of one processor for
any reason without loss of other processors in the
configuration. CFS relies on SCAMPI to detect a processor
failure and consequently the CFS protocol has no mecahnism
for idle polling. If a processor fails, the other
processors will be unaffected, except that the CFS code on
each of the surviving processors must "renegotiate" any
outstanding requests for file accesses.
A processor may be brought on line without restarting
other processors in the configuration.
Any disks which are available only via a failed processor
will be unavailable so long as that processor is
inoperative. If such disks are dual-ported to a different
processor, they may be mounted via that processor and remain
in use although all open files must be re-opened.
With HSC50, most disk errors will not be seen by the
processor(s). All recovery and logging will be handled by
the HSC50. Any disk errors that are reported to the
processor will be logged in the system error file for that
processor. Disk errors occurring on pages that are being
Page 20
"passed through" a processor (e.g. a KL10 servicing a
request for a Massbus disk) will be logged on the processor
to which the disk is directly connected. If a hard failure
occurs such that the server processor must inform the
requesting processor that the request could not be
completed, then the requesting processor will also log the
failure.
6.3 CI Failures
Should the CI fail, or should a processor's KLIPA fail,
the CFS processors must insure that data on shared disks is
not corrupted. This is accomplished as follows.
If a processor detects it is no longer connected to the
CI it must refrain from referencing any sharable disks. A
sharable disk is any HSC-based disk or any dual-ported
MASSBUS disk. A processor is considered no longer attached
to the CI if it cannot send a "loopback" message to itself.
Should a processor's KLIPA fail, and then be restarted
(e.g. by reloading the microcode), the processor will not
be able to continue running if there are other CFS
processors on the CI. This prohibition avoids the problem
of the system rejoining the CFS network having stale data
about the CFS resources, or having data about a previous
incarnation of the network.
Should a processor be "cut off" from the CI indefinitely,
it may continue running but without being able to access
sharable disks.
6.4 Testing For Errors
CFS will be run in the DVT environment so that it may be
evaluated with regard to faults.
Many of the "normal" CFS errors may be tested without
explicit fault insertion. The following simple procedures
tests much of the CFS error recovery code
1. halt one of the CFS processors
2. bring up a new CFS processor
3. reload one of the KLIPAs
Page 21
7.0 PACKAGING AND SYSTEM GENERATION
7.1 Distribution Media
CFS is an unbundled product. Each monitor has the bulk
of the CFS support, but non-CFS monitors have a dummy
version of the CFSSRV protocol module. CFS sites will
receive a separate tape containing the proper CFSSRV.
7.2 Sysgen Procedures
Only CFSSRV differs from a bundled to an unbundled
monitor. The SYSFLG switch, CFSSCA specifies the type of
monitor.
7.3 Bundled And Unbundled CFS Systems
There is no protection in the monitor for running a
bundled and an unbundled monitor on the same CI. This is
particularly important for F-S procedures as the KLAD
monitor may not be compatible with the system environment.
Running a mixed configuration is potentially catastrophic as
file structures may be destroyed.
8.0 REFERENCES
1. Functional Specification for Loosely Coupled Systems
(LCS) - Fred Engel, 30 April 1980
2. LCG CI Port Architecture Specification, 11-July-83
(Keenan)
3. LCS and the Common File System (Memo) - Dan Murphy,
15 Jan 1980
4. CFSDOC (memo) - Arnold Miller
CFS Global Job Numbering
Functional Specification
1.0 INTRODUCTION
The development of Common File System (CFS) on TOPS-20
produces a unique problem, requiring changes to the way job
numbers on TOPS-20 are handled. Many applications on
TOPS-20 systems assume that their job number represents a
unique identifier for their job, and use that number in the
process of synthesizing temporary file names for work space.
TOPS-20/CFS presents the possibility of two or more jobs
from different systems accessing the same user directory,
and hence those jobs are required to have unique job numbers
throughout the CFS network.
2.0 BACKGROUND
Applications commonly require temporary files in which
to store interim data to be read, written, sorted, or
otherwise manipulated. Programs such as MACRO, and LINK
commonly create temporary files when the volume of data to
be processed (in this example, a program being assembled
and/or linked) exceeds the capacity of virtual memory.
These applications must give names to their temporary files
that make them unique in their file system environment,
since it is always possible to have multiple copies of the
same application accessing the same user directory. The
solution to this identity crisis has always been to use the
local job number as part of the text of the temporary file
name itself.
In a CFS environment, not only is it possible for
multiple users on a single processor to be accessing the
same user directory, it is also possible for multiple users
on separate processors to do so. Because of this, a method
was developed to provide every job in a CFS environment with
a job number that uniquely identifies that job throughout
the CFS environment. This in turn, changes the way TOPS-20
treats job identifiers internally.
Job numbers on TOPS-20 systems have actually been used
for two purposes, which up until Release 6 were combined in
a single identifier - the 'job number'. The job number was
both a unique identifier for each job, as well as an index
into various internal monitor tables containing information
about that job. With TOPS-20 Release 6, these two functions
have been split. A 'Global Job Number' acts as the unique
Page 2
identifier for each job on all of the systems of a CFS
environment. A 'Local Job Index' is used by TOPS-20 to
access monitor job tables for information about any given
job on the local processor. Global Job Numbers can be
translated to and from the local job index for any given job
on the system by calling CFS routines described in the
following section.
To avoid further conflict of terminology, this document
will always refer to a Global Job Number as simply a 'job
number', and to a Local Job Index as a 'job index'.
3.0 NEW FUNCTIONALITY
There are four basic functions required to implement
Global Job Numbers in TOPS-20. Initialization, which
includes allocation of 'blocks' of Global Job Numbers from
the CFS pool; assignment, which allocates a single Global
Job number to a given local job index; deassignment, to
deallocate a Global job number of a job that is logging out;
and translation, to provide the monitor with access to the
Global job number of given local index, and vice versa.
One side effect of these changes has been to invalidate
all comparisons of a user-specified job number with the
highest valid job number on a given system, a value
represented by the symbol NJOBS. This symbol does not
represent the highest Global Job Number possible, but merely
the size of the local monitor's job tables. The translation
routines, then, should also provide validation of a job
number or job index being translated, and error returns to
signify when invalid data was provided, or when a given job
number or index simply doesn't exist.
4.0 NEW DATA STRUCTURES
All of the new data structures for implementing Global
Job Numbers can be found in STG. These include:
1. JOBGLB - A table of twelve-bit entries indexed by
Global job numbers used to look up the local job
index for each global job. This table resides in
the resident monitor.
2. GLBJOB - A 12-bit byte pointer that ALWAYS points
to the beginning of JOBGLB. This is used via ADJBP
n,GLBJOB, where n contains the global job number,
to look up the local job index.
Page 3
3. JOBMBT - A bit table of MXGLBS bits, one bit for
each possible global job number. Bits set to one
indicate available global job slots. This table is
in the resident monitor.
4. GBLJNO - a new JSB location used to contain the
Global job number for the current job. JOBNO (in
the PSB) contains the local job index for the
current job.
5.0 ROUTINES AFFECTED
There are currently several modules affected by this
new functionality. CFSSRV contains the following new
routines to support the four major functions described
above:
1. CFGTJB - to negotiate with the other CFS systems
for blocks of unique job numbers, and initialize
the Global Job Number data structures.
2. JBGET1 - given a local job index, this will return
a Global Job Number assigned to that index. This
routine will only return an error when there are no
more available global job numbers.
3. JBAVAL - the antecedent to JBGET1, this routine
releases a global job number when a job is logging
out.
4. GL2LCL, and LCL2GL - to translate a Global Job
Number into a local job index (GL2LCL), and to
translate a local index into a Global Job Number
(LCL2GL). These routines will return error codes
to indicate if the caller has specified an invalid
or a non-existent job number or index.
In addition, the following modules require changes to
support the new discrepency between job numbers and job
indices:
1. APRSRV
In BUGH5, Print Global Job number, not local Job
index (JOBNO)
2. DIRECT
In DELTS1, check temp file generation number
against global job number, instead of local index
Page 4
(JOBNO)
3. ENQ
In ENQOK, translate user-specified Global Job
Number to local index
In VALRQ1, use GBLJNO, instead of JOBNO.
In ENCF0H, Translate Local Job index to Global Job
Number before returning it to the user.
4. FORK
In .SJPRI, don't compare Job number to NJOBS, use
GL2LCL to convert it.
5. FUTILI
In CKJBNO/CKJBLI, the argument is a local job
index, so make sure the comments say so.
6. GTJFN
In DEFVER, user Global job number in GBLJNO to
create Temp file version
7. IPCF
In MSEND1, call MSHEAD with GBLJNO instead of JOBNO
In MUTCRE, Assume all job numbers are global, and
convert them to locals before attempting to use
them. -1 means use GBLJNO and convert it.
In MUTFOJ, convert local job index to Global Job
number before returning it to the user.
In SPLMES, use GBLJNO to build message, but use
JOBNO for index into job tables.
In LOGIMS/LOGOMS, use Global job number in RH of
message header
Samething in LOGOMO.
8. JSYSA
In ACES01, if user specifies another job, call
GL2LCL to convert it from a global job number to a
local index
In CRJOB1, save Global job number of caller, rather
than local index
Page 5
In CRJDSN, call GL2LCL to translate Global job
number to local...
In .GACCT, translate user-specified Global Job
number to a local index
In ALOCRS, get caller's job number from GBLJNO, not
JOBNO
In .SETJB, translate user-specified Global Job
number to a local index
In UFNI01, use global job number (GBLJNO) in UHJNO,
not JOBNO
9. MEXEC
In SYSINE, assign a Global Job number from CFS, and
save it in GBLJNO
In RUNDD3, Initialize CFS Global Job Number
database...
In LOG2, call JBAVAL to release Global job number
just before HLTJB call.
In LOGJOB, Print Global job number during logout.
In ELOGO, translate user-specified global job
number into local index
In .GJINF, use GBLJNO to return user's own job
number, instead of JOBNO
In .GETAB, overhaul GTTAB table to use new GTJOB
routine to translate user-specified Global Job
number into local job index. Make the tables'
'size' be the highest legal Global Job number,
MXGLBS, for range checking instead of NJOBS, which
is only the highest index value.
In .GETJI, translate user-specified global job
number into local index
In GETJIT table, return other jobs Global Job
number (from JSB)
In ATACH1, same as .GETJI
10. MSTR
In MSTJOB, always return local job index for
Global, convert globals.
Page 6
11. SCHED
In .TWAKE,, SKDRTJ, and SKDSJC, convert
user-supplied job number from Global to local.
12. STG
Make JBWDS be a function of MXGLBS, not NJOBS, so
global job numbers will work.
A new JSB location, GBLJNO, will contain the Global
job number for the current job.
1.0 Introduction
This memo describes the changes to TOPS-20 structure handling made
for the CFS-20 project.
A structure is mounted either for exclusive use or for shared use.
A structure mounted for exclusive use may be mounted on exactly
one processor and a structure mounted for shared use may
be mounted on one or more CFS-20 processors.
(Items denoted by a * are still under consideration, but are unlikely to
change very much).
2.0 MSTR changes
The designation of shared or exclusive is available as a new
structure status bit, MS%EXC. If this bit is set in the returned
status word of the .MSSGS function, the structure is mounted on
this processor for exclusive use. If the bit is not set, it is mounted on this,
and perhaps other, processors for shared use.
A structure is declared to be shared or exclusive when it is mounted.
A new flag bit, MS%EXL, has been defined for the .MSMNT function to declare
that the structure is to be mounted for exclusive use. If this bit is not
set, the structure is to be mounted for shared use. A mounted structure
is either mounted shared or exclusive, there is no "promiscuous" or
"unrestricted" mount option.
A structure that is mounted for the use of a single job, MS%XCL, will
implicitly be mounted for the exclusive use of this processor. That
is, setting MS%XCL implies the setting of MS%EXL.
*A new function has been added to MSTR, .MSCSM. This function is used
to change the designation of a mounted structure from exclusive to
shared or from shared to exclusive. The function requires a structure
designator and the desired structure mount type. The specification
is as follows:
MSTR
T1/ count,,.MSCSM
T2/ address of argument block (E)
E/ structure device designator
E+1/ new mount attribute as follows:
MS%EXL => structure is to be exclusive
0=> structure is to be shared
Errors:
MSTRX2 Wheel or Operator privilege required
MSTX17 Status change denied by CFS
MSTX16 Status change not available for non-CFS systems
If the structure is already mounted with the requested attribute, the
JSYS will succeed.
*3.0 GALAXY changes (details to be provided by GALAXY group)
MOUNTR supports two new structure attributes, SHARED and EXCLUSIVE.
MOUNTR.CMD may contain these new attributes. If a structure is not
designated either SHARED or EXCLUSIVE, it will default to SHARED for
CFS-20 systems and EXCLUSIVE for non-CFS-20 systems. This defaulting
is a function of the MSTR JSYS.
Likewise, OPR recognizes these new attributes as well. If one
directs OPR to change the structure mount attribute of a mounted
structure, MOUNTR will use the .MSCSM function of MSTR to request
the change.
MOUNTR will also use .MSCSM to implement the DISMOUNT/REMOVE operation.
When a user requests that a removable structure be removed, MOUNTR
will attempt to set the structure mount attribute for the structure
to EXCLUSIVE. This is so it can determine if any other CFS-20
systems are using the structure. If the request succeeds, the
structure may be removed. If it fails, the operator must first
dismount the structure on the other systems that have it mounted
for SHARED access. This inelegant approach is in lieu of a high-level
operator protocol that would serve to unify the operation of
a CFS-20 installation.
4.0 Miscellaneous changes
4.1 ENQ/DEQ
CFS does not provide for a distributed ENQ/DEQ facility. The current
wisdom says that ENQ/DEQ is an LCS feature and is independent of
CFS. This means that we will develop a distinct SYSAP for ENQ/DEQ
along with its protocol and functional description. However we
will not be able to do this for release 6.
Without a distributed ENQ/DEQ, we are unable to distribute many
commercial and data base applications. However, some of these
applications, when distributed, will appear to run and consequently
jeopardize the integrity of the data base (that is, TOPS-20 does
not prevent someone from attempting to distribute the application).
Applications that rely on the OF%DUD option of the OPENF JSYS
will not be allowed to run on two or more CFS nodes. Only processes
on a single machine will be able to open the file as CFS will
detect a distributed application that uses OF%DUD.
Applications that do not rely on OF%DUD, but still rely on ENQ/DEQ
for coordination, will malfunction. This is because the various
ENQ/DEQ data bases are maintained independently of one another,
and the resources locked on one processor will not preclude
resources locked on another. Therefore, it is possible, for example,
for a process on one processor to have an exclusive ENQ on
a file and a process on a different processor to also have
an exclusive ENQ on this file! Clearly, this is a violation of
the application's intent.
In order to prevent such a malfunction, ENQ/DEQ has been changed
for release 6. If an ENQ is done specifying a JFN as the
locked resource, ENQ will attempt to acquire a CFS file
resource. This CFS resource will be "exclusive" and therefore
will prevent any other CFS processor from performing an ENQ
on the same file, even if the ENQ is compatible with the
original one. DEQ will release the CFS resource.
This "feature" assumes that all ENQs are exclusive ones, clearly
a false assumption. However, it allows a simple, fool-proof
solution to the problem described above. Note that this change
does not affect ENQs done by other processes on the same processor.
These "local" ENQs continue to function as they always have - for better
or for worse.
The only change that a user will notice is that an ENQ might now
fail with a "file is busy" error. The monitor will not enqueue
requests that fail because of a conflict over the CFS resource,
even if the user requested waiting. To do that, or anything
else, would require implementing part or all of the distributed
ENQ/DEQ service, and we've simply not the time or resources
to consider that.
+---------------+
! d i g i t a l ! I N T E R O F F I C E M E M O R A N D U M
+---------------+
TO:
DATE: 12-Nov-84
FROM: Clair Grant
Ron McLean
DEPT: Large Systems
Software Engineering
LOC: MRO1-2/L10
EXT: 6877
SUBJ: TOPS-20 Multi-Access Disk Management Specification
Most of the important actions decsribed in this specification involve
accessing a disk when CI communication is, or has been, disrupted. Thus,
while still true for an HSC-controlled disk, most of the details are
interesting only if you are referring to a dual-ported MASSBUS disk because
if you can't communicate over the CI, you can't access an HSC-controlled
disk anyway.
Page 2
1.0 GOALS . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 No Data Corruption . . . . . . . . . . . . . . . . 3
1.2 5.1 Compatibility . . . . . . . . . . . . . . . . 3
1.3 Minimal Overhead Writing To The Disk . . . . . . . 3
2.0 RESTRICTIONS . . . . . . . . . . . . . . . . . . . . 4
2.1 Disk Serial Numbers . . . . . . . . . . . . . . . 4
2.2 Configurations . . . . . . . . . . . . . . . . . . 4
3.0 DEPENDENCIES ON RSX20F . . . . . . . . . . . . . . . 4
3.1 Disk Configuration . . . . . . . . . . . . . . . . 4
3.2 Stopping The CI Microcode . . . . . . . . . . . . 5
4.0 USER INTERFACE . . . . . . . . . . . . . . . . . . . 5
4.1 CHECKD Program . . . . . . . . . . . . . . . . . . 5
4.2 SMON% . . . . . . . . . . . . . . . . . . . . . . 5
4.3 SETSPD Program . . . . . . . . . . . . . . . . . . 6
4.4 BUGHLTs . . . . . . . . . . . . . . . . . . . . . 6
4.5 PAR>SHUT . . . . . . . . . . . . . . . . . . . . . 6
5.0 DATA STRUCTURES . . . . . . . . . . . . . . . . . . 6
5.1 Processor Data Block (PDB) . . . . . . . . . . . . 6
5.2 Unit Data Block (UDB) . . . . . . . . . . . . . . 7
5.3 Request-ID Status (RIDSTS) . . . . . . . . . . . . 8
5.4 CI Wire Status (WIRSTS) . . . . . . . . . . . . . 8
6.0 DISK ACCESS LOGIC . . . . . . . . . . . . . . . . . 8
Page 3
1.0 GOALS
The multi-access project has 3 goals: 1) no data corruption, 2) 5.1
compatibility, and 3) minimal writing of management data (overhead) to the
disk. The second goal is a bit unusual in that we are providing
compatiblity for a feature we didn't support prior to 6.0, we just didn't
prevent it. Some customers have come to depend on this and it is in our
best interest to allow them to continue as they have in the past.
1.1 No Data Corruption
The major goal of this project is to ensure data integrity on all
multi-accessed disks in the TOPS-20 file system. This is accomplished by
allowing TOPS-20 to write to such a disk in only 2 cases: 1) when it is
communicating via CFS with the other CPUs in the CI network, or 2) when the
other CPUs are known to be down.
1.2 5.1 Compatibility
Prior to Release 6.0 TOPS-20 did not provide a facility to manage
access to multi-accessed disks. But, it did not prevent a customer from
porting an RP06 to 2 systems and writing software to manage such a
configuration. This was clearly stated as unsupported by TOPS-20.
In Release 6.0 TOPS-20 manages multi-accessed disks with CFS, yet a
customer may still wish to use whatever local management scheme was used in
the past on certain disks. TOPS-20 provides for this by allowing the
customer to declare disk drives and disk packs as "don't-care" access,
meaning they don't-care to have TOPS-20 manage multi-access and TOPS-20
should honor all write requests.
NOTE
The customer must explicitly request this
"don't-care" designation, as described in
later sections.
1.3 Minimal Overhead Writing To The Disk
Some amount of writing to all multi-access disks is required by this
management scheme; this will be kept to a minimum. This is the primary
reason a keep-alive mechanism was rejected, to avoid constant writing to
the disk.
Page 4
2.0 RESTRICTIONS
2.1 Disk Serial Numbers
All disks in the TOPS-20 file system must have unique serial numbers
in order for TOPS-20 to operate properly; serial numbers are now an
integral part of TOPS-20's disk management and TOPS-20 outputs a BUGCHK
when it discovers a disk without a serial number.
Any disks without serial numbers must be fixed. Since RP20s don't
have serial numbers, a CHECKD command will be created to assign a serial
number to an RP20.
Unfortunately, TOPS-20 can't tell the difference between 2 disks
drvies with the same serial number (which is bad) and a disk which is
accessible via 2 RH20 channels (which is OK). Therefore, it is up to the
system manager to guarantee all disks have unique serial numbers.
2.2 Configurations
TOPS-20 will not properly manage data in the following configurations:
1. It is illegal to have a MASSBUS disk drive ported to 2 CPUs which
have the same CI node number on different CI networks.
2. It is also illegal to define a disk drive as don't-care to one CPU
and do-care (the default) to another CPU when the 2 CPUs can both
access the disk.
3.0 DEPENDENCIES ON RSX20F
Work in RSX20F is required by this project; this work will provide
TOPS-20 with better disk and CI configuration information, allowing TOPS-20
to more accurrately manage multi-accessed disks.
3.1 Disk Configuration
RSX20F must communicate its disk configuration to TOPS-20 by passing
the drive serial numbers in a configuration packet. This helps TOPS-20
determine which disk drives are potentially being assessed by other CPUs
connected to the STAR, as opposed to those whose other port is to the
Console Front End.
Page 5
3.2 Stopping The CI Microcode
RSX20F must stop the CI-20 u-code whenever the HALT or ABORT commands
are executed by the PARSER. ALso, HALT.CMD should cause the CI-20 u-code
to halt. The instruction CONO KLP,400000 will halt the CI u-code.
NOTE
The CONTINUE command should not do anything
to the CI-20. This means that after a HALT
(which stops the CI-20) the CI-20 will not be
restarted by the CONTINUE. The halted CI-20
will be restarted when detected by the
once-a-second check in PHYKLP.
4.0 USER INTERFACE
4.1 CHECKD Program
CHECKD needs new commands which allow a user to declare a structure as
"don't-care" and "do-care". This command will use the MSTR% function
.MSHOM (modify home block) to set the newly-defined word HOMDCF in the home
block of each disk in the structure.
CHECKD>ENABLE DON'T-CARE structure-name
CHECKD>DISABLE DON'T-CARE structure-name
CHECKD needs another new command to set the serial number of an RP20
disk.
CHECKD>SET DRIVE-SERIAL-NUMBER (FOR RP20)
Enter deciaml serial number:
4.2 SMON%
A new SMON% function .SFDCD is needed to declare to the running
monitor that a disk drive is "don't-care". It's arguments are:
AC1/ channel
AC2/ controller
AC3/ unit
Page 6
4.3 SETSPD Program
SETSPD needs to have a new command which will use the new SMON%
function. This command will be used in n-CONFIG.CMD.
DONTCARE channel controller unit
4.4 BUGHLTs
The TOPS-20 BUGHLT code will stop the CI-20 u-code.
4.5 PAR>SHUT
If a SHUT (or PAR>DEP 20=1) causes the KL to halt, then the KLIPA will
also be halted. If you end up at a breakpoint or nothing at all happens on
the KL, nothing will be done to the KLIPA.
5.0 DATA STRUCTURES
5.1 Processor Data Block (PDB)
The PDB resides in physical block 3 on a disk and has the following
format:
-----------------------------------------------
! Current Drive Serial Number (word 1) !
-----------------------------------------------
! Current Drive Serial Number (word 2) !
-----------------------------------------------
!Non-CI CPU Serial Number ,, !
-----------------------------------------------
! Node 0 Serial Number ,, !A!B!
-----------------------------------------------
! Node 1 Serial Number ,, !A!B!
-----------------------------------------------
! Node 2 Serial Number ,, !A!B!
-----------------------------------------------
! Node 3 Serial Number ,, !A!B!
-----------------------------------------------
! Node 4 Serial Number ,, !A!B!
-----------------------------------------------
! Node 5 Serial Number ,, !A!B!
-----------------------------------------------
! Node 6 Serial Number ,, !A!B!
-----------------------------------------------
! Node 7 Serial Number ,, !A!B!
-----------------------------------------------
! Node 8 Serial Number ,, !A!B!
-----------------------------------------------
Page 7
! Node 9 Serial Number ,, !A!B!
-----------------------------------------------
! Node 10 Serial Number ,, !A!B!
-----------------------------------------------
! Node 11 Serial Number ,, !A!B!
-----------------------------------------------
! Node 12 Serial Number ,, !A!B!
-----------------------------------------------
! Node 13 Serial Number ,, !A!B!
-----------------------------------------------
! Node 14 Serial Number ,, !A!B!
-----------------------------------------------
! Node 15 Serial Number ,, !A!B!
-----------------------------------------------
1B34 - this node's wire A to the STAR is good
1B35 - this node's wire B to the STAR is good
When a disk unit is discovered, at system start up or coming online
when the system is running, PHYSIO will make the following series of
checks:
1. If the drive serial number in the disk's PDB doesn't match the
current drive's serial number (the disk pack has been moved;
therefore any data in the PDB is invalid), zero the entire PDB and
then fill in the drive serial number and our CPU information (CPU
serial number and CI wire status). Otherwise, move to the next
check...
2. If the disk is dual-ported, fill in our CPU information (CPU
serial number and CI wire status). Otherwise, the disk is single
ported to us so eliminate any old data by zeroing the PDB and
filling in our CPU information (CPU serial number and CI wire
status).
5.2 Unit Data Block (UDB)
A new bit U1.FED will be set in the UDB of a front-end disk. (See
section 3.1).
A new bit U1.DCU in the UDB indicates the unit (disk drive) as having
been declared don't-care by SETSPD; another new bit U1.DCD indicates the
disk's home block had the HOMDCF word set by SETSPD. When a disk is found
spinning on a drive, U1.DCU and U1.DCD are set appropriately; if they are
both set, the disk will be declared don't-care. However, if one bit is on
and the other is off, TOPS-20will treat the disk as do-care.
Page 8
The UDB contains a copy of the PDB.
5.3 Request-ID Status (RIDSTS)
The table RIDSTS (indexed by CI node number) contains information
about the current status of each path to each of the other nodes on the CI.
This status is a result of periodically sending REQUEST-IDs to all the
other nodes on the CI, alternating paths on consecutive sends to an
individual node. Of interest to the disk service are the bits which
indicate if the last REQUEST-ID received an answer or there was no response
on that path to the node. If REQUEST-IDs are being answered we assume
there is a TOPS-20 running on the remote system, and if REQUEST-IDs are not
being answered we assume TOPS-20 is not currently running on the remote
system.
5.4 CI Wire Status (WIRSTS)
The locations CIWIRA and CIWIRB indicate the results of the latest
periodic loopback packets to the STAR on the 2 wires; 0 = succeeded, non-0
= failed.
6.0 DISK ACCESS LOGIC
Upon receiving a transfer request from PAGEM, PHYSIO will make the
following series of checks:
1. If the disk is single-ported, don't-care, both ports to us, or
belongs to 20F, allow access. Otherwise (the disk is dual-ported
to another CPU), move to next check..
2. If other nodes have never accessed the disk (their offsets in the
PDB are 0), allow access. Otherwise, move to next check..
3. If both of our CI wires are bad, don't allow access. Otherwise,
move to next check...
4. If REQUEST-IDs are not being answered by the other nodes, allow
access. Otherwise, move to next check..
5. If there are CFS connections to the other nodes, allow access.
Otherwiese, disallow access.
CFS PROCESSING DOCUMENTATION
CFSSRV is a lock manager. The locks it manages represent resources in
the system, but CFSSRV is not aware of the mapping of lock to
resource. The mapping, or meaning, is made by the creator of the
resource.
Files are a resource with CFS locks. Each file has the following CFS
locks:
. open type
. write access
. ENQ/DEQ lock
In addition, each of the sections of the file, represented by an OFN,
has an access token. Therefore a file has up to 512 access tokens.
When a file is opened, the "open type" and "write access" lock are
acquired. The "open type" is either"
. shared read (frozen)
. shared read/write (thawed)
. exclusive (restricted)
. promiscuous (unrestricted)
The word in parentheses represents the argument to OPENF%.
If the opener requests "frozen write" access, then if the "open type"
lock is successfully locked, i.e. no one has the file open in a
conflicting mode, the "write access" lock is acquired. This is an
exclusive lock that represents the single "frozen write" user of the
file. The lock is held by the system that has the file opened "frozen
write".
Each of the locks described above apply to a file, that is something
described by an FDB. In addition to these, each file has some number
of OFNs, one for each file section that is in use. Therefore, a file
may have up to 512 OFNs or file sections.
Each active OFN has an "access token" lock. The access token
represents the ability of the system to access the data described by
the OFN. The access token may be held in one of the following modes:
. place-holder
. read-only
. exclusive (read or write)
A read-only access token may be held by any number of systems
simultaneously. An exclusive token is held by only one system. A
"place-holder" access token is an artifact that permits the CFS
systems to agree on the end-of-file correctly. It also has some
ramifications for bit table access tokens that will be described
later. Place-holder tokens are also an optimization to avoid
reallocating tokens that have been "lost" to another system.
The file access token is the most fundamental CFS lock in that it is
used not only to control simultaneous access to user files, but also
to manage directories and bit tables.
The access token state transition table is given below, with the
action required to make the designated state change
\
\ new read exclusive place-holder
\
\
old \
------------------------------------------------------------
read nothing vote DDMP*
exclusive DDMP** nothing DDMP*
place-holder vote vote nothing
Where:
vote means that the other CFS systems must be asked for permission
to make the state transition. Voting is a fundamental operation
of CFSSRV and is done by a software implemented broadcast.
DDMP* means that DDMP must run and remove all of the OFN's pages
from memory and update the disk copy of any modified pages.
DDMP** means that DDMP must run to update to disk any modified
pages and any in memory pages must be set to "read only".
This latter operation is performed by clearing the CST write
bit. The CST write bit has been implemented in KL paging explicitly
to support loosely-coupled multi-processors.
While DDMP is performing a CFS-directed operation, all pages of the
OFN are inaccessible to any other process. This is achieved by a bit,
SPTFO, set in SPTO2 by DDMP.
Access permission to a file moves among the CFS systems on demand.
Each system must remember its state of the token so it may respond to
requests for the access permission.
The token consists of:
. The structure name
. the OFN disk address
. a flag bit to indicate this is the access token
. state
. end-of-file pointer
. end-of-file transaction number
. fairness timer
. the OFN this token is for
and, if this is a token for a bit table:
. structure free count
. structure free count transaction number
The fairness timer is a CFS service that allows a resource to be held
on a node for a guaranteed interval. Therefore, the owner need not
lock the resource and arrange to unlock it later. Rather it simply
places the guarantee interval in the resource block and the CFS
protocol takes care of the rest.
Place-holder tokens exist principally to hold the values associated
with the end-of-file pointer and with the structure free count. It is
important that these be held by each system, because the owner of the
OFN token may crash and therefore the last known state of these
quantities must be remembered so that the remaining nodes may have
the best possible value for them. The transaction count is intended
to determine whose value is the most recent should the owner not be
present to contribute the current value. During the voting for
acquiring a token, these values are passed among the CFS nodes, and
the node conducting the vote retains the values associated with the
largest transaction number.
The file access token represents the rights that a system has to
access a file section. That is, the token is associated with the
file's contents.
However, the owner of a file, i.e. the system holding exclusive
rights to access the file, also has the right to modify the file's
index block. The owning system may add pages to the file or delete
pages from the file.
OFNs are treated specially in TOPS-20. Unlike the file's data pages,
an OFN may not be discarded when the system gives up its access to
the file and read from its home on the disk when the access is
reacquired. An active index block, represented by an OFN, contains
paging information that must be retained while the file is opened.
For this reason, a system needs to be informed if the index block
contents are changed by another system.
This information is disseminated in CFS by a broadcast message. Each
time a system writes a changed index block to disk, it informs all of
the other CFS systems by a broadcast message. Note that this
broadcasting is done only when the changed index block is written to
disk, and not each time the index block is modified. A broadcast
message is used instead of including this in optional data with the
access token for reasons explained in a later section of this
document.
When a CFS system receives such a message, it sets a status bit in
the apporpriate OFN so that the next time a process attempts to
reference the OFN the following will happen:
. the disk copy of the index block is examined.
. for each changed entry, update the local OFN
This reconciliation of the index block with the local OFN is
accomplished by the routine DDXBI.
VOTING IN CFS
When a node needs to "upgrade" its access to a resource, including
acquiring a new resource, it must poll each of the other CFS nodes.
This is so because none of the CFS nodes is a master and therefore
there is no a priori location for resolving access requests. CFS is
not only a democracy, but somewhat of a cacophony.
Voting, then, requires "broadcasting" to each other node the required
resource and access. Each node must respond with its permission or
denial.
The CI does not support broadcast, and even if it did, it would not
support a reliable broadcast. Therefore, CFS implements broadcasting
by sending a message to each of the other nodes, one-at-a-time.
A vote request contains:
. function code
. resource "name" (seventy-two bits)
. access desired
. vote number
A reply contains:
. function code
. resource name (seventy-two bits)
. reply (yes, no or "qualified yes")
. vote number
. optional data
The message contains a function code because votes and replies are
only one kind of CFS to CFS communication.
The vote number is used to insure that the reply is to the proper
request. The requestor may "restart" a vote at any time. It does this
be "canceling" the current vote, acquiring a new vote number, and
broadcasting the new request. A vote number is a monotonically
increasing, thirty-six bit quantity.
A vote will be restarted for one of the following reasons:
. a configuration change is reported by SCA
. a previous vote "times out".
The latter should rarely occur, and is likely indicative of a
malfunctioning CFS on some other system. In some cases, a node will
not reply if it is unable to acquire the appropriate space for
constructing a message. There are a small number of cases where this
is legal, and for these cases, the requestor must revote when
appropriate.
When a reply is received, the vote number must match the number in
the associated resource block.
The replies to a vote are:
. unconditional yes.
. no
. conditional yes.
. cancel yes condition
A conditional yes means that the respondent will approve the request,
but it needs to perform a local housekeeping operation first. The
most common form of this is voting for an access token where the
respondent must first update the disk copy of the file, and perhaps
flush all of its local copies of the file data. When the condition
has been satisfied, a "condition satisfied" reply is sent.
Each resource has a "delay mask". This mask has a bit for each of the
other CFS nodes, and whenever a node replies with "conditional yes",
its bit is set in the resource's delay mask. Therefore, a process
that is waiting for the conditions to be satisfied, simply examines
the delay mask periodically and waits for all of the delay bits to be
cleared. While any delay bits are set, the vote is considered to be
still in progress, and therefore any configuration change will
require restarting the vote.
Conditional yes votes, and the associated delay mask, are provided to
eliminate the need for nodes to reply "no" when there are temporary
conditions preventing the approval of the request. The overhead
required to process such replies, and to wait for them, is offset by
the gains in not having to revote in the face of such conditions.
CFS provides the following basic voting services:
. Acquire a resource. If the resource is known on this
node, but the current state conflicts with the request, the
currently held resource is released and a vote is taken.
This service is called specifying either "retry until
successful", or return after one try.
. Upgrade a resource. This service tries only once. It
also guarantees that the currently held resource will not
be released. In fact, the resource may be held and "locked"
locally when "upgrade" is requested.
. Acquire local resource. This is used for resources
not shared by other CFS nodes, but managed by CFS. Examples
are directory locks on exclusive structures.
VOTE MECHANISM
A vote is started by the routine VOTEW. Ordinarily, one does not call
this routine directly, but rather one requests a resource, and if
necessary, VOTEW will be called to conduct a vote.
VOTEW always waits for the vote results. The results are tallied at
interrupt level by noting the number of replies received in the
associated resource block. VOTEW periodically examines the resource
block testing for:
. all tallies received
. a "no" vote recorded
. a configuration change
The actions taken are as follows:
. configuration change: restart the vote
. a "no" vote: return to the called
. all tallies received:
. if no "conditional yes" votes, return to caller
. If one or more "conditional yes" votes, wait for
the "condition satisified" replies. While waiting,
a configuration change could occur, requiring the vote
to be restarted.
RESOURCE ACQUISITION AND UPDATING
CFS resources are acquired and changed in response to requests from
other parts of the monitor. Rather than describe each one, it will be
instructive to consider how the file related resources are acquired,
maintained, and destroyed.
When a file is opened, and the first OFN is created, ASOFN will
create the static CFS resources: open type and, if appropriate, the
frozen writer token.
Anytime an OFN is created, be it in response to opening the file, or
one of the "long file" OFNs, ASOFN will create the access token.
The access token state is verified by various of the file system and
memory management routines. The most common place for this is in the
page fault handler. The two exceptions to this are for a bit table
access token and a long file "super index block". The bit table token
is acquired and "locked" when the bit table lock is locked and
released only when the bit table is unlocked. The token for a super
index block is occasionally acquired in DISC by the routine (NEWLFT)
that creates new long file index blocks. In theory, these exception
cases need not be exceptions. That is, the code could simply rely on
the normal management of the token during page faults to insure data
integrity. However, in these cases, the code must perform multiple
operations on the file data "atomically". That is, it must modify two
or more pages, or it must "test and set" a location with the
assurance that no other accesses to the data occur between the steps.
On a single system, this is done by a NOSKED to prevent any other
process from running. In an LCS environment, NOSKED is not sufficient
(although it is necessary!). Another form of interlock must be used
to prevent a process on another system from examining or modifying
the data. It turns out that the access token satisfies this need
quite well.
The above discussion implies that the page fault handler, when it
acquires an access token for an OFN, does not "lock" the token on the
system. That is, the token is acquired but not "held". This may
result in the token being preempted by another system before the
process is able to reexecute the instruction that caused the page
fault. The "fairness" timer in the token resource is one attempt to
minimize such thrashing.
The access token is acquired on the following conditions:
. when an OFN is being created
. when the OFN is locked
. when a page fault occurs because the current access is
not correct
The current state of the token is kept in the CFS resource block as
well as in the OFN data base. The field, SPTST, is the current OFN
state of an OFN. The values are:
0 => no access
.SPSRD => read only
.SPSWR => read/write
SPTST is modified by the routines in CFSSRV that are called to set
the state of the file. The values are set here, and not in PAGEM,
PAGFIL or PAGUTL because the OFN state must be set while the CFS
resource block is interlocked against change.
The routines to modify the state of an OFN token are:
. CFSAWT - acquire token but don't hold it
. CFSAWP - acquire token and hold it
TOKEN MANAGEMENT
Once a token is "owned" on a system, it will remain in that state
until it is required on another system. That is, if the token is held
for read/write access (exclusive), then all references to the pages
of the OFN will succeed without CFSSRV being invoked.
If a token must be revoked because another system needs it, CFSSRV
signals DDMP to process the data pages. This is done by:
. Setting bits in the field STPSR in the OFN data base.
. Setting the OFN's bit in the bit mask OFNCFS.
. Waking up DDMP.
The field STPSR is a two-bit quantity indicating the type of access
required by the requesting system. DDMP's action is as follows:
read-only needed:
Write all modified pages to the disk. Clear all of the CST
write bits in all in-memory pages.
read/write needed:
Write all modified pages to disk. Flush all "local" copies
of data including any copies on the swapping space. Swap out
the OFN page if it is in memory (actually, simply place it on
RPLQ).
Once DDMP has performed the necessary operation, it calls CFSFOD.
This routine will set the OFN state and the resource state
appropriately as follows:
read-only requested:
set OFN state to .SPSRD and set resource state to "read".
read/write requested:
set OFN state to 0 and set resource state to "place-holder".
CFSFOD also copies the current end-of-file information from OFNLEN
into the resource block and finally it sends the "condition
satisfied" message to the requestor.
While DDMP is performing its work on behalf of CFS, it sets the bit
SPTFO in the OFN data base. This bit is examined by the page fault
handler, and by CFSAWP/CFSAWT to see if the OFN is in a transition
state. If SPTFO is set, and the process requiring the OFN is not
DDMP, then the process is blocked until SPTFO is cleared by DDMP. In
order to facilitate identifying DDMP from all other processes, a new
word has been added to the PSB called DDPFRK. If DDPFRK is non-zero,
then the current process is indeed DDMP and SPTFO should be ignored.
UNUSED RESOURCES
Whenever a node replies "no" to a request, it remembers in the
associated resource block the node(s) that have been rejected. The
only reason for unconditionally denying a request is that the
resource is "held" locally. If a resource cannot be granted because
of the fairness timer, the "no" response includes an optional data
word of the time the resource is to be held. Therefore, the requestor
knows precisely when to request the resource anew.
When a held resource is "released" (or undeclared), CFS examines the
rejection mask for the resource. For each node identified in the
mask, a "resource released" message is sent indicating that this is a
propitious time to try to acquire the resource. There is no guarantee
the new request will be granted as the resource could be held again,
or another node could have requested, and been granted, the resource
first.
DELETING FILE RESOURCES
The access token is deleted whenever the associated OFN is
deassigned.
The static file resources are released when the file is closed. This
is performed in RELOFN.
CHANGES TO EXISTING CONCURRENCY CONTROL SCHEMES
As a result of CFS, much of the concurrency control in TOPS-20 has
become distributed. In some cases, this has been done by creating a
companion resource to an already existing one. As example of this is
the file open mode resource described above.
In other cases, existing locks have been replaced by CFS resources.
The decision as to which technique to employ was made on a
case-by-case basis. The significant criterion was how easy it was to
eliminate the existing concurrency control and replace it with the
CFS management. The file resources proved difficult to do. However,
there are two important pieces of the monitor's structure that were
easily and efficiently replaced: directory locks and directory
allocation tables.
Directory locks are now CFS resources. A directory lock resource
contains:
. the seventy-two bit identifier
. owning fork
. access type
. share count
. waiting fork bit table
In fact, a directory lock resource is the sole instance of a "CFS
long block".
Directory locks are always acquired for exclusive use. However,
unlike file access tokens, directory locks are never granted
"conditionally". This is because directories are files, and the
directory contents are subject to negotiation by the associated file
access token. That is, acquiring exclusive use of the directory lock
resource is independent of acquiring permission to read or write the
directory contents. When some process on the owning system attempts
to read or write the directory contents, it must first acquire the
file access token in the proper state. Although this sounds somewhat
inefficient, i.e. requiring the node to acquire two independent
resources, it is in fact a remarkably efficient adaptation of the CFS
resource scheme. This is so because a node need not know how the
directory contents will be used when it acquires the directory lock.
That is the way the lock was handled before CFS, and preserving this
convention means that the code to acquire the directory lock under
CFS is as efficient as possible. The state of the file access token,
and consequently the degree of sharing of the directory contents, is
determined by how the contents are referenced and not by how the
directory is locked. This means that a process may lock the directory
lock without knowing how it will reference the associated data, and
its reference patterns determine what other negotiations are
required.
The directory allocation table is a local "cache" for the information
normally stored in the directory. Each active OFN is associated with
a directory allocation entry. Each entry is for exactly one
directory. The entry, before CFS, contained: structure number,
directory number, share count, and remaining allocation.
Under CFS, an active allocation entry contains: structure number,
directory number, share count, and pointer to the CFS resource block.
The CFS resource block contains, besides the normal CFS control
information, the remaining allocation for the directory and a
transaction number. The transaction number serves the same purpose as
the transaction number associated with a file end-of-file pointer.
CFS may have an "unused" resource block for a directory allocation
entry. That is, even though there is no active directory allocation
entry, there may be a CFS resource block representing the directory.
This is because CFS attempts to retain knowledge of resources for as
long as possible to avoid having to vote when some process wishes to
create the resource anew. However, CFS will destroy any unused
resource allocation entry that is requested by another system.
TRANSACTION NUMBER
The optional data items, "end-of-file pointer" and "structure free
space", have an associated value called the "transaction number".
One either uses centralized or decentralized control in a
"loosely-coupled multiprocessor" system. In a centralized system,
control information and updating is coordinated by a master.
Transactions are "serialized" by virtue of having a single owner for
the resoruce and therefore a single manager of the resource data. In
a decentralized system, the various systems share the ownership of
resources and use some sort of "concurrency control" technique to
manage resources.
CFS is a decentralized system. A resource is not owned or managed by
any particular system, but rather the responsibility for the resource
is passed from system to system as required. As such, it may not
always be possible to uniquely identify a particular system as the
owner. This may cause a problem when a system needs to become the
owner, and therefore must determine the current status of the
resource in question.
There are two possibilites that a nascent owner may encounter:
. The previous owner is present and indentifiable.
. There is no system that is the previous owner
and of this latter case:
. the existing control information is accurate
. the existing control information in not accurate.
Clearly, if the previous owner is present, the new owner has all of
the information it needs to proceed with its transaction.
If the previous owner cannot be identified, then the new owner must
be able to determine which of the systems has the current control
information about the resource. It may be that none of them has, and
this is a problem that exists even on a single-processor system. The
result of such a problem may be "lost pages", inconsistent data bases
and other such phenomena. As in a single-processor system, the
problem occurs because the resource control information is lost as an
effect of a system crashing.
In order to determine the most up-to-date information about a
resource, each system maintains a transaction count along with the
information. Whenever it acquires information with a larger
transaction count than its own value, it knows that information is
more current and it must replace its own copy with the new data and
count. Whenever a system unilaterally changes its copy of the control
information, it must also increment the associated transaction count.
Since a system may perform such an update only when it has write or
exclusive access to the resource, the system need change the
transaction count only when it must downgrade its access.
Due to the nature of the CFS voting and resource management, it is
possible for a system to acquire a resource but to receive a
different value for the resource control information from each of the
other systems (this will happen only if the owner crashed. If the
owner didn't crash, then at least two of the other systems must have
the same control information and transaction count). In this case,
the transaction counts are used to identify the most up-to-date
value.
The transaction count is really a "clock" that is used to
"time-stamp" information. When systems communciate with one-another,
they synchronize the clocks by sending each other the current counts.
Most network concurrency schemes use clocks for similar purposes, and
most of the uses and implementations are considerably more exotic
than this one. However, since CFS needs the clock only to determine
relative ages, and not absolute ages, of information, this simplified
clock is adequate.
An alternative to using transaction counts is to "broadcast" changes
to resources. This has the disadvantage that it is costly in both
processor and communications time and resources. However, CFS does
use broadcasting in a few cases where the lack of up-to-date
information could result in data being destroyed. The two cases are:
. an OFN being modified and written to disk
. an EOF value being written into the directory copy on the disk
As both of these represent changes in the permanent copy of the
resource, it is essential that all of the other systems have current
copies or knowledge of the update.
CFS MESSAGE SUMMARY
Items marked with a "*" are sent as broadcast messages.
*1. request resource (vote)
2. reply to request:
a. unconditional yes
b. unconditional no
c. no with retry time
d. conditional yes
3. resource available
4. condition satisfied
*5. OFN updated
*6. EOF changed
In addition, each message type may carry specific optional data items,
up to four words of optional data per message.
CFS ERROR RECOVERY PROCESSING
1.0 Introduction
Failures in a CFS-20 network may result in various partitions of
the original homogeneous system. Many of these partitions are
simply detected and harmless even if not properly administered. However,
there is an important class of partitioning that is both
difficult to detect and extremely harmful to the shared data
bases.
2.0 Failures
The CFS network exhibits many potential failure cases. The most
common will be a single processor crashing. Fortunately, this
common case poses no real threat to the integrity of the disks.
A processor is considered to have crashed if its SCA keep-alive
ceases. Once this is determined, SCA will inform all of the
SYSAPs that the circuit to that processor has failed and
the CFS SYSAP will begin the recovery described above.
In general, if a processor crashes, the remaining processors will
simply stop sending it CFS messages. Also, when the crash is
first detected, all outstanding CFS votes are restarted so that
the requesting processor will not have to determine if it has already
recorded the vote of the failed processor. These two measures
are adequate to synchronize the CFS network protocol.
This is all well and good so long as the reason the remote
SCA has ceased to maintain its keep-alive counter is that
the processor has crashed and will be restarted. However,
this same effect will be observed if the remote processor's
link to the CI, the KLIPA, fails.
CI failures are manifested in two important ways: a single processor's
link to the CI fails, and there is some failure on the CI cable.
2.1 Single processor link failure
In this case, the port or link has failed. This processor is unable
to send or receive data over the CI. Other CI processors can not
send data to this processor.
This failure is detectable by the attached processor in one of several
ways:
. The local CI port raises a unique error
. The local CI port no longer responds to local commands
(e.g. DATAI or CONI).
. A CI failure can be diagnosed (see 2.2)
2.2 CI failure
In general, the entire CI will not fail. This is because the CI is
made up of many cables, all connected together with a "star coupler".
Therefore, if a single cable is crushed or cut, only the processor
connected to that cable will suffer. Of course, some catastrophe
could befall the star coupler, but this is so remote that we won't
consider it.
Therefore, it is possible to detect this case by sending a message
to oneself. Such a message will be delivered only if the path
from the local port to the star coupler is working. This technique,
called "loopback", is quite a powerful diagnostic and may be used
to detect port failures as well.
2.3 CI failure consequences
In general, a CI failure should result in no harm to any HSC-based disks.
This is so because the failed processor can neither participate in CFS
negotiations nor can it access the HSC.
However, if the failed processor is connected to one or more Massbus
disks, and one or more of these disks is also connected to another
processor, the data on the shared disks is in jeopardy. This is so
because the failed processor and the other one can no longer coordinate
accesses to the shared disk. If one of the processors does not unilaterally
cease accessing the disk, the data and disk structure will be damaged.
While it is possible for a CI or port failure to occur that a processor
cannot detect, this is a remote possibility. Therefore, we will assume
it is possible to decide which processor has been "cut off" from the
CFS network (at least THAT processor can determine it is the one)
and therefore it is possible to negotiate the ownership of the
shared disks.
In summary, the problem of shared Massbus disks connected to a
processor that is "cut off" from the CI is an important CFS failure
that must be detected and properly handled.
3.0 Massbus disks
Massbus disks are of two varieties: DCL and non-DCL disks.
A DCL disk is either an RP04, RP06 or RP07. Each of these disk drives
provides a status bit called "programmable". A drive displaying this
bit has a dual-port option installed, and the port select switch is
in the A/B position. There is no indication whether the other
port is attached to anything or whether the port is in use.
A non-DCL disk is an RP20. An RP20 drive does not provide the
programmable bit. However, the RP20 does have an onboard controller
with its own unit number. By convention, a dual-ported onboard controller
has a non-zero unit number and a single-ported onboard controller
has a unit number of zero. Note that this is an installation guideline
and not a hardware enforced requirement. However, the CFS failure recovery
algorithm depends on its being adhered to.
TOPS-20 maintains a software status bit, available via the MSTR
read status call, called "sharable". A drive that is sharable is
either a DCL drive displaying the programmable bit or an RP20
that has an onboard controller with a non-zero unit number.
4.0 Fail safe procedures
There appear two choices for recovering from a CI failure:
1. The processor that is cut off must refrain from
using any of the sharable Massbus disks.
2. The processor that is cut off determines which of the
sharable disks are in use by another processor, and it
refrains from using only those disks.
The first choice is simple to implement and entirely safe. However, it
does preclude certain "legitimate" configurations and therefore
does not provide the maximum availability.
The second choice is fraught with complications, although some of
the unsupported configurations from the first choice are available.
Implementing choice 2 is complicated and the benefits derived
are a function of the likelihood of a CI failure. Such failures
should be rare, and therefore the effort to implement choice
2 is inappropriate. Therefore, TOPS-20 release 6 will implement
choice 1 as follows.
3.1 The algorithm
A processor that is "cut off" from the CI, and is either being loaded
or is running when the fault occurs must refrain from addressing
any sharable disks. The disks that may be referenced are:
. Any non-sharable disk
. The disk being used as the front-end file system (see section 3.2)
This will be enforced by means of a new unit status bit (a bit in UDBSTS).
When this bit is set, all references to a sharable
disk unit will be returned by PHYSIO with an error indication. Therefore,
a unit that is not usable when the fault is detected may be made usable
by switching its port selection switch to either the A only or the B only
position.
The errors reported back by PHYSIO will likely be manifested as
"IO data errors" to the users.
The monitor should also report this to the operate via a unique
BUGCHK so that the operator may decide whether to change
the port selection or not.
3.2 The front-end file system disk pack
As noted in section 3.1, TOPS-20 needs to know which pack, if any,
contains the "in use" front-end file system. The front end's file system
may be:
. on a floppy
. on a drive available only to the front-end
. on a drive shared by the front-end and the KL10
In addition, any valid TOPS-20 structure may contain a front-end file
system, but at most one of these structures may be in use by the
front-end.
It may not be possible for TOPS-20 to determine which of the valid
structures is being used by the front-end. However, certain configuration
measures can eliminate the ambiguities. Following is the algorithm
for making the determinations:
. TOPS-20 must determine from the -11 reload word the boot
device for the front-end. This word is provided by RSX20F
during "protocol intialization" and is saved by DTESRV in
the "comm region".
. If the boot device is a disk, the reload word will contain
the RH11 unit number. This unit number will be the same
as the RH20/Massbus unit number.
. At this point we have to rely on configuration guidelines.
If TOPS-20 can detect only one drive with the specified
unit number that also contains a valid front-end file system,
then it may assume that pack is being shared with RSX20F.
Note that this does not have to be true. The drive being
used by RSX20F could be connected to only the RH11, and the
drive that TOPS-20 can "see" could be connected to another
KL10. However, either we state and require these configuration
rules, or we have to forego many important error recovery
procedures.
One way to obviate some of the ambiguities would be for RSX20F
to also relate whether the drive it is using for its file
system is programmable. If it is, then it is reasonable to assume
that the other port is connected to the KL10 (again, not necessarily
so, but to not do so makes little or no sense).
One remaining problem is the collection of drives connected to the
RH11 and to the KL10 that are not being used by the front-end. Although it
is prefectly safe for a "cut off" CFS node to continue using any such
disk drive, it cannot tell that the drive is not connected to another
KL10 and therefore cannot use the drive. This implies that a "cut off"
CFS may not access drives shared with front-end's RH11 unless the
drive is first locked onto the RH20 port. This limitation will preclude
performing certain files-11 maintenance functions together with
routine time-sharing access to these drives.
5.0 Choice 2
This section is a brief outline of the steps required to implement choice
2. This is not meant to be an exhaustive description of the algorithm,
but rather a sketch of the required elements.
5.1 Conflict detection
A processor that is cut off from the CI has only one means of communicating
with the system sharing a given Massbus disk: the disk drive itself.
Therefore, each processor attached to a sharable Massbus disk must
signal its presence now and then so that the other system, when need be,
may detect that presence.
The most straightforward way to do this is for each processor attached
to a sharable disk to guarantee that it will change some register
in the drive periodically. That is, each processor must either
use that register to accomplish a data transfer, or it must
store a canonical value into the register. If we select the
"drive command register" as the "keep-alive" register, then
each processor must either store a drive command (i.e. do a
legal operation) or store its CI drop number in the register (it would
do this without setting the "go" bit so that the drive would not interpret
this datum as a command).
Then, a system that wishes to determine if a processor is attached to
the other port would:
. write its CI drop number in the drive command register
. wait 5 seconds.
. read the drive command register. If it has been changed, then
the other port is in use.
. if it is unchanged, and this is the second poll, the drive
is not being used. If it is the first poll, repeat steps 2 and beyond.
This algorithm will determine whether the processor that has been "cut
off" may continue to use the drive.
5.2 Ownership
The technique in 5.1 does not, however, prevent the case where
the other processor is not detectable because it is not running.
In this case, when this other processor is loaded, and if it
is connected to the CI, it will conclude it may use the drive also.
This will again create the case of two processors using a drive
without the benefit of a concurrency control procedure.
Therefore, we must use the medium itself to detect and manage this case.
In the case where a processor is being loaded and it discovers
a valid medium on a sharable drive, or the case where a valid
medium appears on a previously off-line drive, a concurrency
control algorithm must be execute by the processor or processors
attached to the drive.
The only case of interest is that of a "cut off" processor" having
concluded it may use a disk drive, verified the medium on the drive,
and made the data available to the system. All other cases are
covered by the drive conflict algorithm.
Following is a sufficent concurrency control algorithm:
. if this a "cut off" processor, and the drive
is sharable, but not in use on the other port,
write a predefined word in the home block with a
code to indicate this use.
. when a "CI connected processor" is loaded, for each
"sharable" drive that is being used on the other port
and that has a valid medium mounted, it must read the
home block to see if the medium is being used
by a "cut off" processor.
Note that this algorithm has the problem that disk packs
may be dismounted without the "cut off" word being reset. This
will happen if the "cut off" processor crashes before the structure
is formally dismounted. In this case, it may not be possible to
mount the pack again. However, this case can be detected and reported
by OPR, thereby giving the operator the choice of forcibly
clearing the restriction.
5.3 Conclusion
As can be seen, choice 2 requires a significant effort in the monitor.
Perhaps the technique presented could be refined some more to
reduce the work and the risk, but in any event it will be more
complicated than choice 1. The nature of Massbus disks would
seem to militate against supporting any such "high availability"
option and therefore, TOPS-20 will not do so.
CFS-RESOURCES.TXT
CFS Resources
David Lomartire
7-Nov-84
================
Files:
o File open token..............Page 2
o Frozen writer token..........Page 4
o File access token............Page 5
Directory Locks:
o Directory lock token.........Page 14
o Directory allocation token...Page 16
Structures:
o Structure name token.........Page 22
o Drive serial number token....Page 22
BAT Blocks:
o BAT Block lock token.........Page 26
ENQ/DEQ:
o ENQ Token....................Page 28
Appendix A
o Flow chart of CFSGET.........Page 29
Page 1
1. Files - An opened file has one or two CFS file resources (file open token
and possibly frozen writer token) and a file access token for each active OFN
(at least one). The field SPTST of SPTO2 holds the access specified in the file
access token. While DDMP is performing CFS-directed operations on a file, all
pages of the OFN are inaccessible to any other process. This is achieved by a
bit SPTFO, set in SPTO2 by DDMP. FKSTA2 will contain the resource block address
(when appropriate) of the CFS resource the fork is waiting on.
Below is the table of contents fragment from CFSSRV which illustrates the
various file related routines:
15. File open resource manager
15.1. CFSGFA (Acquire file opening locks) . . . . . 83
15.2. CFSFFL (Release file locks) . . . . . . . . . 84
15.3. CFSURA (Downgrade to promiscuous) . . . . . . 85
16. Frozen writer resource manager
16.1. CFSGWL (Get write access) . . . . . . . . . . 86
16.2. CFSFWL (Free write access). . . . . . . . . . 87
18. File access token resource manager
18.1. CFSGWT (Get write token value). . . . . . . . 91
18.2. CFSAWP/CFSAWT (Acquire write token) . . . . . 92
18.3. CFSDWT (Write token revoked). . . . . . . . . 95
18.4. CFSOVT (Approve sharing of OFN resource). . . 95
18.5. CFSGOC (Get count of resource sharers). . . . 96
18.6. CFSDAR (Optional data for access token) . . . 97
18.7. CFSFWT (Free write token) . . . . . . . . . . 97
18.8. CFSUWT (Release access token) . . . . . . . . 98
18.9. CFSBOW (Broadcast OFN update) . . . . . . . . 100
18.10. CFSBEF (Broadcast EOF). . . . . . . . . . . . 101
18.11. CFSBRD (Main broadcast routine) . . . . . . . 102
18.12. CFSFOD (DDMP force out done). . . . . . . . . 103
Page 2
File Open Token
---------------
CFS routines: CFSGFA - Acquire file opening locks
CFSFFL - Release file locks
CFSURA - Downgrade to promiscuous (unrestricted, OF%RDU)
When the file is opened via OPENF%, it will have one of the following open
types assigned to it:
Open type (spec) CFS term (code) OPENF% term OPENF% bits
---------------------------------------------------------------------------
shared read .HTOSH - read-only shared Frozen not OF%THW
shared read/write .HTOAD - full sharing Thawed OF%THW
exclusive .HTOEX - exclusive Restricted OF%RTD
promiscuous .HTOPM - promiscuous read Unrestricted OF%RDU
local exclusive 1B0!.HTOEX - local exclusive -- OF%DUD
** CFSGFA - Acquire file opening locks **
Called by: GETCFS in PAGUTL
Upon entry to CFSGFA, the access type is converted into one of the access
codes shown above. Next, HSHLOK is called to see if a file open token already
exists for this file. If it does, a call is made to CFSUGD to upgrade the
already existing access to the new access which is requested.
If a file open token does not already exist, CFSSPC is called to get a
short request block. CFSSPC will return a block with HSFLAG, HSHPST, and HSHOKV
zeroed as well as HSHRET set to 1,,SHTADD. Then, the following is placed in the
block:
HSHROT six-bit structure name
HSHQAL index block address
HSFLAG HSHTYP=access, HSHLCL set if local exclusive
Finally, CFSGTT is called to get the token (with "try only once" set).
(Note, if the structure is set local exclusive, CFSGTT will discover this and
use CFSGTL. This will mean that HSHVTP will not be updated with the access of
the vote since no vote is required.) If the token is acquired, the following
has been updated in the resource block:
HSFLAG HSHVTP=access of vote, HSHCNT incremented (owned)
HSHFRK FORKX of running fork
HSHTIM TODCLK stamp when vote approved and token obtained
If, upon return from CFSGTT, we do not need the newly created block (T1 is
non-zero), SHTADD will be called to return the extra block to the CFS pool.
Page 3
** CFSFFL - Release file locks **
Called by: FRECFS in PAGUTL
The routine CFSFFL simply calls CFSNDO to release the file open token. See
the discussion of CFSAWT/CFSAWP for a description of CFSNDO.
** CFSURA - Downgrade to promiscuous (unrestricted, OF%RDU) **
Called by: RELOFN in PAGUTL when OFOPC in SPTO2 is 0
(no more "normal" (non-unrestricted) openings)
The routine CFSURA is called by PAGEM to downgrade the access of a file
open token to .HTOPM whenever all open OFNs are closed. It calls CFSUGD with a
new access of .HTOPM and decrements HSHCNT when the new access is obtained.
Page 4
Frozen Writer Token
-------------------
CFS routines: CFSGWL - Get write access
CFSFWL - Free write access
If a file is opened for frozen write (OF%WR and not OF%THW), then the
frozen writer token is acquired after the file open token is obtained. This is
an exclusive access token that represents the single "frozen write" user of the
file. It is held only by the system which has the file open for frozen write.
** CFSGWL - Get write access **
Called by: CHKACC and GETCFS in PAGUTL
Upon entry to CFSGWL, a short resource block is obtained via a call to
CFSSPC. CFSSPC will return a block with HSFLAG, HSHPST, and HSHOKV zeroed as
well as HSHRET set to 1,,SHTADD. The following is then placed in the block:
HSHROT six-bit structure name
HSHQAL FILEWL+index block address
HSFLAG HSHTYP=.HTOEX
Finally, CFSGTT is called to get the token (with "try only once" set).
(Note, if the structure is set local exclusive, CFSGTT will discover this and
use CFSGTL. This will mean that HSHVTP will not be updated with the access of
the vote since no vote is required.) If the token is acquired, the following
has been updated in the resource block:
HSFLAG HSHVTP=access of vote, HSHCNT incremented (owned)
HSHFRK FORKX of running fork
HSHTIM TODCLK stamp when vote approved and token obtained
If, upon return from CFSGTT, we do not need the newly created block (T1 is
non-zero), SHTADD will be called to return the extra block to the CFS pool.
** CFSFWL - Free write access **
Called by: RELOFN and FRECFS in PAGUTL
The routine CFSFWL simply calls CFSNDO to release the write access token.
See the discussion of CFSAWT/CFSAWP for a description of CFSNDO.
Page 5
File Access Token (OFN access token)
------------------------------------
CFS routines: CFSAWT/CFSAWP - Acquire/Acquire and reserve access token
CFSUWT - Release access token
CFSFWT - Free write token
CFSDWT - Write token revoked (callback)
CFSOVT - Approve sharing of OFN (callback)
CFSDAR - Optional data for access token (callback)
CFSBOW - Broadcast OFN change
CFSBEF - Broadcast EOF
CFSFOD - DDMP force out done
Each active OFN has an access token. It may be in one of the following
modes:
* place-holder - .HTPLH (this value must be zero!)
* full sharing - .HTOAD
* exclusive (read or write) - .HTOEX
The location CFSOFN points to a table which is NOFN long and is indexed by
OFN. It contains the address of the resource block which describes that OFN.
** CFSAWT/CFSAWP - Acquire/Acquire and reserve access token **
Called by: NEWLFP in DISC for .HTOEX access (CFSAWP)
UPDLEN in DISC for .HTOEX access
GETLEN in DISC for .HTOAD access
MAPBTB in DSKALC for .HTOEX access (CFSAWP and CFSAWT)
RELMPG in PAGEM for .HTOAD access (CFSAWP)
NTWRTK in PAGEM for .HTOEX access
NIC in PAGEM for .HTOEX access
OFNTKN in PAGUTL for .HTOAD access
DDXBI in PAGUTL for .HTOAD access (CFSAWP)
UPDPGS in PAGUTL for .HTOAD access (CFSAWP)
ASGOFN in PAGUTL for .HTOAD access (CFSAWP)
LCKOFN in PAGUTL for .HTOAD access (CFSAWP)
MRKOFN in PAGUTL for .HTOAD access (CFSAWP)
The routines CFSAWT and CFSAWP acquire the access token. The latter leaves
the resource block reserved on the system. The former does not.
Upon entry, the access type is checked. If zero, full shared (.HTOAD)
access is acquired. If not equal to zero, exclusive (.HTOEX) access is
acquired. Next, the SPTFO bit in SPTO2 is checked to see if DDMP is forcing
this OFN to disk. If so, the fork goes into WTFOD wait. Otherwise we proceed to
lookup in the CFS OFN table (pointed to by CFSOFN) the address of the resource
block for this OFN. If none exists (entry is zero), we continue at CFSAW1 and
add an entry.
Page 6
At CFSAW1, GNAME is called to get the structure name. Then CFSSPC is
called to obtain a short resource block. CFSSPC will return a block with
HSFLAG, HSHPST, and HSHOKV zeroed as well as HSHRET set to 1,,SHTADD. The
following is then placed in the block:
HSHROT six-bit structure name
HSHQAL FILEWT+index block address
HSFLAG HSHTYP=access, HSHKPH set
HSHCOD OFN
HSHPST 1,,CFSDWT
HSHOKV 1,,CFSOVT
HSHCDA 1,,CFSDAR
HSHOP1 0 (transaction number)
Next, a call is made to get the resource. CFSGET is called if the
structure is shared. If the structure is set exclusive, CFSGTL is used. (Note
that if CFSGTL is called, HSHVTP will not be updated with the access of the
vote since no vote is required.) The call is made to "retry until successful"
so, upon the return, we have acquired the resource. The following has been
updated in the resource block:
HSFLAG HSHVTP=access of vote, HSHCNT incremented (owned)
HSHFRK FORKX of running fork
HSHTIM TODCLK stamp when vote approved and token obtained
Now, HSHFCT is set to be TODCLK+WRTTIM. A check is made to see if optional
data was returned during the vote. The optional data will be the most recent
value of OFNLEN in HSHOPT and the transaction number in HSHOP1. This value
represents what the other node believes OFNLEN is for that OFN. If there is
optional data present from the vote (HSHODA is set), this must be the most
current value of the file length and it is stored in OFNLEN. In this case, the
other node had more recent information than us so we must update our copy of
OFNLEN. Otherwise, no other copy exists and we initialize the optional data of
the resource block to contain the current OFNLEN entry for that OFN. HSHOP1 is
incremented to initialize the transaction number. In this case, no other node
had more recent information than us so we establish ourselves as the node which
knows the state of OFNLEN. Note that if this access token was for a bit table,
the structure free count must also be maintained or established. The callback
routine CFSDAR will insure that STOFRC is called to update the structure free
count if appropriate.
If CFSAWT was originally called, CFSNDO will be called to undeclare the
resource. CFSNDO will decrement HSHCNT and HSHNBT will be "searched" looking
for previously rejected hosts to notify (via CFNOHN). If CFSAWP was originally
called, the resource will remain owned by the fork. In either case, the
resource block remains in the hash table and HSHKPH will remain set. The only
distinction is whether we own the resource or not.
Finally, the SPTST field in SPTO2 is set to the correct state; .SPSWR (2)
for .HTOEX access tokens or .SPSRD (1) for .HTOAD access tokens.
Page 7
The description above describes what occurs when CFSAW1 is entered to add
a new OFN entry. However, there is another option. If CFSOFN already contains
the address of a resource block, then this OFN has already been added and is
known to this CFS system.
First, the post callback address (HSHPST) is checked and set to 1,,CFSDWT
if it was zero. The block is then locked against removal by either CFSRSE or
CFSUWT by incrementing HSHLKF. Both of these routines can remove resource
blocks from the hash table. So, we are locking the resource block against
possible removal from the hash table via the use of HSHLKF. Now, a check is
made to see if there is anyone waiting for this block. This is done by checking
HSHTWF, HSHUGD, and HSHWVT. HSHTWF gets set when the fork is going to go into a
wait state on that block (like CFSRWT). The setting of STKVAR WTFLAG indicates
to CFSUGW (the wait routine) if we set HSHTWF. If WTFLAG is -1, HSHTWF was
already set and if it is zero, we set HSHTWF. HSHTWF is also set when we call
CFSUGD to upgrade access to the token. HSHUGD is set when we are performing an
upgrade vote on the resource. HSHWVT is set when we are voting on a resource.
So, if there is someone waiting on this block, we continue at CFSUGW. We
pass into CFSUGW the address of the wait routine; in this case CFGVOT. In
CFSUGW, we place TODCLK+^D500 in HSHTIM and call CFSWUP (the general wait
routine) to wait until the vote has completed. Upon return from CFSWUP, the
block will no longer be in a "wait state". We will clear HSHTWF if we had set
it (WTFLAG=0) and check HSHPST to see if the block has been released (it will
be zero if so). If it has not been released, we unlock it by decrementing
HSHLKF, clear some bits which could be left over from voting (HSHRTY and
HSHVRS) and start over again at CFSAWL to try to acquire the access token. If
the block has been released, we get the value of HSHLKF and decrement it. If
the resulting value is non-zero, we are not the last locker, so we couldn't
obtain the access token and return without changing the access token we
currently have (current access reflected in SPTO2). If the decremented value of
HSHLKF is now zero, then we are the "owner" of the resource block. The block
address in CFSOFN will be cleared and CFSRMV will be called to remove the block
from the hash table. Again, we will return without changing the access of the
OFN. CFSRMV will not post the removal.
Page 8
If there is no one waiting on the resource, we can proceed to try to
upgrade our access. First, we check STRTAB to see if the structure is mounted
exclusively. If so, we make HSHTYP be exclusive (.HTOEX). This is done
regardless of the access we were asking for because access on an exclusive
structure is always .HTOEX (it needs to be nothing other because this is the
only node which can use this structure). We will never be refused access to an
OFN on an exclusive structure due to setting HSHTYP to .HTOEX (as shown below).
Now, we check HSHTYP to see what kind of access we hold on the resource.
If is is exclusive (.HTOEX), then we are granted access. If CFSAWP was called,
HSHCNT is incremented in order to hold ownership. Finally, the OFN access is
set in SPTO2.
If HSHTYP is not exclusive, we will have to upgrade our access to the OFN.
First we set HSHTWF to indicate we are processing this resource block. Next we
call CFSUGD to try to upgrade our access. CFSUGD will modify HSHTYP and HSHCNT
to reflect the state of the resource after the upgrade attempt. HSHTYP will
contain the current access and HSHCNT will contain the number of owners of the
resource.
If CFSUGD successfully allows the upgrade, we will clear HSHTWF and place
TODCLK+WRTTIM in HSHFCT. The block will be unlocked via a decrement of HSHLKF
and, if CFSAWT was originally called, HSHCNT will be decremented so that the
resource will not be held. If HSHCNT is zero, CFNOHS will be called to notify
any nodes which were rejected in the interim. Next, if any optional data was
returned in the upgrade vote, it is placed in OFNLEN. This would be the file
length for that OFN. Finally, the OFN access in SPTO2 is updated to reflect the
new access state.
If CFSUGD does not allow the upgrade, HSHWTM is checked to see if a retry
wait time was given. If so, it is added to TODCLK and placed in HSHTIM.
Otherwise, TODCLK+^D20 is used. Processing will now continue at CFSUGW, with
the wait address specified as CFSRWT. CFSRWT will awaken under one of the
following conditions:
1. The block can no longer be found in the hash table (it has been
released)
2. The same block is found by HSHLOK (address match) and :
a) HSHVRS or HSHRTY is set in the resource block
b) HSHTIM is less than or equal to TODCLK
3. A different block is found by HSHLOK and:
a) HSHCNT is zero
b) HSHTYP is not .HTOEX but does match the desired access
Page 9
** CFSUWT - Release access token **
Called by: FRECFS in PAGUTL
When a file is closed, CFSUWT is called to release the access token. The
resource block is found (via HSHLOK) and the OFN is retrieved from HSHCOD. The
SPTO2 entry for this OFN is checked to see if anyone is waiting for this access
token. This is indicated by the field SPTFR being set which indicates that CFS
has requested a DDMP force out to be done for that access token. (SPTFR is set
in routine CFSDWT and cleared in CFSFOD.)
If SPTFR is set then a force out has been requested for this OFN. At this
point we clear the bit in OFNCFS which indicates which OFN DDMP should force
out and call CFSFDF to signal that the force out is done. (CFSFDF is an
alternate entry point to CFSFOD which will always signal that the force out is
done regardless of the number of sharers remaining on that OFN. In effect, it
"forces" the force out regardless of the current number of sharers of that OFN.
CFSFOD will only signal that the force out is done when there are 2 or fewer
sharers indicated by HSHCNT.) (OFNCFS is a multi-word bit mask scanned by DDMP
to determine which OFNs need to be forced out.) Finally, CFSUWT is continued at
the beginning to try to release the access token again.
If SPTFR is zero, then this OFN is not being forced out. We call CFNOHS to
notify any hosts which we rejected for this OFN. Next we check HSHLKF to see if
the resource block is locked. This will be set by CFSAWT/CFSAWP to prevent the
block from being removed from the hash table. If the block is locked, we simply
clear HSHTYP, HSHCNT, HSHKPH, and HSHPST. By clearing HSHKPH, the block is
eligible for removal as stale by routine CFSRSE. If the block is not locked, we
clear the corresponding entry in CFSOFN and remove the block from the hash
table via a call to CFSRMV. CFSRMV will not post the removal.
** CFSFWT - Free write token **
Called by: DDXBI in PAGUTL
UPDPGS in PAGUTL
ASGOFN in PAGUTL
ULKOFN in PAGUTL
UMPBTB in DSKALC
The routine CFSFWT simply calls CFSNDS to release the write access token.
CFSNDS is an alternate entry point to CFSNDO which will employ a "fairness
test" when notifying other nodes of the resource release. It will set HSHRFF in
the block just before calling CFNOHS. CFSFWT is the only routine that calls
CFSNDS. See the discussion of CFSAWT/CFSAWP for a description of CFSNDO.
Page 10
** CFSDWT - Write token revoked (callback) **
Called by: CFSRTV when we want to release the resource
(Note: CFSRSE has the ability to call a post
routine but it should never be called
for a file access token since HSHKPH
will be set and this will prevent
the block's removal. In fact, if
CFSDWT were to be called, incorrect
or needless DDMP action could result.)
This routine is the callback routine placed in HSHPST when the access
token is formed. To post removals, CFSRMX is called. CFSRMX is the alternate
entry point to CFSRMV used to do posting of removals. CFSRMX will insure that
CFSDWT is called to do any cleanup that is needed before the resource block is
removed from the hash table.
CFSDWT simply invokes DDMP to force out the pages of the OFN being
released. If the resource is a place holder, then nothing is done and CFSDWT
just returns. Otherwise, SPTFR is set in SPTO2 for the OFN. Note that SPTFR is
a two bit field (bits 22 and 23) so both bits will be set. Next, if we own the
resource exclusively (HSHTYP) and the vote request is for any access other than
exclusive (HSHVTP), we will clear bit 22, which is named SPTSR. If we do not
own it exclusively (we own it .HTOAD) then SPTSR will remain set.
SPTSR is checked by routine DDOCFS (in PAGUTL) when deciding what to do
with the copy of the pages. If SPTSR is zero, then the vote request was not for
exclusive access (and we had .HTOEX access) so UPDPGY is called to update to
disk any modified pages and set any in memory pages to "read-only" (via the use
of the CST write bit). (In the access token state transition table in the spec,
this is shown as DDMP**.) This will result in a new access for the OFN of
.HTOAD (full sharing) on the processor which used to own the resource
exclusively. If SPTSR is set, then the vote request was for exclusive (or we
only had .HTOAD access) so UPDPGX is called to update to disk any modified
pages and remove all the OFN pages from memory. (In the access token state
transition table in the spec, this is shown as DDMP*.) This will result in a
new access for the OFN of .HTPLH (place-holder) on the processor which used to
own the resource exclusively.
Finally, DDCFSF is incremented to wake up DDMP and the appropriate OFN bit
in the OFNCFS bit-mask is set to indicate to DDMP which OFN to force out. Also,
T1 is set to zero. This is important because, upon return to CFSRMX, T1 is
checked and, if zero, a -1 will be placed in T1 upon return to CFSRTV and the
block will not be removed from the hash table. This value in T1 is taken to be
the vote type which is placed in .CFTYP. So, this is how a -1 (or conditional
yes) is generated; namely, a vote comes in which causes an access token to be
released and requires DDMP to run. When this is done, CFSFOD will send a
"condition satisfied" message (.CFTYP = -2) to indicate that the force out is
done. (The appropriate HSHDLY bit for the node is set when the -1 is received
and cleared when the -2 is received. This is done in CFSRVT.)
Page 11
** CFSOVT - Approve sharing of OFN (callback) **
Called by: CFSRTV when vote is to be approved
This routine is placed in HSHOKV when the access token is formed. The
routine is called when the vote is to be approved in order to place this node's
optional data in the vote packet.
If HSHOP1 is non-zero, then this node has some copy (it may be old) of the
file length information (OFNLEN for that OFN). Both HSHOPT (the file length)
and HSHOP1 (the transaction number) are placed in CFDAT and CFDT1 of the CFS
send packet. Also, the CFODA flag will be set to indicate that optional data is
present. If HSHOP1 is zero (no transaction number) then this node has no
optional data to contribute concerning the file length and CFDT1 (the file
length transaction number) is set to zero. This is done because there is also a
structure free count which can be sent. So, just using CFODA is not enough.
However, sending a transaction number of zero will insure that processing of
the file length data will be ignored.
Finally the HSHBTF flag is checked to see if this is a bit table OFN. If
it is, SNDFRC will be called to set up the send packet with the structure free
count data. SNDFRC will call GETFRC and place the returned value of the free
count in the CFDST0 of the send packet. Next, the current transaction number
for the structure free count is retrieved from the CFSSTR table. This table,
indexed by structure number, contains the transaction number values for each
structure. This transaction value is placed in CFDST1 and, if we are the
exclusive owner of the OFN, it is incremented. This will insure that the count
is updated on the remote system. Finally, CFODA is set to indicate there is
optional data present in the vote packet.
This optional data information will be processed by the CFSDAR callback
routine (described below).
** CFSDAR - Optional data for access token (callback) **
Called by: CFSRVT when a vote packet arrives with optional data
This routine is placed in HSHCDA when the access token is formed. The
routine is called when the vote packet arrives in order to process the optional
data in the packet.
First, HSHBTF is checked to see if this a bit table. If it is, then the
structure free count may have to be updated. If structure free count data is
present (CFDST1 non-zero) and if the remote node's structure free count
transaction number is greater than our own (CFDST1 > CFSSTR(str)), then our
copy of CFSSTR is updated and STOFRC is called to update the free count.
Next, we continue at CFADAR and check CFDT1 to see if any file length
optional data is present. If it is and the transaction number is greater than
ours, we return +2 to CFSRVT and store the data and transaction number in
HSHOPT and HSHOP1 and set HSHODA to indicate optional data is present. (This
data will be taken out of the resource block and placed in OFNLEN by
CFSAWT/CFSAWP). If the transaction number in the packet is not greater than
ours, we return to CFSRVT and ignore the optional data.
Page 12
** CFSFOD - DDMP force out done **
Called by: DDOCFS in PAGUTL
This routine is responsible for signaling that DDMP has completed the
force out of OFN pages. The corresponding entry in CFSOFN is checked to see if
an access token exists for it. If the table entry is empty, then this node no
longer has the resource so SPTST (the OFN access) and SPTFR (the DDMP force out
flag bits) are cleared and CFSFOD returns successfully.
If there is a CFSOFN table entry, it contains the address of the resource
block for the access token. We retrieve the number of sharers of the resource
from HSHCNT and, if greater than 2, we cannot signal success so we return
failure. Next, any optional data that is present in the access token (HSHOPT
and HSHOP1) is placed in a newly acquired vote packet (obtained via a call to
GVOTE1). Note that if HSHODA is zero, then there is no optional data in the
token so the file length information is obtained directly from OFNLEN. If we
own exclusive access to the OFN, then the transaction number (HSHOP1) is
incremented to insure that the remote system will use the optional data we are
providing since it is the most current. If this is a bit table OFN, SNDFRC is
called to place structure free count optional data into the vote packet.
Next, we clear HSHRFF and set the new access of the OFN. If SPTSR is set,
then HSHTYP is set to .HTPLH and SPTST is set to 0. If SPTSR is not set, then
HSHTYP becomes .HTOAD and SPTST becomes .SPSRD. Finally, we clear SPTFR,
decrement HSHCNT to "unown" the resource and call SCASND to send the vote
packet. The packet type code is a -2, which is a "condition satisfied" message
which is used to indicate to the remote that the DDMP force out has completed.
We will then return successfully.
Page 13
2. Directory locks and directory allocation - Directory locks are now managed
by CFS and are a CFS resource. The old LOKTAB and associated storage is now
gone. Each time a directory is locked or unlocked, a CFS resource is created or
modified. The remaining directory allocation of each active directory is also a
CFS resource.
Below is the table of contents fragment from CFSSRV which illustrates the
various directory related routines:
14. Directory lock resource manager
14.1. CFSLDR (Lock directory) . . . . . . . . . . . 75
14.2. CFSRDR (Unlock directory) . . . . . . . . . . 77
14.3. CFSDAU (Acquire allocation entry) . . . . . . 78
14.4. CFAFND/CFAGET (Find/Get allocation table) . . 78
14.5. CFASTO (Store new allocation value) . . . . . 80
14.6. CFAULK (Unlock allocation entry). . . . . . . 80
14.7. CFAREM (Remove allocation entry). . . . . . . 80
14.8. CFAUPB (Undo keep here bit) . . . . . . . . . 80
14.9. CFAVOK (Vote to be approved). . . . . . . . . 81
14.10. CFADAR (Optional data present). . . . . . . . 81
14.11. CFARMV (Voter remove entry) . . . . . . . . . 81
14.12. GETDBK (Find resource block). . . . . . . . . 82
Page 14
Directory Lock Token
--------------------
CFS routines: CFSLDR - Lock directory
CFSRDR - Unlock directory
Directory locks are the only example of long resource blocks. Also,
directory locks are always exclusive (.HTOEX) resources. For the duration of
the lock, the process is CSKED.
** CFSLDR - Lock directory **
Called by: LCKDNM in DIRECT (via CALLRET)
First, a check is made to see if the resource already exists. If it does,
and it is in "use" (HSHWVT or HSHCNT are set), then a vote is required.
Otherwise, we can lock the directory and the following is updated in the
resource block:
HSFLAG HSHCNT incremented (owned)
HSHFRK FORKX of running fork
HSHTIM TODCLK stamp to indicate when resource acquired
If the resource is not known to this node, or the current resource block
for the directory is in use, then a vote is required. First, CFSSPL is called
to obtain a long resource block. CFSSPL will return a block with HSHLOS set in
HSFLAG, HSHPST, and HSHOKV zeroed as well as HSHRET set to 1,,LNGADD. The
following is then placed in the block:
HSHROT six-bit structure name
HSHQAL DRBASE + directory number
HSFLAG HSHTYP=.HTOEX
HSHCOD DRBASE
HSHFCT TODCLK+DIRTIM
Finally, CFSGTT is called to get the token with "retry until successful"
set so, upon return, we will have acquired the resource. (Note, if the
structure is set local exclusive, CFSGTT will discover this and use CFSGTL.
This will mean that HSHVTP will not be updated with the access of the vote
since no vote is required.) The following has been updated in the resource
block:
HSFLAG HSHVTP=access of vote, HSHCNT incremented (owned)
HSHFRK FORKX of running fork
HSHTIM TODCLK stamp when vote approved and token obtained
If, upon return from CFSGTT, we do not need the newly created block (T1 is
non-zero), LNGADD will be called to return the extra block to the CFS pool.
Page 15
** CFSRDR - Unlock directory **
Called by: ULKDNM in DIRECT (via CALLRET)
The routine CFSRDR simply calls CFSNDO to release the directory lock
token. For these long blocks, CFSNDO will do something extra before the call to
CFSOHS; it will check the HSHBTT for any waiting forks to wake up. If it finds
one, CFSOHS will not be called to notify remotes of the token release. In this
way, the forks on the local node will have priority over forks on remote nodes
for acquiring directory locks. See the discussion of CFSAWT/CFSAWP for a
description of CFSNDO.
Page 16
Directory Allocation Token
--------------------------
CFS routines: CFSDAU - Acquire allocation entry
CFAFND/CFAGET - Find/Get allocation table
CFASTO - Store new allocation value
CFAULK - Unlock allocation entry
CFAREM - Remove allocation entry
CFAUPB - Undo keep here bit
CFAVOK - Vote to be approved (callback)
CFADAR - Optional data present (callback)
CFARMV - Voter remove entry (callback)
The remaining allocation of a directory is cached in memory to aid in
processing page faults and file page creation. For each active directory, there
is an allocation entry and this entry is a CFS resource. The optional data is
the resource block carries the remaining allocation for this directory.
** CFSDAU - Acquire allocation entry **
Called by: Various routines in PAGEM and PAGUTL.
Each calls provides one of the following function
codes:
.CFAGT==:0 - Get and lock current allocation
.CFAST==:1 - Store allocation
.CFARL==:2 - Release allocation table
.CFARM==:3 - Remove entry
.CFAUP==:4 - Undo hold bit
.CFAFD==:5 - Find it
This routine is called and a function code is provided. Then, based on
this code, we dispatch off to the correct routine.
Function code Dispatch routine
------------- ----------------
.CFAGT CFAGET
.CFAST CFASTO
.CFARL CFAULK
.CFARM CFAREM
.CFAUP CFAUPB
.CFAFD CFAFND
Page 17
** CFAFND/CFAGET - Find/Get allocation table **
Called by: QLOK in PAGEM
ASGALC in PAGUTL (with CF%PRM set)
REMALC in PAGUTL
GETCAL/GETCAH in PAGUTL (with CF%NUL and CF%HLD set)
These routines are used to lock the allocation table. CFAFND differs from
CFAGET only in that it will not create a new resource block; it will only
succeed if the block already exists and can be found via the routine GETDBK.
Currently, CFAFND is never dispatched to. CFAGET will return +1 if a resched
took place during its operation and +2 if one did not.
Upon entry to CFAGET, the access is determined. If CF%HLD was specified in
the flag bits (the left half of T3, the Flags,,Operation word), then the
requested access is exclusive (.HTOEX). Otherwise, the access is for full
sharing (.HTOAD). GETDBK is called to obtain the address of the resource block.
If one does not exist, we continue at CFSDAO.
At CFSDAO, we will create a new resource block and vote for access to the
token. CFSSPC is called to get a short request block. CFSSPC will return a
block with HSFLAG, HSHPST, and HSHOKV zeroed as well as HSHRET set to
1,,SHTADD. Then, the following is placed in the block:
HSHROT six-bit structure name
HSHQAL DRBAS0+directory number
HSFLAG HSHTYP=access, HSHKPH set if CF%PRM specified
HSHPST 1,,CFARMV
HSHOKV 1,,CFAVOK
HSHCDA 1,,CFADAR
HSHOPT 0
HSHOP1 0 (transaction number)
Finally, CFSGTT is called to get the token (with "retry until successful"
set). (Note, if the structure is set local exclusive, CFSGTT will discover this
and use CFSGTL. This will mean that HSHVTP will not be updated with the access
of the vote since no vote is required.) When the token is acquired, the
following has been updated in the resource block:
HSFLAG HSHVTP=access of vote, HSHCNT incremented (owned)
HSHFRK FORKX of running fork
HSHTIM TODCLK stamp when vote approved and token obtained
If, upon return from CFSGTT, we do not need the newly created block (T1 is
non-zero), SHTADD will be called to return the extra block to the CFS pool.
Next, HSHLOK is called to retrieve the resource block and the access
desired in the call to CFSDAU is checked. If it is not CF%HLD, then we do not
want write access so we "unown" the resource by decrementing HSHCNT.
Page 18
Finally, HSHOP1 was checked to see if optional data was returned in the
voting process. If it is non-zero, then there was optional data returned (the
current allocation) and it is in HSHOPT. We return +1 from CFAGET (and CFSDAU)
now with T1 containing the current allocation, T2 the resource block address,
and T3 the transaction number. If HSHOP1 was zero, then no optional data was
returned. This means that no other node had more recent information that us to
contribute so we must establish ourselves as the node that holds the latest
information. The value of the allocation (passed into CFSDAU in T3 and held in
ALLC) is placed in HSHOPT and HSHOP1 (the transaction number) is incremented
only if this is not a temporary entry (CF%NUL was not specified in the flag
bits passed into CFSDAU). We return +1 with the allocation in T1 and resource
block address in T2.
The description above outlines what happens when CFAGET is invoked to
return the allocation for a resource which did not exist before on this node.
However, if one already exists (and is found by GETDBK) then the following
takes place.
First, we check the access bits set in T2 to see if this is a permanent
block (CF%PRM set) and, if so, HSHKPH is set in the HSFLAG word. A check is
then made to see if anyone is using this block by checking HSHTWF, HSHWVT, and
HSHUGD. If any of these bits are set, then we wait a while at routine CFGVOT.
This wait routine will wakeup when HSHTWF, HSHWVT, and HSHUGD are all zero.
Upon wakeup, we try again at the top of the CFAGET routine.
If no one is waiting for this block, we check to see what kind of access
we have to this resource (in HSHTYP). If it matches the type of access we are
requesting, HSHCNT is incremented. Otherwise, our access must be upgraded. We
set HSHTWF to indicate we are waiting for this token, and call CFSUGD to
upgrade our access. If CFSUGD allows the upgrade, we will have obtained the
desired access to the resource and HSHTWF will be cleared. Otherwise, CFSUGD
has denied our upgrade attempt and we must wait. If HSHWTM is non-zero (the
wait time for when to try again) then this is placed in HSHTIM. If HSHWTM is
zero, then TODCLK+^D500 is placed in HSHTIM. Then we wait at CFSRWT. Upon
return from CFSRWT (see the discussion of file access token for the conditions
of return for CFSRWT), we try to get the token again by continuing at the top
of CFAGET.
Once we have acquired the desired access to the token, we check to see if
we wanted write access (CF%HLD set). If not, HSHCNT is decremented to "unown"
the resource. Next, our fork number is placed in HSHFRK. Finally, T1 is loaded
with the allocation (HSHOPT), T3 with the transaction number (HSHOP1), and T2
with the resource block address. We will return +1 if we ever had to enter a
wait routine or if we needed to upgrade our access. Otherwise, we will return
+2.
Page 19
** CFASTO - Store new allocation value **
Called by: QSET in PAGEM
ADJALC in PAGUTL
This routine is used to place a new allocation value into the resource
block for particular directory allocation token. Once the value has been
stored, the resource will be released. So, a store a an allocation entry has an
implied release following.
The block is located via a call to GETDBK. Then the allocation value
(which is passed in and resides in ALLC) is placed in HSHOPT. The transaction
number is incremented (HSHOP1) to insure that this is the most current entry
known. Finally, we fall through into CFAULK to release the resource. A
description of CFAULK follows.
** CFAULK - Unlock allocation entry **
Called by: QREL in PAGEM
GETCAL/GETCAH in PAGUTL
The routine CFAULK simply calls CFSNDO to release the directory allocation
token. See the discussion of CFSAWT/CFSAWP for a description of CFSNDO.
** CFAREM - Remove allocation entry **
Called by: REMALC in PAGUTL
GETCAL/GETCAH in PAGUTL
This routine removes a resource from the hash table. The resource block is
located via a call to GETDBK and HSHCNT is decremented. If the number of
sharers goes to zero, then no one is using this resource and CFSRMV is called
to removed it from the hash table. CFSRMV will not post the removal.
** CFAUPB - Undo keep here bit **
Called by: DASALC in PAGUTL
This routine is used to "unlock" a resource from the node. The resource
block is found via a call to GETDBK and the "keep here" bit (HSHKPH) is
cleared. (This will allow the routine CFARMV to signal to CFSRMX that this
resource block should not be held on the node and can be removed from the hash
table. CFSRMX is called from CFSRTV when an incoming vote results in releasing
the resource. If HSHKPH is set when CFARMV is called, the resource becomes a
place-holder since it was desired that the block always remain on this node.)
Finally, CFSNDO is called to undeclare the resource. See the discussion of
CFSAWT/CFSAWP for a description of CFSNDO.
Page 20
** CFAVOK - Vote to be approved (callback) **
Called by: CFSRTV when vote is to be approved
This routine is placed in HSHOKV when the allocation token is formed. The
routine is called when a vote is to be approved for this resource in order to
place this node's optional data (the allocation data held by this node) in the
vote packet. The "optional data present in vote" flag (CFODA) is set in the
vote packet so that the optional data will be noticed by CFSRVT when this vote
packet is received.
** CFADAR - Optional data present (callback) **
Called by: CFSRVT when a vote packet arrives with optional data
This routine is placed in HSHCDA when the allocation token is formed. The
routine is called when the vote packet arrives in order to process the optional
data in the packet.
The transaction number present in the packet is compared to our own in
HSHOP1. If the packet's value is greater than ours, we return +2 to CFSRVT and
store the data and transaction number in HSHOPT and HSHOP1 and set HSHODA to
indicate optional data is present. (This will be noticed by CFAGET and the
allocation will be returned to the caller.) If the transaction number in the
packet is not greater than ours, we return to CFSRVT and ignore the optional
data.
** CFARMV - Voter remove entry (callback) **
Called by: CFSRTV when we want to release the resource
(Note: CFSRSE has the ability to call a post
routine but it should never be called
for an allocation token since HSHKPH
will be set and this will prevent
the block's removal.)
This routine is the callback routine placed in HSHPST when the allocation
token is formed. To post removals, CFSRMX is called. CFSRMX is the alternate
entry point to CFSRMV used to do posting of removals. CFSRMX will insure that
CFARMV is called to do any cleanup that is needed before the resource block is
removed from the hash table.
Basically, CFARMV checks HSHKPH to see if the block should be kept on the
node. If not, CFARMV will return +1 and CFSRMX will remove the resource block
from the hash table and return indicating the resource is unconditionally
available. Otherwise, the block is to be kept on the system so HSHTYP is zeroed
(this sets the state to place-holder; .HTPLH) and return +2 to CFSRMX. CFSRMX
will then zero HSHCNT and HSHTYP and return to CFSRTV with 0 in T1. This value
will be used as the vote type (.CFTYP) to the other node; 0 indicates
unconditional yes. (Note that CFARMV assumes T1 contains the address of the
vote packet and is, therefore, not 0. So, before returning to CFSRMX, T1 is not
changed. This is important because CFSRMX will check T1 and, if it is zero,
assumes that this is a delayed yes return for a file access token. So, CFARMV
is depending upon the fact that T1 is non-zero when returning to CFSRMX. This
should never change.)
Page 21
3. Structures - Structure mounting is managed by CFS in order to coordinate
access to the structure by various CFS processors. CFS requires that each
mounted structure be mounted with the same access by all accessing processors,
and that the structure have the same "alias" name on all the accessing
processors. This is accomplished by providing 2 resources to control
structures: structure name and drive serial number resources.
Below is the table of contents fragment from CFSSRV which illustrates the
various structure related routines:
19. Structure resource manager
19.1. CFSSMT (Acquire structure resource) . . . . . 105
19.2. CFMNAM (Register structure name). . . . . . . 107
19.3. CFMDSN (Register drive serial number) . . . . 108
19.4. CFSSUG (Upgrade or downgrade mount) . . . . . 109
19.5. CFSSDM (Release mount resource) . . . . . . . 110
19.6. STRVER (Structure verify) . . . . . . . . . . 111
Page 22
Structure Name Token and Drive Serial Number Token
--------------------------------------------------
CFS routines: CFSSMT - Acquire structure resource
CFSSUG - Upgrade or downgrade mount
CFSSDM - Release mount resource
CFMNAM - Register structure name
CFMDSN - Register drive serial number
In order to mount a structure, both the structure name token and drive
serial number token must be acquired. The access type (whether the structure is
accessed shared or exclusive) is controlled by the DSN resource only. The
structure name token is always created with full sharing. When a structure's
access type is changed, only the DSN resource needs to be updated. Since CFS
matches a structure with a DSN, it is important that if the structure is moved
to another drive that the CFS resources be renamed. This is accomplished by
having PHYSIO call CFS at CFRDSN describing the old and new UDB for the disk
pack.
** CFSSMT - Acquire structure resource **
Called by: MNTPS in DSKALC (for exclusive and shared)
MSTMNT in MSTR (access based on user flags)
This routine is called when a structure is first mounted on a system. It,
in turn calls routines CFMNAM to register the structure name and CFMDSN to
register the driver serial number. CFSSMT insures that the alias for the
structure is not already in use for another structure and that the name of the
structure is the same as what is in use by any other CFS system. These two
conditions are sufficient to allow the alias to be used as the root name of the
structure.
Upon entry to CFSSMT, a check is made to see if this is a "reduced" CFS
system. This is controlled by defining the switch CFSDUM. If this is defined,
then this is a "reduced" CFS. This system is on the CI and uses SCA to connect
to other CI-based systems. However, this processor will not share structures
with any other system but will insure that the structures it is using are
mutually exclusive from structures used by any other CI-based TOPS-20 system.
This implies that this system will establish connections to other reduced or
full CFS systems and will participate in structure mounting votes.
If this is a reduced system (MYPOR1 is not less than zero), then the
access is forced to be exclusive (.HTOEX). Otherwise, the access is determined
from the call (passed in T2) and it will be set to either full sharing (.HTOAD)
or exclusive (T2 is zero = .HTOAD, otherwise .HTOEX).
Finally, CFMNAM is called to register the name and CFMDSN is called to
register the serial number. If the call to CFMNAM fails, we RETBAD and if
CFMDSN fails, we continue at CFSSDM to undo the mount and then return failure.
Page 23
At CFMNAM, we will create a new resource block and vote for access to the
structure name token. CFSSPC is called to get a short request block. CFSSPC
will return a block with HSFLAG, HSHPST, and HSHOKV zeroed as well as HSHRET
set to 1,,SHTADD. Then, the following is placed in the block:
HSHROT six-bit structure name
HSHQAL STRCTN
HSFLAG HSHTYP=.HTOAD, HSHAVT set, HSVUC set
HSHCOD UDBDSN XOR (STRCTK+UDBDSH)
Finally, CFSGET is called to get the token (with "try only once" set). If
the token is acquired, the following has been updated in the resource block:
HSFLAG HSHVTP=access of vote, HSHCNT incremented (owned)
HSHFRK FORKX of running fork
HSHTIM TODCLK stamp when vote approved and token obtained
If, upon return from CFSGET, we do not need the newly created block (T1 is
non-zero), SHTADD will be called to return the extra block to the CFS pool.
At CFMDSN, we will create a new resource block and vote for access to the
drive serial number token. CFSSPC is called to get a short request block.
CFSSPC will return a block with HSFLAG, HSHPST, and HSHOKV zeroed as well as
HSHRET set to 1,,SHTADD. Then, the following is placed in the block:
HSHROT UDBDSN
HSHQAL STRCTK+UDBDSH
HSFLAG HSHTYP=access, HSHAVT set, HSVUC set
HSHCOD six-bit alias name
Finally, CFSGET is called to get the token (with "try only once" set). If
the token is acquired, the following has been updated in the resource block:
HSFLAG HSHVTP=access of vote, HSHCNT incremented (owned)
HSHFRK FORKX of running fork
HSHTIM TODCLK stamp when vote approved and token obtained
If, upon return from CFSGET, we do not need the newly created block (T1 is
non-zero), SHTADD will be called to return the extra block to the CFS pool.
If we just got the token exclusively, the STEXL flag is set in the status
bits of the SDB for that structure.
Page 24
** CFSSUG - Upgrade or downgrade mount **
Called by: MNTCSM in MSTR
This routine is used to change the access to a mount resource. It is only
valid for full CFS systems. The new access is passed in T2 (T2 is zero =
.HTOAD, otherwise .HTOEX). Then the resource is located via a call to HSHLOK and
CFSUGA is called to upgrade (or downgrade) to the desired access.
** CFSSDM - Release mount resource **
Called by: MNTER4 in MSTR
MSTDIS in MSTR
This routine is called to release the mount resource upon a dismount. For
both the resource name token and the drive serial number token, the resource
block is found via a call to HSHLOK and then removed via a call to CFSRMV.
CFSRMV will not post the removal.
Page 25
4. BAT Block Lock - The BAT block lock on a structure is now a CFS resource.
It is an exclusive resource.
Below is the table of contents fragment from CFSSRV which illustrates the
various BAT block related routines:
17. BAT block resource manager
17.1. CFGBBS (Set BAT block lock) . . . . . . . . . 89
17.2. CFFBBS (Release BAT block lock) . . . . . . . 90
Page 26
BAT Block Lock Token
--------------------
CFS routines: CFGBBS - Set BAT block lock
CFFBBS - Release BAT block lock
** CFGBBS - Set BAT block lock **
Called by: LKBAT in DSKALC
Upon entry to CFGBBS, CFSSPC is called to obtain a short resource block.
CFSSPC will return a block with HSFLAG, HSHPST, and HSHOKV zeroed as well as
HSHRET set to 1,,SHTADD. The following is then placed in the block:
HSHROT six-bit structure name
HSHQAL -1
HSFLAG HSHTYP=.HTOEX
Next, a call is made to get the resource. CFSGET is called if the
structure is shared. If the structure is set exclusive, CFSGTL is used. (Note
that if CFSGTL is called, HSHVTP will not be updated with the access of the
vote since no vote is required.) The call is made to "retry until successful"
so, upon the return, we have acquired the resource. The following has been
updated in the resource block:
HSFLAG HSHVTP=access of vote, HSHCNT incremented (owned)
HSHFRK FORKX of running fork
HSHTIM TODCLK stamp when vote approved and token obtained
If, upon return from CFSGET, we do not need the newly created block (T1 is
non-zero), SHTADD will be called to return the extra block to the CFS pool.
** CFFBBS - Release BAT block lock **
Called by: ULKBAT in DSKALC
The routine CFFBBS simply calls CFSNDO to release the BAT block lock
token. See the discussion of CFSAWT/CFSAWP for a description of CFSNDO.
Page 27
5. ENQ/DEQ - In order to allow ENQ/DEQ to operate in a CFS environment, there
is a temporary CFS resource representing the ENQ on a file. Note that this is
not the same as global ENQ/DEQ across all the systems in a CFS environment.
This is always an exclusive resource.
Below is the table of contents fragment from CFSSRV which illustrates the
various ENQ related routines:
20. File enqueue resource manager
20.1. CFSENQ (Get ENQ resource) . . . . . . . . . . 112
20.2. CFSDEQ (Release ENQ resource) . . . . . . . . 113
Page 28
ENQ Token
---------
CFS routines: CFSENQ - Get ENQ resource
CFSDEQ - Release ENQ resource
Each time an ENQ file resource is first requested (an ENQ lock block is
created), CFS is called to register an exclusive ENQ resource for the file. If
the requesting processor succeeds in creating the ENQ resource, then the ENQ
will be allowed. Otherwise, the ENQ is denied.
** CFSENQ - Get ENQ resource **
Called by: CFETST in ENQ
Upon entry to CFSENQ, ENQSET is called to set up T1 and T2 with the proper
root and qualifier. Then CFSSPC is called to obtain a short resource block.
CFSSPC will return a block with HSFLAG, HSHPST, and HSHOKV zeroed as well as
HSHRET set to 1,,SHTADD. The following is then placed in the block:
HSHROT six-bit structure name
HSHQAL FILEEQ+index block address
HSFLAG HSHTYP=.HTOEX, HSHLCL is set
Next, a call is made to get the resource. CFSGET is called if the
structure is shared. If the structure is set exclusive, CFSGTL is used. (Note
that if CFSGTL is called, HSHVTP will not be updated with the access of the
vote since no vote is required.) The call is made to "try only once". If we
acquire the resource, the following has been updated in the resource block:
HSFLAG HSHVTP=access of vote, HSHCNT incremented (owned)
HSHFRK FORKX of running fork
HSHTIM TODCLK stamp when vote approved and token obtained
If, upon return from CFSGET, we do not need the newly created block (T1 is
non-zero), SHTADD will be called to return the extra block to the CFS pool.
** CFSDEQ - Release ENQ resource **
Called by: CRELOK in ENQ
LOKREL in ENQ
The routine CFSDEQ simply calls CFSNDO to release the ENQ token. See the
discussion of CFSAWT/CFSAWP for a description of CFSNDO.
Page 29
Appendix A