Trailing-Edge
-
PDP-10 Archives
-
bb-bt99g-bb
-
t1s702.d09
There is 1 other file named t1s702.d09 in the archive. Click here to see a list.
EDIT DESCRIPTIONS FOR TOPS-10-KS-V702
EDIT 11139 FOR 702
[SYMPTOM]
Monitor too slow.
[DIAGNOSIS]
Unnecessary calls to SETCHP when user sets or clears XOFFed
status for a terminal.
[CURE]
Only call SETCHP when necessary.
Note that there are two FILCOMs present. The first is for 7.02,
and the second is for 7.01A with MCO 10260 installed.
********************************************************************************
EDIT 11140 FOR 702
[SYMPTOM]
Sporadic "?Illegal address in UUO" messages when creating an .EXE
file via LINK on SMP systems.
[DIAGNOSIS]
The CLOSE UUO calls RELSEG to see if we just superseded a known
segment. We wander into FNDSGU, where we set up a path pointer in
.JDAT+SGAPPN. Then, we call FNDSEG, which calls SRCSEG, which gets
MM. When MM is not available, we do a context switch, thus clobbering
the dump ac we so cleverly put our path pointer in.
[CURE]
PUSH/POP SGAPPN around the call to UPMM in SRCSEG.
********************************************************************************
EDIT 11142 FOR 702
[SYMPTOM]
JBSET. UUO to set page fault timer traps will always set them for
the job doing the UUO, rather than for the requested target job.
[DIAGNOSIS]
Timer trap words are in the UPT, and we can't change UPTs at UUO
level.
[CURE]
Move the timer trap values to the PDB.
********************************************************************************
EDIT 11144 FOR 702
[SYMPTOM]
DEVCHR UUO returns wrong status for RSX20F lines which have done
a SET HOSTESS command.
[DIAGNOSIS]
No check made for VTM status.
[CURE]
Add code.
********************************************************************************
EDIT 11149 FOR 702
[SYMPTOM]
Batch PTYs get handed to unsuspecting users when connecting a PTY
to an MPX channel.
[DIAGNOSIS]
The batch PTY bit is not cleared before assigning the PTY to the
user's MPX channel.
[CURE]
Clear the bit.
********************************************************************************
EDIT 11150 FOR 702
[SYMPTOM]
Error return from FILOP. function .FOMTP (MTAPE) is confusing.
[DIAGNOSIS]
When we determine (from DEVIOS) that an error has occurred, we
return ERIPP% (No such PPN on structure).
[CURE]
Return the status bits instead. This makes .FOMTP consistent
with the other I/O functions to FILOP.
********************************************************************************
EDIT 11180 FOR 702
[SYMPTOM]
TTY characters lost on "high speed input", in particular with regard
to ANF-10/DN8x terminals.
[DIAGNOSIS]
The problem is several-fold. The "fixes" described below do not
constitute a rigid solution to the problem, they merely describe a set
of "adjustments" to ameliorate the problems. We believe that any
further improvement would require a change in the NCL protocol, and we
will keep this under consideration for possible future releases.
The following discussion is based on the 7.02 version of both the -10
and -11 code (although the -10 code (SCNSER) has not changed
substantially, the -11 terminal service has undergone significant work
since the 7.01A release).
I.
First and foremost is the "flow-control protocol" between the host
(TOPS-10) and the remote (typically a DN8x, but can also be another
host). Basically, when the -10 decides that the TTY input stream is
"full" a logical XOFF is transmitted to the terminal. For "hardwired
lines" such as the DZs on the KS-10 a real XOFF character is sent to
the terminals. For network terminals such as on the DN8x a "buffer
low" control message is sent to the remote - where it is the remote's
responsibility to cause the flow of data into the -10 to cease (and in
fact the DN8x sends an XOFF to the terminal on behalf of the -10,
bypassing the normal output character stream buffered up within the
-11).
Independently of the host's buffering and processing of characters,
the remote is also flow-controlling the terminal line, buffering
characters to be sent to the host. When the remote's local TTY input
stream is "full" it will XOFF the terminal independently of (and
essentially invisibly to) the host.
It is this parallel and simultaneous flow-control which leads to
confusion between the remote and the host. In particular, if the host
sends an XOFF just as the remote is deciding that it should also XOFF
the terminal, the remote will then shortly XON the terminal (as soon
as it can flush its pending buffered characters to the host and has
buffer space), so getting more terminal characters which are in turn
dutifully sent to the host, which will then run out of buffer space
and discard the "excess" characters.
The accompanying DNTTY patch addresses this problem by attempting to
"override" the remote (DN8x only, no provision is made herein for
mainframe hosts acting as remotes) flow control with the host flow
control (the "BIC #DS.IST,@J" instruction clears the fact that the
DN8x has sent an XOFF already, and relies on the terminal to honor the
XOFF generated, and so not push the DN8x into exerting its own flow
control.
This patch still leaves a narrow window, however, where the DN8x will
immediately re-assert DS.IST - if a character is in the process of
being received (by the hardware) as the XOFF is generated on behalf of
the host (and the DS.IST flag is cleared), and if that character
overflows the DN8x buffer "warning" level, then the DN8x will start
exerting its own flow control again, defeating the host's attempt to
override the DN8x. As soon as the DN8x can ship its buffered data to
the host, it will XON the terminal (clearing DS.IST), thus restarting
terminal input, and thus exceed the host buffering, resulting in lost
data, albeit somewhat less often.
To address this problem, the SNDXOF routine in the TOPS-10 monitor
(see the SCNSER FILCOM) is also modified. This results in the host's
periodically attempting to "retry" an XOFF if previous attempts to
XOFF the terminal have elicited no success. The exact definition of
"periodically" is left dependent on the buffering values of the host
and the remote (see "tuning" further down), and is chosen so that an
XOFF retry should only be attempted roughly once per possible
"spurious" XON in the remote. This also works equally well for
non-network terminals (which could conceivably have lost the XOFF for
their own reasons).
II.
A secondary problem now arises in the host ("SCNSER") code in the
calculation of warning and discard thresholds. For PIM mode input the
thresholds are calculated dynamically. Actually only the warning
threshold is dynamic, the discard limit is a fixed function of the
warning level, and it is this fixed-function discard limit that can
cause "unwarranted" loss of data.
First, the fixed offset is too small. Since a incoming DN8x terminal
data message can hold up to TTYMIC characters (see "tuning" further
down), where TTYMIC is by field-image default (^D60) greater than the
warning-to-discard offset (^D20), it is possible for a single input
data message to store some characters, exceed the warning level, queue
up a "buffer low" control message (see above), continue storing more
characters, and finally exceed the discard limit, thus discarding the
remainder of the data message.
However, even making the fixed-offset much larger still leaves open
another more insidious problem - it is possible for a previous data
message to have completely "fit" without exceeding the warning level,
then the warning level gets dynamically recalculated (because of a
sudden flurry of activity on a lot of terminals for example), dropping
by a sufficiently large amount that the new discard limit is less than
the previous warning level - ANY data now arriving will be discarded
with no warning whatsoever having been given to the terminal/sender.
For this reason, the warning and discard thresholds for PIM mode
terminals have been made static (ASCII and IMAGE mode were already
static, and so didn't suffer the dynamic threshold problem).
III.
Finally, some words regarding "tuning" for safe high speed input
operations.
It is important to bear in mind that the above "fixes" merely provide
the framework in which the window for loss of data can be made
arbitrarily small - THE POSSIBILITY STILL EXISTS FOR DATA TO BE LOST.
This section describes the tradeoffs involved in tuning for high speed
input.
First off, the parameters involved:
TTYMIC Defined in DNCNFG.P11, settable via DN8x configuration
parameter file at DN8x assembly time.
The size of the DN8x's terminal input "buffer" - how many
characters the DN8x will buffer before discarding excessive
input. The XOFF threshold is "TTYMIC-20".
Defined in SCNSER.MAC.
This parameter is also defined in SCNSER so that SCNSER can
calculate a reasonable XOFF retry interval based on the
"known" operation of the DN8x remotes.
Default value: 60 (decimal).
TTIBRK Defined in SCNSER.MAC.
The number of characters that SCNSER will buffer in ASCII mode
before triggering a "break" condition and causing the user
program to wake up and see a "line" of input available. Does
not affect IMAGE or PIM input.
Default value: 132 (decimal).
TTIWRN Defined in SCNSER.MAC.
The number of characters that SCNSER will buffer in either
ASCII or IMAGE modes before attempting to shut down the input
stream via a "buffer low" or XOFF condition.
Default value: 200 (decimal).
TTIMAX Defined in SCNSER.MAC.
The absolute maximum number of characters that SCNSER will
buffer for a given terminal line in ASCII or IMAGE modes
before discarding characters.
Default value: TTIWRN + 5*TTYMIC.
TTPBRK Defined in SCNSER.MAC.
The number of characters that SCNSER will buffer in PIM mode
before triggering a "break" condition and causing the user
program to wake up and see a buffer of input available. Does
not affect ASCII or IMAGE input.
Default value: 132 (decimal).
TTPWRN Defined in SCNSER.MAC.
The number of characters that SCNSER will buffer in PIM mode
before attempting to flow-control the input stream via a
"buffer low" or XOFF condition.
Default value: 500 (decimal).
TTPMAX Defined in SCNSER.MAC.
The absolute maximum number of characters that SCNSER will
buffer for a given terminal line in PIM mode before discarding
characters.
Default value: TTPWRN + 5*TTYMIC.
TTCHKN Defined in COMDEV.MAC, used in COMMON.MAC, settable via MONGEN
"symbol,value" dialog.
The total number of chunks available for SCNSER to distribute
among all terminals, on a demand basis.
Default value: 6 * <total number of TTYs+PTYs+CTYs in system>
The TTYMIC parameter directly controls the buffering available in the
DN8x, and indirectly controls the ultimate limit on throughput to the
host. The network terminal input protocol is essentially a
half-duplex protocol requiring that each terminal input data message
be ACKed before the remote can send another data message. (The term
"half duplex" used here is solely internal to the operation of the
network, the terminal is still a "full duplex TTY".) The larger the
value of TTYMIC, the larger the individual bursts of data to the host
can be, and the greater the possible throughput.
For a value of 60 (decimal), a DN87S can maintain a 7.5-8.0Kb rate of
data flow to the host for an indefinite period of time (all data rates
assume the terminal running at 9600 baud, PIM mode input (ASCII
typically about 10% less), no other load on the system, and a DN87S
talking to a KL10 via DTE, unless otherwise qualified). For a DN82
using a 19.2Kb link to a DN87S using a DTE to a KL10, the rate drops
to about 4.5-5.0Kb (this is the manifestation of the half-duplex
network terminal protocol). The DN87S is XOFFing the terminal line
about 10-15% of the time, while the DN82 is XOFFing the line almost
50% of the time.
Increasing TTYMIC to 120 (decimal) raises the DN87S data rate to about
9.0Kb, and the DN82 data rate to about 6.0Kb. The DN87S is
effectively never XOFFing the terminal (the test system generating the
data flow was itself only running at about 9.0Kb output of data),
while the dn82 was still XOFFing the terminal for a significant amount
of time.
Further raising TTYMIC to 180 (decimal) has no significant impact on
the DN87S, but raises the DN82 data rate to about 9.0Kb. At this
point neither the DN87S nor the DN82 is XOFFing the terminal.
The significance of the DN8x XOFFing the terminal is simply the
probability of exercising the window mentioned in "I" above - namely
the destructive interaction of the host and remote both trying to flow
control the same terminal at the same time. The less likely it is
that the DN8x must XOFF the terminal, the less likely it is for the
host flow control to be defeated by the DN8x flow control. The
tradeoff in increasing the size of TTYMIC is how much DN8x memory
("chunk") space will be dedicated to holding terminal characters,
although the difference between 60 and 180 is only two chunks
(assuming default 64 byte chunk size).
The TTIWRN/TTPWRN and TTIMAX/TTPMAX parameters carry a little more
significance, they control how much data the host must buffer on a
long-term interval (until the program gets around to reading the data,
on a loaded system this might be many seconds). (The remotes never
"store" data, they always ship it to the host as soon as possible.)
For any TTIWRN/TTPWRN value significantly over TTYMIC (say 2*TTYMIC)
an unloaded host can trivially keep up with the remote. The default
values selected for the TTIWRN/TTPWRN values seem reasonable for most
systems, the ASCII value assuming user typein (even a heavily loaded
system can keep up with most humans' ability to type), while the
larger PIM value is assumptive of more "pure data", probably at a
machine-generated rate many times faster than people can type.
The TTIMAX/TTPMAX limits are the critical limits - they determine when
the total datapath goes into an overrun condition (i.e., when the
datapath fails and loses data). They must be set sufficiently larger
than the corresponding TTIWRN/TTPWRN thresholds so that the total
datapath can "synchronize" and agree to come to a halt. In
particular, as described in "II" above they must be sufficiently large
to allow leeway for enough attempts at XOFFing the terminal (or "total
datapath", whatever happens to be out there) to ensure that at least
one XOFF works and the terminal stops sending input. The worst
(network) case would be a full (TTYMIC-sized) message just received,
and another full message's worth of data in the remote by the time the
buffer-low/XOFF message reaches it, so that each "retry leeway" would
require that the TTIMAX/TTPMAX limit be 2*TTYMIC larger than the
TTIWRN/TTPWRN threshold. In general, of course, the worst case is not
fully realized, so some further leeway is gained.
The default TTIMAX/TTPMAX values selected allow 5*TTYMIC total leeway.
The metric used in arriving at the 5*TTYMIC value was the observation
that (in the data rate tests described above, with the "system"
artifically loaded down so that the input program could only respond
at a 2.5Kb reading rate, and with TTYMIC=60) the host buffer low/XOFF
stood about a 10% chance of getting lost, requiring a second XOFF.
This second XOFF in its turn stood an equal chance of getting lost,
requiring a third XOFF about 1% of the time. In "many" minutes (about
20 actually) a fourth XOFF was never observed to be required, and no
characters were ever lost. (However it should also be pointed out
that ANY attempt to use a terminal line as a data path MUST be able to
tolerate loss and corruption of the character data - asychronous RS232
communications are notorious for being noisy and unreliable -
especially if routing over an unconditioned phone line, and very few
real-to-life applications obey the 50 foot maximum cable run allowed
for RS232 EIA links.)
The usual tradeoff applies for the TTIWRN/TTPWRN/TTIMAX/TTPMAX values
as well - the bigger the threshold, the more memory (chunks) are
required. SCNSER stores 12 characters per chunk, so a limit of 500
requires at least 42 chunks - and the overall system default is only 6
chunks per terminal! If more chunks are deemed necessary, the value
TTCHKN can be set via MONGEN to generate more chunks for the system.
Alternatively the monitor .EXE file can be patched (or the freshly
loaded but not yet started monitor can be patched via EDDT) by
changing the left half of location "TTCLST" to be the desired number
of chunks.
[CURE]
Embedded in above.
********************************************************************************
EDIT 11182 FOR 702
[SYMPTOM]
STOPCD FOP
[DIAGNOSIS]
User dismounts a pack that was spinning at once-only time. It
therefore used once-only core for its SAB, SPT, et al. On dismount,
FILFND calls GVFWDS to return the core. GVFWDS gets confused because
of its use of bit 0 in pointers sometimes but not others. Extended
addressing mnemonics were used where they aren't correct.
[CURE]
Turn several selected SSX(E)s into HRLIs. Turn several
SPUSH/SPOPs into PUSH/POPs. Note that this only causes a code change
for KI-paging systems. The only one of these that we currently
support is the KS.
********************************************************************************
EDIT 11232 FOR 702
[SYMPTOM]
IME at CLSNM1, etc.
[DIAGNOSIS]
A little known feature of TOPS10: Open device "DSK:" in buffered
mode and LOOKUP a UFD (or SFD). The LOOKUP will succeed for the 1st
STR in your JSL. Upon encountering EOF, however, the monitor will
perform an implicit LOOKUP on the 2nd STR, etc. The net result is to
concatenate the UFDs from all the STRs in your JSL.
Upon hitting EOF on the last STR, the DDB is left in a funny
state and subsequent UUOs will do bizarre things. An ENTER UUO will
cause an IME. A "REWIND" (i.e. USETI to block 1) will get an IO.IMP
error.
[CURE]
Upon exhausting the JSL, re-open the first STR before returning
EOF to the user. This will leave the DDB in a consistent state.
Moreover, a subsequent USETI 1 will get you to the right place.
********************************************************************************
EDIT 11245 FOR 702
[SYMPTOM]
ORION can't find out where a 'SEND OPR' command came from when
the sender wasn't logged in.
[DIAGNOSIS]
By the time the IPCF packet gets to ORION the sender's job is
gone or has been given to someone else.
[CURE]
Add a new QUEUE. UUO arg block, .QBTTY, and include it in the
IPCF message the monitor sends to ORION for a 'SEND OPR' command. Put
the SIXBIT TTY name of the sender and the corresponding node and line
numbers in the block. This new arg block should be reserved for the
monitor's use only -- ORION will only accept it if it comes from
[SYSTEM]GOPHER.
********************************************************************************
EDIT 11289 FOR 702
[SYMPTOM]
1. IME while doing an ENQC. status function.
2. Occasionally an unowned long-term lock will never go away.
3. Some locks don't get their date-time stamp fixed up on a SET
DATE or SET DAYTIME command.
4. It is possible for a job to log off with a permanent lock
still active, contrary to documentation.
5. The number of queued requests for a lock (returned by the
ENQC. status function) is always zero.
[DIAGNOSIS]
1. The ENQC. status function didn't allow for the possibility
that a long-term lock might not have any owners or queued
requests at all. The code simply treated the lock block as a
queue block entry, and things went downhill from there.
2. Off-by-one bug in ENQMIN (once-a-minute code).
3. Off-by-one bug in ENQSDT (set date-time code).
4. Off-by-one bug in ENQNDR ("no delete on reset" code).
5. After carefully storing the count of queued requests in the
left half of P2, QUESER does a LOAD of a half word quantity
into P2. The LOAD assembles to an HLRZ, which will zero the
left half of P2.
[CURE]
1. If there are no owners or queued requests, return -1 in the
right half of the first status word, and zero for everything
else.
2. SOJG ==> SOJGE
3. SOJG ==> SOJGE
4. SOJG ==> SOJGE
5. This last bug is very interesting. It would seem that no one
has ever used this field. However, instead of fixing it to
correspond to the documentation, change the documentation and
then change the code to reflect the new documentation. The
documentation will now state that the ENQC. status function
returns the number of sharers of the resource in the left
half of the third status word.
********************************************************************************
EDIT 11328 FOR 702
[SYMPTOM]
Funny space is exhausted. This causes numerous UUOs
to fail. In particular, the OPEN UUO fails. The user can't
even run a program.
[DIAGNOSIS]
Spooled DDBs are never returned to the free list. We
eventually run out of funny space.
[CURE]
Return to free list.
********************************************************************************
EDIT 11329 FOR 702
[SYMPTOM]
Spurious IO.IMP error while doing output to spooled
device.
[DIAGNOSIS]
The first output to a spooled device forces an ENTER
UUO. The monitor makes up a filename that it hopes will be
unique. If it is not unique (i.e. the ENTER returns error
code 4) then the monitor tries for a new filename.
Unfortunately, the error code is only 18 bits wide and the
code does a 36 bit compare.
[CURE]
Do 18 bit compare. Check for both error codes 3 and 4
(file being modified, and file already exists,
respectively).
********************************************************************************
EDIT 11348 FOR 702
[SYMPTOM]
DZ based dataset lines behave VERY slowly on a KS10. It may take
tens of seconds for the autobaud character to be recognized.
[DIAGNOSIS]
DZ dataset timing is based on being called once/tic. If M.STOF
is non-zero, it won't be. So with M.STOF=7 (a typical value for the
KS), it takes 16 seconds for carrier stabilization to time out.
(Instead of 2)
[CURE]
Teach DZQADD to allow for M.STOF. Make STOPAT global - this
makes it easier for those folks who like to patch this on the fly as
well.
********************************************************************************
EDIT 11354 FOR 702
[SYMPTOM]
New: Add a new IPCF message from the gopher to MDA. This
message will inform QUASAR that a structure has been mounted by a job
other than PULSAR. The new IPCC function is .IPCST (function 47).
[DIAGNOSIS]
[CURE]
********************************************************************************
EDIT 11358 FOR 702
[SYMPTOM]
Performance problem.
[DIAGNOSIS]
"KEEP ME" bit doesn't get turned on in section 0 and 1 map
pointers.
[CURE]
Turn it on.
********************************************************************************
EDIT 11395 FOR 702
[SYMPTOM]
User ACs get clobbered if an E, D, or VERSION command requires
that a page be paged in or have access allowed set for it.
[DIAGNOSIS]
Previous job run ACs get stored when the command completes.
[CURE]
Save/restore user ACs in/from .USUAC.
********************************************************************************
EDIT 11397 FOR 702
[SYMPTOM]
Too much funny space used for spool parameter messages. QUASAR
crashes.
[DIAGNOSIS]
IPCSER computes a value for the length of a SPB which is one word
too long. This causes QUASAR's template not to match. If a customer
adds a word to the SPB, the only rational place is at the end of the
standard block - Since the format must match SPPRM. Result is first
customer word gets garbage.
[CURE]
+1 becomes +0
********************************************************************************
EDIT 11431 FOR 702
[SYMPTOM]
1). When renaming a file across SFD's, "set watch files" will
type the wrong SFD (it types the old SFD not the new one).
2). The typeout may IME.
3). If the RENAME UUO gets error code 17 (partial allocation
error), then "set watch files" will type the wrong error code.
[DIAGNOSIS]
1). The typeout is done at the time of UUO exit. The UUO leaves
garbage in DEVSFD so the typeout gets the wrong SFD.
2). Even if DEVSFD pointed at the right SFD, the use count is
not up so we would have no guarantee that the core grabber didn't
steal the NMB out from under us.
3). If the RENAME gets error 17, then M is left pointing at the
wrong place so the typeout can't find the error code.
[CURE]
The typeout must be done before the RENAME UUO closes the file.
It must be done while DEVSFD still points at the right SFD. It must
be done while the use counts are still up. It must be done while M
still points at something meaningful. It should not be done, however,
until after we know for certain whether or not the UUO will get an
error code.
Upon doing the typeout, light a bit in .USBTS to flag that the
code at UUO exit shouldn't do the typeout a second time.
********************************************************************************
EDIT 11438 FOR 702
[SYMPTOM]
A PATH UUO of device SSL: returns your default PPN but doesn't
tell you the SFD.
[DIAGNOSIS]
Device SSL: should work just like device ALL:. It should return
the default path (including SFDs) but leave PT.IPP zero (i.e. no
implied path).
[CURE]
Make SSL: work like ALL:.
********************************************************************************
EDIT 11448 FOR 702
[SYMPTOM]
1. If a remote node crashes (or the path to the remote node is
othewise lost) whilst a CONNECT to that node is pending then
the program or job that issued the CONNECT (e.g., OPEN
monitor call or ASSIGN monitor command) is stuck in network
Event Wait.
2. Programs using PSI trapping are not informed when network
devices (especially TSK devices) go "offline" due to the
remote node's having crashed (or otherwise become
inaccessible).
[DIAGNOSIS]
1. The code to handle "Node Down" in NETSER was overly paranoid
about not NETWAKing a job when it shouldn't. In particular,
for the case of a CONNECT awaiting confirmation, the DDB will
NOT have either ASSCON or ASSPRG set for the device. These
flags won't get set until NETSER has successfully connected
the device and returned it to UUOCON for active use.
2. This is essentially the same as "1" above, aggravated by the
general lack of consistency of handling of this case by the
various device-type-specific service routines in NETDEV.
[CURE]
Both problems are fixed via a general cleanup of the "Network
Disconnect" and "Node Down" processing. This fix implements a new
extended I/O status code IONDD% which is returned when a network
device initiates a disconnect (rather than the owning job). This
error code distinguishes a "Node Down" condition (extended error code
IONND%) from a disconnecting device (currently only TSK can initiate a
disconnect).
********************************************************************************
EDIT 11468 FOR 702
[SYMPTOM]
CMU on KS after removing once-only structure
[DIAGNOSIS]
CHKTAL checks for free pages starting at SYSSIZ. ONCE-only core
is below syssiz. MCO 11182 allows this core to become free, as
intended.
[CURE]
Don't. Check all memory instead.
********************************************************************************
EDIT 11474 FOR 702
[SYMPTOM]
IME using DCP
[DIAGNOSIS]
OneWordGlobalBytePointeritis
[CURE]
Revamp EBI2BI routine in NETSER
********************************************************************************
END OF TOPS-10-KS-V702