Trailing-Edge
-
PDP-10 Archives
-
tops20_v6_1_tcpip_distribution_tp_ft6
-
6-1-documentation/tops20.tco
There are 24 other files named tops20.tco in the archive. Click here to see a list.
TCO-number: 6.1.1000
Written-by: MCINTEE Creation-date: 24-Feb-83 10:03:57
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Problem:
Diagnosis:
Solution:
[End of TCO 6.1.1000]
TCO-number: 6.1.1003
Written-by: GUNN Creation-date: 20-Jul-83 12:21:32
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: MONITOR
Routines-affected: LLMOP STG MONSYM
Problem: Digital Network Architecture (DNA) Phase IV requires
minimum subset Low Level Maintenance OPeration (LLMOP) support
for Ethernet.
Diagnosis: Need to add code to TOPS-20 Monitor to implement
part of the LLMOP functions.
Solution: Add new module LLMOP and code in various other
modules to implement Ethernet Loopback Protocol Server,
Remote Console Protocol Server, and LLMOP% JSYS as interface
to Ethernet Loopback Requestor and Remote Console Requestor.
[End of TCO 6.1.1003]
TCO-number: 6.1.1004
Written-by: GLINDELL Creation-date: 28-Nov-83 11:15:44
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCJSYS MONSYM STG PROLOG GLOB
Problem:
There is a need for a way to set and read link parameters and quotas
for a logical link.
Diagnosis:
Not needed before.
Solution:
Add 4 new MTOPR functions: set/read link parameters, and set/read link
quotas.
[End of TCO 6.1.1004]
TCO-number: 6.1.1007
Written-by: GLINDELL Creation-date: 12-Jun-84 16:37:36
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: scjsys
Problem: DECnet X25 object numbers do not have names defined
Diagnosis: Not needed before
Solution: Add names/object numbers X25GAT/31, X29SRV/34 and X25HST/36
[End of TCO 6.1.1007]
TCO-number: 6.1.1008
Written-by: GLINDELL Creation-date: 3-Jul-84 13:46:57
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: jntman
Problem:
DECnet startup is slow when there are many nodes to define. Engineering
net currently has more than 2000 nodes defined.
Diagnosis:
Solution:
Add a new function to the NODE% jsys that allows a table of node names
and numbers to be inserted into the monitor. The SETNOD program will use
this function.
[End of TCO 6.1.1008]
TCO-number: 6.1.1009
Written-by: PAETZOLD Creation-date: 18-Jul-84 19:14:10
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: PHYSIO PHYH2 STG pagutl
Problem: :
Low on address space. However need to support 4 meg of memory.
Diagnosis: :
Need to move more stuff out of PC section.
Solution: :
Move CST5.
[End of TCO 6.1.1009]
TCO-number: 6.1.1010
Written-by: PRATT Creation-date: 30-Jul-84 11:17:48
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG GLOBS IPNIDV ANAUNV IPIPIP IMPDV
IPFREE MNETDV MONSYM PARAMS TTYSRV TTANDV
TTPHDV
Problem: Can't transmit TCP/IP over Ethernet
Diagnosis: No code
Solution: Write the code
In addition, changes are needed to TTYSRV, and TTANDV
so that TTANDV can assemble independently of TTYSRV.
[End of TCO 6.1.1010]
TCO-number: 6.1.1011
Written-by: GROSSMAN Creation-date: 17-Aug-84 20:22:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: MONITOR
Routines-affected: MONSYM STG
Problem: User's can't do Ethernet functions directly.
Diagnosis: No interface.
Solution: Add the NI% JSYS. The functions come later.
[End of TCO 6.1.1011]
TCO-number: 6.1.1021
Written-by: GLINDELL Creation-date: 10-Oct-84 16:03:40
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: APRSRV globs JSYSA LDINIT MEXEC PAGEM
pagutl PROLOG STG
Problem: Merge 6.1 address space changes.
[End of TCO 6.1.1021]
TCO-number: 6.1.1022
Written-by: PRATT Creation-date: 12-Oct-84 06:49:46
Edited-by: PRATT Edit-date: 25-Oct-84 13:51:36
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA TTYSRV LATSRV TTANDV TTPHDV MONSYM
Problem:
There is no jsys which provides a means for finding out
the originating node for a given job coming into the 20
from a network.
Diagnosis: No code
Solution:
Add a new jsys call NTINF% which will be a generic network
information jsys. The new standard jsys calling sequence
will be used for passing arguments.
Given the terminal number, job #, or -1 (for self), function
.NWRRH will return the remote hostname for the job.
[End of TCO 6.1.1022]
TCO-number: 6.1.1024
Written-by: PAETZOLD Creation-date: 17-Oct-84 20:34:25
Edited-by: PAETZOLD Edit-date: 17-Oct-84 21:31:06
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: MNETDV
Problem:
No way to return list of Internet addresses for the system.
Diagnosis:
Oversight.
Solution:
Add function .GTHLA to the GTHST% JSYS. Calling sequence (simliar to
conventions for this JSYS) is:
1/ .GTHLA
3/ Destination Address
4/ Count of items to return
Non skip return indicates error (ARGX24 only possible)
skip return indicates success. T4 contains count of items returned.
[End of TCO 6.1.1024]
TCO-number: 6.1.1026
Written-by: PRATT Creation-date: 20-Oct-84 17:55:06
Edited-by: PRATT Edit-date: 22-Oct-84 12:37:12
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: COMND
Related-QAR: 706037
Problem:
We have a way of retrieving the last command if it had an error
but have no way to retrieve it if that command completed sucessfully.
Diagnosis: No code
Solution: Add the code which allows ^H to retrieve the last
command without the confirmation character.
[End of TCO 6.1.1026]
TCO-number: 6.1.1030
Written-by: PRATT Creation-date: 1-Nov-84 14:14:57
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: ANAUNV STG GLOBS MNETDV IPIPIP
Problem: No way to run TCP/IP over the CI
Diagnosis: No code
Solution:
Create a new module called IPCIDV which interfaces
Multinet to SCA.
Change MNETDV so that it accepts an IPCI device
Change ANAUNV to build an NCT for IPCI
Change IPIPIP to call CIPSRV from the Internet fork
Change STG to define storage for IPCIDV
[End of TCO 6.1.1030]
TCO-number: 6.1.1032
Written-by: PRATT Creation-date: 5-Nov-84 16:30:19
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PROLOG STG GLOBS TTYSRV TTPHDV TTANDV
TTYDEF
Problem:
TTYSRV/TTYDEF can't be compiled in the normal M61: area
for Arpanet monitors or any other monitor which would turn
off LAT, or CTERM.
Also, we still have lots of references to line types which
don't exist, such as the DZ, DC, and RP line types.
Diagnosis:
TTYDEF.MAC contains conditonal assembly for Arpanet, LAT, CTERM
and NRT. TTYSRV uses TTYDEF.UNV to find out which line types
to assemble into the TDCALL's and other things. What results is a
TTYSRV.REL which may or may not have any one of the particular
line types turned on even though the monitor is built with the
corresonding device dependent code. BADTTY buglhts usually result.
Further complications can occur when assembling the device
dependent modules with an unknown TTYDEF.UNV.
Solution:
Remove traces of the DC, DZ, and RP line types and the
KCFLG conditional code. Save the code for historical reasons
in a module called TTDZKC.
Update the line type values in PROLOG
Change the names of some local routines within TTYSRV which the
NRT and FE code references because they are already globally
defined elsewhere in the monitor.
Remove the TDCALL macros from TTYDEF and rewrite them so they
always assemble the device specific code. Dummy symbols will
be defined within STG if the device specific code is not loaded.
Move some storage that was defined within TTYSRV to STG
Move the FE line code to a module called RSXSRV.MAC
Move the NRT code to a module called NRTSRV.MAC
Make TTANDV.MAC become TVTSRV.MAC
Make a bunch of symbols global.
Add the LOADMODULES for the new device specific modules
Turn on the device specific code based on the global
flags used for each network or device.
[End of TCO 6.1.1032]
TCO-number: 6.1.1033
Written-by: GLINDELL Creation-date: 6-Nov-84 17:10:38
Edited-by: GLINDELL Edit-date: 6-Nov-84 17:20:17
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: MONITOR
Routines-affected: sclink
Problem: In a large network, there will be a lot of 'node online'
and 'node offline' messages. What the operator is probably interested
in seeing is if any nodes go offline that people have open links to.
Diagnosis:
Solution: Remove the 'node online' and 'node offline' code from GALAXY.
Make SCLINK generate 'link broken' messages.
When SCLINK discovers that a user link is broken, the node in question
will be added to an offline table. Every 10 seconds the table will be
checked, and an operator message will be generated if the table was
non-empty. The message format will be (approximately)
15:22:03 -- Message from monitor --
User links to the following DECnet nodes were broken:
KL2102 GIDNEY CLOYD
If there are more than 5 nodes, only the first 5 will be typed out followed
by 'and more'.
The operator will be able to suppress typeout of these messages by
DISABLE OUTPUT-DISPLAY (OF) DECNET-LINK-MESSAGES
[End of TCO 6.1.1033]
TCO-number: 6.1.1035
Written-by: PALMIERI Creation-date: 7-Nov-84 15:20:44
Edited-by: PALMIERI Edit-date: 7-Nov-84 16:11:28
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: ROUTER DNADLL D36COM SCLINK JNTMAN D36PAR
Problem: Can't send buffers larger than 576 on the Ethernet or the CI.
Diagnosis: No code
Solution: Add code to select a receive blocksize based on the circuit
type. Create DECnet buffers that are as large as the largest receive
blocksize. The default size is 576 bytes and a larger or smaller
blocksize may be selected in CONFIG.CMD. Provide a routine for the
session control to determine the largest blocksize that it can use on
transmit for a given logical link. Large buffers can only be used to
adjacent nodes which support large blocksizes. If large buffers are
in use over a circuit and the circuit fails another path to the
adjacent node may be selected. If the new circuit has a smaller
blocksize than the previous the link will be aborted.
[End of TCO 6.1.1035]
TCO-number: 6.1.1036
Written-by: PAETZOLD Creation-date: 7-Nov-84 18:33:24
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PROLOG
Problem:
XNENT and XRENT need to be able to define global symbols.
Diagnosis:
They do not do this now.
Solution:
Add an optional argument to the macro. If it is set to a non null string
("G" is recommended) then make the symbol internal.
[End of TCO 6.1.1036]
TCO-number: 6.1.1037
Written-by: PAETZOLD Creation-date: 9-Nov-84 11:13:28
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PROLOG
Problem:
No easy way to call into msec1 from xcdsec.
Diagnosis:
Solution:
add a new macro called callx which is just like xcall except that it does
not do a EA.ENT.
[End of TCO 6.1.1037]
TCO-number: 6.1.1038
Written-by: GROSSMAN Creation-date: 9-Nov-84 13:02:56
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG
Problem: Too much scheduler overhead
Diagnosis: LV8CHK does a CALL R 50 times a second.
Solution: Remove the CALL R in LV8CHK. Actually, it is really a CALL NISCH.
Since PHYKNI does not need a scheduler level entry point, NISCH was redefined
to be R.
[End of TCO 6.1.1038]
TCO-number: 6.1.1039
Written-by: PAETZOLD Creation-date: 9-Nov-84 18:17:12
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: SCHED
Problem:
MRETN to monitor context from XCDSEC causes previous caller ACs to not be
restored correctly.
Diagnosis:
MRETN is running in XCDSEC also (in this case). Microcode problem causes
the BLT to restore ACs to fail.
Solution:
Force monitor into section one at MRETN1.
[End of TCO 6.1.1039]
TCO-number: 6.1.1040
Written-by: PRATT Creation-date: 9-Nov-84 18:30:50
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CTHSRV
Problem:
Setting the pause/unpause characters on a system you are
CTERM'd, can generate a confusing message from the cterm-server
and also mess up the echoing of those characters.
Diagnosis:
There were actually a few problems:
The CTHPPC routine sends a message to tell the host to
change it's local echoing for the pause/unpause characters.
After entering the routine, by mistake, the code tries to save the
characters twice and unfortunately does it wrong both times.
The 1st time it is saved in the wrong AC, and the 2nd time
it picks up the characters out of a smashed AC.
This produced weird characters which were sent to the
cterm-server program on the other system and exercises
a bug there as well.
Solution:
Save the characters in the right AC before they are smashed.
[End of TCO 6.1.1040]
TCO-number: 6.1.1041
Written-by: PAETZOLD Creation-date: 11-Nov-84 11:39:33
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: IPIPIP IPFREE IPNIDV IPCIDV TCPTCP TCPCRC
TCPBBN TCPJFN IMPANX IMPDV MNETDV ANAUNV
TVTSRV STG
Problem:
Address space.
Diagnosis:
Not enough.
Solution:
Move ARPANET code to XCDSEC. Move almost all of it. Some of the TCOPR%
JSYS code remains in MSEC1, TVTSRV remains in MSEC1 but calls into XCDSEC.
This frees up 37 pages in MSEC1.
[End of TCO 6.1.1041]
TCO-number: 6.1.1045
Written-by: GROSSMAN Creation-date: 12-Nov-84 17:09:02
Edited-by: GROSSMAN Edit-date: 13-Nov-84 01:27:51
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FORK JSYSA NIPAR NISRV MONSYM NIUSR
PHYKNI
Problem: No NI% JSYS code.
Diagnosis: Code not written.
Solution: Write the code.
Note that a new KNILDR is required. Also note that the new ERRMES.BIN
should be put up.
[End of TCO 6.1.1045]
TCO-number: 6.1.1046
Written-by: GROSSMAN Creation-date: 12-Nov-84 23:36:57
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: nisrv
Problem: Closing of an NISRV portal occasionally results in a KNIBER BUGHLT.
Diagnosis: Overly paranoid programmer. KNIBER (KNI Bad Error Return) happens
whenever PHYKNI gives an unexpected error return to NISRV.
Solution: There is no need for such paranoia, pass the error upwards and
let the caller deal with the problem (the problem is usually a memory
allocation failure).
[End of TCO 6.1.1046]
TCO-number: 6.1.1047
Written-by: GROSSMAN Creation-date: 13-Nov-84 00:00:42
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: phykni
Problem: Various and sundry fixes in preparation for NI% JSYS:
1) PHYKNI can now abort commands for a channel that isn't running. This
causes the command to be returned (via the callback mechanism) with the
error code UNCAB%. This means that a portal can now be closed even
though the channel is dead.
2) A mousetrap has been put in to help track down the spurious KNISTP's
that people have been seeing. This will cause a KNICRS (Can't Read
Station Info) BUGCHK to print out if PHYKNI is unable to queue up a
command to the port to see if it's alive.
3) KNISTP was fixed so that it will really stop the KLNI. This allows
KNILDR to dump and reload it. (Unfortunately, KNILDR has a bug which
currently prevents this from happening).
4) Fix PXCT bug in FIXBSD when dealing with user mode addresses.
5) Make sure that Receive Failure Bit mask is 0 if Receive Failure count
is 0.
6) Fix race in NISTP.
[End of TCO 6.1.1047]
TCO-number: 6.1.1048
Written-by: GROSSMAN Creation-date: 13-Nov-84 00:15:06
Edited-by: GROSSMAN Edit-date: 13-Nov-84 00:21:00
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NIPAR
Problem: KLNI state symbols in the monitor do not correspond with NI%
state symbols in MONSYM.
Symbols aren't global.
TABENT macro cannot deal with arguments that expand to multiple lines of
macro.
Diagnosis:
Solution: Make all UNS.xx symbols in the monitor correspond with .NISxx
symbols in MONSYM.
Make all definitions in NIPAR be global to avoid confusion if values should
happen to change.
Rewrite TABENT and friends. Now you can generate a table with LOADs and
STORs as arguments.
[End of TCO 6.1.1048]
TCO-number: 6.1.1051
Written-by: GROSSMAN Creation-date: 13-Nov-84 16:48:19
Edited-by: GROSSMAN Edit-date: 13-Nov-84 16:51:15
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI
Problem: Programs using NISRV can get hung if KNILDR never completes reload
of the KLNI.
Diagnosis:
Solution: Time out KNILDR. If KNILDR doesn't complete in 15. seconds, then
a KNIRTO BUGCHK will occur. We will also put the port in the "Can't Reload"
state, and all portals will be informed that the KLNI is now OFF.
[End of TCO 6.1.1051]
TCO-number: 6.1.1052
Written-by: GROSSMAN Creation-date: 13-Nov-84 22:55:33
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: APRSRV
Problem: EA.ENT much too slow.
Diagnosis: The routine $EAENT was written in the stone age of extended
addressing.
Solution: Take advantage of new inventions (like XJRST). This considerably
simplifies switching from section 0 to section 1.
[End of TCO 6.1.1052]
TCO-number: 6.1.1054
Written-by: GROSSMAN Creation-date: 15-Nov-84 14:06:27
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MONSYM NIUSR
Problem: NI% JSYS symbols conflict with other symbols in the monitor.
Solution: Change prefix of Buffer Descriptor blocks from BD to BX. Change
prefix of NI% JSYS argument block from NI to EI.
Change a spec. Change 7 programs.
[End of TCO 6.1.1054]
TCO-number: 6.1.1055
Written-by: MELOHN Creation-date: 16-Nov-84 19:26:05
Edited-by: MELOHN Edit-date: 17-Nov-84 16:24:02
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA SETSPD GLOB LATSRV
Problem: Need SETSPD command and corresponding SMON% to set default LAT state
at startup time.
Diagnosis: No Code.
Solution: Add new SMON function to set LAT initial startup state. Change SETSPD
to set this state (default is LAT ON). Users can set LAT state off, and then use
LCP commands (LATOP%) to set groups, IDs, etc, before turning LAT on. Most users
will ignore this command, and LAT will come by default, with LAT group 0
enabled.
[End of TCO 6.1.1055]
TCO-number: 6.1.1056
Written-by: PRATT Creation-date: 18-Nov-84 13:23:53
Edited-by: PRATT Edit-date: 18-Nov-84 13:30:27
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: COMND
Related-QAR: 706236
Problem:
COMND returns NPXAMB (ambiguous) when given a FDB list of
.CMSWI followed by any other function if the user types
"/<ESC>".
COMND should just beep and continue parsing.
Diagnosis:
CMAMBT gets called when an escape is seen and there is no data
in the atom buffer. It then checks for another FDB in the list
and if it finds one, attempts to parse using the new one. Since a
"/" was already typed the next FDB in the list will probably not
be able to parse causing the error return.
Solution:
Check to see if we have a prefix character. If so, do not
try to parse the next FDB, just beep and continue trying
to parse this field.
[End of TCO 6.1.1056]
TCO-number: 6.1.1059
Written-by: MELOHN Creation-date: 19-Nov-84 15:26:59
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: TTYSRV CTHSRV
Problem: Detaching a job from a CTERM terminal leaves the terminal in a
wierd state, where only the escape sequence and control C do anything.
Diagnosis: The TDCALL for detaching a CTERM terminal does a front end request
which doesn't do much for a non-front end terminal.
Solution: Change the TDCALL to work just like NRT and LAT terminals, which
don't do FE requests.
[End of TCO 6.1.1059]
TCO-number: 6.1.1061
Written-by: GROSSMAN Creation-date: 20-Nov-84 00:57:44
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI
Problem: DECnet and LAT do not survive KLNI reloads.
Diagnosis: State mappings in PHYKNI are incorrect.
Solution: Rewrite SETSTA. Make it table driven.
[End of TCO 6.1.1061]
TCO-number: 6.1.1062
Written-by: HAUDEL Creation-date: 21-Nov-84 08:01:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGUTL
Problem: Monitor code moved to extended sections and the "keep"
bit is not set for machines that have the MCA25.
Diagnosis: No code to do so.
Solution: Add code. The "keep" bit will now be set for RSCOD,RSDAT,
RSVAR,XRCOD,and XRVAR. The CSTs will also have the "keep" bit set.
[End of TCO 6.1.1062]
TCO-number: 6.1.1064
Written-by: GROSSMAN Creation-date: 28-Nov-84 13:44:13
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NISRV
Problem: NISRV is dependant upon D36COM.
Diagnosis: NISRV uses D36COMs memory manager.
Solution: Use the memory manager in FREE (ie: ASGRES/RELRES).
[End of TCO 6.1.1064]
TCO-number: 6.1.1067
Written-by: GROSSMAN Creation-date: 3-Dec-84 13:40:02
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NITEST
Problem: NITEST uses DNGWDS.
Solution: Make it use ASGRES.
[End of TCO 6.1.1067]
TCO-number: 6.1.1072
Written-by: GROSSMAN Creation-date: 5-Dec-84 00:33:22
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: MONITOR
Routines-affected: NISRV PHYKNI
Problem: 1) Unused code in NISRV.
2) Error responses not handled appropriately.
Solution: 1) Remove unused code.
2) Dispatch on various error types instead of using catchall KNISCE. We
now have:
KNIBLV (Halt) - Buffer length violation
KNIIEC (Halt) - Illegal error code
KNICCF (Chk) - Carrier check failed
KNICDF (Chk) - Collision detect check failed
KNIFTL (Chk) - Frame too long
KNIRFD (Chk) - Remote failure to defer
KNIFTS (Halt) - Frame too short
KNIDOV (Chk) - NIA buffer overrun
Some of these errors are also passed up to the user in the form of an NISRV
error code. Some errors that used to be reported via KNISCE now just get
passed up to the user (such as Queue length violations).
[End of TCO 6.1.1072]
TCO-number: 6.1.1074
Written-by: GUNN Creation-date: 6-Dec-84 14:58:47
Edited-by: GUNN Edit-date: 8-Jan-85 17:01:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: Yes Hardware-related: No
Program: MONITOR
Routines-affected: IPCF FREE
Related-SPR: 18886 20473
Problem: @RETRIEVE *.* command on directory with large number of files
(300+) may encounter random failures for some files. For example;
@RETRIEVE *.*
F001.DAT.1 [OK]
F002.DAT.1 [OK]
...
F213.DAT.1 [OK]
F214.DAT.1 Archive system request not completed
F215.DAT.1 [OK]
F216.DAT.1 Archive system request not completed
F217.DAT.1 Archive system request not completed
F218.DAT.1 [OK]
...
@
Diagnosis: The ARCF% JSYS function .ARRST sends an IPCF
packet to QUASAR for each file to be retrieved. The PID QUASAR uses to
receive its IPCF packets is not quota controlled. Any sender has the
potential to 'flood' QUASAR, especially under conditions where
QUASAR might not be able to receive and process its packets in a
timely fashion. It is possible under these conditions for all of the
IPCF free space to be used up temporarily until QUASAR receives the
packets and releases the space.
The routine ARCMSG in IPCF is responsible for sending packets
to QUASAR for the archive functions. It calls the common routine
MESTOR to send the packets. MESTOR can fail in two cases which are
potentially recoverable, if the receivers PID is over quota (IPCFX7),
or the call to ASGIPC fails to get free space for the packet.
Currently, ARCMSG doesn't return error information to its caller.
There is code that attempts to protect against over quota failures by
going OKINT and DISMS'ing until the receiver has gone back under
quota, but this code has the potential of leaving the caller NOINT.
Solution: Make ARCMSG return an error code in T1 on failure. Have
ARCMSG pass up the error code from ASGIPC or MESTOR failures. Add a
mechanism to RELIPC to flag when free space is again available and
have callers (particularly code at ARRFR in JSYSF) of ARCMSG go OKINT
and DISMS until the recoverable conditions have changed and try again.
[End of TCO 6.1.1074]
TCO-number: 6.1.1076
Written-by: MCCOLLUM Creation-date: 7-Dec-84 16:38:41
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: COMND
Problem:
MONNEJ bugchks
Diagnosis:
Missing ERJMPs after MTOPRs in COMND
Solution:
Add ERJMPs
[End of TCO 6.1.1076]
TCO-number: 6.1.1079
Written-by: PAETZOLD Creation-date: 11-Dec-84 13:13:27
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: mnetdv
Problem:
ATNVT% referencing section 1 stuff from 6 without proper care.
Diagnosis:
EBD.
Solution:
Use an XJRST when referencing TVTJFN.
[End of TCO 6.1.1079]
TCO-number: 6.1.1080
Written-by: GROSSMAN Creation-date: 11-Dec-84 14:38:33
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NIPAR
Problem: Bit masks returned by Read Channel Counters funtion of NISRV
are undefined.
Solution: Define the bits. They live in the fields CCRFM and CCSFM (receive
and send failure masks).
[End of TCO 6.1.1080]
TCO-number: 6.1.1081
Written-by: GROSSMAN Creation-date: 11-Dec-84 14:50:32
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NISRV
Problem: If the .NICLO function of NISRV fails, the portal that was being
closed can no longer be used in any way (such as re-trying the .NICLO
function).
Diagnosis: When .NICLO closes a portal, it sets the "closing" flag for that
portal. This flag prevents people from doing anything with the portal while
it is being closed. Unfortunately, when an error occurred during a close,
the "closed" flag was not being reset, and therefore nobody could play with
the portal anymore.
The most common error that occurs during an close is a resource error.
Usually, this happens during system startup, or heavy Ethernet traffic.
Solution: Clear the "closing" flag when giving an error back to the user.
[End of TCO 6.1.1081]
TCO-number: 6.1.1082
Written-by: GROSSMAN Creation-date: 11-Dec-84 15:20:01
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NIUSR
Problem: The NI% JSYS:
1) Buffer status is not being returned upon completion of xmits and rcvs.
2) ^C can't get out of a blocking receive.
3) Random user ?Ill mem refs from programs that use the NI% JSYS.
Diagnosis: 1) Oops
2) Wrong check in the receive complete and transmit complete scheduler tests.
3) User was putting a receive buffer on a copy on write page. The receive
buffer code lock the page down. The user then attempts to modify the page,
which causes the monitor to attempt to give the user his own copy of the
page. Unfortunately, the page is locked down (by NIUSR) and PAGEM refuses
to do the copy on write. This eventually turns into an ?Illegal write ref...
Solution: 1) Write the code.
2) Check FKPS1 instead of FKPS0 inside the scheduler test.
3) Attempt to write a byte into the user's receive buffer. If the page is
copy on write, he will get his own private writeable copy of the page and
all is well. If the page is not writeable, he will get an illegal write
reference trap of some sort. If the page is writeable, no problem.
[End of TCO 6.1.1082]
TCO-number: 6.1.1086
Written-by: GRANT Creation-date: 13-Dec-84 07:01:34
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: phyklp scampi
Problem: Poorly coordinated BUGxxx.
Diagnosis: SCA and CI-20 device driver don't handle the closing of a virual
circuit in the same manner when it comes to outputting stuff on
the CTY.
Solution: Remove SCACVC. Make SCATMO a BUGINF rather than a BUGCHK.
Create KLPCVC (closed virutal circuit). Change KLPNUP to KLPOVC
(opened virtual circuit).
Now, whenever TOPS-20 opens a virtual circuit you will get a KLPOVC
and whenever TOPS-20 closes a virutal circuit you will get a KLPCVC.
[End of TCO 6.1.1086]
TCO-number: 6.1.1087
Written-by: HAUDEL Creation-date: 13-Dec-84 10:09:26
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MONSYM
Problem: DIAG% functions for the Reading and Writing of maintenance data
do not have any entries in MONSYM.
Diagnosis: Entries never added.
Solution: Add .DGWMD and .DGRMD to Monsym.
[End of TCO 6.1.1087]
TCO-number: 6.1.1088
Written-by: MELOHN Creation-date: 13-Dec-84 15:24:47
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: latsrv
Problem: If LAT-STATE is OFF in config file and user attempts to connect
to host before the first multicast message is sent out, system crashes with
SKDPF1.
Diagnosis: If a start message is recieved from a LAT server and the LAT
circuit state is off, routine HMSTRT calls LCLHLT to shut down the circuit.
Since no circuit yet exists, we get a SKDPF1.
Solution: Re-arrange the checks in HMSTRT so that we don't bother checking
any circuit related parameters if the circuit doesn't exist yet.
[End of TCO 6.1.1088]
TCO-number: 6.1.1089
Written-by: MELOHN Creation-date: 13-Dec-84 16:02:12
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CTHSRV
Related-TCO: 6.1.1058
Problem: System crashes with ILMNRF before TCO 6.1.1058, and MCLNSK after.
Diagnosis:
(courtesy of Gunnar Lindell)
1. User is running on a CTERM line and does a jsys that will affect the
PSI system, for instance ATI or STIW.
2. Since the PSI system will be affected, the fork lock is acquired.
As a consequence, the job goes CRSKED.
3. TTYSRV is called by the jsys to process the function. TTYSRV gets
the terminal lock (LCKTTY). As a consequence, the job goes NOINT.
4. The connection with the remote host is broken some time after that
ULKTTY was called, and before (6).
5. TTYSRV calls CTHSRV to process the function (CTHSPS for instance).
The first thing CTHSRV does is to lock the CTERM database (LOKCDB).
6. LOKCDB will check if the link state is RUN. It wont be, since the
connection was broken. So LOKCDB will decide to 'blow the link away'.
It does this by calling MSGREL (in CTHSRV). MSGREL queues up a
'carrier off' PSI. It wont of course take effect yet, since the job
is still NOINT.
7. After calling MSGREL, LOKCD0 goes on to call ULKTTY, and this is
where the roof falls in. ULKTTY will do an OKINT, this will let
the carrier off interrupt in. That will trap to FLOGO1. There
a jsys entry is simulated by calling MCENTR. SInce we are still
CRSKED (from the fork lock) we bughlt MCLNSK!
Solution: Remove call to ULKTTY at LOKCD1. There should be no reason
to ever have to unlock the TDB on error, since the caller who locked
the TDB in the first place will unlock it as well. This also fixes the
case where if we fail for some reason to get the CDB, an ULKBAD BUGCHK
occurs, since we tried to unlock the TTLOK twice.
[End of TCO 6.1.1089]
TCO-number: 6.1.1090
Written-by: HAUDEL Creation-date: 17-Dec-84 08:40:19
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCSJSY SCHED MONSYM
Problem: Possible "race condition" errors in SCSJSY.MAC
Diagnosis: Code not written to handle such conditions.
Solution: Change code in SCSJSY, change the way SCSJSY code is
called from SCHED, and change/delete some error codes in MONSYM.
[End of TCO 6.1.1090]
TCO-number: 6.1.1091
Written-by: GRANT Creation-date: 17-Dec-84 10:19:41
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: phymsc STG globs
Problem: Too many CI-related BUGxxx.
Diagnosis: Many CI-related BUGxxx were created for debugging purposes but aren't
necessary during normal operation.
Solution: Create the cell CIBUGX; the default for its contents is zero. If
CIBUGX is non-zero you will get more CI-related BUGxxx.
[End of TCO 6.1.1091]
TCO-number: 6.1.1092
Written-by: TBOYLE Creation-date: 17-Dec-84 15:41:52
Edited-by: TBOYLE Edit-date: 18-Dec-84 12:13:14
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC EXECSU TTYSRV
Problem:
Jobs lying around in LOGOUT or Not logged in EXEC jobs. This
happens on LAT, CTM, NRT and also FE lines.
Diagnosis:
Along the process of LOGOUT somebody usually does an unconditional
block like a DOBE in the EXEC's case after printing "Autologout" and a
call to TTDOBE in the hangup code.
We should not do this because the final rundown in LGOUT% JSYS
trys SOBE's for 15 seconds and the gives up so that LOGOUT can proceed.
Solution:
Remove DOBE in AUTOL6 and remove CALL TTDOBE in the TTHNGU code.
Move the CFOBF to be after the 15 seconds rundown so that FE lines
don't have the previous guys logout message hanging around in them.
[End of TCO 6.1.1092]
TCO-number: 6.1.1095
Written-by: HAUDEL Creation-date: 19-Dec-84 12:58:27
Edited-by: HAUDEL Edit-date: 19-Dec-84 13:04:16
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYSIO
Related-SPR: 20000
Problem:
I/O could fail to restart if the tape "rewind timer" mechanism
set the tape status to indicate BOT and there are IORBs queued
to the device via the UDBTWQ of the UDB.
Diagnosis:
The "rewind timer" mechanism did not include a way for this
to happen.
Solution:
Have the "rewind timer" code set US.OIR when it sets US.BOT.
[End of TCO 6.1.1095]
TCO-number: 6.1.1096
Written-by: LOMARTIRE Creation-date: 21-Dec-84 08:54:26
Edited-by: LOMARTIRE Edit-date: 21-Dec-84 08:55:43
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKLP JSYSA MONSYM GLOBS CFSSRV
Problem: There is no way to obtain the names of the HSC nodes to which we have
open connections.
Diagnosis: No function to do this.
Solution: Add a function to the CNFIG% JSYS. This function, .CFHSC, will
return the node names of any HSCs to which we have an open VC. The argument
block returned is identical in format to the one returned by the .CFCND
function (return all CFS nodes).
[End of TCO 6.1.1096]
TCO-number: 6.1.1097
Written-by: GRANT Creation-date: 28-Dec-84 09:32:43
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYMVR SCAMPI GLOBS STG
Problem: It is difficult to debug in a multi-CPU configutation because various
CI-related timers go off and cause connections to close.
Diagnosis: There is no nice way of turning off these timers.
Solution: Create the cell CITIMR and make it non-0 if you are debugging and
want to stop on breakpoints without having the other node(s)
time you out.
[End of TCO 6.1.1097]
TCO-number: 6.1.1099
Written-by: GRANT Creation-date: 30-Dec-84 06:36:21
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYPAR
Problem: It is difficult to find the device unit number, given a CDB, KDB,
and UDB.
Diagnosis: The data structures are not always interpretted the same and the
definitions in PHYPAR don't provide much help.
Solution: Enhance definitions of UDBSLV and CDBUDB.
[End of TCO 6.1.1099]
TCO-number: 6.1.1100
Written-by: PAETZOLD Creation-date: 31-Dec-84 13:04:31
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: ipcidv
Problem:
Low order octet of the local internet address defined for the internet
CI interface must agree with the CI node number. it is however easy to
do this wrong.
Diagnosis:
Solution:
in CIPRST generate a CIPBAD BUGINF if the address is wrong and do not
initialize the multinet interface.
[End of TCO 6.1.1100]
TCO-number: 6.1.1101
Written-by: PAETZOLD Creation-date: 1-Jan-85 13:53:09
Edited-by: PAETZOLD Edit-date: 2-Jan-85 11:20:39
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: GTJFN
Problem:
Most JFN informational JSYSi (eg. DVCHR, JFNS) do not work with DSK*: JFNs.
Diagnosis:
GTJFN is overzealous is trimming free space blocks and is overtrimming
the device name block when DSK*: is used.
Solution:
Make STRDEV fix up FILOPT(JFN) to make sure enough space is reserved for
a full device name.
[End of TCO 6.1.1101]
TCO-number: 6.1.1108
Written-by: GRANT Creation-date: 4-Jan-85 08:08:58
Edited-by: GRANT Edit-date: 4-Jan-85 08:18:25
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: phyklp STG globs
Related-QAR: 706372
Problem: If you have a broken CI-20 it may be impossible to boot your
system because you may always get a KLPNRL BUGHLT.
Diagnosis: If TOPS-20 decides to reload the port during startup (after its
initial attempt), a KLPNRL is pretty much guaranteed. There
needs to be a clean way to boot the system and have it ignore
the port so you can run SPEAR and find out what the problem is.
Solution: Create the cell NOKLIP. If it contains a non-0 value at system
startup, the port will be reset and then ignored by TOPS-20.
[End of TCO 6.1.1108]
TCO-number: 6.1.1109
Written-by: GROSSMAN Creation-date: 4-Jan-85 08:44:28
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NISRV PHYKNI
Problem: 1) Disabling multicasts would hose the monitor over.
2) Promiscuous & Unknown modes of operation did not work.
3) NISRV too slow.
4) Unused code.
5) KNIRTO BUGCHKs
6) KLNI variables not be handled properly after a restart.
Solution: 1) Throw out the disable multicast code and start over.
3) Implement a memory cache for all memory required by transmits or receives.
When blocks on the cache aren't used for a minute or more, return them to
the resident free space pool.
4) Remove unused code.
5) KNIRTO timer was too small. Increase it from 15. to 30. seconds.
6) Implement a validity flag for each KLNI variable we maintain. When the
KLNI gets restarted, ensure that all valid variables get set in the KLNI.
In addition, add a cell called TOTINT which contains the cumulative run time
NISRV spends at interrupt level. This time is in the same format as that
returned by RDTIME.
[End of TCO 6.1.1109]
TCO-number: 6.1.1110
Written-by: LEACHE Creation-date: 4-Jan-85 13:51:52
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: STG
Problem:
(1) Symbols created in KDDT user mode get lost.
(2) Symbol-table growth in user mode causes the table
to cross a section boundary, causing section 36 to be
mapped.
Diagnosis:
(1) Symbol table pointers not being correctly managed by the
pre and post KDDT code in STG.
(2) Symbol table being BLT'ed into the low end of section 37
instead of the high end.
Solution:
(1) Manage the symbol table pointers correctly.
(2) Move the symbol table to the high end of section 37 while in
KDDT user mode.
[End of TCO 6.1.1110]
TCO-number: 6.1.1111
Written-by: MCCOLLUM Creation-date: 4-Jan-85 15:00:39
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC
Problem:
Job 0 JFNs can be released by a fork other than the one that originally got it.
Diagnosis:
Solution:
Set bit GJ%ACC in the GTJFN call at USGOPN
[End of TCO 6.1.1111]
TCO-number: 6.1.1112
Written-by: MCCOLLUM Creation-date: 4-Jan-85 15:39:13
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC
Problem:
GETAB% returns the local job index rather than the global job number
when an entry from the DEVUNT table is requested. Any program which
attempts to use this job number will get unexpected results.
Diagnosis:
GETAB% returns the value directly from the LH of DEVUNT and does not
convert it to a global job number first.
Solution:
If the LH of the DEVUNT entry is .GE. zero, call LCL2GL and return
the global job number to the user.
[End of TCO 6.1.1112]
TCO-number: 6.1.1113
Written-by: GROSSMAN Creation-date: 5-Jan-85 15:53:46
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NISRV STG PHYKNI
Problem: NISRV wastes lots of paper by printing too many BUGxxxs.
Diagnosis: KNIFQE (Free Queue Empty), and KNIDMD/KNIDM1 are nice to have for
debugging purposes. However, they are not good for production situations.
Solution: Create a cell called NIBUGX that will enable the extraneous BUGxxxs
if it contains non-zero. It will default to 0.
In addition, move LOADMODULE of NITEST into STG, and make it's loading dependant
upon the DEBUG conditional.
[End of TCO 6.1.1113]
TCO-number: 6.1.1114
Written-by: GRANT Creation-date: 6-Jan-85 20:03:51
Edited-by: GRANT Edit-date: 7-Jan-85 08:16:55
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG params
Related-QAR: 706369
Problem: SAVTRE facility noreadily accessible to non-source site.
Diagnosis: Must poke with DDT to turn it on.
Solution: Create symbol SAVTRF and define it as an NDG in PARAMS. This way
you can override it with your PARAM0.
[End of TCO 6.1.1114]
TCO-number: 6.1.1115
Written-by: HAUDEL Creation-date: 7-Jan-85 08:29:18
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC
Related-QAR: 706387
Problem: MONNEJ's when job logs out.
Diagnosis: No ERJMP after some JSYSes in the Monitor.
Solution: Add the ERJMPS.
[End of TCO 6.1.1115]
TCO-number: 6.1.1116
Written-by: GRANT Creation-date: 7-Jan-85 08:31:03
Edited-by: GRANT Edit-date: 7-Jan-85 09:03:06
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: mstr PHYSIO
Related-QAR: 706408
Problem: OPR>SHOW STATUS DISK incorrectly claims a disk is dual-ported
to another KL when the other port is actually the front end.
Diagnosis: MSTR% returns incorrect MS%2PT bit value. When the front end
started sending us its disk configuration packet which causes the U1.FED
bit to get set in the UDB, the MSTR% support routines were never updated.
Solution: In PHYSIO's GETSTR routine, add the check for U1.FED and don't
return MS%2PT if it is on. Also, in MSTR, eliminate the check for the disk
being part of PS:; it's not longer needed.
Note: Although not part of this QAR's problem, we need to check for
don't-care disk, too.
[End of TCO 6.1.1116]
TCO-number: 6.1.1118
Written-by: MCCOLLUM Creation-date: 7-Jan-85 16:22:37
Edited-by: MCCOLLUM Edit-date: 11-Jan-85 15:13:09
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSF
Problem:
ARCF% function .ARGST is too slow.
Diagnosis:
The ARCF% code updates the directory on disk even though .ARGST is a read-only
function.
Solution:
For ARCF% function .ARGST, don't update the directory. Updating the directory
slows this function down by 100%.
[End of TCO 6.1.1118]
TCO-number: 6.1.1119
Written-by: MCCOLLUM Creation-date: 8-Jan-85 14:09:39
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: TTYSRV
Related-QAR: 706399
Problem:
Failing TTMSG's can leave a process NOINT.
Diagnosis:
Error returns from routine SALLIN neglect to go OKINT.
Solution:
Add OKINTs in error returns.
[End of TCO 6.1.1119]
TCO-number: 6.1.1120
Written-by: TBOYLE Creation-date: 8-Jan-85 14:54:53
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYMSC
Related-QAR: 706362
Problem: CHANS displays cylinder and sector as huge negative numbers
on MSCP disks.
Diagnosis: PHYMSC forgets to turn off the physical bit before
computing UDBPS1 and UDBPS2.
Solution: Turn off the IRBPAD bit in the area of MSCS6A.
[End of TCO 6.1.1120]
TCO-number: 6.1.1121
Written-by: LOMARTIRE Creation-date: 9-Jan-85 10:49:16
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GTJFN
Related-SPR: 20453
Related-QAR: 706323
Problem: ILMNRF while processing a parse-only JFN in GTJFN.
Diagnosis: RECDIR is attemping to parse the directory name and finds that it
cannot find the directory. DIRLUK will return no match but will not return an
updated byte pointer.
For parse-only JFNs, RECDIR assumes a no match return from DIRLUK is ambiguous
and "updates" FILOPT with the new byte pointer. This destroys FILOPT.
This eventually leads to an ILMNRF.
Solution: Still treat the match as ambiguous but do not update FILOPT for "no
match" returns from DIRLUK in RECDIR.
[End of TCO 6.1.1121]
TCO-number: 6.1.1122
Written-by: GROSSMAN Creation-date: 9-Jan-85 10:58:06
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: MONITOR
Routines-affected: NISRV PHYKNI
Problem: Various and sundry fixes:
1) Move NISRV to section 6. This frees up 6 section 0/1 pages from the
bootable monitor.
2) Fix hung KLNI detector (KNISTP generator) so that resource errors don't
result in spurious KNISTPs.
3) Fix SBD code in KNIJB0 (runs KNILDR).
4) Re-write GETCOR so that memory is no longer 4 word aligned. Alignment is
not necessary on KL10s.
[End of TCO 6.1.1122]
TCO-number: 6.1.1125
Written-by: GRANT Creation-date: 9-Jan-85 14:41:06
Edited-by: GRANT Edit-date: 9-Jan-85 14:46:31
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYP4 phymsc
Related-QAR: 706405
Problem: TOPS-20 has wrong drive serial number.
Diagnosis: While TOPS-20 remained running, a disk had its HDA replaced.
Once we have a UDB, we never reread the serial number.
Solution: Whenever get an online interrupt (Massbus) or online an
MSCP disk, get the DSN again and put it in the UDB.
[End of TCO 6.1.1125]
TCO-number: 6.1.1126
Written-by: MOSER Creation-date: 10-Jan-85 16:53:41
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYMVR
Problem: Too many Bugchks.
Diagnosis: Some are useless.
Solution: Remove MSSGON and only do MSSSHT when it is interesting.
[End of TCO 6.1.1126]
TCO-number: 6.1.1127
Written-by: LOMARTIRE Creation-date: 10-Jan-85 16:56:44
Edited-by: LOMARTIRE Edit-date: 20-Mar-85 10:48:07
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: APRSRV
Related-TCO: 6.1.1279
Related-SPR: 20429
Problem: ILLUUO from a bad argument passed to GTJFN.
Diagnosis: Code in GTJFN processes the user's byte pointer by placing a XCTBU
of an ILDB on to the stack and then doing an XCT -2(P). If the byte pointer is
bogus, then this will result in the KIMXLP trace of the XCT to go to -2(P) to
try to find the next instruction. However, since the stack is now changed,
this will produce random results.
Solution: Once KIMUO4 has determined that an XCT caused the UUO, continue the
search from the trapping instruction which is passed in in T2. This will avoid
extra iterations and the confusion that XCT -2(P) causes.
[End of TCO 6.1.1127]
TCO-number: 6.1.1128
Written-by: MOSER Creation-date: 10-Jan-85 17:31:07
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYMVR
Problem: MSSNUT BUGHLT.
Diagnosis: When a drive is write locked the MSCP driver may ask the server to
online the disk with the UF.WPR bit set. The server will reject this but it
uses the wrong length message to do so. The sanity check catches this and
crashes.
Solution: Use the right lenght after SETUID fails.
[End of TCO 6.1.1128]
TCO-number: 6.1.1129
Written-by: GRANT Creation-date: 11-Jan-85 08:51:07
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: diag
Problem: DFPTA diagnositc (CI/NI port selector) failures due to DIAG%
returning "Diagnostic owns the channel" error.
Diagnosis: The diagnostic is trying to assign RH20 channel 0 and the monitor
is checking to see that there are no disks dual-ported to another KL. But,
it fails to test for offline, which would allow the DIAG% to succeed.
Solution: In DIAG.MAC's DGUDPT routine, check for disk offline (US.OFS)
before proceeding to the dual-ported type checks.
[End of TCO 6.1.1129]
TCO-number: 6.1.1132
Written-by: LOMARTIRE Creation-date: 14-Jan-85 11:05:04
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSF
Related-SPR: 20490
Problem: Edit 2612 to RCUSR% causes LCKDIR BUGHLTs.
Diagnosis: The edit was done incorrectly and introduced a path which caused the
LCKDIRs under certain RCUSR% combinations.
Solution: Rewrite the patch correctly. If a user specifies RC%STP but the
user name does not contain wild cards, fail and return RC%NMD in the
flags.
[End of TCO 6.1.1132]
TCO-number: 6.1.1133
Written-by: LEACHE Creation-date: 15-Jan-85 08:30:58
Edited-by: LEACHE Edit-date: 15-Jan-85 09:27:47
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: JSYSA DISC GLOBS
Related-SPR: 13064 15605 17743 17832 18333
Problem:
Disc allocation (as returned by INFO DISK) is often wrong for SYSTEM
directory.
Diagnosis:
When an OFN is acquired for the accounting file, the value supplied
in the call to ASGOFN is supposed to be the remaining allocation for the
directory, but the value 377777,,0 is erroneously used. This causes the
EXEC to display incorrect (sometimes negative) values in the INFO DISK
command.
Solution:
Get the current remaining allocation for directory SYSTEM and use
that value in the call to ASGOFN.
[End of TCO 6.1.1133]
TCO-number: 6.1.1134
Written-by: LEACHE Creation-date: 15-Jan-85 09:10:25
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: DISC
Related-SPR: 15605 17743 17832 18333
Problem: Disk allocation (as displayed by INFO DISK) is always wrong for
ROOT-DIRECTORY.
Diagnosis: Assigning an OFN for ROOT-DIRECTORY presents a chicken-and-egg
problem: the directory allocation is required to get the OFN, but an OFN
is required to get the directory allocation. The monitor attempts to solve
this by specifying the value 377777,,0 in the call to ASGOFN. This will
cause the INFO DISK command to report erroneous values.
Solution: In routine ASROFN, acquire the first OFN for ROOT-DIRECTORY
using a recognizably unique value for the directory allocation. On the
next OFN assignment for ROOT-DIRECTORY, fetch the true allocation and
call ADJALC to adjust the allocation remaining for the directory.
[End of TCO 6.1.1134]
TCO-number: 6.1.1136
Written-by: GROSSMAN Creation-date: 15-Jan-85 13:55:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GLOBS LLMOP NISRV PHYKNI SYSFLG
Problem: Too many references to KNIN throughout the monitor. Customers
should be able to control KLNIness by changing the definition of KNIN in
PARAMS. Therefore, the only references to KNIN should be in STG.
Solution: Remove many unneeded references to KNIN.
[End of TCO 6.1.1136]
TCO-number: 6.1.1137
Written-by: GROSSMAN Creation-date: 15-Jan-85 14:23:11
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI
Problem: NISRV crashes the monitor when it can't find KNILDR.
Diagnosis: The routine KNIJB0 (which is in section 0/1) was trying to call a
routine in section 6. Unfortunately, it it ended up calling HSYS, in section
0/1 and crashed the monitor.
Solution: Move most of KNIJB0 to section 6. Leave the entry point in section
0/1.
[End of TCO 6.1.1137]
TCO-number: 6.1.1138
Written-by: GROSSMAN Creation-date: 16-Jan-85 15:52:43
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NIUSR
Problem: There is no interlock mechanism to prevent multiple jobs from
playing with the KLNI.
Solution: Create a new NI% JSYS function called .EIGET, which will acquire
ownership of the KLNI. Only the "owner" of a KLNI is allowed to alter it's
state or set it's address. If there is no owner, anybody will be allowed to
do these functions.
[End of TCO 6.1.1138]
TCO-number: 6.1.1139
Written-by: MOSER Creation-date: 16-Jan-85 16:36:03
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CFSSRV PHYMVR PHYSIO
Problem: SKED too high. Jobs get dismissed and rescheduled too much. Scheduler
thrashes when system is IO bound.
Diagnosis: Routines flag running the scheduler by setting PSKED which forces
DISMSJ. PSKD1 is more approprate - it means "there may be a scheduling event but
don't dump the current fork".
Solution: Set PSKD1 instead of PSKED where appropriate.
[End of TCO 6.1.1139]
TCO-number: 6.1.1140
Written-by: GROSSMAN Creation-date: 16-Jan-85 16:50:27
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NIUSR STG
Problem: GLFNFs and various other scheduler weirdness...
Diagnosis: NIUSR was calling the scheduler at interrupt level to request a
PSI (PSIRQ). It works most of the time, but the scheduler isn't interlocked
with respect to interrupt level, so races can occur.
Solution: Make a routine that gets called in LV8CHK whenever interrupt level
needs a PSI to be generated.
[End of TCO 6.1.1140]
TCO-number: 6.1.1141
Written-by: GROSSMAN Creation-date: 16-Jan-85 17:10:26
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NIUSR
Problem: NI% JSYS too slow.
Diagnosis: It calls ASGRES and RELRES for every packet it transmits or receives.
Solution: Make a cache for transmit and receive memory. Search the cache first,
and call ASGRES only if no blocks are found.
[End of TCO 6.1.1141]
TCO-number: 6.1.1142
Written-by: MOSER Creation-date: 17-Jan-85 13:38:20
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCHED
Problem: SUMNR1 BUGCHKs when running the "new" scheduler.
Diagnosis: There is now a path through the code that can change the working
set size of a fork in the balance set. Previously this was impossible. The
path is Balance set wait satisfied calls NEWST which gives a big boost and
calls NEWST which calls NEWWSS which calls ADJWSS.
Solution: Make ADJWSS fix SUMBNR if FKIB% is on in FKSWP.
[End of TCO 6.1.1142]
TCO-number: 6.1.1143
Written-by: LOMARTIRE Creation-date: 17-Jan-85 15:52:54
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GTJFN
Related-SPR: 20532
Problem: If GTJFN% is performed on a filespec whose device field is a
system-wide logical name, and the first entry in that system-wide logical name
is a job-wide logical name, the files pointed to by the job-wide logical name
are not found. However, a job-wide logical name anywhere except first in the
system-wide logical name works correctly.
Diagnosis: If during the logical name evaluation (done in CHKLNM), a
system-wide logical is found, the SAWSLN bit is set. Then, until we step to
the next entry in the logical name definition, we will only search the
system-wide logical name table. So, for a system-wide logical of:
A: => ONE:, TWO:
with job-wide logicals of:
ONE: => PS:LOGIN.CMD
TWO: => PS:LOGOUT.CMD
a DIR of A: will not find PS:LOGIN.CMD (since ONE: is the first entry in a
system-wide logical but it is not a system-wide logical itself). However,
the file pointed to by TWO: will be found since TWO: is the second entry in
the system-wide logical name.
Solution: In CHKLNM, ignore the setting of SAWSLN when deciding whether to
search job-wide then system-wide logicals or just system-wide logicals. This
will make system-wide logicals perform correctly regardless of the placement of
the individual entries. Namely, job-wide will be searched first, then
system-wide.
[End of TCO 6.1.1143]
TCO-number: 6.1.1145
Written-by: LOMARTIRE Creation-date: 17-Jan-85 16:04:06
Edited-by: LOMARTIRE Edit-date: 17-Jan-85 16:07:30
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FILMSC
Related-SPR: 20459
Problem: A fork is assigned a controlling terminal and goes into TCITST on
that terminal line. Now, the fork is frozen, is assigned a new controlling
terminal, and is resumed. It will still be in TCITST on the old terminal line.
Diagnosis: TCOs 6.1526 and 6.2031 handle the case where a job controlling
terminal is changed in the above manner. However, there is no code to handle
the case of a fork controlling terminal.
Solution: Add code in TTYIN2 (after the job controlling terminal check) to
check if the JFN is for the fork's controlling terminal. If it is, place the
line number in the left half of DEV.
[End of TCO 6.1.1145]
TCO-number: 6.1.1146
Written-by: PAETZOLD Creation-date: 19-Jan-85 16:18:39
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: niusr
Problem:
SKDPF1s from NIUSR.
Diagnosis:
EBD.
Solution:
.NIOFF needs to be resident as well NIJJIF.
[End of TCO 6.1.1146]
TCO-number: 6.1.1147
Written-by: MAYO Creation-date: 21-Jan-85 10:00:52
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSF
Related-SPR: 19735
Problem: ARCF% discard function doesn't clear AR%WRN. REAPER checks
AR%WRN to see if it should delete archived files. So, if a user
retrieves a file, keeps it a while, gets a warning from REAPER that
the file will soon expire and be deleted, and decides to keep it by
discarding archive information, he will be very suprised in a few weeks
when REAPER expires and deletes the file anyway.
Diagnosis: ARCF% discard doesn't clear enough FDB bits.
Solution: Have it clear AR%WRN.
[End of TCO 6.1.1147]
TCO-number: 6.1.1149
Written-by: LEACHE Creation-date: 22-Jan-85 14:41:54
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CRYPT
Related-QAR: 706373
Problem: CHKPEV always returns failure for customer encryption algorithms.
Diagnosis: Wrong flavor of test instruction.
Solution: Simplify the routine and the bug goes away.
[End of TCO 6.1.1149]
TCO-number: 6.1.1150
Written-by: MELOHN Creation-date: 22-Jan-85 21:08:27
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CTHSRV
Problem: The CTERM fork runs every 100ms whether or not it has anything to
do. SYSDPY seems to indicate that it is using .3 seconds of runtime every
minute even when are are no users of CTERM.
Diagnosis: The CTERM would run MUCH less often if it had a scheduler test
that woke up when there was something for the CTERM fork to do.
Solution: Add scheduler test CTMTST, and make the CTERM fork MDISMS with it
until it has something to do.
[End of TCO 6.1.1150]
TCO-number: 6.1.1152
Written-by: PAETZOLD Creation-date: 22-Jan-85 22:03:58
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: phykni
Problem:
KPALHVs from NISRV looping in NIDPT2 when a portal is closed and the
monitor has arpanet code and the ethernet cable has a UNIX based system
on it and the TOPS-20 system in question has had very little ethernet
IP traffic in the last several minutes before the portal is closed.
Diagnosis:
A UNIX system sends various ethernet multicast messages with the IP
protocol type. TOPS-20 does not enable this multicast address and a
KNIDMD/KNIDM1 combination will result. After this event the buffer and
its CM block will once again be placed on the free queue without a
callback to the client.
At this point we decide to close the portal. NIDPT goes into a loop at
NIDPT2 to NIDPT1 attempting to release all buffers back to the client.
These buffers have never had IO performed on them (since they were
queued) and may still have the multicast address set in the CM block.
At this point MSGAVA will once again generate a KNIDMD/KNIDM1
combination and requeue the block back onto the free queue. We now have
a loop trying to remove this block from the free queue.
After attempting to do this many tens of thousands of times we finally
crash with a KPALHV.
Solution:
Make sure the destination ethernet address of a buffer coming out of
the NIDPT2 loop has a clear address without any multicast bits set.
The reason this problem was never reproducable on the CHIP/ETHER
systems is that there is no UNIX system on the private ethernet but
there is on the public ethernet.
[End of TCO 6.1.1152]
TCO-number: 6.1.1153
Written-by: GRANT Creation-date: 23-Jan-85 16:11:36
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYSIO
Problem: Unusual events can occur with don't-care disks and no one is notified.
Diagnosis: When a don't-care mismatch occurs (drive and disk are not the same
status) we treat the disk under standard access rules but say nothing about
what was discovered.
Solution: Add PHYDCU and PHYDCD BUGCHKs and PHYDCR BUGINF.
PHYDCU is for standard disk on don't-care drive; PHYDCD is don't-care disk on
standard drive; PHYDCR means we are treating the disk as don't-care.
[End of TCO 6.1.1153]
TCO-number: 6.1.1154
Written-by: HAUDEL Creation-date: 23-Jan-85 16:15:52
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSF
Related-QAR: 706126
Problem: MONNEJ bugchk.
Diagnosis: No ERJMP after a RPACS jsys in JSYSF.
Solution: Add and ERJMPR.
[End of TCO 6.1.1154]
TCO-number: 6.1.1155
Written-by: WAGNER Creation-date: 24-Jan-85 14:17:07
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CFSSRV
Problem: CFS Hashing algorithm uses non-prime numbers.
Diagnosis: Theoretically primes are better
Solution: Make HSHLEN prime (either ^D509 or ^D251, based on CFSSCA)
[End of TCO 6.1.1155]
TCO-number: 6.1.1156
Written-by: HAUDEL Creation-date: 24-Jan-85 16:00:17
Edited-by: HAUDEL Edit-date: 19-Feb-85 16:51:33
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCHED
Related-QAR: 706331
Problem: Cannot use the .SKSCJ (set job class) function of the SKED% for the
calling job with privs disabled.
Diagnosis: The code sets up the job number in T1 and then uses T2 in testing
job number.
Solution: Change CAMN T2,JOBNO to CAMN T1,JOBNO at SKDSJC+16 in SCHED.MAC.
[End of TCO 6.1.1156]
TCO-number: 6.1.1157
Written-by: GROSSMAN Creation-date: 28-Jan-85 10:15:50
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG NIUSR
Problem: Entry into NIJJIF too slow.
Diagnosis: XCT is slower than SKIPE.
Solution: Replace XCT of CALL NIJJIF with SKIPE flag followed by CALL NIJJIF.
[End of TCO 6.1.1157]
TCO-number: 6.1.1158
Written-by: PAETZOLD Creation-date: 28-Jan-85 15:22:33
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: IMPANX IPNIDV IPCIDV
Problem:
Interrupt context code (eg. device drivers) for internet often request service
by the internet fork by incrementing INTFLG. If the system is in the null
job the scheduler will not notice this for a while. This causes extra
latency on unloaded systems.
Diagnosis:
Solution:
AOS PSKD1 as well as INTFLG.
[End of TCO 6.1.1158]
TCO-number: 6.1.1159
Written-by: PAETZOLD Creation-date: 28-Jan-85 15:52:00
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: TTYSRV TVTSRV
Problem:
TVT output performance could be faster and could make better use of networking
resources.
Diagnosis:
Currently monitor assumes TVTs are slow speed.
Solution:
Make monitor believe they are fast.
[End of TCO 6.1.1159]
TCO-number: 6.1.1160
Written-by: PAETZOLD Creation-date: 28-Jan-85 16:37:25
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: ipfree
Problem:
None observed but code is wrong. DEFSTR for USIZE in IPFREE has a 36 bit
field ending on bit 17.
Diagnosis:
Solution:
Make it a 18 bit field.
[End of TCO 6.1.1160]
TCO-number: 6.1.1161
Written-by: PAETZOLD Creation-date: 28-Jan-85 17:15:28
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: tcpjfn
Problem:
It is possible for the monitor to leave a TCP: JFN in a locked state if
you attempt to use TCOPR% on a TCP: JFN that is no longer associated
with a TCB. The problem is caused by using the RETERR macro without
first making sure that the JFN is indeed unlocked. The problem occurs
in two places in TCPJFN.MAC.
Diagnosis:
EBD.
Solution:
Fix it.
[End of TCO 6.1.1161]
TCO-number: 6.1.1162
Written-by: GROSSMAN Creation-date: 29-Jan-85 14:27:57
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI NIPAR MONSYM
Problem: The "SET PORT NI AVAILABLE" command to OPR doesn't restart the KLNI
as advertised.
Diagnosis: OPR is trying to put the KLNI into the RUN state. Unfortunately,
the KLNI may contain bad ucode. The monitor doesn't know this, and just starts
the KLNI anyway. Usually, a KNIPER results, sometimes death is the result.
Solution: Create a new state called the "Reload Requested" state (.EISRR).
Setting the KLNI into this state causes KNILDR to run and reload the KLNI.
This state may only be set if the current KLNI state is not RUN.
[End of TCO 6.1.1162]
TCO-number: 6.1.1163
Written-by: MCCOLLUM Creation-date: 29-Jan-85 15:35:25
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FORK
Problem:
PTAIC bughlts
Diagnosis:
The code around KSEF1 SETOMs the entry in the SYSFK table before routine
CLNZSC is called to delete the user's non-zero sections. CLNZSC does an
SMAP% JSYS to delete any non-zero sections using .FHSLF as a process handle.
Routine FKHPTX attempts to translate .FHSLF by looking in the SYSFK table
and gets the -1 put there by KSEF1. A string of incorrect references based
on this -1 eventually causes SETCPT to try to map in a page table for the
section using a bogus SPT index. A reference to this page table by SECPTR
causes the crash.
Solution:
Do not clear the entry in SYSFK until after the non-zero sections are
successfully deleted by CLNZSC.
[End of TCO 6.1.1163]
TCO-number: 6.1.1164
Written-by: GRANT Creation-date: 31-Jan-85 07:48:54
Edited-by: GRANT Edit-date: 31-Jan-85 07:50:45
Edit-checked: No Document: Yes TCO-tested: Yes
Maintenance-release: Yes Hardware-related: No
Program: MONITOR
Routines-affected: phyp2 PHYSIO
Related-QAR: 706360
Problem: RP20s don't have drive serial numbers and the current method of faking
them doesn't work very well.
Diagnosis: Using CHECKD to put a number in the disk's homeblock is a nuisance
for the system manager and also causes some special case RP20 code in the
monitor which seems needless and appears to be bug-prone.
Solution: When an RP20's UDB is created, the monitor will make up a drive
serial number by adding the unit number to 8000 (decimal) and placing it in
the UDB. Then whenever a serial number is required, the RP20 is guaranteed
to have one, just like any other disk.
The scheme implies a restriction, namely, all RP20s connected to systems in
a cluster must have unique unit numbers.
[End of TCO 6.1.1164]
TCO-number: 6.1.1165
Written-by: GRANT Creation-date: 31-Jan-85 13:08:46
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: phyp2
Problem: Homeblocks don't get checked when an RP20 comes online.
Diagnosis: No code.
Solution: Before calling PHYONL, set US.CHB.
[End of TCO 6.1.1165]
TCO-number: 6.1.1166
Written-by: MOSER Creation-date: 31-Jan-85 15:25:53
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG GLOBS PROLOG CFSSRV LINEPR DISC
PAGUTL
Problem: The monitor is too slow. Especially when accessing long files
randomly.
Diagnosis: Management of OFN resources is not done optimally. Since
many Jsyses use OFNs any improvement in this area can potentially
speed up the system significantly. Long files are especially bad since they
contain many OFNs.
Solution: Change OFN assignment to use a hash and link algorithm. Change
OFNJFN to do a single compare instead of a costly search for long files.
Change all ASxOFN callers to conform to the new sequence. Most of the
changes are outlined in the OFN-MANAGMENT-PERFORMANCE.MEM spec.
[End of TCO 6.1.1166]
TCO-number: 6.1.1167
Written-by: TBOYLE Creation-date: 31-Jan-85 15:50:45
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYP2
Problem: RP20's never get bad blocks into the BAT BLOCKS.
Diagnosis: PHYP2 never properly handled the CLASS 5 ERROR.
CLASS 5 is for DX20 requested command retry. This happens
on all data errors. In IBM land, when a data error occurs,
the drive requests the channel to retry the transfer several
times before reporting an error. i.e. the channels perform
error recovery.
We have always ignored the CLASS 5 ERROR. The microcode as of
edit 17 also gives us this error under some specific conditions
which we will need to check for.
Solution: Change the CLASS 5 error handler to look for appropriate
data errors and flag them properly for BAT BLOCK processing. Errors
that are not due to data are carefully checked for and left as
device errors.
[End of TCO 6.1.1167]
TCO-number: 6.1.1169
Written-by: GROSSMAN Creation-date: 2-Feb-85 11:11:25
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GLOBS STG PHYH2 PHYKNI DIAG
Problem: Diagnostics DFPTA, and DFNIE do not work with the KLNI.
Diagnosis: They were trying to do DIAG% functions to read and write the channel
logout areas. These functions would fail because there was no CDB for the
channel the KLNI uses (channel 5).
Solution: Create a CDB and dispatch table for the KLNI. Fill it in with the
bare minimum of information needed to support DIAG and keep PHYSIO off my
back. In addition, move the initialization of the KLNI from PHYH2 to PHYSIO
(by modifying PHYCHT in STG). Also, put some checks into various DIAG functions
so that programmers cannot attempt to do disk type stuff to the KLNI.
[End of TCO 6.1.1169]
TCO-number: 6.1.1170
Written-by: GROSSMAN Creation-date: 2-Feb-85 11:37:59
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: MONITOR
Routines-affected: PHYKNI
Problem: The KNISTP BUGCHK does not print the micro PC if the KLNI is still
running.
Solution: Stop the KLNI just before doing the BUGCHK. This will help detect
microcode loops.
[End of TCO 6.1.1170]
TCO-number: 6.1.1171
Written-by: PAETZOLD Creation-date: 3-Feb-85 15:13:15
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: PHYH2
Problem:
ILLGO BUGHLTs on systems with 4096K.
Diagnosis:
Channels talk to physical memory. At the end of a transfer the logout
area contains the address+1 of the last word written (or read). This is
a modulo 22 bit address. ie. when writing to page 17777 of memory the
logout area will have a zero.
The code at CKERR2 fetches the logout address and decrements it. However
this results in a minus one and not 17777. The CAME then fails and we get
an ILLGO.
Solution:
Insert an ANDX to retain only the desired bits (and simulate modulo 22 bit
addressing).
[End of TCO 6.1.1171]
TCO-number: 6.1.1172
Written-by: PAETZOLD Creation-date: 3-Feb-85 16:33:01
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: DSKALC
Problem:
assembly errors from dskalc.
Diagnosis:
hosers.
Solution:
Change OFNPTT symbol to OFPTT to avoid conflict with new stuff in PROLOG.
[End of TCO 6.1.1172]
TCO-number: 6.1.1173
Written-by: GLINDELL Creation-date: 5-Feb-85 09:55:22
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: sclink ntman router
Problem: LOOP NODE does not work
Diagnosis: Never tested
Solution: Fix it
[End of TCO 6.1.1173]
TCO-number: 6.1.1174
Written-by: GLINDELL Creation-date: 5-Feb-85 10:18:38
Edited-by: GLINDELL Edit-date: 5-Feb-85 10:20:05
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FORK
Related-QAR: 706220
Problem:
It is common practice to set T3 to -1 in the EPCAP% jsys. This will
enable all possible capabilities. When the EPCAP% asks the ACJ for
permission, it passes the unmodified T3 to ACJ. When the ACJ sees
the -1 it is impossible to make a decision since the actual bits to
be enabled cannot be distinguished.
Diagnosis:
Solution:
If the user passes -1 in T3, then calculate the bits the user
is actually trying to enable.
[End of TCO 6.1.1174]
TCO-number: 6.1.1175
Written-by: TBOYLE Creation-date: 5-Feb-85 13:35:17
Edited-by: TBOYLE Edit-date: 5-Feb-85 13:43:13
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYP2
Problem: SPEAR does not report all of the extended status bytes for
RP20 device/data errors.
Diagnosis: Although there is room for them in the SYSERR block,
the monitor does not fill them all in.
Solution: Fix PHYP2 to use all 80 status bytes. Chage SNSNUM, and
fix the calculation for number of words based on SNSNUM. We will
use 20 words.
[End of TCO 6.1.1175]
TCO-number: 6.1.1176
Written-by: GLINDELL Creation-date: 5-Feb-85 15:17:56
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA
Related-QAR: 706380
Problem:
Can't delete section zero pages from code in non-zero section.
Diagnosis:
PMAP% code ignores PM%EPN if it's the "delete process page" function.
Solution:
At PMAP0 + a few, if it's the delete option, read the users value of
PM%EPN before calling FKHPTX.
Note: the documentation for PMAP% should be changed. It currently
states that PM%EPN cannot be used with "delete process pages" (case IV)
in the documentation. As of this TCO, this restriction has been
removed.
[End of TCO 6.1.1176]
TCO-number: 6.1.1177
Written-by: GLINDELL Creation-date: 5-Feb-85 17:11:05
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SYSERR
Related-QAR: 706428
Problem:
The bug description is trashed when looking at bug entries with SPEAR.
Diagnosis:
18-bit arithmetic for 30-bit values.
Solution:
Single instruction patch at SEBCP3 to use a 1-word global bytepointer.
[End of TCO 6.1.1177]
TCO-number: 6.1.1180
Written-by: GRANT Creation-date: 7-Feb-85 08:28:52
Edited-by: GRANT Edit-date: 7-Feb-85 08:34:53
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYSIO
Problem: Wrong CI wire information in PDB on disk.
Diagnosis: When a pack gets moved from one drive to another, the PDB is cleared,
the new drive serial number is set along with our node's info. But, a bad
SKIP instruction was preventing the current CI wire status from getting set.
Solution: In routine CLRPDB, change SKIPGE to SKIPL so CALL to PTHSTS occurs.
[End of TCO 6.1.1180]
TCO-number: 6.1.1181
Written-by: GROSSMAN Creation-date: 7-Feb-85 14:31:21
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI
Problem:
Support new KLNI microcode version number format. There are now major and
minor version numbers, and an edit number. NISRV will not start the KLNI
if the major and minor version do not exactly match the expected values.
The expected values are in UVCMAJ for the major version number, and UCVMIN
for the minor version number. If the microcode version does not match the
values in UCVMAJ and UCVMIN, a KNIVER BUGxxx will result. The data items
are:
1) Bad major version
2) Bad minor version
3) Expected major version
4) Expected minor version
[End of TCO 6.1.1181]
TCO-number: 6.1.1183
Written-by: GRANT Creation-date: 8-Feb-85 17:09:17
Edited-by: GRANT Edit-date: 8-Feb-85 17:12:55
Edit-checked: Yes Document: Yes TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYMVR MONSYM
Problem: Booting a system with NOKLIP patched to contain a non-zero value causes
SETSPD to produce the error MSCPX1 (No MSCP server in current monitor) for each
ALLOW command in CONFIG.
Diagnosis: The error code is being returned from the monitor to SETSPD for each
SMON%. The error is misleading since it's the same one you get if you build a
monitor without the MSCP server module PHYMVR.MAC.
Solution: Create a new error code MSCPX4 whose meaning is "MSCP server not
currently running" and have PHYMVR return it instead of MSCPX1 when the server
has not been initialized, which is the state it is in if the system is not
using a CI.
[End of TCO 6.1.1183]
TCO-number: 6.1.1184
Written-by: GRANT Creation-date: 11-Feb-85 08:37:55
Edited-by: GRANT Edit-date: 11-Feb-85 09:36:58
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYSIO phyklp
Problem: TOPS-20 refuses to access a disk when TOPS-10 is run on one of the
KLs in the cluster which once ran TOPS-20.
Diagnosis: TOPS-20 assumed that 1) systems on the CI don't change their node
numbers, 2) there are no VAXes on the CI, and 3) a KL on the CI must be
running TOPS-20.
Solution: Add logic to handle the following cases: 1) node x was once a KL
but is now an HSC or VAX, and 2) TOPS-20 and TOPS-10 can be run
interchangeably on a KL in the cluster.
[End of TCO 6.1.1184]
TCO-number: 6.1.1185
Written-by: GLINDELL Creation-date: 11-Feb-85 11:31:05
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA
Related-QAR: 838001
Problem:
All accounts that have an /EXPIRATION date set in ACCOUNTS-TABLE.BIN
will get "Account has expired" independent of what the expiration date
was set to.
Diagnosis:
Someone changed CHKEXP to CALL LGTAD instead of doing a GTAD% jsys.
That was a good idea, but unfortunately LGTAD does not preserve T2/B.
Solution:
Save T2 over the call to LGTAD.
[End of TCO 6.1.1185]
TCO-number: 6.1.1187
Written-by: GRANT Creation-date: 12-Feb-85 07:07:25
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYSIO
Problem: Disk forced offline when it should be accessible.
Diagnosis: Bring up a system not on a CI and have a MASSBUS disk dual-ported
to another system. TOPS-20 will refuse access to the disk; this is the
desired action. However, single-porting the disk should cause TOPS-20 to
allow access to the disk but it doesn't due to a bug which didn't get the
forced-offline bit cleared.
Solution: At location UPDBYE, ANDCAM the U1.OFS bit into the status word.
[End of TCO 6.1.1187]
TCO-number: 6.1.1188
Written-by: WAGNER Creation-date: 12-Feb-85 09:56:16
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCHED
Problem: DDMP AND CHKR CHECKED TOO OFTEN IN SKDLV8
Diagnosis: NO NEED TO CHECK EVER 20 mS
Solution: MOVE THE CHECKING TO CLK2CL, AND CHECK EVERY 10 S.
[End of TCO 6.1.1188]
TCO-number: 6.1.1189
Written-by: GRANT Creation-date: 12-Feb-85 10:20:51
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYSIO
Problem: Logic of code too difficult to follow.
Diagnosis: Routine names are confusing and routines do more than one
logical function.
Solution: Move some code so that CHKPDB, CLRPDB, and RSTPDB, actually
check, clear, and reset the PDB, respectively.
[End of TCO 6.1.1189]
TCO-number: 6.1.1190
Written-by: MELOHN Creation-date: 12-Feb-85 14:26:19
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: PLUTOs running LAT V1.1 software can occasionally crash at
PC 000002 when a user disconnects from a TOPS-20 host.
Diagnosis: The PLUTO crashed with an invalid stop message. What
happens is this: when the number of slots on a TOPS-20 host is
decreased to 0, the spec says the circuit should be stopped. Both
TOPS-20 AND the server try to send a stop circuit message to close the
circuit. If the server's message arrives after we have cleared the
circuit database but before the server itself receives the TOPS-20
generated stop message, TOPS-20 generates a second, bogus stop message
which crashes the server.
Solution: Don't send the stop message to the server when the number
of slots becomes zero. As part of a more general fix, a host timer
will be implemented such that if the server doesn't stop the circuit
within the keep-alive-timer * 2, TOPS-20 will send the stop circuit
message itself.
[End of TCO 6.1.1190]
TCO-number: 6.1.1191
Written-by: MELOHN Creation-date: 12-Feb-85 14:37:06
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: Posideons (DECserver-100) occasionally get -207- protocol
violation messages, which stop the affected slot. Pluto based LAT
users see a "node stopped circuit" message.
Diagnosis: The slot multiplexor routine can be called to send a "must
reply NOW" message to the server. In the case that there is no slot
data to be sent to the server, the slot formatting routine is
bypassed. This does not correctly zero the number of slots in the
message, and therefore a valid message with an invalid number of slots
is sent to the server.
Solution: Zero the number of slots in the main loop of the slot
multiplexor routine, not during slot formatting.
[End of TCO 6.1.1191]
TCO-number: 6.1.1192
Written-by: MELOHN Creation-date: 12-Feb-85 14:57:01
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: Clearing all LAT service names with LCP crashes the monitor
with SKDPF1.
Diagnosis: The multi-cast building routine assumes that there is at
least one service offered in the multi-cast message. If that service
is deleted, the routine attempts to load a byte from a garbage
location.
A LAT host by definition must offer at least one service, so it should
not be possible for the user to clear all offered services.
Solution: Fix the multi-cast building routine to check to make sure
there is at least one service before rebuilding the multi-cast
message. Change the LATOP% jsys to return LATX07 (Invalid or unknown
LAT service name) if the user attempts to clear the last service name
offered.
[End of TCO 6.1.1192]
TCO-number: 6.1.1193
Written-by: PALMIERI Creation-date: 12-Feb-85 16:18:17
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GLOB STG
Problem: DECnet called unnecessarily from LV8CHK before it is initialized.
LV8CHK calls to DECnet are into section 1 when calls to section 6 would
require less code in the DECnet modules and be somewhat faster.
Diagnosis: No code
Solution: Check D36IFG in LV8CHK before calling DECnet. Make calls to
DECnet be of the form: CALL @[XCDSEC,,ADDRESS]
[End of TCO 6.1.1193]
TCO-number: 6.1.1194
Written-by: LOMARTIRE Creation-date: 13-Feb-85 12:06:26
Edited-by: LOMARTIRE Edit-date: 13-Feb-85 14:18:04
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CFSSRV MSTR
Related-TCO: 6.1.1195
Problem: The current error codes returned on a failing structure mount request
are not very descriptive and do not imply what the cause of the failure was.
Diagnosis: CFS will vote in order to gain the requested access to the
structure. If the vote fails, there is no way to determine why we were told
NO.
Solution: Implement a way for a reason code to be passed back when a CFS node
desides to say NO to an incoming vote. Have the voting node interpret this
reason code and transform it into a meaningful TOPS-20 error code. Currently,
only the structure resource handling routines will do this. Below are the
error codes which will be returned. Note that some are new and others are just
new text on an already existing error code.
MSTX44 - Mount type refused by another CFS processor
MSTX45 - Structure naming or drive serial number conflict in CFS cluster
MSTX47 - Shared access denied; already set exclusive in CFS cluster
MSTX48 - Exclusive access denied; access conflict in CFS cluster
MSTX49 - Structure naming conflict in CFS cluster
[End of TCO 6.1.1194]
TCO-number: 6.1.1196
Written-by: LOMARTIRE Creation-date: 13-Feb-85 12:20:29
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCAPAR
Problem: SC.IDL gets called too often from the scheduler.
Diagnosis: The constant used to determine the interval between calls is being
calculated incorrectly.
Solution: Fix the calculation. Instead of the timing being once every 3
milliseconds, it will be once every 160 (decimal) milliseconds.
This will result in SC.IDL being called roughly half as often.
[End of TCO 6.1.1196]
TCO-number: 6.1.1197
Written-by: WAGNER Creation-date: 13-Feb-85 13:45:53
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCHED
Problem: BGND includes time that is actually idle.
Diagnosis: New code tries some background tasks before running the NUL job.
Because of the way the code was written, the time that it spends
doing this is charged to BGND.
Solution: Modify RDSIVL to not accumulate time if flag BKIDLF is set. This
way time is charged to whichever idle is appropriate for the time
that we spend doing background tasks before running the NUL job.
[End of TCO 6.1.1197]
TCO-number: 6.1.1198
Written-by: GRANT Creation-date: 13-Feb-85 14:32:53
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: phyklp
Problem: Over zealous sending of REQUEST-IDs.
Diagnosis: When a virtual circuit is closed TOPS-20 tries madly to
re-establish the connection by sending REQUEST-IDs once a second to the
node which went away. This doesn't seem necessary.
Solution: In the once-a-second checker, remove the code which checks for
closed virtual circuits. Communication will be resumed by more standard
methods when the node reappears.
[End of TCO 6.1.1198]
TCO-number: 6.1.1200
Written-by: LOMARTIRE Creation-date: 14-Feb-85 15:52:01
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CFSSRV
Problem: With the new DSA disks, the HDA can be replaced during timesharing and
the drive serial number of the drive changes. This causes a lot of problems
with CFS since now this structure may be know by another "name" since CFS uses
the serial number as root of one of the structure resources.
Diagnosis: CFS was not coded to handle this case.
Solution: Whenever PHYSIO detects the case of a serial number change, it will
update the UDB of the disk and call CFSDSN in order to update the structure
resource. The old resource block will be unlinked, updated, and relinked.
[End of TCO 6.1.1200]
TCO-number: 6.1.1201
Written-by: GRANT Creation-date: 18-Feb-85 11:09:38
Edited-by: GRANT Edit-date: 18-Feb-85 11:17:32
Edit-checked: Yes Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYSIO
Problem: The UDBs for all HSC-based disks contain the wrong value for the
high-order word of the drive serial number.
Diagnosis: Since non-HSC-based disks only have a 1-word DSN, TOPS-20 makes
up the high-order word to fill in the UDB. It was not correctly making the
distinction between HSC and non-HSC disks and, thus, smashing the high-order
word of the HSC-based disks' UDBs.
Solution: In routine PHYDUA, fix the index register to be P1 instead of P3
so the CDB status word is correctly obtained.
[End of TCO 6.1.1201]
TCO-number: 6.1.1202
Written-by: GROSSMAN Creation-date: 18-Feb-85 22:14:03
Edited-by: GROSSMAN Edit-date: 20-Feb-85 09:39:12
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NIUSR MONSYM
Problem: Implement Read Channel Counters.
Diagnosis:
Solution:
[End of TCO 6.1.1202]
TCO-number: 6.1.1203
Written-by: GROSSMAN Creation-date: 18-Feb-85 22:50:03
Edited-by: GROSSMAN Edit-date: 20-Feb-85 09:48:36
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NIUSR
Problem: Programs doing blocking transmits with the NI% JSYS wake up too
frequently.
Diagnosis: The scheduler test for blocking transmits always succeeds because
an AC was not being set up.
Solution: Setup the AC before using it.
[End of TCO 6.1.1203]
TCO-number: 6.1.1204
Written-by: GROSSMAN Creation-date: 18-Feb-85 23:10:53
Edited-by: GROSSMAN Edit-date: 20-Feb-85 09:57:12
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI
Problem: PHYKNI treats KLNI CRAM parity error 7777 as an unplanned CRAM
parity error, when in reality it is planned (ie: intentional).
Diagnosis: PHYKNI treats KLNI CRAM parity errors in the range 7750-7775
(inclusive) as Planned CRAM Parity Errors. Unfortunately, the microcoders
are now using 7777 as a PCPE. So, now the rules are: all parity errors
between 7750 and 7777 (inclusive), but excluding 7776 are Planned, and
must be treated special.
Solution: Follow the rules stated above.
[End of TCO 6.1.1204]
TCO-number: 6.1.1205
Written-by: GROSSMAN Creation-date: 18-Feb-85 23:34:05
Edited-by: GROSSMAN Edit-date: 20-Feb-85 10:05:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI
Problem: Code wrong at NIDPT + a few.
Diagnosis: The code was expecting a temp AC to be preserved across a
subroutine call. In reality, the code worked correctly, but it was just
pure luck. Don't depend on luck.
Solution: Use a more permanent AC.
[End of TCO 6.1.1205]
TCO-number: 6.1.1206
Written-by: PALMIERI Creation-date: 19-Feb-85 10:21:45
Edited-by: PALMIERI Edit-date: 19-Feb-85 16:50:13
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DATIME
Problem: IDTIM fails when asked to suppress inputting of date and time
Diagnosis: IDTNCS subroutine called by IDTIM notices that inputting of date
and time are suppresed and attempts to return the current date and time.
After doing the ODCNV it forgets that time input is suppressed and returns
what it thinks it input for time which is garbage. IDTIM then calls IDCNV
to convert the date and time to internal format and notices the garbage
time and returns an error in T1 when IDTIM expects it in T2.
Solution: Return current time when inputting of time is suppressed.
Look for error code in correct AC.
[End of TCO 6.1.1206]
TCO-number: 6.1.1207
Written-by: LOMARTIRE Creation-date: 19-Feb-85 10:21:58
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CFSSRV
Problem: If a structure is mounted by one system but not another,
and it is moved, then no other system will be able to mount the structure
until it is dismounted.
Diagnosis: PHYSIO calls CFS at CFRDSN in order to allow CFS to update the
structure tokens with the new drive serial number. However, it calculates
the new HSHCOD value for the structure name token wrong. So, this system will
always refuse any mount requests for the structure because the HSHCOD values
won't match.
Solution: Correctly calculate HSHCOD.
[End of TCO 6.1.1207]
TCO-number: 6.1.1208
Written-by: HAUDEL Creation-date: 19-Feb-85 11:49:50
Edited-by: HAUDEL Edit-date: 19-Feb-85 16:53:10
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Related-QAR: 706364
Problem: CHECKD's "REBUILD" command fails if the DSKBTTBL file does
not exist.
Diagnosis: Thd DSKAS% tries to get a JFN on an existing DSKBTTBL
file and if the file is not found, it does not try to
write a new one. Even if the DSKAS% does find an existing DSKBTTBL file,
it does not seem to do anything with the contents.
Solution: If the DSKAS% fails to get a JFN because the file does not
exist, JRST to the code that writes a new DSKBTTBL.
[End of TCO 6.1.1208]
TCO-number: 6.1.1209
Written-by: GROSSMAN Creation-date: 19-Feb-85 11:58:07
Edited-by: GROSSMAN Edit-date: 20-Feb-85 10:11:12
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI
Problem: The KLNI sometimes doesn't restart after a continuable error. It
just hangs in the INIT state.
Diagnosis: Sometimes when the KLNI gets an error, there are items left on the
response queue. When the KLNI gets restarted, it never gets an interrupt
to tell it to look at the response queue. This happens because the interrupt
is generated only if a response is put onto an empty response queue.
Solution: Clean up (empty out) the response queue during error processing.
[End of TCO 6.1.1209]
TCO-number: 6.1.1210
Written-by: GROSSMAN Creation-date: 19-Feb-85 12:19:47
Edited-by: GROSSMAN Edit-date: 20-Feb-85 10:12:36
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NIUSR MONSYM
Problem: Implement Read Portal Counters.
Diagnosis:
Solution:
[End of TCO 6.1.1210]
TCO-number: 6.1.1213
Written-by: PALMIERI Creation-date: 19-Feb-85 17:30:54
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: Ntman
Problem: No entry in line parameter table for TOPS20 specific parameter
RECEIVE BUFFER SIZE (2500)
Diagnosis: Never put in
Solution: Add it
[End of TCO 6.1.1213]
TCO-number: 6.1.1214
Written-by: MELOHN Creation-date: 19-Feb-85 18:54:58
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: TOPS-20 provided host services with "Dynamic" ratings never change.
Diagnosis: Code to set the rating based on the load average was not
being called regularly. The formula 255-INT(15-minute load average) should be
more flexible to provide a better indication of how loaded a system is.
Solution: Rewrite the DYNRAT routine to use the formula
255-INT(4*(15-minute load average)), add a cell to the host node
database to contain the host's current rating, and check to see if
this rating needs to be updated each time the multi-cast message is
sent out.
[End of TCO 6.1.1214]
TCO-number: 6.1.1215
Written-by: PALMIERI Creation-date: 20-Feb-85 14:21:22
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: STG DNADLL GLOBS
Problem: Too many instructions executed in LV8CHK if DECnet doesn't have
anything to do. D36IFG flag check every time through LV8CHK.
Diagnosis: LV8CHK calls DNADLL every time to see if something to do. LV8CHK
always checks the DECnet intialized flag.
Solution: Have LV8CHK check DNADLL's queue any only call it if something to
do. Remove check of D36IFG since all callee's do the right thing is DECnet
is not initialized.
[End of TCO 6.1.1215]
TCO-number: 6.1.1216
Written-by: MELOHN Creation-date: 22-Feb-85 12:08:43
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: Stop circuit messages from the server are not acted upon by
the TOPS-20 Host code.
Diagnosis: Routine HMSTOP checks incoming stop messages using the
Remote ID field when it should be using the Local ID field.
Solution: Use the Local circuit ID field instead.
[End of TCO 6.1.1216]
TCO-number: 6.1.1217
Written-by: GROSSMAN Creation-date: 22-Feb-85 17:09:10
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI
Problem: Not setting an NIA20's Ethernet address (via the SETSPD command
ETHERNET), results in the system using a 0 address.
Diagnosis: PHYKNI was getting the Ethernet address from a location that
doesn't get initialized if the SETSPD command is never issued.
Solution: Initialize the address only if we have a valid address to use.
[End of TCO 6.1.1217]
TCO-number: 6.1.1218
Written-by: WAGNER Creation-date: 25-Feb-85 08:30:52
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA
Related-QAR: 706379
Problem: NIN% only handles radices up to decimal 10
Diagnosis: Code overly restrictive
Solution: Make code able to accept radices up to decimal 36. NOUT% can handle
these, so no changes there. Change MONSYM to report IFIXX1 based on new range.
[End of TCO 6.1.1218]
TCO-number: 6.1.1219
Written-by: WAGNER Creation-date: 25-Feb-85 11:25:28
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSF
Related-QAR: 706383
Problem: User with infinite quota can create inferiors with negative quotas.
Diagnosis: Poor check made, assumption is that normal return from CKLIQ means
infinite quota, but it can also mean negative.
Solution: Make the jump conditional upon quota being non-negative.
[End of TCO 6.1.1219]
TCO-number: 6.1.1222
Written-by: LOMARTIRE Creation-date: 27-Feb-85 10:12:51
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GTJFN
Related-SPR: 17125
Problem: G1%NLN (no long names) does not work when recognition is used.
Diagnosis: When the field is recognized, there is no check for the length
of the field which is being returned.
Solution: Before the field which was recognized is output, check the length
of the field. If it is too large, return the appropriate error; either
GJFX41 or GJFX42. Note that this is only done for recognition of file names
and extensions (from routines DEFNAM, DEFEXT, RECNAM, RECEXT).
[End of TCO 6.1.1222]
TCO-number: 6.1.1223
Written-by: MELOHN Creation-date: 27-Feb-85 21:10:30
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: If a LAT server crashes, TOPS-20 doesn't find out until the
server comes back and the first person tries to reconnect to the host.
This first person sees the banner followed immediatly by "node stopped
circuit", at which time all TTYs on that server are detached and
JOBCOFed. The person has to connect again in order to establish a new
circuit on the host, and the period between the time the server
crashes and the first person attempts to re-connect can be weeks,
during which time all of the users on that server remain connected and
can be charged for TTY connect time.
Diagnosis: A keep-alive timer needs to exist on the host side of the
LAT circuit to detect those times when the server does not or cannot
send its normal keep-alive message. If a reasonable number of server
keep alive intervals has passed without a message from the server, it
is safe to assume that the server has passed away and the circuit
should be stopped.
Solution: Implement such a timer, initially based on 6 times the
server keep-alive timer number of seconds. Stop the circuit if no
messages have been received from the server within that interval.
[End of TCO 6.1.1223]
TCO-number: 6.1.1225
Written-by: PAETZOLD Creation-date: 28-Feb-85 09:42:47
Edited-by: PAETZOLD Edit-date: 6-Mar-85 16:08:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: IPNIDV
Problem:
ILULK2 BUGHLTS from NISRV and IPIPIP.
Diagnosis:
This is a catch all BUGHLT for problems in the TCP/IP Ethernet buffer
handling stuff.
Caused by a race in the NIPSTO (start output routine).
Solution:
Use NTOB in the NCT as an interlock for NIPSTO so that all other
callers will stay away from this routine. This is OK since the current
possessor of the interlock will queue all linked buffers to NISRV
anyways. Initial code in NIPSTO should be PIOFF instead of NOSKED. Also
reset NBQUE field of buffers in NIPQUE when dequeueing them.
Add IPNDSW debuging code. This code links all send and receive buffers
given to NISRV and BUGHLTs (IPNMIS and IPNHIT) on any discrepancies.
Other causes of ILULK2s during the interim period of these changes were
caused by bugs induced by the debuging code. There are no known ILULK2
problems in the TCP/IP at this time.
[End of TCO 6.1.1225]
TCO-number: 6.1.1227
Written-by: LOMARTIRE Creation-date: 4-Mar-85 07:57:31
Edited-by: LOMARTIRE Edit-date: 12-Mar-85 10:45:17
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC
Related-TCO: 6.1.1247
Problem: CFSVFL BUGHLTs
Diagnosis: The starting of CFS at system startup is very timing dependent
and is subject to variations to timing changes elsewhere in the monitor.
Recently, something has changed in the system startup timing that causes the
system not to be fully "joined" to the cluster when we try to mount our PS.
This can result in confusion since we may obtain the wrong access. This will
eventually result in a CFSVFL BUGHLT. One reason we were not fully joined is
that SCA usually is stuck on buffers due to MSCP connections being established
before we join the cluster. A temporary fork is around to help solve this
problem but it is started too late to be of help once we call CFSJYN.
Solution: TEFORK is the temporary fork used at system startup to insure that
the SCA buffers are replenished. Move it after the call to PPDINX and before
the call to IPACHK. In this way, SCA has been initialized and we have a TEFORK
ready to call SC.ALM whenever needed.
[End of TCO 6.1.1227]
TCO-number: 6.1.1228
Written-by: MOSER Creation-date: 5-Mar-85 10:31:50
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: TTYSRV
Problem: Still more ITRLGOs. This one is from TTYSRV when the ACJ refuses to
allow line speed changes.
Solution: ERJMP .+1 after MTOPR in TTCKSP.
[End of TCO 6.1.1228]
TCO-number: 6.1.1229
Written-by: WAGNER Creation-date: 5-Mar-85 10:48:46
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FORK
Related-QAR: 838045
Problem: Performance Issues
Diagnosis: SETJSB doesn't check to see if it needs to do mapping before doing
it. Mapping is expensive.
Solution: Lets see if we need to spend the effort first.
[End of TCO 6.1.1229]
TCO-number: 6.1.1231
Written-by: PALMIERI Creation-date: 5-Mar-85 13:48:03
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: DTESRV GLOBS STG
Problem: Too many BUGINFs when MCB DTE is initialized or rebooted.
Diagnosis:
Solution: Add DTBUGX which when zero suppresses BUGINFs DTESUI, DTETPR, DN20ST
for MCB DTEs only. Default is non-zero. Place DTEWAT BUGINF under
FTDEBUG.
[End of TCO 6.1.1231]
TCO-number: 6.1.1232
Written-by: HAUDEL Creation-date: 5-Mar-85 14:32:35
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCAPAR MONSYM
Problem: SCA's connection management symbols are not available to
the users of the SCS%.
Diagnosis: The connection management symbols are defined in SCAPAR and
not in MONSYM.
Solution: Define new symbols in MONSYM and have the symbols in
SCAPAR point to those in MONSYM.
[End of TCO 6.1.1232]
TCO-number: 6.1.1233
Written-by: GLINDELL Creation-date: 5-Mar-85 15:01:44
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: nrtsrv cthsrv
Related-QAR: 838026
Problem:
NRT and CTERM does not do the right thing when
^ESET NO LOGINS DECNET-TERMINAL is set. NRT tests
for 'remote terminals' instead of for 'decnet terminals'
while CTERM doesn't test anything at all. Also, NRT
returns the wrong reason code 'node shutting down' instead
of 'access not permitted'.
Diagnosis:
Solution:
Test SF%MCB for both NRT and CTERM. Use disconnect reason
code RSNACR (access not permitted).
[End of TCO 6.1.1233]
TCO-number: 6.1.1234
Written-by: GLINDELL Creation-date: 5-Mar-85 15:37:39
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA
Related-QAR: 838052
Problem:
Any user can get any password on a system that is not using password
encryption. This bug is also in all 4.1 and 5.1 systems. Also, this
can be done in a finite time, more exactly proportional to 128 times
the number of characters in the password.
Diagnosis:
Check password code in JSYSA is not defensive enough. Monitor uses
one address when reading bytes that match the correct password, another
address when reading bytes that did not match. Clever usage of address
break will use this fact.
Solution:
Use the one and same address whether good or bad bytes are read.
The address break will not reveal anything that way. Code affected
is at CHKPS3 and CHKPS5.
[End of TCO 6.1.1234]
TCO-number: 6.1.1236
Written-by: PALMIERI Creation-date: 6-Mar-85 10:19:10
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: SCJSYS D36PAR
Problem: Random BUGHLTs when privileged user continually opens the maximum
number of logical links
Diagnosis: Port database in SCJSYS is built to accommodate maximum number
logical links and is indexed by link number. However link numbers
are assigned by the lower layer (SCLINK) when the logical link
is opened and numbers greater than maximum logical link value
may be assigned. This occurs because SCLINK does not immediately
release an SLB for a logical link but instead puts it on a
queue to be released later. The user is told that the link is
closed and may then open another. If that happens before the
SLB for the previous link is released the link number cannot be
reused. SCLINK expands its database and can then assign a link
number that exceeds maximum links. This number is given to SCJSYS.
This causes SCJSYS to index off the end of its port database and
trash memory it does not own.
Solution: Add a table of pointers to the port database indexed by port number.
Make this table maximum links times 2 for unprivileged users and
maximum links plus 10 for privileged users. If the link number
assigned exceeds the size of the indirect table close the link
and return MONX07 error to user. The indrect table facilitates
not having to build the port database until the port is opened.
[End of TCO 6.1.1236]
TCO-number: 6.1.1238
Written-by: MCCOLLUM Creation-date: 6-Mar-85 14:54:28
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MANY
Problem:
There are nearly 400 undocumented BUGxxx's in the monitor as well as a few
hundred improperly documented BUGxxx's.
Diagnosis:
Same.
Solution:
Document all the BUGxxx's that are not under DEBUG=1 and fix all the improperly
documented BUGxxx's so the documentation people can distribute BUGS.MAC.
[End of TCO 6.1.1238]
TCO-number: 6.1.1239
Written-by: PAETZOLD Creation-date: 6-Mar-85 16:11:00
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: STG
Problem:
Should be able to turn off the IPNIN and IPCIN code even though it is not
a supported configuration. This is possible in 5.4.
Diagnosis:
oversight.
Solution:
Key the IMPANX and IMPDV loadmodules off of ANXN under control of NETN.
Also create an dummy for IMPCHK when ANXN is off and NETN is on.
[End of TCO 6.1.1239]
TCO-number: 6.1.1241
Written-by: GROSSMAN Creation-date: 7-Mar-85 16:09:39
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LLMOP
Problem: COMMMS BUGHLTs when running DFNIS (sometimes called RMTCON).
Diagnosis: ^C'ing out of DFNIS at the wrong moment leaves some buffers laying
around. When the fork gets reset, these buffers get returned to the memory
manager. Unfortunately, NISRV still has pointers to the buffers, and
eventually returns them to LLMOP. LLMOP then tries to return them to the
memory manager again, resulting in a COMMMS BUGHLT.
Solution: Create an "ABORT" bit for requests. When LLMOP tries to clean up
the request buffers, it first see's if the request has completed, if it has,
it returns the buffer immediately, otherwise, it just sets the "ABORT" bit.
Eventually, NISRV returns the buffer. If the abort bit is on, the buffer
is just released with no further ado.
[End of TCO 6.1.1241]
TCO-number: 6.1.1242
Written-by: PRATT Creation-date: 7-Mar-85 19:13:44
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: IPCF
Problem:
1) Must be whopr to use the QUEUE jsys.
2) IP%MON is not being cleared when it should which would
allow a user to set the "sent by monitor" bit.
Diagnosis:
1) The VALARG routine was checking 3 fields including
the field IP%CFC for a non-zero value when the user
wasn't priv'd. The new .IPCCG value (sent on behalf
of the QUEUE jsys) doesn't need privs. All the old values do.
2) Typo in the code
Solution:
1) Check IP%CFC for .IPCCG after checking the status of
IP%CFP, and IP%CFM.
2) Change the T1 to a P1
[End of TCO 6.1.1242]
TCO-number: 6.1.1244
Written-by: GLINDELL Creation-date: 7-Mar-85 22:19:11
Edited-by: GLINDELL Edit-date: 7-Mar-85 22:22:26
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: POSTLD
Related-QAR: 838076
Problem:
Illegal memory read in POSTLD if the monitor PDV is missing.
This should only happen if a bad .CCL file is used to link the
monitor.
Diagnosis:
POSTLD needs to find the monitor PDV in order to locate the symbol
table. The .POLOC function of PDVOP% is used. If there is no name
with the requested name, then the PDVOP% will return with a 0 PDVA
and a 0 count of items returned. However, when I wrote the code
I thought the PDVOP% would generate an error if no matching PDV
was found. Since this is not the case, POSTLD will pick up a 0
for the PDVA and after that it's downhills.
Solution:
Check for 0 items returned after the PDVOP%, and if no PDV is found,
issue an appropriate error message and abort POSTLD.
Also, add some information on how to debug POSTLD in a comment at the
top of the module.
[End of TCO 6.1.1244]
TCO-number: 6.1.1245
Written-by: GRANT Creation-date: 11-Mar-85 07:46:45
Edited-by: GRANT Edit-date: 11-Mar-85 10:00:03
Edit-checked: Yes Document: Yes TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: cfssrv MEXEC phymvr globs PROLOG STG
Problem: Jobs gets hung when another system in the cluster is shutdown for PM.
Diagnosis: The MSCP server is invisible to the operator on the system going
down and to users on the other systems who have structures mounted through the
MSCP server. The software provides no help in this siutation.
Solution: When a system is going to cease timesharing, check to see if it has
any disks onlined by other systems through its MSCP server. If so, warn the
operator that the other systems must be checked for possible structure
dismounting instructions. On the other systems, check for any structures
mounted through the MSCP server of the system going down. If there are any,
warn the operator about the other system's pending shutdown and list the
structures that should be dismount.
[End of TCO 6.1.1245]
TCO-number: 6.1.1247
Written-by: LOMARTIRE Creation-date: 12-Mar-85 10:45:16
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CFSSRV MEXEC
Related-TCO: 6.1.1227
Problem: CFSVFL BUGHLTs at system startup.
Diagnosis: TCO 6.1.1227 tried to solve this problem but only fixed part of
the problem. There still must be a more deterministic way to insure that
we have joined with all existing TOPS-20 systems.
Solution: Add code to routine CFSJYN which will make this routine return
only once we are sure that we have completely joined the cluster. We will
wait until we have a CFS connection to every TOPS-20 system to which we have
at least one open path. The extra DISMS in MEXEC after the CALL CFSJYN is
now no longer needed.
[End of TCO 6.1.1247]
TCO-number: 6.1.1250
Written-by: MELOHN Creation-date: 12-Mar-85 18:46:38
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: When bias knob 20 set set, LAT ends up retransmitting up to
40% of all of its messages. This means more work for LATSRV, the
server, and more unnecessarily retransmitted messages on the Ethernet.
Also, LCP displays the circuit retransmit timer is in seconds,
when it is actually in scheduler cycles. Quite a difference.
Diagnosis: The LAT host retransmit timer was designed based on the
assumption that a scheduler cycle happens every 20ms. This is not a
valid assumption when the bias knob is set or when the system is very
idle and the LAT scheduler routine is run as part of the idle loop.
Solution: Change the retransmit timer to be based on TODCLK, like the
keep alive timer. Change the LATOP% jsys to expect the retransmit
timer value in milleseconds, just like the various circuit timers in
the server.
[End of TCO 6.1.1250]
TCO-number: 6.1.1251
Written-by: MELOHN Creation-date: 12-Mar-85 18:51:56
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: Various LATOP% parameters default to bogus values. The Host
number which is supposed to be settable via the NODE command in SETSPD
isn't.
Diagnosis: The defaults grew out of the values used in standalone. No
code existed to set the host number.
Solution: Make the default values and ranges more realistic. Read the
host number from RTRADR at LATINI time.
[End of TCO 6.1.1251]
TCO-number: 6.1.1252
Written-by: MELOHN Creation-date: 12-Mar-85 18:58:57
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: Users report being "dropped" by the LATBOX but there is no
evidence that anything was wrong, other than a stopped session at the
user's terminal.
Diagnosis: When an illegal message type is received, a LATIMT BUGCHK
is recorded and the circuit terminated. If the first part of the
message is readable, but the slot data in the message is garbaged, the
circuit is stopped, but no BUGCHK issued. LATIMT should probably be a
BUGINF anyway.
Solution: Make the LATIMT a BUGINF. Add a new BUGINF LATIST which
will print out when a user is dropped due to an illegal slot within a
seemingly legal message.
[End of TCO 6.1.1252]
TCO-number: 6.1.1253
Written-by: MCCOLLUM Creation-date: 13-Mar-85 14:27:38
Edited-by: MCCOLLUM Edit-date: 13-Mar-85 14:41:07
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DISC
Related-QAR: 838064
Problem:
GTFDB3 crashes when renaming files.
Diagnosis:
When renaming a file to itself, RNAMF% should return a RNMX10 (Source file
is not closed) error. There is a coding error in this path that prevents
RNAMF% from performing the check that determines if the file is indeed
open. It wrongly assumes that the file is not open and proceeds to trash
the FDB of the already existent file. A last second sanity check call to
GETFDB finds this and causes the crash.
Solution:
Fix the coding error and allow RNAMF% to return RNMX10.
[End of TCO 6.1.1253]
TCO-number: 6.1.1255
Written-by: PALMIERI Creation-date: 13-Mar-85 15:38:12
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: SCJSYS
Related-QAR: 838043
Problem: Previous DECnet monitors allow a program to do an implied connect
accept by doing a SOUT/SOUTR after the OPENF on a DECnet link.
6.1 does not.
Diagnosis: No code
Solution: Add code to wait for connect and accept it if SOUT/SOUTR is done
and the link is in connect wait/connect received state.
[End of TCO 6.1.1255]
TCO-number: 6.1.1256
Written-by: PAETZOLD Creation-date: 13-Mar-85 16:37:40
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: Yes
Program: monitor
Routines-affected: STG
Problem:
Channel detected write parity errors from the last quadword of memory.
Diagnosis:
Hardware design problem in the KL10.
Solution:
Add conditional assembly in STG to force NMAXPG down by one page if MAXCOR
is set for 4.0 Meg.
[End of TCO 6.1.1256]
TCO-number: 6.1.1258
Written-by: PALMIERI Creation-date: 13-Mar-85 16:51:31
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: D36COM SCLINK
Related-QAR: 838082
Problem: 36 bit byte mode does not work from 6.1 to 5.1 or 6.1 to 6.1.
Diagnosis: If segment size of message falls somewhere in a full word other
than at the end, the rest of the bytes of the word may not be sent.
Solution: Send a maximum of segment size modulo 9 bytes in a message. Send
any remaining bytes in the following message.
[End of TCO 6.1.1258]
TCO-number: 6.1.1260
Written-by: PALMIERI Creation-date: 13-Mar-85 17:03:39
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: ROUTER
Problem: Loop node does not work when ROUTER is an endnode.
Diagnosis: The routing algorithm choses the destination as the next hop for
the NI but since we are always the destination when
performing loop node we fail to get the message back since
the NI cannot receive its own transmitted messages.
Solution: If message is destined for ourselves send it to the designated
router if there is one. If not send it to ourselves and never get
it.
[End of TCO 6.1.1260]
TCO-number: 6.1.1263
Written-by: WAGNER Creation-date: 14-Mar-85 13:46:34
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MONSYM
Related-SPR: 20644
Problem: DOCUMENTATION FOR .SKRJP FUNCTION OF SKED% IS WRONG, AND MISSING SOME
SYMBOLS
Diagnosis: SEE ABOVE
Solution: CORRECT THE DOCUMENTATION, AND IMPLEMENT 2 NEW SYMBOLS: .SACSH, AND
.SACLU
[End of TCO 6.1.1263]
TCO-number: 6.1.1264
Written-by: MELOHN Creation-date: 14-Mar-85 15:35:39
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: The third service offered by a TOPS-20 host to a PLATO does
not appear on the PLATO list of available services.
Diagnosis: The Multi-cast change flags were not being set when a new
service was added. Apprently VMS doesn't set them either, but the
PLATO allows the second service to be added without setting the change
flags so as not to upset VMS while the third or more service must have
the change flags set. *yuk*
Solution: Set the change flags whenever we add a new service, up to
the maximum number of services offered.
[End of TCO 6.1.1264]
TCO-number: 6.1.1265
Written-by: MELOHN Creation-date: 14-Mar-85 15:52:31
Edited-by: PRATT Edit-date: 14-Mar-85 16:33:03
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG
Related-TCO: 6.1.1268
Related-QAR: 838073
Problem: Need more customer defined terminal types.
Diagnosis: If we reserve a new block for DEC as well, customers who
need larger amounts of terminals than we can to build in can add them
at the end without worrying about DEC coming along some day and using
the slots for our new terminal types.
Solution: Add more - 10 reserved for customers, 10 for DEC.
[End of TCO 6.1.1265]
TCO-number: 6.1.1267
Written-by: MELOHN Creation-date: 14-Mar-85 16:31:12
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: TTYSRV TTYDEF NRTSRV
Related-QAR: 838078
Problem: System halted with COMMMS. There were 13 SCLSBJ and 2
NVTPCL BUGCHKs queued on SEBQOU.
A user on terminal 31 was set hosted (.MOSNH had been issued). TTNUS
was set in the terminal's dynamic data but the contents of TTULL was
101 (octal) and not the address of a NRB. This discrepancy caused the
BUGHLT and BUGCHKs when MCSRV tried to service the line.
Diagnosis:
After careful scrutiny we noticed that TTLMAX and TTULL share the same
word in the dynamic data. TTLMAX is the maximum of TTLINE, the line
counter. TTULL is the address of the NET USER logical link. The
comment lines in TTYDEF tell us the assumption is TTLMAX will not be
in use when SET HOST is in effect. Unfortunately there is no code in
TTYSRV to support this assumption.
The value in TTULL at the time of the crash was the same as the value
in TTLINE, indicating that routine INCLIN(TTYSRV) had recently
deposited the value. INCLIN is called for example by the BOUT% JSYS
(TCO->TCOY->TTCO1 (via CHITAB dispatch)) when a ^J is being output to
the terminal.
Another way for TTLMAX to get set (and TTULL to get zapped) is by the
MTOPR function .MOSLM which directly stores into TTLMAX.
Since the SET HOST (.MOSNH) function does not freeze any processes in
the job, or prevent non job output, it is possible for the value in
TTULL to get lost.
The careful reader would have immediately noticed that under versions
6.0 and 5.1 TTULL was also overlayed on the TTLMAX word. Correct, but
now the field for the escape character (TTUEC) is 774000,,0 whereas
under previous versions it was 177,,0. This change has been made to
accomodate a 29 (?) bit address for the field TTULL, which is
allocated out of extended resident free space.
This means that under versions 6.0 and before the test at INCLIN would
invariably prevent the store of a new value in TTLMAX. However, with
the new position of TTUEC a user supplying an escape character of "@"
or greater causes the value in TTLMAX to be negative! This negative
value of TTUEC and TTULL then gets overwritten by the code at INCLIN.
Of course our user was an extremist, he supplied tilde (code 176) for
his escape character.
Solution: Solution:
Forget about saving a word in the dynamic data, nobody will notice
because the system will be crashing. Instead increase the dynamic
data and seperate TTLMAX and TTULL.
Change the lines in TTYDEF (changes are in lowercase):
TTLMAX==34 ;MAXIMUM OF TTLINE
DEFSTR TTULL,TTLMAX,35,29 ;NET USER LOGICAL LINK
; - WHEN TTLMAX NOT IN USE
DEFSTR TTUEC,TTLMAX,6,7 ;NET USER ESCAPE CHAR
:
:
TTDDLN==37 ;DEFAULT DYNAMIC DATA SIZE
to be:
TTLMAX==34 ;MAXIMUM OF TTLINE
:
:
ttlnuw==37 ; net user word
defstr ttull,ttlnuw,35,29 ; net user logical link
defstr ttuec,ttlnuw,6,7 ; net user escape char
:
:
ttddln==40 ; default dynamic data size
[End of TCO 6.1.1267]
TCO-number: 6.1.1271
Written-by: PRATT Creation-date: 14-Mar-85 23:00:19
Edited-by: PRATT Edit-date: 14-Mar-85 23:13:22
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA
Problem:
Any user supplying bad length arguments to the QUEUE% jsys can
cause a ILLUUO bughalt.
Diagnosis:
The Queue jsys code was trying to verify the users arguments and
found an error. It took the error return which tried to clean up
by releasing free space used for building an IPCF packet.
Unfortunately we hadn't gotten the free space yet which happens
later on in the code. The QUMSG location had garbage in it and
when we tried to release that, we die a horible death.
Solution:
In QUVERF, change the two error conditions before the call to ASGPGS
to generate an illegal instruction trap. After the call, use RETBAD
so we can return and clean up the free space used.
[End of TCO 6.1.1271]
TCO-number: 6.1.1272
Written-by: GRANT Creation-date: 15-Mar-85 08:18:30
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYSIO
Problem: Impossible to figure out the algorithm for homeblock checking.
Diagnosis: Routine UPDPDB is too hard to read.
Solution: Restructure the code. No logic changes intended.
[End of TCO 6.1.1272]
TCO-number: 6.1.1273
Written-by: GRANT Creation-date: 18-Mar-85 10:05:38
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: phyklp
Problem: Occasional problems with CI wires going from good to bad and from
bad to good.
Diagnosis: There is a problem in the KLIPA (to be ECOed) which is
aggrevate by frequent loopback. TOPS-20 sends 2 loopbacks per second.
Solution: Change TOPS-20 to send only 1 loopback per second; this should
not deminish the monitor's ability to detect real problems.
[End of TCO 6.1.1273]
TCO-number: 6.1.1275
Written-by: GROSSMAN Creation-date: 18-Mar-85 15:12:10
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LLINKS
Problem: Invalid DECnet event 3.0 from LLINKS.
Diagnosis: An NSP Disconnect Confirm message with a bad reason code was
being handled incorrectly.
Solution: Make LLINKS follow the NSP version 4.0 spec in this regard.
Note that if VMS followed the spec in this regard, this problem would not
have occurred.
[End of TCO 6.1.1275]
TCO-number: 6.1.1276
Written-by: PALMIERI Creation-date: 18-Mar-85 16:56:29
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: GTJFN
Problem: Wildcard for nodename not permitted for network parse-only JFN.
Diagnosis: Code is too restrictive.
Solution: Remove restriction and allow wildcard in nodename.
[End of TCO 6.1.1276]
TCO-number: 6.1.1277
Written-by: MELOHN Creation-date: 19-Mar-85 17:30:33
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: TTYSRV
Problem: Users on VT2xx terminals in VT2xx, 7-bit mode see garbage
when hosting via NRT to a remote 20.
Diagnosis: Parity checking is enabled for NRT (and all other line
types). VT200 in 7-bit control mode use the 8th bit not for parity but
for 8-bit characters.
Solution: Parity is for farmers, not for NRT, CTERM, or LAT lines.
Remove the TRZ(TRO) to clear/set parity in PARTBL:.
[End of TCO 6.1.1277]
TCO-number: 6.1.1278
Written-by: GROSSMAN Creation-date: 20-Mar-85 09:40:01
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NIUSR
Problem: The Read Portal Counters function of the NI% JSYS does not convert
global job numbers to local indexes.
Diagnosis: Oops!
Solution: Add a call to GL2LCL to the portal locating routine.
[End of TCO 6.1.1278]
TCO-number: 6.1.1279
Written-by: LOMARTIRE Creation-date: 20-Mar-85 10:48:07
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: APRSRV GTJFN
Related-TCO: 6.1.1127
Problem: ILLUUO BUGHLTs.
Diagnosis: TCO 6.1.1127 attempted to solve the cause of ILLUUOs from passing a
bad byte pointer to GTJFN. The changes were made to KIMXLP under a false
assumption.
Solution: Remove the code added to KIMXLP by TCO 6.1.1127. Instead, rewrite
GTJFN so that it does an XCT Q1 not an XCT -2(P) since P is not preserved when
we are in KIMXLP.
[End of TCO 6.1.1279]
TCO-number: 6.1.1280
Written-by: MELOHN Creation-date: 20-Mar-85 14:55:44
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: LCP show server Quack does not display the location string
of server Quack.
Diagnosis: Recent additions to the LAT circuit database were made at
the end of the CB BEGSTR. Unfortunetly the LATOP% jsys BLTs the last
half the the CB to user context as the return for the get server
information function.
Solution: Move the new cells to a more appropriate place in the CB.
Add the warning to the CB that the data structures in the last half of
the BEGSTR are returned as part of the jsys, and should not be changed
without corresponding changes to the LATOP code.
[End of TCO 6.1.1280]
TCO-number: 6.1.1281
Written-by: MOSER Creation-date: 21-Mar-85 11:15:03
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC
Problem: When a job is logged out because of a carrier off action the wishes
of the ACJ are ignored. If the ACJ refuses logout the job is logged out anyway!
Diagnosis: Coded that way. This is inconsistent with regular logout
using the LGOUT Jsys where the ACJs wish is honored.
Solution: If the ACJ denies the request to logout wait 1 minute and ask it
again.
[End of TCO 6.1.1281]
TCO-number: 6.1.1282
Written-by: LOMARTIRE Creation-date: 21-Mar-85 16:13:30
Edited-by: LOMARTIRE Edit-date: 17-Apr-85 14:27:20
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGUTL STG GLOBS
Related-TCO: 6.1.1328
Problem: PITRAP BUGHLTs
Diagnosis: During the lookup of a resource block in HSHLOK, one of the
pages which is referenced is no longer in core. Since CFS locks all it's pages
down, and never unlocks them, someone else is unlocking the page. We need a
diagnostic patch which will catch the culprit.
Solution: In the routine MULK1, check to see if the page to be unlocked is
a page from CFSSEC. If so, die with an ILULK5 BUGHLT. This feature is
controlled by the location CFUNLF. When zero, this checking will be done.
When non-zero, it will not. CFUNLF is zero by default.
[End of TCO 6.1.1282]
TCO-number: 6.1.1283
Written-by: GROSSMAN Creation-date: 21-Mar-85 23:18:52
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI
Problem: KNISTP BUGCHK's
Diagnosis: Keep alive timer for KLNI was implemented wrong. Every five
seconds, KLNI is checked for activity within the last 10 seconds. If
there has been no activity in the last 10 seconds, NISRV gives the KLNI
a command. This means that on an idle system, the KLNI is processing
a command every 10 seconds. If the keep alive routine in NISRV determines
that it hasn't heard from the KLNI in 15 seconds, it generates a KNISTP
BUGCHK, and reloads the KLNI.
Due to variability in the second level clocks from the scheduler, the
KLNI keep alive routine may be called in such a manner that it won't queue up
a command to the KLNI for about 15 seconds. This results in a KNISTP BUGCHK.
Solution: Halve the minimum activity interval. Ie: if the KLNI has done nothing
in the last 5 seconds, give it a command. This allows a large margin of
error for level 2 clock drift.
[End of TCO 6.1.1283]
TCO-number: 6.1.1284
Written-by: GRANT Creation-date: 22-Mar-85 06:45:40
Edited-by: GRANT Edit-date: 22-Mar-85 07:02:04
Edit-checked: No Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC scampi phyklp PROLOG globs STG
Related-QAR: 838136 838142
Problem: SCAEBD BUGHLT (Error while doing Buffer Defferal)
Diagnosis: Two systems are simultaneously running CHECKD at system startup.
Facts to remember: 1) MSCP server doesn't open a listener until after CHECKD
finishes, 2) MSCP continuously tries to connect to another 20's MSCP server
once the virtual circuit is open. These are both the desired actions. However,
during the time MSCP's connection attempts fail, many SCA connect blocks are
marked for reaping but the fork which does the reaping doesn't get started
until after CHECKD finishes! This deadly embrace causes SCA to use up the
entire section's worth of buffers for the connect attempts, thus producing the
SCABSF (Buffer Section Full) BUGCHK and finally the SCAEBD BUGHLT.
Solution: What was once TEFORK (temporary fork) will now live on through the
life of the system. It gets started immediately after starting the CI20 and
will handle SCA buffer creation, SCA connect block reaping, and (while we're
at it) loading/dumping of the CI20. The fork will be called CIFORK and usually
will be sitting in the scheduler test CITEST waiting for any of the bits in the
flag word CIFRKF to get set. There are bits for each of the 3 actions CIFORK
performs.
Comments: This eliminates SCA's use of DDMP to do SCA buffer creation. Also,
moving loading/dumping of the CI20 away from Job 0 may eliminate some of the
KLPNRL BUGHLTs you get due to the deadly embraces that occur, but it is not
a 100% cure; that is planned for Release 7.0.
[End of TCO 6.1.1284]
TCO-number: 6.1.1285
Written-by: GROSSMAN Creation-date: 25-Mar-85 10:50:56
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NISRV PHYKNI
Problem: Spurious KLNI Reload Timeouts, resulting in KNIRTO BUGCHK's. Sometimes
KLNI is not started because of this.
Diagnosis: Reload timeouts being performed wrong. Timing starts whenever the
reload request is made. During system startup, the reload request happens
almost immediately, KNILDR doesn't run until CHKR gets around to it. If the
startup procedures take too long, KNIRTO's result. Usually, they are spurious
and have not effect onthe KLNI, but sometimes, they cause the KLNI to be
shut down.
Solution: Get rid of KNIRTO. Don't time out KNILDR. When CHKR (via KNIJB0)
runs KNILDR, check the state of the KLNI after KNILDR completes. If the
KLNI isn't running, shutdown the KLNI and issue a KNIRLF (Reload Failed)
BUGCHK.
This can happen if KNILDR dies while reloading a KLNI, or if someone put a
bogus program that just happens to be called KNILDR up on SYSTEM:.
[End of TCO 6.1.1285]
TCO-number: 6.1.1286
Written-by: LOMARTIRE Creation-date: 25-Mar-85 11:20:50
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYSIO MSTR MONSYM
Related-QAR: 706407
Problem:
Whenever a cluster change occurs, the monitor is going to force all dual
ported disks offline while it goes through a homeblock check on them. This
causes MOUNTR to print out that the "previously mounted structure is no longer
mounted". This can be confusing since this homeblock check is temporary and
should be transparent to the user.
Diagnosis:
If a .MSRUS function of MSTR% is done during the time that TOPS-20 has
forced the disk offline, MS%OFL will be returned. MOUNTR interprets this as a
true physical offline when, in fact, the disk is still online but it is just
temporarily inaccessible.
Solution:
The monitor needs a way to return the status of a disk which it has forced
offline in a way other than MS%OFL. Invent a new bit, called MS%IAC, which will
be returned whenever the disk is U1.OFS (forced offline). This bit is returned
in additon to whatever other bits are appropriate (such as MS%OFL). Now, MOUNTR
can test for MS%IAC and take more desirable actions under this special
situation.
[End of TCO 6.1.1286]
TCO-number: 6.1.1287
Written-by: MCCOLLUM Creation-date: 25-Mar-85 15:01:51
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FORK
Problem:
CFORK% returns illegal instruction trap for certain resource exhaustion
problems. CFORK% is documented as returning +1 and error code in AC 1
for all error types.
Diagnosis:
As above.
Solution:
Change an ITERR to a RETERR.
[End of TCO 6.1.1287]
TCO-number: 6.1.1288
Written-by: LOMARTIRE Creation-date: 26-Mar-85 08:51:11
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FORK
Related-QAR: 706305
Problem:
The RT%DIM bit of the RTIW% JSYS has no effect. The deferred terminal
interrupt mask is always returned regardless of the setting of the RT%DIM bit.
Diagnosis:
There was never any code to check for the RT%DIM bit.
Solution:
Add code to check to see if the user specified RT%DIM before placing the
mask in T3.
[End of TCO 6.1.1288]
TCO-number: 6.1.1289
Written-by: LOMARTIRE Creation-date: 26-Mar-85 11:15:10
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA
Related-QAR: 706307
Problem:
GTDIR% returns illegal memory write error, yet argument block and all
addresses are writeable
Diagnosis:
The range checking code is off by one.
Solution:
Decrement the value of Q2 (the argument block length) before the
comparisons so that it reflects the highest value to be returned.
[End of TCO 6.1.1289]
TCO-number: 6.1.1292
Written-by: LOMARTIRE Creation-date: 28-Mar-85 09:34:19
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GTJFN
Related-SPR: 19910
Problem:
If there are files (such as A.A.2, B.A.2, C.A.2) which have been deleted
but not expunged with a higher generation number than those of the same name
(such as A.A.1, B.A.1, C.A.1) a command like COPY *.A.0 (TO) NUL: will only
copy the first file; A.A.1.
Diagnosis:
GNJFN% always has VERLUK look for deleted files even if the GTJFN% call
did not consider them. So, B.A.1 and C.A.1 will not be found because the
higher, deleted, versions will be found first. Then, when GNJFN% notices that
the file is deleted, it attempts to find the next one. Of course, there is no
next one, so the GNJFN% fails.
Solution:
In GNJFN1, check the flags passed to GNJFN% which reflect the original
GTJFN% call. If GJ%GND (deleted files were not considered) is set, do not set
IGDLF (ignore the deleted bit). This way, deleted files will be found only if
they were originally requested.
[End of TCO 6.1.1292]
TCO-number: 6.1.1293
Written-by: GROSSMAN Creation-date: 28-Mar-85 09:57:21
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: APRSRV STG GLOBS
Problem: BUGHLT code can lose valuable PI information if no SYSERR blocks are
available. BUGHLT code also loses state of PI system ON/OFF bit.
Diagnosis: BUGH0 (in APRSRV) does a PIOFF before saving any PI information.
This loses the state of the PI system ON/OFF bit.
Solution: When a BUGHLT occurs:
1) Save the current CONI PI, in PISV1.
2) Turn off the PI system (PIOFF).
3) Acquire the bug lock (AOSE BUGLCK).
4) Copy PISV1 to PISAV.
In case a recursive BUGHLT occurs, only PISV1 will be disturbed. PISAV will
contain the PI status from the original BUGHLT.
[End of TCO 6.1.1293]
TCO-number: 6.1.1294
Written-by: GROSSMAN Creation-date: 28-Mar-85 15:06:19
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LLMOP
Problem: LLMOP% JSYS does not accept a service password in the Reserve
Console (.RCRSV) function.
Diagnosis: Code never implemented.
Solution: Write the code.
[End of TCO 6.1.1294]
TCO-number: 6.1.1295
Written-by: LOMARTIRE Creation-date: 28-Mar-85 15:25:18
Edited-by: LOMARTIRE Edit-date: 28-Mar-85 15:27:42
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DIRECT
Related-QAR: 838071
Problem:
TCO 6.2005 was never installed in either 6.0 or 6.1. It attempts to solve
the problem of receiving "?No such directory name" during directory name field
recognition when a file of the same name exits.
Diagnosis:
Solution:
Install it.
[End of TCO 6.1.1295]
TCO-number: 6.1.1296
Written-by: MCCOLLUM Creation-date: 29-Mar-85 15:24:51
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GTJFN
Problem:
RELBAD BUGHLTs.
Diagnosis:
When attempting to perform recognition on a file name and the default device
name in the GTJFN block is DSK*:, routine DEFDEV will call STRDVD to translate
DSK*: to the public structure name. STRDVD alters the byte pointer in FILOPT
when it changes the free space block to accomdate longer strings. In this case,
however, other routines depend upon the old value in FILOPT and assume it
will not change over the call to DEFDEV.
Solution:
Since the new value in FILOPT is only in use during the call to DEFDEV, save
the value of FILOPT before calling DEFDEV. Restore FILOPT to its initial
state after the call to DEFDEV is completed.
[End of TCO 6.1.1296]
TCO-number: 6.1.1297
Written-by: TBOYLE Creation-date: 29-Mar-85 15:58:02
Edited-by: TBOYLE Edit-date: 1-Apr-85 16:06:05
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGEM
Problem: PTNON0 BUGHLTS. The application was 1022.
Diagnosis: PMAP to change another forks address space crashes
because the code in MSETPT does not remain NOSKED between
the removing of page-table entries and the adding of new ones.
In this case, the target fork intervened and faulted a private
page.
Solution: Be NOSKED during the release and set page-table entry
process. Swap pages in beforehand to prevent NOSKED page-faults.
This is also a significant improvement over going OKSKED in the middle
of critical code to prevent NOSKED page-faults!
We will include this change for 6.1 to solve 1022's invocation
of these PTNON0's. However, we must plan to revamp this code
during the next release because several of these routines have
races and also racy methods of preventing NOSKED page-faults.
[End of TCO 6.1.1297]
TCO-number: 6.1.1298
Written-by: GROSSMAN Creation-date: 30-Mar-85 09:33:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LLMOP
Problem: KPALVH's, ILMNRF's and many other BUG.'s sometime after receiving
an LLMOP Request Counters message.
Diagnosis: When LLMOP receives a Request Counters message, it generates an
internal request block and puts it on a request queue. It then asks NISRV
for the desired counters. When NISRV returns the counters sometime later,
LLMOP completes the request, and deallocates the request block.
Unfortunately, LLMOP never removed the request block from the request queue,
and it now has a stale pointer to some memory it used to own.
In any case, somebody else eventually acquires the deallocated memory, and
it's downhill from there. In one particular case, the memory was picked up
by SCLINK, and LLMOP's request queue ended up pointing at SCLINK's logical
link list. LLMOP then tried to find something on the request queue, and
got lost, resulting in a KPALVH.
Solution: Don't queue the request. There was never any reason to do so, as
LLMOP kept track of th request block by storing it's address in UNRID of
the NISRV arg block.
[End of TCO 6.1.1298]
TCO-number: 6.1.1299
Written-by: GROSSMAN Creation-date: 30-Mar-85 11:24:15
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LLMOP
Problem: Lost memory when using LLMOP functions .RCRSV, .RCREL, and .RCRBT.
Diagnosis: Each of these functions allocates an LLMOP request block, and
never returns it.
Solution: Set the 'abort' bit in the request block for each of these functions.
When the transmit complete interrupt happens, this will cause the block to
be deallocated.
[End of TCO 6.1.1299]
TCO-number: 6.1.1300
Written-by: PAETZOLD Creation-date: 31-Mar-85 13:07:39
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: tcptcp
Problem:
TCP hangs up. Lots of FLKTIMs. TCPHLK is locked.
Diagnosis:
Solution:
BUFHNT: needs to check for a null buffer.
[End of TCO 6.1.1300]
TCO-number: 6.1.1301
Written-by: PAETZOLD Creation-date: 31-Mar-85 13:16:34
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: tcpjfn
Problem:
;LOCAL-HOST and ;FOREIGN-HOST GTJFN attributes do not work.
Diagnosis:
Solution:
HSTHST is returning results in T1 and not T2 like it is supposed to.
[End of TCO 6.1.1301]
TCO-number: 6.1.1302
Written-by: PAETZOLD Creation-date: 31-Mar-85 13:27:24
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: IPNIDV
Problem:
ARPBFL BUGCHKs and eventual ARP shutdown.
Diagnosis:
IPDWNS BUGINFs cuased by among other things carrier failures on the NI.
Eventually we run out of ARP buffers and an IPABFL results.
Solution:
ARPCDS transfers to CDSERR on a send failure but forgets to release the
ARP buffer. Fix up ARPCDS to do the correct thing and make it display
the real error code instead of the NISRV dispatch address for ARP service.
[End of TCO 6.1.1302]
TCO-number: 6.1.1303
Written-by: PAETZOLD Creation-date: 2-Apr-85 08:17:31
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: ipipip
Problem:
resource problems in the tcp/ip code. specifically ipiblp and knifqe
buginfs.
Diagnosis:
At times of high load the internet fork is not running fast enough.
Solution:
Use jobbit to set the priority of the internet fork up. The code used to
be this way but was changed a while back as an expiriment. the expirament
failed.
[End of TCO 6.1.1303]
TCO-number: 6.1.1304
Written-by: WAGNER Creation-date: 2-Apr-85 10:55:22
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCHED
Related-SPR: 20483
Problem: SKDIDL overflows into SKDTM0 after 95 hours, 26 minutes, 37 seconds
of idle time.
Diagnosis: SKDIDL is kept in HP time units, only so many can fit. We convert
these to mS and put them in SKDTM0. But a conversion of overflowed
garbage is still garbage.
Solution: Check for impending overflow, correct down by a constant, remember
that constant. Convert to mS where it is subsequently used, add
back in that converted constant. Since SKDIDL is only used to be
converted to mS anyway, and only in one place (two if class scheduling)
it is more effecient this way (one compare each load average update)
than if we changed all the calculations to use double word arithmetic.
Besides, the compare only succeeds every 95 hours, etc.
[End of TCO 6.1.1304]
TCO-number: 6.1.1308
Written-by: PAETZOLD Creation-date: 5-Apr-85 15:04:37
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: ipipip
Problem:
ILULK2 BUGHLTs from RCVGAT in IPIPIP when a system has multiple network
interfaces and the system is a gateway and an interface is down and a
gateway client sends packets to be forwarded on the interface which is down.
Diagnosis:
The packet first comes into RCVGAT. RCVGAT checks for a full length
buffer and unlocks the buffer accordingly. Since the buffer was a
receive buffer received from a hardware interface it is full length.
The destination address is not for the local host so the packet is to
be forwarded. The packet is now given to another interface to forward.
Unfortunatly the target interface is down.
GWYLUK is called to find another interface for the packet. GWYLUK
returns an interface which is currently up. It is determined that the
local host is indeed the gateway and the packet is forwarded to SNDLCL.
SNDLCL locks down the packet again. But uses the actual length of the
data and not the length of the buffer. This is usually OK because
SNDLCL is not usually sending out recieve buffers.
RCVGAT gets the packet again and unlocks it. RCVGAT unlocks the whole
buffer but SNDLCL only locked the data portion. If the buffer crosses a
page boundary (probability .5) an ILULK2 will result.
Solution:
Change SNDLC4 to lock the whole buffer if the buffer is full size. This
appears to be a day one BBN problem.
[End of TCO 6.1.1308]
TCO-number: 6.1.1309
Written-by: GRANT Creation-date: 5-Apr-85 20:09:29
Edited-by: GRANT Edit-date: 6-Apr-85 12:30:59
Edit-checked: No Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: scampi
Problem: KPALVH, SCASCQ, and KLPNOM BUGHLTs. There were probably others
but we've lost track.
Diagnosis: A packet was appearing on 2 of the CI port's queues at the same
time, leading to massive confusion. The results were CI-related BUGHLTs
of various flavors.
Solution: When CIFORK was created SCAMPI was made to set a bit whenever it
needed buffers allocated or connect blocks reaped. Newly-added code caused
a buffer to get returned when it shouldn't have been. This was caused by
incorrectly skipping over a RET.
[End of TCO 6.1.1309]
TCO-number: 6.1.1310
Written-by: GRANT Creation-date: 8-Apr-85 07:42:47
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: phyklp
Problem: Can't test CI microcode's NO-ANSWER feature which is used by
diagnostics.
Diagnosis: No code.
Solution: Add a routine which does the SET-COUNTER function to set the
NO-ANSWER bit. This is never called by the standard system; it is only used
for debugging the micorcode.
[End of TCO 6.1.1310]
TCO-number: 6.1.1312
Written-by: PALMIERI Creation-date: 8-Apr-85 14:08:51
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: SCPAR SCJSYS
Problem: COMMMS BUGHLTs
Diagnosis:
SCJSYS is releasing SAB blocks twice. If a link blocks as a result of
a SOUT/SOUTR the SAB it was using is entered into the "active" slot of
the SAB indirect table. If the fork runs at a higher priority level
before the output completes the monitor will notice the incomplete
output and attempt to complete it. If it succeeds, the SAB will be
returned to the monitor free space pool. After the higher priority's
output completes the blocked lower priority will be wakened. It still
has a pointer to the now released SAB in its ACs and may attempt to
release it a second time, resulting in a COMMMS BUGHLT.
Solution:
Keep an indirect "active" slot for each PSI level (normal,1,2,3).
Only attempt to complete blocked output that is at the current PSI
level. Do not return a SAB to its "normal" (not active) slot if the
SAB indirect table pointer (PSBSAB) is zero.
[End of TCO 6.1.1312]
TCO-number: 6.1.1313
Written-by: PALMIERI Creation-date: 8-Apr-85 16:01:14
Edited-by: PALMIERI Edit-date: 8-Apr-85 16:05:50
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: SCLINK
Related-QAR: 838176
Problem: 36-bit byte mode does not send all bytes.
Diagnosis: SCLINK tries to determine if all bytes in users buffer will fit
into a message. AC P1 is used as a flag to indicate more to send.
If SCLINK thinks all bytes will fit into a segment but the copy
routine does not, SCLINK does not notice.
Solution: Update P1 after user's data is copied.
[End of TCO 6.1.1313]
TCO-number: 6.1.1314
Written-by: PALMIERI Creation-date: 9-Apr-85 11:16:45
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: SCJSYS
Problem: DECnet free space used up
Diagnosis: SCJSYS not releasing port indirect tables
Solution: Change SKIPN to SKIPE in RELSJB
[End of TCO 6.1.1314]
TCO-number: 6.1.1316
Written-by: MELOHN Creation-date: 9-Apr-85 15:16:07
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CTHSRV
Problem: When SET HOSTing from VMS to TOPS-20 several of the initial
characteristics sent out by TOPS-20 are ignored by VMS.
Diagnosis: VMS does not correctly handle an init and a
characteristics CTERM message in the same foundation common data message.
Solution: Provide characteristics in smaller spoonfulls for VMS by
putting each CTERM message in it's own common data foundation message
type.
[End of TCO 6.1.1316]
TCO-number: 6.1.1317
Written-by: MELOHN Creation-date: 9-Apr-85 15:44:25
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CTHSRV
Problem: System crashes when user CTERMs in and HOSTs out.
Diagnosis: CTHOOE, the TDCALL which determines out-of-band echoing
for CTERM terminals, is in swappable code. It is called at scheduler
level when the .MOSNH MTOPR% is being used. *BAM*
Solution: Put CTHOOE is RESCD.
[End of TCO 6.1.1317]
TCO-number: 6.1.1318
Written-by: TBOYLE Creation-date: 11-Apr-85 15:56:19
Edited-by: TBOYLE Edit-date: 11-Apr-85 15:59:07
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FORK
Problem: SPLFK% can cause a job to run out of fork handles if
another fork obtains handles on either of the forks used in the
call that exercises the new SPLFK% with suicide option.
This can occur if you ^C out of LINK when it has two forks and do
an INFORMATION FORK, and it also happens by just using LINK with
Rutgers "WATCH" program running with the "program-watch" option.
Diagnosis: Part of the code must exchange the Job-wide data on
the two forks so that one becomes the other. The Job-fork-handle
share counts should not, however, be exchanged. This is because
other forks have relative fork handles that will point to the
new forks and they must remain the same so that the Job-fork-handles
can be properly released.
Solution: Add code to insure that the share counts (FKHCNT) remain
the same after the splice with suicide option.
[End of TCO 6.1.1318]
TCO-number: 6.1.1319
Written-by: PAETZOLD Creation-date: 12-Apr-85 11:28:53
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: IPCIDV
Problem:
TCP/IP on the CI does not initialize.
Diagnosis:
Job zero startup has changed around and internet is usually not initialized
when IPCIDV tries to initialize. This is possible and IPCIDV handles the
situation. However it handles it wrong and marks the interface desired
state to be down.
Solution:
Change a jrst into a ret at CIPRST+1.
[End of TCO 6.1.1319]
TCO-number: 6.1.1320
Written-by: LEACHE Creation-date: 14-Apr-85 14:39:22
Edited-by: LEACHE Edit-date: 14-Apr-85 17:00:07
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: BOOT
Related-SPR: 19606
Problem: BOOT does not always reload all DX20's on the system. Also,
BOOT will often unnecessarily reload some DX20's more than once.
Diagnosis: Lost in the dawn of history is the reason why BOOT specifically
avoids reloading tape DX20's. The unnecessary reloadings are an artifact
of the design of pre-DX20 BOOT.
Solution: Make each invocation of BOOT (whether manual or auto-reload)
load each DX20 on the system exactly once.
[End of TCO 6.1.1320]
TCO-number: 6.1.1321
Written-by: LEACHE Creation-date: 14-Apr-85 14:47:11
Edited-by: LEACHE Edit-date: 14-Apr-85 17:12:37
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: BOOT CHECKD DSKALC PROLOG
Related-SPR: 19084
Problem: BOOT will abort an auto-reload if it gets a dump error.
Diagnosis: This change was made in V5 so that if the dump was important
the dump could be attempted again. This is acceptable behaviour on a
development system, but not on a production machine where the most important
thing is to get the system back up.
Solution: Create a home-block cell for storing BOOT parameters and modify
CHECKD to read and write these parameters. Define a parameter that, when
set, will cause BOOT to halt when dump errors are encountered during an
auto-reload. The default behaviour has been changed to proceed on dump
errors.
The CHECKD command ENABLE BOOT-PARAMETERS will set and clear the parameters.
SHOW BOOT-PARAMETERS will display the settings.
The first bit in the parameter-word controls whether BOOT will read the
remaining flags or not. If the first bit is set, then BOOT will read the
remaining flags (only 1 of which is defined: halt-on-dump-errors). When
BOOT encounters an enabled parameter word it will change its prompt to
[*BOOT ...] to indicate on the console that it is reading parameters that
may change its behaviour.
[End of TCO 6.1.1321]
TCO-number: 6.1.1323
Written-by: GLINDELL Creation-date: 16-Apr-85 21:03:45
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCLINK STG
Related-QAR: 838204
Problem:
Remember the TCO last week about the number of nodes growing
on the Enet? Well, you can forget it now. Instead of having
a static maximum size of the database, make it dynamic.
Diagnosis:
Solution:
Instead of allocating a chunk of memory at initialization time,
get a page at a time when needed instead. Use ASGVAS.
[End of TCO 6.1.1323]
TCO-number: 6.1.1324
Written-by: GROSSMAN Creation-date: 16-Apr-85 23:10:33
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LLMOP
Problem: Remote console programs, and NCP's TRIGGER NODE command eventually stop
working.
Diagnosis: LLMOP loses track of the number of receive buffers it has posted for
the Remote Console protocol type. It gets into a state where it beleives that
it has two receive buffers posted, when in reality, it has none posted.
This was caused by the use of a bizarre mutation of the INCR/DECR macros.
Solution: Change all occurances of the INCRF and DECRF macros into INCR and DECR
macros as appropriate. Delete definitions of INCRF and DECRF to prevent
future abuse.
[End of TCO 6.1.1324]
TCO-number: 6.1.1325
Written-by: GRANT Creation-date: 17-Apr-85 13:01:01
Edited-by: GRANT Edit-date: 17-Apr-85 13:09:20
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYSIO
Problem: If a cluster system gets hung during startup, the console message on
the other systems is not informative, namely, " %Problem Drive Dual
Ported to Unknown System ".
Diagnosis: It doesn't tell you what the problem is.
Solution: Change the message to, " %Drive forced offline because a running
system hasn't joined the cluster ".
[End of TCO 6.1.1325]
TCO-number: 6.1.1326
Written-by: PALMIERI Creation-date: 17-Apr-85 13:24:40
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: SCJSYS
Related-QAR: 838190
Problem: No interrupt if connect is rejected.
Diagnosis: If connect initiate is rejected interrupt is given on the data
channel rather than the connect channel.
Solution: Give interrupt on the connect channel.
[End of TCO 6.1.1326]
TCO-number: 6.1.1327
Written-by: LOMARTIRE Creation-date: 17-Apr-85 13:56:11
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CFSSRV
Problem:
PITRAP BUGHLTs.
Diagnosis:
The routine CFCARV is not correctly calculating which page to lock down.
If the old page has been completely used and a new page is being acquired, then
page N+2, not N+1 is locked down. Thus, the next page which will be used is now
unlocked.
Solution:
First, ask Tom Moser for help. Then, change a ADDI T1,PGSIZ to ADD
T1,CFNXSZ.
[End of TCO 6.1.1327]
TCO-number: 6.1.1328
Written-by: LOMARTIRE Creation-date: 17-Apr-85 14:27:20
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGUTL STG GLOBS
Related-TCO: 6.1.1282
Problem:
TCO 6.1.1282 is no longer needed. The PITRAP problem has been found.
Diagnosis:
Solution:
Remove it.
[End of TCO 6.1.1328]
TCO-number: 6.1.1329
Written-by: GLINDELL Creation-date: 17-Apr-85 15:56:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCLINK
Problem:
It is possible to redefine the executor name after DECnet has
initialized. This does not cause any problems except perhaps
user confusion. However, the philosophy and documentation says
that it is not allowed to change either the executor name or
address.
Diagnosis:
The 'add node' routine SCTAND in SCLINK checked if the user
was changing the executor address, but not if the name was
changed.
Solution:
When not finding the node name to add in the database, see if
the node address is in the home area. If so, clear out any
ADRTAB entry that there may be.
[End of TCO 6.1.1329]
TCO-number: 6.1.1330
Written-by: GRANT Creation-date: 17-Apr-85 17:23:08
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: phymsc
Problem: MSCP tries to connect to TOPS-10 systems.
Diagnosis: MSCP tries to connect to HSC50s and KL10s.
Solution: Change MSCP to attempt connections based on software type, not
hardware type. Try to connect to systems who run either "HSC" or
"T-20", as indicated in their start packets.
[End of TCO 6.1.1330]
TCO-number: 6.1.1331
Written-by: GLINDELL Creation-date: 18-Apr-85 11:56:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: D36COM
Problem:
ILMNRF's when generating a 3.0 event caused by a bad incoming
'disconnect initiate' DECnet message.
Diagnosis:
DNCM2B is called to copy the disconnect data from the message block
into a disconnect block. Doing so, it resets the index pointer in the
MSD from T6 to T3 and leaves it that way. The poor event code later
tries to read the message again to create the event, and uses the T3
that was left in the MSD as index pointer instead of the correct T6.
Solution:
Restore the index AC used before saving the update byte pointer in
DNCM2B.
[End of TCO 6.1.1331]
TCO-number: 6.1.1332
Written-by: GLINDELL Creation-date: 19-Apr-85 14:26:49
Edited-by: GLINDELL Edit-date: 19-Apr-85 15:24:30
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CIDLL
Problem:
ZERO LINE CI-0-0 COUNTERS produced amusing results.
Diagnosis:
DECnet does not maintain any CI line counters. CIDLL therefore
returned error code NF.FNS which means "function not supported"
when asked to zero these counters. This caused great consternation
in NMLT20.
Solution:
Make CIDLL return "NF.NDP" which means 'function succeeded but
no data present to return' when asked to do something to the
CI line counters. This will be more to NMLT20's taste.
Also fix NMLT20 to handle an error return from the ZERO COUNTER
call.
[End of TCO 6.1.1332]
TCO-number: 6.1.1333
Written-by: PAETZOLD Creation-date: 22-Apr-85 09:44:58
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: IPIPIP
Problem:
asniq jsys checks for sc%nwz put does not check for sc%whl.
Diagnosis:
Solution:
make wheel and operator good enough as well as sc%nwz.
[End of TCO 6.1.1333]
TCO-number: 6.1.1334
Written-by: LOMARTIRE Creation-date: 23-Apr-85 08:34:04
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYMVR
Problem:
MSSCFS BUGCHKs.
Diagnosis:
During startup, the system is careful to insure that the system has joined
the cluster before allowing it to serve any of it disks with the MSCP server.
However, this is not the case if we are already up, connections are broken, and
then reestablished. In this case, we have already allowed access to the disks.
If the MSCP connection is established before the CFS one, then the server could
be handed a queued up IORB from the other system before CFS has started. This
should not cause any problems since CFS has already arbitrated the I/O request,
but the BUGCHK is not very reassuring.
Solution:
First change MSSCFS into a BUGHLT to try to trap any real problems. Next,
reject any connect requests to the server if we have not or are not in the
process of establishing a CFS connection. The MSCP driver will repeat the
attempt later so, eventually, we should get a connection established.
[End of TCO 6.1.1334]
TCO-number: 6.1.1335
Written-by: GRANT Creation-date: 23-Apr-85 08:46:53
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYSIO
Related-QAR: 838025
Problem: If you have an RP06 with both its port to the same KL, you can't
switch from Port A to Port B while working is going on.
Diagnosis: %Problem on device ....... messages result. TOPS-20 won't use
the newly selected port.
Solution: The code which searches for a second path to the same disk wrongly
establishes the primary and secondary paths. In some cases it was
setting the primary path to be the side which is now offline. Fix
the code so it always has an active port as the primary path.
[End of TCO 6.1.1335]
TCO-number: 6.1.1336
Written-by: GROSSMAN Creation-date: 23-Apr-85 17:01:05
Edited-by: GROSSMAN Edit-date: 23-Apr-85 17:03:47
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MONSYM NIUSR
Problem: No way for user's to find out the Ethernet address of the local
node via the NI% JSYS.
Diagnosis: OOPS.
Solution: Return the Physical address (the current address), and the
Hardware address in the .EIRCI (Read Channel Info) function.
[End of TCO 6.1.1336]
TCO-number: 6.1.1337
Written-by: GROSSMAN Creation-date: 23-Apr-85 17:27:21
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NIUSR NISRV
Problem: SMON% code for setting Ethernet address in wrong module.
Diagnosis: The code belongs in NIUSR, but NIUSR didn't exist when the code
was written.
Solution: Move the code from NISRV to NIUSR.
[End of TCO 6.1.1337]
TCO-number: 6.1.1339
Written-by: PALMIERI Creation-date: 24-Apr-85 10:09:43
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: STG FREE
Related-QAR: 838215
Problem: DECnet swappable free space is available but never used in 6.1.
Diagnosis: Pre 6.1 needs it but it is always configured regardless of setting
of FTNSPSRV.
Solution: Add FTNSPSRV around parameter that makes it size non-zero.
[End of TCO 6.1.1339]
TCO-number: 6.1.1341
Written-by: GROSSMAN Creation-date: 24-Apr-85 13:03:26
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NIUSR
Problem: Ill mon refs if bad byte pointers given to NI% JSYS.
Diagnosis: No ERJMPx after some PXCT's.
Solution: Put ERJMPs after all PXCTs that can occur while NOINT.
[End of TCO 6.1.1341]
TCO-number: 6.1.1342
Written-by: GROSSMAN Creation-date: 24-Apr-85 13:40:04
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NISRV NIUSR NIPAR
Problem: When the NI% JSYS returns monitor portal ID's to the user, these
ID's are just NISRV's portal block addresses. There are a number of
problems with this:
1) User portal ID's are only a halfword, but monitor portal ID's
are fullwords.
2) If the user passes a portal ID to the NI% JSYS, it shouldn't
be an address, because he might pass a bad address.
3) Monitor addresses are ugly looking ID's to pass back to a user.
Solution: Have NISRV generate a unique 'external' ID for all portals that
are opened. Now, each portal will have a unique 9 bit code associated
with it. This code can then be returned to the user as the portal ID.
Also, create a new NISRV function which will translate 'external' portal
ID's to real portal ID's.
The 'external' portal ID will now be returned by the NU.RPI (read portal
info) function of NISRV.
[End of TCO 6.1.1342]
TCO-number: 6.1.1343
Written-by: PALMIERI Creation-date: 25-Apr-85 16:11:15
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: JNTMAN
Problem: SYSDPY gets wrong information as to whether task is active (DCN) or
passive (SRV).
Diagnosis: JNTMAN doesn't understand the new way to access port blocks.
Solution: Teach JNTMAN about the port indirect table.
[End of TCO 6.1.1343]
TCO-number: 6.1.1346
Written-by: GROSSMAN Creation-date: 29-Apr-85 14:28:36
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGEM
Problem: IDFOD2 BUGCHK'S, PAGLCK BUGHLT'S sometime later on. Seem to be
related to doing word searches in MDDT.
Diagnosis: MDDT gets bad information from the MRPAC% JSYS, and touches
CXBPGA. This causes the page to be created with the section 6 map as
it's owner, instead of the currently running process. When the process
gets destroyed, it attempts to deassign the page, and a PAGLCK results.
Solution: Add a check to the MRPAC% JSYS to explicitly check for section
XCDSEC and the special pages, and treat them as process private pages.
[End of TCO 6.1.1346]
TCO-number: 6.1.1347
Written-by: GROSSMAN Creation-date: 29-Apr-85 14:36:41
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGEM
Problem: MDDT hangs when doing word searches in section 6 (XCDSEC).
Diagnosis: MDDT touches a page that is not mapped into section 6, but is
mapped into section 0/1. The test for page ownership in FPTA incorrectly
claimed that the page was owned by section 0/1, and paged in the wrong
page. PAGEM eventually restarts the faulting instruction, and the faults
ocurrs again and again ad nauseum.
Solution: Fix the page ownership test for XCDSEC pages (FTPAXC) to claim
that all pages between NRCOD and NRCODZ (within section 6) are owned by
the XCDSEC map. All other pages in section 6 are owned by the section 0/1
map.
[End of TCO 6.1.1347]
TCO-number: 6.1.1348
Written-by: GROSSMAN Creation-date: 29-Apr-85 14:54:05
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LLMOP
Problem: LLMOP% remote console loses buffers, eventually, it stops working
altogether.
Diagnosis: When LLMOP receives a message with an error from NISRV, it neglects
to update it's outstanding receive buffer count (SVBPC). Eventually when it
calls PSTBUF to setup more receive buffers, PSTBUF doesn't queue up any
because SVBPC is too high. Eventually, all receive buffers are lost and
no more messages come in for the remote console protocol type.
Solution: Re-arrange the routine LLMRCX (the common receive processing
routine for all LLMP protocol types). Ensure that SVBPC gets decremented
for all buffers received back from NISRV. Also, keep seperate counts for
the receipt of unsupported messages, and bad messages.
[End of TCO 6.1.1348]
TCO-number: 6.1.1350
Written-by: LOMARTIRE Creation-date: 30-Apr-85 15:52:12
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC
Related-QAR: 838042
Problem:
When a system in a CFS cluster automatically reloads, the cluster-wide
time may be set backwards by some amount as dictated by the reloading machine.
Diagnosis:
The startup code in MEXEC sets the system time before joining the cluster
by obtaining the time from the front end. This time may actually lag the
cluster-wide time by a large amount. Depending on the seqequence of CFS
connections at startup, this bogus time may propogate to other machines.
Solution:
Do not set the system time until after we have joined the cluster. Prefer
the cluster time to that supplied by other sources. The ordering will be as
follows:
1. the cluster time
2. the front-end
3. the person at the console
[End of TCO 6.1.1350]
TCO-number: 6.1.1351
Written-by: GLINDELL Creation-date: 30-Apr-85 16:11:31
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NTMAN
Related-QAR: 838253
Problem:
SHOW LOOP NODES returns a node number for the loop node names.
Loop node names do not have node numbers. This causes a funny-looking
display in NCP.
Diagnosis:
Loopback code in NTMAN calls NMXS2A (convert sixbit to ascii) and
thinks NMXS2A returns +1. It doesn't, it returns +2 always.
Solution:
Change NMXS2A to always return +1. This will fix another case
in NTMAN when a routine didn't expect NMXS2A to return +2. That
case is more interesting - although I haven't seen any problems
with it - every time NTMAN was asked to convert a node number to
a name (often) it would fall through to the code on the next page.
It didn't seem to do any harm though.
[End of TCO 6.1.1351]
TCO-number: 6.1.1354
Written-by: GROSSMAN Creation-date: 2-May-85 09:53:16
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC
Problem: Programs doing CRJOB JSYS can hang permanently.
Diagnosis: The job that is being created by the CRJOB JSYS (hereafter
called the 'object' job) can be logged out before being completely created.
If this happens, the object job never gets to setup CRJANS (the CRJOB answer
cell), and the job that did the CRJOB JSYS hangs forever waiting for CRJANS
to become non-zero.
Although, there is a relatively small window between the time of the job's
creation, and it's setup of CRJANS, this window can be lengthened quite
a lot by an ACJ that takes a long time when monitoring LOGINs.
Solution: Have the logout code at FLOGO check to see if this job is the
object of a CRJOB. If it is, ensure that CRJANS gets set to -1 to wake up
the program doing the CRJOB.
[End of TCO 6.1.1354]
TCO-number: 6.1.1355
Written-by: GROSSMAN Creation-date: 2-May-85 17:12:46
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI
Problem: KNIIPT BUGHLTs.
Diagnosis: When closing an NISRV portal, a resource failure (from ASGRES)
can occur. This error gets passed back up to the user, who may then
optionally try closing the portal again. Unfortunately, PHYKNI was not
cleaning up properly after the failure. When the user re-tried the close
(somewhat later), a sanity check failed, and a KNIIPT BUGHLT resulted.
Solution: Re-do some code so that the resource failure cannot occur in the
routine NIDPT.
[End of TCO 6.1.1355]
TCO-number: 6.1.1359
Written-by: WAGNER Creation-date: 6-May-85 11:22:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGUTL
Problem: PMCTL% returns "page available" status, even if page number requested
is greater than physical memory.
Diagnosis: Code was dreaming, comparing against maximum possible, not against
what we can actually afford.
Solution: Make check against reality, in this case NHIPG instead of MAXCOR.
[End of TCO 6.1.1359]
TCO-number: 6.1.1360
Written-by: TBOYLE Creation-date: 6-May-85 14:30:21
Edited-by: TBOYLE Edit-date: 6-May-85 14:49:02
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DSKALC
Problem: Unecessary pages marked in the batblocks. Extra entries
present that could have been a single entry.
Diagnosis: Disks that have sectors/page equal to one (HSC, RP20)
allocate BAT entries as if there were 4 sectors per page. Also, a
miscalculation of sector rounding causes a bad page to be entered as
a separate entry even if an entry for the next page exists in the
batblocks. It should alter the existing entry to include the new bad page.
Solution: Use SECPAG based on the UDB to properly add pages to
existing and new pages to the batblocks around the code in DOPAIR.
[End of TCO 6.1.1360]
TCO-number: 6.1.1363
Written-by: GROSSMAN Creation-date: 6-May-85 15:43:08
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: TTYSRV
Problem: ILMNRFs, potential core trashing when doing sendalls.
Diagnosis: The routine ASGSHT was smashing T2 on error returns. The caller
expected T2 to contain the line number upon return from ASGSHT. It then used
the bad T2 to zero an entry in TTACTL. In this case, we were very lucky,
because we tried to zero write protected memory.
Solution: Make ASGSHT preserve T2 even for error returns.
[End of TCO 6.1.1363]
TCO-number: 6.1.1364
Written-by: GROSSMAN Creation-date: 6-May-85 15:53:54
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI
Problem: Systems without KLNI's get messages at startup indicating that the
KLNI is being reloaded. In addition, the reload fails, and a KNIRTO BUGCHK
results.
Diagnosis: When I moved the KLNI initialization from PHYH2 to PHYSIO, I
lost the check for KLNIness. So, no matter what happens to be in RH slot
5, NISRV will pounce on it and do lots of nasty things. This is generally
not a problem if there is nothing in RH slot 5. However, if there is an
RH20 in slot 5, chaos would result.
Solution: Put the check for KLNIness back into NIINI.
[End of TCO 6.1.1364]
TCO-number: 6.1.1365
Written-by: GLINDELL Creation-date: 6-May-85 16:35:32
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCLINK
Problem:
COM* bug* when doing SHOW KNOWN NODES in NCP.
Diagnosis:
The buffer to return 'known nodes' was allocated based on a symbol
that is not guaranteed to be kept up to date. When the latest round
of node name tables came around, node numbers that were higher in value
than the symbol used occurred.
Solution:
Don't try to be smart - always set up a buffer big enough for the highest
possible node number (1023). The memory allocated is only 1 bit per node.
[End of TCO 6.1.1365]
TCO-number: 6.1.1366
Written-by: TBOYLE Creation-date: 6-May-85 17:03:51
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYPAR PHYP2
Problem: OVRDTA, DXBEWC, Hung Jobs and of course, headaches.
Diagnosis: There is a class of servo errors that seem to wedge
the DX20. On an unrecoverable error (seemingly a servo track
error.) The DX20 remains locked to one drive. A hang occurs
when an attempt is made to transfer to a different drive.
If another transfer to the same drive occurs successfully,
the error is reset. This will usually happen right away because
the monitor will write the errant page to the batblocks so
as to never use the bad page again and the error will
usually be reset. There is a problem window however.
Solution: Include a new bit in the UDB to watch for an overdue
transfer that occurs twice. When it happens restart the microcode,
this will reset the world. This is ok, because PHYSIO will retry
pending transfers. It will also recover from this error so that
processing may proceed.
In the event that there are other forms of errors that hang the DX20,
this TCO will guard against them from killing the system. Since they
all seem to happen with unrecoverable errors, we can be content
because they will be marked bad and not used again.
[End of TCO 6.1.1366]
TCO-number: 6.1.1369
Written-by: TBOYLE Creation-date: 7-May-85 16:32:44
Edited-by: TBOYLE Edit-date: 7-May-85 16:37:38
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYP2
Problem: Bad pages used over and over again on RP20's.
Diagnosis: The error handler is not catching all possible media
and HDA weakened errors. The error handler is mostly biased toward
considering errors that are not clean data errors as device errors.
There are, however, numerous nasty errors such as weak or defects on
servo track, bad formatting, etc. There are also media errors
where the information returned is incomplete because these errors
cause the microcode to have head pains. Since all these errors are
flagged as device errors, they never make it into the BAT blocks.
Solution: Bias the error handler toward media errors. Pick out the
controller, DCU errors, parity errors, etc. and flag them as retriable
device errors. Treat all others as data errors. If they do not succeed
after the many retries, this will insure that they are put into the
BAT blocks.
[End of TCO 6.1.1369]
TCO-number: 6.1.1370
Written-by: MELOHN Creation-date: 7-May-85 17:47:30
Edited-by: MELOHN Edit-date: 14-May-85 16:55:51
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CTHSRV CTERMD
Related-TCO: 6.1.1390
Problem: SKDPF1/SKDCL1 in LOKWAI+a few.
Diagnosis: Problems with the CTERM interlock scheme between the
CTERM fork, which runs at process level, and TTYSRV TDCalls which want
to mung around in the CDB.
If someone other than the CTERM fork wants the CTERM lock, their
TDB is unlocked and they wait in scheduler test LOKWAI for the CTERM
lock. With the TDB unlocked, Bad Things can happen to the TDB, and if
TDB goes away, LOKWAI will blow up. If the DECnet link goes away first,
LOKWAI returns to LOKCDB, which returns to its caller with the TDB
still unlocked. When the caller tries to unlock the unlocked TDB,
ULKBADs can happen. It is really a bad idea for CTERM to unlock
something it didn't lock in the first place.
Solution: Do NOT try to unlock the TDB while waiting around for the
CDB lock.
[End of TCO 6.1.1370]
TCO-number: 6.1.1371
Written-by: MELOHN Creation-date: 7-May-85 17:55:24
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CTHSRV CTERMD
Problem: CTERM gets wedged such that it will not accept any new
connections and its current connections are hung.
Diagnosis: Another problem with the CTERM interlock scheme between the
CTERM fork, which runs at process level, and TTYSRV TDCalls which want
to mung around in the CDB.
Many error conditions (DECnet link gone, CTERM protocol error, etc)
decide to "blow the link away". As part of this process, the CDB is
deallocated. If however the CTERM line requires service, the CTERM
fork (which is always given the CTERM lock) will hang trying to mung
the deallocated CDB.
Solution: Add a new state for the CDB - .STDEL. Instead of "blowing
the link away", put the CDB in this state and tell the CTERM fork to
dispose of it as part of its service routine.
[End of TCO 6.1.1371]
TCO-number: 6.1.1372
Written-by: GROSSMAN Creation-date: 9-May-85 09:30:03
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI
Problem: ILMNRF's, ILLUUO's, and various other assorted problems.
Diagnosis: A bad KLNI microcode resulted in the generation of an error
response which was unknown. The error code was used to index into a
dispatch table. Unfortunately, the entry for the error code had no IFIW,
and subsequently caused PHYKNI (in section 6) to jump somewhere into
section 0, resulting in total chaos.
Solution: Add IFIW's to all the dispatch words in the error dispatch table.
Now, if we get an unknown error code a KNIIEC BUGHLT will result.
[End of TCO 6.1.1372]
TCO-number: 6.1.1373
Written-by: LOMARTIRE Creation-date: 9-May-85 15:28:59
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DSKALC MSTR
Problem:
If a structure which was previously mounted is re-created, MOUNTR will
refuse to mount it due to an ambiguous ID. The ID for the disk is found in
UDBMID and is returned to MOUNTR via an MSTR%.
Diagnosis:
Whenever a homeblock is created, a new media ID is written on it. This ID
is not placed in the UDB (or SDB) and this causes confusion.
Solution:
Make CRTHOM and MAKHOM smarter and have them update UDBMID and SDBPUC when
a new media ID is created and written on the homeblocks.
[End of TCO 6.1.1373]
TCO-number: 6.1.1374
Written-by: LOMARTIRE Creation-date: 9-May-85 15:33:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CFSSRV
Related-QAR: 838252
Problem:
CFRECN BUGHLTs as a result of a large configuration.
Diagnosis:
Currently, the delay time set by CFS is not large enough to handle a large
configuration. These sites will experience a CFRECN whenever their port dies
due to the amount of time it takes to reestablish all the required connections.
Solution:
Make the delay a function of the configuration size. Wait 10 seconds to
reload the CI microcode (very generous) and 5 seconds per node on the CI.
[End of TCO 6.1.1374]
TCO-number: 6.1.1375
Written-by: LOMARTIRE Creation-date: 9-May-85 16:33:26
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CFSSRV
Problem:
When two systems are coming up simultaneously, they will always have
different times.
Diagnosis:
When both systems are waiting in "Enter date and time:", the system with the
larger serial number is supposed to broadcast it's time to the systems on the
CI with a lower serial number. However, the routine is missing an index
register in a key place and so we never broadcast to any one.
Solution:
Add the index register in routine BRDTIM. Note that a bug still exists if
you proceed the higher numbered system first. In this case, the lower one will
proceed second and it will not broadcast to the higher. Once again a time
mismatch. However, the solution to this one is left until later.
[End of TCO 6.1.1375]
TCO-number: 6.1.1376
Written-by: GROSSMAN Creation-date: 9-May-85 16:58:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG PHYKNI GLOBS
Problem: Undefined global symbols when KNIN=0. The symbols DLLUNI, LLMRSJ,
LLMRSF, NIJKFK, and LLMINI are all undefined at LINK time when STG is
compiled with KNIN=0.
Diagnosis: Oops. Several misspellings. Now, DLLUNI will return an error
code of UNIFC% (invalid function code) if NISRV is not loaded.
[End of TCO 6.1.1376]
TCO-number: 6.1.1377
Written-by: GRANT Creation-date: 12-May-85 08:34:47
Edited-by: GRANT Edit-date: 12-May-85 08:41:02
Edit-checked: No Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MEXEC phyklp
Related-QAR: 838252
Problem: KLPNRL BUGHLTs
Diagnosis: Any number of deadly embraces can occur when there is heavy CI
activity and TOPS-20 tries to reload the CI microcode. These are all
related to the monitor's having to read in IPALOD and run it.
Solution: At system startup, read the CI microcode into resident memory.
Then, whenever the monitor needs to reload the CI, it can simply do the
DATAOs itself.
[End of TCO 6.1.1377]
TCO-number: 6.1.1378
Written-by: GRANT Creation-date: 12-May-85 08:52:10
Edited-by: GRANT Edit-date: 12-May-85 08:53:38
Edit-checked: No Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: phyklp
Problem: None observed, but code is wrong.
Diagnosis: At system startup, code currently does a CONO KLP,400000 and then
does a CONI to verify that the thing in RH slot 7 is really a KLIPA.
If an RH20 were in slot 7, it just got its PIA zapped.
Solution: Verify KLIPAness before shooting it.
[End of TCO 6.1.1378]
TCO-number: 6.1.1379
Written-by: GRANT Creation-date: 12-May-85 09:01:36
Edited-by: GRANT Edit-date: 12-May-85 09:05:46
Edit-checked: No Document: No TCO-tested: Yes
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: phyklp
Problem: Unnecessary BUGCHKs, namely, KLPNMG and KLPNDG.
Diagnosis: PHYKLP outputs a BUGCHK if it is called to remove a buffer from an
empty free queue. This is unnessary since it gets a buffer from the SCA
pool and returns that to the caller; if this should fail, the caller
will do whatever it feels necessary, including BUGxxxing.
Solution: Put KLPNMG and KLPNDG under CIBUGX control.
[End of TCO 6.1.1379]
TCO-number: 6.1.1380
Written-by: PAETZOLD Creation-date: 12-May-85 13:54:19
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: mnetdv STG
Problem:
Normal NIC supplied host table no longer fits.
Diagnosis:
This time the file has grown to a point larger than the code driving the
existing data structures are capable of handling. Most hosts now have
multiple addresses.
Solution:
Make HOSTN twice as large as NHOSTS. Fix code in MNETDV that assumes
all table are of the same length. Also make the initialization code
that used to be a loop using setzm's use blt's.
[End of TCO 6.1.1380]
TCO-number: 6.1.1381
Written-by: GLINDELL Creation-date: 13-May-85 11:07:00
Edited-by: GLINDELL Edit-date: 13-May-85 11:20:08
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCJSYS JSYSF
Related-QAR: 838154
Problem:
SWJFN% jsys does not work properly with DECnet JFN's.
Programs that do SWJFN% involving DECnet JFN's get spurious IO data
errors (IOX5). Job which run such programs will eventuallu get DCNX5
(No more logical links available) errors upon trying to create such
links.
Diagnosis:
Not all cells of the JFN blocks are swapped. One of the cells it does
not swap is the cell that contains the index of the DECnet-36 channel
represented by a DECnet JFN.
Solution:
Have SWJFN% swap all cells of a JFN block, removing the list of cells
to be swapped in favour of a simple loop swapping all cells. Also call
a routine in SCJSYS to reevaluate the channel numbers for the swapped
JFN's.
Thank you Rob Gingell for analysis and suggested fix.
BTW, SWJFN% can never have worked for any devices that use the new IO
byte pointer stuff (Arpanet?). This will hopefully fix that as well.
[End of TCO 6.1.1381]
TCO-number: 6.1.1382
Written-by: PALMIERI Creation-date: 13-May-85 11:57:51
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: SCJSYS
Related-QAR: 838222
Problem: Can't get JFN on "SRV:" or "SRV:objectid.taskname".
Diagnosis: Code doesn't parse network JFNs correctly. Also doesn't
allow non-privileged user to open generic TASK.
Solution: Rewrite the SRV parsing code.
[End of TCO 6.1.1382]
TCO-number: 6.1.1384
Written-by: GLINDELL Creation-date: 13-May-85 17:23:42
Edited-by: GLINDELL Edit-date: 13-May-85 17:30:11
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LLINKS
Problem:
No problem, except that we could be a little nicer and not exercise
a RSX DECnet bug.
Diagnosis:
Currently we send a 'flow off' when our local buffer resources are
completely depleted. It is likely that we will drop incoming messages
that were already under way on the floor. This is ok, the other end
should retransmit these. However, unless we are congested we may as
well keep an extra few messages around. The "goal" concept in LLINKS
was supposed to address this, but it was never fully implemented.
Solution:
Implement a static goal, defined by the value of NSGOAL (currently 8).
Unless the system is congested, the following will now happen if messages
come in faster than we can process them:
1) Green zone: we accept messages up to the user's quota and put them on
the user's queues.
2) If more messages come in than fits in the user's quota, then we send
a 'flow off' and enter yellow zone. We continue to accept messages
up till NSGOAL.
3) If we run out of NSGOAL, we enter red zone and drop all incoming messages.
If the system is already globally congested, then we skip yellow zone and
go directl to red zone.
Also, remove all the old 'goal' stuff and put it under feature switch
FTGOL in LLINKS. Leave the data fields in the ELB and SLB in case we want
to change again, but these fields should probably be removed before SDC.
[End of TCO 6.1.1384]
TCO-number: 6.1.1385
Written-by: GLINDELL Creation-date: 13-May-85 17:36:57
Edited-by: Edit-date: 13-May-85 17:37:30
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LLINKS
Problem:
None observed, but we could send ACK's faster.
Diagnosis:
LLINKS used the concept of "buffer-rich" sublinks in order to delay
sending ACK's till clock level, hoping that more messages would come
in and therefore making it possible to ack more than one message.
However, in NSP 4.1, the ACK DELAY concept takes over this function
and removes the need for "buffer-rich" sublinks. Indeed, when we do
get a message that actually requests an ACK, then we should ack immediately
and not defer it.
Solution:
Remove "buffer-rich" code and put it under feature switch FTBFR.
Leave the data structures in in case we want to put it back. However,
the BFR subfield of the ES structure should probably be removed before
SDC ship.
[End of TCO 6.1.1385]
TCO-number: 6.1.1387
Written-by: NICHOLS Creation-date: 14-May-85 11:37:34
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LLINKS
Problem: When LLINKS receives a DECnet Link Service message from
a remote system using No Flow Control, it rejects the message if the
FCVAL field is non-zero. The Phase IV+ architecture may make use
of non-zero values here.
Diagnosis: LLINKS is older than Phase IV+
Solution: Remove the check.
[End of TCO 6.1.1387]
TCO-number: 6.1.1390
Written-by: MELOHN Creation-date: 14-May-85 16:55:51
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CTHSRV
Related-TCO: 6.1.1370
Problem: ULKBADs whenever logging in on CTERM TTYs since TCO 6.1.1370 added.
Diagnosis: I removed the global case of locking and unlocking the
TDB, however a few cases remained which fool around with TLOCKing.
None of these seems to be necessary since CTERM, LAT, and NRT TDBs are
marked as non-deletable and so the TLOCK count is meaningless.
Solution: Remove the rest of the TLOCK/ULKTTYs from CTHSRV.
[End of TCO 6.1.1390]
TCO-number: 6.1.1391
Written-by: PALMIERI Creation-date: 15-May-85 22:07:12
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: ROUTER
Problem: NMLT20 can hang in SHOW CIRCUIT NI-0-0 COUNTERS
Diagnosis: NMLT20 is dismissed while NISRV asks port to return counter data.
If the port dies and cannot be restarted the counters may never be
returned. NISRV will not respond to the read counters request until
the portal is closed.
Solution: Whenever NISRV reports that the port has died close the DECnet
portal which will cause NISRV to return an error to the read counters
request and this will be enough to satisfy the scheduler test and get
NML running again.
[End of TCO 6.1.1391]
TCO-number: 6.1.1392
Written-by: PALMIERI Creation-date: 15-May-85 22:19:32
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: ROUTER
Problem: Protocol initialization to MCB DTE often fails and is never retried.
Diagnosis: First attempt to restart MCB protocol after reload of KL succeeds
and Router is told that the line is okay to use. However, the first
message that Router tries to queue to DTESRV fails because the MCB
has terminated protocol. (Don't know why it does this). Router
then notifies NMLT20 of the termination and closes that circuit
waiting for the next pass through the once a second code to reopen it.
If NMLT20 attempts to restart protocol before the next second DECnet
will not act on a protocol up since the circuit is closed. NMLT20 will
not try again and the circuit will never come up.
Solution: Add TOPS20 specific event 96.7 which will notify NMLT20 that Router
is attempting to use the circuit. NML will then re-initialize MCB
protocol.
[End of TCO 6.1.1392]
TCO-number: 6.1.1393
Written-by: LOMARTIRE Creation-date: 16-May-85 12:06:18
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGEM
Related-QAR: 838336
Problem:
Code is wrong in NWRBS. This could cause DDMP to hang in WTFOD.
Diagnosis:
A SKIPN is done over a TMNN macro which expands to 2 instructions.
Solution:
Add an IFSKP.
[End of TCO 6.1.1393]
TCO-number: 6.1.1394
Written-by: LOMARTIRE Creation-date: 16-May-85 12:16:27
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GTJFN
Related-QAR: 838295
Problem:
File extension recognition/completion does not work when wildcards are
specified in the filespec.
Diagnosis:
It appears that routine ENDALZ was rewritten to use IFSKPs and friends. In
the process a semi-colon was mistakenly inserted in the mask field of a TXNN
instruction. This causes the instruction not to skip when it should. So instead
of returning ambiguous, it completes the field with a bogus extension.
Solution:
Remove the semi-colon.
[End of TCO 6.1.1394]
TCO-number: 6.1.1395
Written-by: LOMARTIRE Creation-date: 16-May-85 12:29:32
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGUTL
Related-QAR: 838294
Problem:
When an OPENF% is done on a system which has no more OFNs, the user may
receive a garbage error code.
Diagnosis:
When failing, the correct error code is not always setup.
Solution:
Fix routine BGCTYP to always return OPNX10.
[End of TCO 6.1.1395]
TCO-number: 6.1.1396
Written-by: NICHOLS Creation-date: 16-May-85 21:18:15
Edited-by: NICHOLS Edit-date: 16-May-85 21:31:43
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LLINKS
Problem: Hung DECnet link
Diagnosis:
This node sends a message to another node which is using No Flow Control. The
remote node cannot accept the message yet and sends back a Link Service OFF
message. Then this node's application asks for a synchronous disconnect,
which requires that all outstanding messages be ACKed before the Disconnect
Initiate message is sent. LLINKS puts the logical link into Disconnect
Initiate state to prevent the application from sending any more messages. If
the remote node never sends a Link Server ON message, LLINKS will wait forever
to send the Disconnect Initiate message and close the logical link.
Normally such a hung link is detected by sending Link Service messages with
zero data requests and then timing out the ACK. LLINKS was only doing this
"pinging" for links in the RUN state.
Solution:
Check for logical link inactivity in DI state as well as RUN state.
[End of TCO 6.1.1396]
TCO-number: 6.1.1397
Written-by: MCCOLLUM Creation-date: 17-May-85 15:08:44
Edited-by: MCCOLLUM Edit-date: 17-May-85 15:09:17
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FORK
Related-QAR: 838213 838214
Problem:
CFORK does not clean up on certain resource exhausted errors.
Diagnosis:
If CFORK gets a resource exhaustion error after it has assigned a system
and job wide fork handle, it returns to the user without killing the
newly created fork.
Solution:
Kill the newly created fork when resource exhaustion errors are
encountered
[End of TCO 6.1.1397]
TCO-number: 6.1.1398
Written-by: MELOHN Creation-date: 17-May-85 18:03:16
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: TTYSRV
Problem: LAT lines are considered "local".
Diagnosis: No LAT line type was set up; concensus was that they should be
grouped with remote lines, not local lines.
Solution: make them remote lines; change remote line message from
?LOGGING IN ON DATASETS IS NOT ALLOWED to
?LOGGING IN ON REMOTE TERMINALS IS NOT ALLOWED.
[End of TCO 6.1.1398]
TCO-number: 6.1.1399
Written-by: MOSER Creation-date: 20-May-85 13:37:42
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGUTL
Problem: OFNBDB BUGHLTs when running CFS with the OFN performance mods.
Diagnosis: This is a long file/CFS bug. The problem arises because the
OFN database SPTO4 disagrees with the data provided by a caller of ASNOFN.
Examination of the dump reveals that the caller expects to assign an ofn for
second level index block 0 in a long file. The system data reflects the fact
that the OFN currently exists and is a regular index block in a long file. When
a file goes from short to long the OFN database is updated and so this is
unexpected.
Under CFS the problem can arise when:
- Users G and C running on systems G and C respectivly each open the same
file when it is short.
- User G extends the file so it goes long (correctly updating G's local data)
- User P running on C now opens the same file again (it is long since G
extended it but C's database does not know)!
Solution: Expect this to happen and update SPTO4 when it does instead of
crashing. OFNBDB will remain for other real errors.
[End of TCO 6.1.1399]
TCO-number: 6.1.1401
Written-by: MCCOLLUM Creation-date: 20-May-85 14:23:38
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA
Related-QAR: 838367
Problem:
The BREAKI BUGINF provides that structure index of the structure under
"attack". Since the index is a dynamically assigned value, after-the-fact
analysis of this BUGINF can be difficult.
Diagnosis:
Same.
Solution:
Change the value provided in the additional data to the sixbit structure
name.
[End of TCO 6.1.1401]
TCO-number: 6.1.1402
Written-by: MCCOLLUM Creation-date: 20-May-85 14:47:38
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MONSYM
Related-QAR: 838366
Problem:
Customers need reserved error codes in MONSYM to support local additions
to TOPS-20.
Diagnosis:
Solution:
Reserve a block of 1000 (octal) error codes from 6000 to 6777 for
customer use.
[End of TCO 6.1.1402]
TCO-number: 6.1.1403
Written-by: MELOHN Creation-date: 23-May-85 15:13:48
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: SET TER NO PAU COMMAND resulted in the LAT server clearing
input flow control but not output flow control. Likewise setting TER
PAU COMMAND did not reset output flow control.
Diagnosis: The XON/XOFF characters in the DATA_B slot were shifted
incorrectly in the slot data information.
Solution: Flush the shifting in favor of canned message which
supplies the correct input and output flow control characters.
[End of TCO 6.1.1403]
TCO-number: 6.1.1405
Written-by: MCCOLLUM Creation-date: 24-May-85 12:13:46
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: scsjsy
Problem:
SCSFR1 BUGCHKs.
Diagnosis:
When SCSKIL is called to close the CI connections of a fork, an SCSFR1
BUGCHK will result if the fork number provided is not the current fork.
Currently this can only occur when a CLZFF% JSYS is performed given as
an argument a fork handle other than the current fork.
Solution:
If the fork number passed to SCSKIL does not match the current fork,
let the caller get away with it, but don't do any actual work. Note
that this is not the proper solution, but will do for now. The correct
solution would be to make SCSKIL close the CI connections of any fork
that is an inferior of the current fork.
[End of TCO 6.1.1405]
TCO-number: 6.1.1406
Written-by: MCCOLLUM Creation-date: 24-May-85 13:36:03
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: ALL MODULES AND UTILITIES
Problem:
The copyrights need updating.
Diagnosis:
Solution:
Do it.
[End of TCO 6.1.1406]
TCO-number: 6.1.1407
Written-by: PALMIERI Creation-date: 24-May-85 15:03:50
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: ROUTER
Related-QAR: 838359
Problem: Partial routing update loss event does not show correct highest
address.
Diagnosis: No code to search for highest address so garbage is displayed.
Also the rest of message causing the update loss is not being pro-
cessed and some routing information may be lost.
Solution: Process all of message and remember highest address.
[End of TCO 6.1.1407]
TCO-number: 6.1.1408
Written-by: PALMIERI Creation-date: 24-May-85 15:10:46
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: DNADLL
Problem: DECnet is in a funny state if it tries to use the Ethernet when the
physical address is not DECnet's.
Diagnosis: Code to check for DECnet address is commented out.
Solution: Enable the checking and and robustness to it.
[End of TCO 6.1.1408]
TCO-number: 6.1.1409
Written-by: GRANT Creation-date: 28-May-85 09:29:50
Edited-by: GRANT Edit-date: 28-May-85 09:30:45
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG
Related-QAR: 838288
Problem: PHYICE and MSCTMU BUGCHKs.
Diagnosis: Not enough section 0/1 resident free space in the Units Pool.
Solution: Add another page.
[End of TCO 6.1.1409]
TCO-number: 6.1.1410
Written-by: NICHOLS Creation-date: 28-May-85 15:21:45
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: llinks
Problem: DECnet links hang if device drivers lose messages
Diagnosis: LLINKS waits for all output message blocks to be returned from
the lower layers before it will close a link. It now appears that there
are cases in which the KLIPAs can crash in such a way that the drivers
cannot return all the output messages reliably.
Solution: Allow links to close even if the out-in-router count is non-zero
and check all output done messages to see that the corresponding link block
is still active. For debugging purposes, a new switch FTORC can be set
non-zero to make links wait for output completion.
[End of TCO 6.1.1410]
TCO-number: 6.1.1411
Written-by: MOSER Creation-date: 28-May-85 15:57:47
Edited-by: MOSER Edit-date: 28-May-85 17:18:33
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FORK STG GLOBS
Problem: FLKTIM does not actually time out the fork lock. It simply
reports the problem without unlocking. This is desirable when
debugging but is a problem for customers.
Diagnosis: We recode this every release and it is a BIG DRAG.
Solution: Fix this ONCE AND FOR ALL. The following rules now apply:
- DEBUG monitor - never unlock
- NON-DEBUG monitor and DBUGSW <> 0 - Never unlock
- otherwise (no debugging of any kind) unlock when FLKTIM occurs.
It may be desirable to change the timeout value from the current 2 minutes.
Make this a parameter, FLKTMV, in STG and set it to 2 minutes as the default.
[End of TCO 6.1.1411]
TCO-number: 6.1.1413
Written-by: GROSSMAN Creation-date: 28-May-85 16:57:30
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI
Problem: NISRV's Read Portal Info function (NU.RPI) does not return list of
multicast addresses correctly for portals that have more than one multicast
enabled.
Diagnosis: Too complicated to explain, and not worth listening to.
Solution: An ADDI and a JUMPE in the correct places in NIRPI.
[End of TCO 6.1.1413]
TCO-number: 6.1.1414
Written-by: GROSSMAN Creation-date: 28-May-85 17:04:16
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NISRV
Problem: Return Portal Info (NU.RPI) function of NISRV did not check
portal ID's correctly. It also was not returning the User ID.
Diagnosis: Oops. Forgot to set the FC.POR bit in the NISRV function dispatch.
[End of TCO 6.1.1414]
TCO-number: 6.1.1415
Written-by: GROSSMAN Creation-date: 28-May-85 17:31:42
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: NIUSR
Problem: The NI% JSYS didn't deal with global portal ID's correctly.
Also, the function NU.RPI did not return multicast address list in the
correct format.
Diagnosis: Designer brain damage. Monitor portal ID's are fullwords, and user
portal ID's are halfwords. I had to figure out some way to identify monitor
portals to users. Just returning the address was not good enough.
Solution: Invent an "external" portal ID for monitor portals. This ID is
created by NISRV, and is guaranteed to be unique. This id is also guaranteed
to fit into 18 bits. This ID is what the NI% JSYS will deal with when talking
about monitor based portals.
[End of TCO 6.1.1415]
TCO-number: 6.1.1416
Written-by: PALMIERI Creation-date: 29-May-85 15:07:04
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: SCJSYS
Related-QAR: 838141
Problem: If a user issues a SOUT/SOUTR before the connect confirm is received
on a DCN link SCJSYS calls SCLINK without waiting for the connect
confirm. Possible race condition if SCJSYS blocks in IMPWAT.
Diagnosis: No code to wait for connect confirm. IMPWAT does not revalidate
JFN before returning to caller.
Solution: Add routine WCCFRM to await the confirm before calling SCLINK with
the users buffer. Have IMPWAT call SCLFNU to revalidate the JFN be-
fore returning to caller.
[End of TCO 6.1.1416]
TCO-number: 6.1.1417
Written-by: GROSSMAN Creation-date: 29-May-85 15:34:09
Edited-by: GROSSMAN Edit-date: 30-May-85 14:25:39
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: APRSRV
Related-QAR: 838362
Problem: Names of BUGHLTs are messed up in ERROR.SYS if the BUGHLT occurs in
XCDSEC.
Diagnosis: The routine BUGH0 (BUGHLT processor) was using a HRRZ to fetch
the PC of the BUGHLT when passing it on to the SYSERR block generator.
Solution: Load the full PC by using the mask EXPCBT when calling BUGSTO.
[End of TCO 6.1.1417]
TCO-number: 6.1.1418
Written-by: GLINDELL Creation-date: 30-May-85 16:27:04
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: ntman
Problem:
Event block is not deallocated if user requested a signal and the
signal queue was full (very unlikely).
Diagnosis:
Solution:
Call DNFWDS to deallocate event block if signal queue is full.
Thank you Bill Davenport.
[End of TCO 6.1.1418]
TCO-number: 6.1.1419
Written-by: MOSER Creation-date: 31-May-85 15:59:30
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGEM
Problem: SKDPF1 when extended addressing (mapping sections indirect) and
working set preloading.
Diagnosis: Similar problem as TCO 6.2000. The monitor looks at entries in
the working set cache but may page fault on another forks PSB. What is
desired is to remove that page from the WSC. We crash trying to look at the
PSB that is swapped out.
Solution: Change PRELW1 to call FPTAXP and expect to get -1,,SPT indicating
SPX is not in core (or would cause a PF). If this is the case then delete the
page from the working set cache.
[End of TCO 6.1.1419]
TCO-number: 6.1.1420
Written-by: TBOYLE Creation-date: 3-Jun-85 11:31:21
Edited-by: TBOYLE Edit-date: 3-Jun-85 11:49:34
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DIAG
Problem:
ILULK4 crashes when killing the diagnostic monitor D20MON
under certain circumstances.
Diagnosis:
When ports CI and NI are set unavailable, they are both
in maintenance mode. This situation confuses the DGFKIL routines
which heretofore did not expect such a situation. The routines
DGEXRL and DGUNLK don't check to see that the lock word is
no longer in use on the second call to release resources.
Further confusion resulted when making pointers out of -1.
Solution:
Add an additional paranoia check to the routines. Have them
check to see if the DIAG lock is no longer in use.
[End of TCO 6.1.1420]
TCO-number: 6.1.1421
Written-by: TBOYLE Creation-date: 3-Jun-85 11:50:24
Edited-by: TBOYLE Edit-date: 3-Jun-85 12:00:18
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGEM MONSYM
Problem:
Monitor hangs.
Diagnosis:
This TCO replaces TCO 6.1.1297 which was supposed to fix PTNON0
BUGHLTs. This new scheme was necessary because CFSWUP likes to go
OKSKED at times to get its work done when NOSKED is in its way. The
callers of CFSWUP are usually never aware of this behaviour.
Solution:
Work around the behaviour of CFSWUP. Remove the extra NOSKED
and OKSKED in MSETPT. Have SETPT0 inform its caller if the page-slot
set failed. Have caller in MSETPT retry if this happens. Teach
the other callers to SETPT0 to crash as they did before with PTNON0
when the page-slot set fails.
[End of TCO 6.1.1421]
TCO-number: 6.1.1423
Written-by: WAGNER Creation-date: 4-Jun-85 11:30:35
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: SCHED phymvr STG globs
Problem: MSCP server gets higher priority than it should. Party line says that
served disks are there for customer convenience, hence we should not
give them the priority that they get.
Diagnosis: We call MSSCHK to check for server requests within 20mS of the
Server requesting to be scheduled. Currently the server requests
this by AOSing SRVSKD, which the scheduler notices at the next
short cycle, and therefore will always check the server within
20 mS, removing the currently running fork more often than not.
Adding insult to injury, the server already has a mechanism to
be called when needed, but at the 100mS long cycle through CLK2.
This is done be SETZMing MSSTIM, which is noticed in the long
cycle, and MSSCHK is called at that time too!
We only need to call MSSCHK every long cycle, there is no need
to forcibly dismiss the currently running process within 20mS
because of a server request.
Solution: Get rid of the SRVSKD flag, replacing AOSes of it with SETZMs of
MSSTIM. This is done in a new routine MSSCZK (MSS Czech...).
This will make it check server requests only when other running have
been forcibly dismissed anyway.
Note: This has the pleasant side effect of reducing the overhead that
running DUMPER from another system to our served disks has on other
jobs on the poor server system. If the server system is standalone
no effect in DUMPER performance is seen.
[End of TCO 6.1.1423]
TCO-number: 6.1.1424
Written-by: GROSSMAN Creation-date: 5-Jun-85 10:23:00
Edited-by: GROSSMAN Edit-date: 5-Jun-85 10:44:20
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI NIPAR
Related-QAR: 838380
Problem: Not enough information is returned when the NIA20 gets a planned CRAM
parity error.
Diagnosis: The KNIPPE BUGCHK was used for all planned CRAM parity errors. This
required that someone go look up the code returned by KNIPPE in order to fix the
problem.
Solution: Create a seperate BUGINF for each possible planned CRAM parity
error. Make the short text and the long text be fairly explicit about what's
going on, and also return the info that CSSE wants to see.
[End of TCO 6.1.1424]
TCO-number: 6.1.1425
Written-by: GROSSMAN Creation-date: 5-Jun-85 10:38:14
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: phykni
Problem: KNIIPF (Illegal PHYSIO Function) should be a BUGHLT. There is not
enough context saved by BUGCHKs to figure out the problem.
Also, start removing useless TOPS-10 conditionals from PHYKNI.
[End of TCO 6.1.1425]
TCO-number: 6.1.1426
Written-by: LOMARTIRE Creation-date: 5-Jun-85 12:52:18
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DISC
Related-QAR: 838337
Problem:
Job can hang in CFSRWT when doing a RENAME. This happens when renaming the
dump file which is being examined by FILDDT.
Diagnosis:
OPNLNG calls ASNOFN without setting up the ACs it expects. This causes
ALOC1 to be screwed up which later confuses CFS.
Solution:
Call GASOG before calling ASNOFN at OPNLNG.
[End of TCO 6.1.1426]
TCO-number: 6.1.1428
Written-by: MELOHN Creation-date: 5-Jun-85 15:06:38
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA
Related-QAR: 838321
Problem: Can't built monitor with LAHFLG=0
Diagnosis: SMON code to set LAT-STATE was not under LAHFLG conditional.
Solution: Put it under said conditional
[End of TCO 6.1.1428]
TCO-number: 6.1.1429
Written-by: MELOHN Creation-date: 5-Jun-85 16:22:09
Edited-by: MELOHN Edit-date: 5-Jun-85 16:24:47
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: TTYSRV CTHSRV CTERMD TTYDEF GLOBS
Related-QAR: 838331 838376 838309 838441
Problem: Wrapping lines, ^U, ^R, all work inconsistently with
CTERM. Programs that do multi-line TEXTI%, (like MS) lose track of the
current character position, and produce text that is garbled garbage.
Diagnosis: Both TOPS-20 and CTERM have two basic modes of doing
output to TTYs, binary mode and ascii mode. TOPS-20 uses several
different means of doing combinations of binary mode and ascii mode to
the same TTY. (example, the BLANK command in the EXEC, which puts out
the escape sequence to clear the screen in binary and then prints the
EXEC prompt in ascii). TOPS-20 also maintains the wrap count on the
line on the basis of how many ascii (non-binary) characters have been
output to the terminal. It is therefore critical to output characters
in the server in the same mode in which they were generated on the
host.
Originally CTHSRV set the mode of the message based on the value of
TT%DAM in the JFN mode word at the time when the output was to be sent
to the server. This didn't work because it is possible to output
binary characters to the TTY without changing the JFN mode word by
opening a JFN on the TTY with bytesize of 8. It turns out also that
the CTERM fork which moves the output from the TTY line buffers to the
CTERM message sent to the server runs asynchronously from the user
process which sets and clears the JFN mode word.
Edit 842 to CTHSRV recognized that the above couldn't work, and
therefore always put the message in binary mode. This worked for
normal output in most cases. Unfortunetly, the binary output did not
correctly update the current line and character position on the server
terminal, with the result that wrapping the line occurred at seemingly
random times, and control-R, control-U, and DELETE across line
boundaries produced incorrect results. These include the symptoms
described in QARs 838331, 838376, 838309, and 838441. In all of these
cases the line and character counts were only being updated when the
server echoed characters locally (since they were echoed in non-binary
mode). The line and character position was unaffected by the output
from the remote system, and therefore remained wheverever the last
character read and echoed from the terminal left off.
Solution: The only way to make the remote server do the right thing
is send it binary output in binary(transparent) mode, and non-binary
output in ascii mode. Since there are several ways to switch back and
forth between binary and non-binary mode to the TTY, the only
practical place to tell when we are in binary mode and when we are not
is in TCO, the first level output routine in TTYSRV. I propose to put
markers in the output stream for cterm terminals only that signal when
the output mode is switching between ascii and binary, and between
binary and ascii mode. CTHSRV, when it copies characters from TTY
line buffers to CTERM messages will look for these markers, set the
tranparent bit in the CTERM message as approporate, and segment the
message such that it contains only binary or only ascii mode data.
To implement this I have added TTOASC and TTOBIN markers to the output
markers in TTYDEF; TT%BIN to the TDB which tells whether the terminal
is in binary mode or not, and CH%BIN to the CDB which tells whether
the last message sent to the server was in binary mode or not. This
last flag is necessary since the output markers only signal the
transition between modes, so the mode must remain sticky between
different cterm output messages.
[End of TCO 6.1.1429]
TCO-number: 6.1.1431
Written-by: MELOHN Creation-date: 5-Jun-85 17:03:31
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: If you change the server name of a LAT server to a name
shorter than the original name, you get an extra character from the
old name at the end of the output of the NTINF% jsys.
Diagnosis: Routine MMVAZO doesn't work with non-ASCIZ strings.
Solution: Make it work right with strings not terminated with a null
byte. Fix UMVAZO to do the same.
[End of TCO 6.1.1431]
TCO-number: 6.1.1432
Written-by: PALMIERI Creation-date: 5-Jun-85 17:39:34
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: DNADLL
Problem: DECnet close of Ethernet portal causes DNDNCE bugchks.
Diagnosis: DNADLL gives NISRV a portal ID of zero. This happens because
the UN block that was used by DNADLL to open the portal is not the
one it uses to close the portal. DNADLL expects them to be the same.
Routine CHKADR was switching UN blocks in the process of opening a
portal.
Solution: Have CHKADR use the same UN block that the open portal routines
are using.
[End of TCO 6.1.1432]
TCO-number: 6.1.1433
Written-by: PALMIERI Creation-date: 5-Jun-85 21:02:18
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: GTJFN
Problem: Can't use wild cards in filename in parse only network JFNs.
Diagnosis: Code does not allow wildcards in filename.
Solution: Remove restriction for parse only network JFN.
[End of TCO 6.1.1433]
TCO-number: 6.1.1434
Written-by: PALMIERI Creation-date: 6-Jun-85 10:44:26
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: DNADLL
Problem: If received data on the Ethernet exceeds the size of the allocated
buffer a "frame too long" event is generated. This event does not
include the source and destination adresses of the oversize message.
Diagnosis: No code to supply the addresses.
Solution: Add necessary code.
[End of TCO 6.1.1434]
TCO-number: 6.1.1435
Written-by: PALMIERI Creation-date: 6-Jun-85 10:56:13
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: D36COM ROUTER JNTMAN
Problem: Big buffers on the Ethernet not as big as they could be.
Diagnosis: Too many bytes reserved for Routing and who knows what overhead.
Solution: Make an accurate computation of overhead and subtract it from
the Ethernet maximum of 1504 not 1498.
[End of TCO 6.1.1435]
TCO-number: 6.1.1436
Written-by: MOSER Creation-date: 6-Jun-85 20:53:14
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: APRSRV
Problem: Wrong Bughlt info. Bogus dumps. TRAPPC points to BUGH5+5.
Diagnosis: Monitor does a LOAD FKJSB but FX is garbage.
Solution: Have a good FX.
[End of TCO 6.1.1436]
TCO-number: 6.1.1437
Written-by: PAETZOLD Creation-date: 10-Jun-85 10:04:29
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: STG
Problem:
nimaxh too low. customers are running bigge ethernets than we thought.
Diagnosis:
Solution:
increase ght size from 16. to 128.
[End of TCO 6.1.1437]
TCO-number: 6.1.1438
Written-by: LOMARTIRE Creation-date: 10-Jun-85 11:12:14
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGUTL
Problem:
Bad code at BGCTYP.
Diagnosis:
The routine is doing a GTAD% instead of calling LGTAD.
Solution:
Change it to call LGTAD to get the internal date.
[End of TCO 6.1.1438]
TCO-number: 6.1.1439
Written-by: PALMIERI Creation-date: 10-Jun-85 11:18:15
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: SCJSYS
Related-QAR: 838464
Problem: User gets message "No response from destination process" when ACJ
on the local system refuses network access.
Diagnosis: Wrong error code returned by DCN OPENF routine if ACJ refuses
network access.
Solution: Return correct error code in DCNOPN. This error was caused by
one of two incorrect error codes in the error conversion table.
Fix both of them.
[End of TCO 6.1.1439]
TCO-number: 6.1.1440
Written-by: PALMIERI Creation-date: 11-Jun-85 13:44:36
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: D36PAR D36COM
Related-QAR: 838296
Problem: Page faults while in the scheduler. Can be manifested by OKSKBG
BUGHLTs.
Diagnosis: DECnet-36 often backs up byte pointers with an ADJBP. This
can create byte pointers of the form 0410xx,,-1. If xx contains
the first address in a section an effective address will be
computed that is the last address in the previous section.
If this section is not mapped...the monitor will die somewhere.
Solution: In routines that adjust the byte pointer check to see if the pointer
is of the form 0410xx,,-1. If so change the byte pointer to
4410xx,,0.
[End of TCO 6.1.1440]
TCO-number: 6.1.1442
Written-by: MCCOLLUM Creation-date: 11-Jun-85 15:21:47
Edited-by: MCCOLLUM Edit-date: 11-Jun-85 15:28:59
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GTJFN
Related-QAR: 838346 838418 838431
Problem:
RELBAD BUGHLTs.
Diagnosis:
When parsing a field of a file spec in GTJFN, the right half of FILTMP(JFN)
is used to store the address of the block of free space that contains the
text of the field currently being parsed. FILOPT(JFN) is used to hold a
byte pointer to the end of this string. If the field being parsed is after
the device field (e.g. the directory, file name or extension fields) and
the device field is defaulted to DSK*: or a logical name defined as DSK*:,
STRDVD is called to translate DSK*: to the name of the first structure in
STRTAB. In all cases this is the public structure. STRDVD allocates a new
block of free space if the structure name does not fit into the block of
free space which currently holds the string "DSK*". It then updates the
pointer in FILOPT(JFN). This is incorrect. FILOPT(JFN) should only be
updated if the field currently being parsed is the device field. The result
is that ENDTMP is eventually called to trim the block of free space pointed
to by the right half of FILTMP(JFN). It assumes FILOPT(JFN) contains the
pointer to the end of the string stored in this block, but it no longer
does and a a RELBAD BUGHLT results.
Solution:
There is an alternate entry to STRDVD (STRDEV) that is used when GTJFN is
currently parsing the device name field. Remember which entry point was
used and only update FILOPT(JFN) if the device field is currently being
parsed.
[End of TCO 6.1.1442]
TCO-number: 6.1.1443
Written-by: PAETZOLD Creation-date: 12-Jun-85 08:31:27
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: monitor
Routines-affected: FREE
Problem:
ILMNRF from job zero when shrinking a resident free space pool.
Diagnosis:
a bad byte pointer was being constructed due to an AC being trashed.
Solution:
restore the ac after destroying it. this fix thanks to bill schilitt
and debs.
[End of TCO 6.1.1443]
TCO-number: 6.1.1445
Written-by: MELOHN Creation-date: 12-Jun-85 14:58:07
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: LATIST BUGINFs
Diagnosis: LAT Slot dmultiplexor routine (LSDMUX) did not handle the
case where multiple start slots were received in the same message. It
also didn't always correctly adjust the byte pointer when an invalid
or unexpected slot was received.
Solution: Fix LSDMUX to parse slots on the basis of the slot size,
rather than assuming (sometimes incorrectly) how much data is left in
the slot and adjusting the byte pointer by that amount.
[End of TCO 6.1.1445]
TCO-number: 6.1.1446
Written-by: GRANT Creation-date: 12-Jun-85 17:24:02
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKLP
Problem: Monitor doesn't handle another system running CI diagnostics very
well.
Diagnosis: When another system ACKs REQUEST-IDs but doesn't return IDRECs,
we continue to send STARTs.
Solution: Notice that REQUEST-IDs are no longer being answered and return
the state of the v.c. to closed.
[End of TCO 6.1.1446]
TCO-number: 6.1.1447
Written-by: MOSER Creation-date: 13-Jun-85 11:34:57
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGEM
Problem: Many engineers waste lots of time looking at bogus OKSKBG dumps.
Diagnosis: The real problem is a PF in the scheduler, SKDPF1, but the
code only detects this when CKSPFL is turned on. The PF handler
goes NOSKED using the macro but goes OKSKED by doing instructions. If
INSKED then NSKED gets erroneously decremented.
Solution: Allways check for SKDPF1 (this takes 1 instruction). Always
use NOSKED and OKSKED macros.
[End of TCO 6.1.1447]
TCO-number: 6.1.1448
Written-by: MOSER Creation-date: 13-Jun-85 11:45:35
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: FILMSC
Problem: ILMNRF.
Diagnosis: User does a BIN using .TTDES+line number. If the terminal
switches and the code rechecks then it can crash as it expects JFN
to contain an index into the JFN blocks not 400nnn.
Solution: Expect this and do the "right" thing. Note that the dumps I examined
the "right" thing is probably not what the user wants because the user
haas bad code but it is the logically correct action.
[End of TCO 6.1.1448]
TCO-number: 6.1.1449
Written-by: WAGNER Creation-date: 13-Jun-85 14:51:29
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: cthsrv STG
Related-QAR: 838315 838455
Problem: Doing CTERM host output uses 50-70% of CPU in scheduler, and 20-25%
gets charged to user doing the output. This is independant of baud
rate, and decreases only slightly on heavily loaded systems.
The result is that one user doing a large, non-blocking EXEC type
command (or a large SOUT% or it's friends) can effectively hog an
entire system.
Diagnosis: The CTERM output processing code does an MDISMS on a flag, CTMATN,
that gets set whenever there is output to do on a line (other conditions
also set this "attention" flag as well). The real problem is that we
are a JP%SYS process that is also CRSKED at the time of the MDISMS.
This causes us to get 200 mS of balance set hold time, negating the
effect of the wait for the MDISMS to get checked. The result is that
when we request CPU, we get it.
Solution: Implement a new flag, CTMWAG (CTerM Wait And Go), that gets set
when we want to do the output. Now OR this flag in with a SETZed
CTMATN at the scheduler short cycle (LV8CHK), so that if we have
output to do, we will notice it every 20mS at the most. This has
the additional benefit of treating CTERM output just like RSX20F
and NRT output (incidentally, the case of NRT output hogging the
system was fixed in a similar manner).
The results are quite noticable: the user only gets charged for
a more RSX20F and NRT-like 8%-12% CPU while doing the output.
The scheduler overhead is reduced from 50%-70% down to about 1%
on a standalone system. The actual throughput is down from roughly
1000 cps to 950 cps, a reduction of only 5%!
[End of TCO 6.1.1449]
TCO-number: 6.1.1451
Written-by: NICHOLS Creation-date: 17-Jun-85 10:15:44
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: D36COM
Problem: JSR BUGHLT in D36COM
Solution: Replace JSR BUGHLT with a real BUGHLT.
[End of TCO 6.1.1451]
TCO-number: 6.1.1452
Written-by: GROSSMAN Creation-date: 17-Jun-85 10:57:43
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI
Problem: Symbols in PHYKNI that are related to SYSERR entries are wrong or
misleading.
Diagnosis: Wealth of confusion from CSSE error specs and SPEAR documents.
Solution: Change the symbol names to be more meaningful, also update the
appropriate figures.
[End of TCO 6.1.1452]
TCO-number: 6.1.1453
Written-by: GROSSMAN Creation-date: 17-Jun-85 11:08:02
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKNI
Problem: NIA20 keep-alive routines do not always detect a dead KLNI.
Diagnosis: It is possible for the NIA20 to get messed up in such a way that
it no longer processes command queue entries. However, if there is a steady
stream of incoming datagrams, the NIA20 will be cause the KL to wake up and
process them. The incoming datagrams are treated just like regular command
responses. This fools PHYKNI into beleiving that the NIA20 is working just
fine, because it sees these responses frequently enough to keep the keep-
alive process happy. Unfortunately, the KLNI is not responding to commands
during this period, and things like transmits just hang forever.
Solution: Simplify the keep-alive process. ALWAYS give the KLNI a Read Station
Info command every five seconds. If that command is not processed within five
seconds, kill the KLNI and give a KNISTP BUGCHK.
[End of TCO 6.1.1453]
TCO-number: 6.1.1454
Written-by: MOSER Creation-date: 17-Jun-85 15:43:46
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GLOBS IPCF
Problem: Global subroutine MTAMES is unused.
Solution: Remove it.
[End of TCO 6.1.1454]
TCO-number: 6.1.1455
Written-by: MELOHN Creation-date: 18-Jun-85 12:05:20
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CTHSRV FILMSC GLOBS
Problem: COMND (and perhaps users) need a way to determine whether a
CTERM terminal supports the full CTERM implementation (like TOPS-20,
TOPS-10, DECnet-DOS, RSX) or just a limited, bug-filled implementation
(like VMS).
Diagnosis: The .MOCTM MTOPR% should return two different values; 1 if
the terminal in question is a True CTERM terminal, and 2 if the
terminal is a VMS CTERM terminal. Users may find this useful as well,
since many things don't work on VMS CTERM terminals, and will not
until VMS is fixed.
Solution: Do it.
[End of TCO 6.1.1455]
TCO-number: 6.1.1456
Written-by: LEACHE Creation-date: 18-Jun-85 13:12:30
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: STG
Problem: A few sites have insufficient IPCF or ENQ freepool space.
Diagnosis: Pools not large enough.
Solution: Increase size of pool allocation.
[End of TCO 6.1.1456]
TCO-number: 6.1.1457
Written-by: PALMIERI Creation-date: 18-Jun-85 15:34:45
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: STG DNADLL
Problem: Endnodes on the Ethernet may have more overhead than is desirable.
Diagnosis: Non-routing nodes on the Ethernet always enable to receive
multicast packets sent to the ID "All Routers". This allows
DECnet to eavesdrop on routing messages so a database can be main-
tained for INFO DECnet.
Solution: Only enable "all routers" multicast address in endnodes on the
Ethernet when variable EVSDRP is non-zero. The default value for
EVSDRP is -1.
[End of TCO 6.1.1457]
TCO-number: 6.1.1459
Written-by: MOSER Creation-date: 18-Jun-85 16:32:22
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GTJFN
Related-QAR: 838488
Problem: GTJFN no longer returns GJFX39 (logical name loop detected) it returns
GJFX24 (file not found) instead.
Diagnosis: TCO 6.2261 caused this by always returning GJFX24 when SETDEV
fails.
Solution: Return GJFX24 if SETDEV returns STRX09 otherwise return the SETDEV
error.
NOTE: *****************************************************************
* THIS TCO DOES NOT IMPLY ANY "OWNERSHIP" OF GTJFN *
* I WILL DISAVOW ANY KNOWLEDGE OF THAT MODULE *
*****************************************************************
[End of TCO 6.1.1459]
TCO-number: 6.1.1460
Written-by: MOSER Creation-date: 19-Jun-85 09:35:44
Edit-checked: No Document: Yes TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: DISC
Problem: JSR BUGHLT in DISC.
Solution: Make it a "real" BUGHLT, XTRAPT (long file has extra page table).
[End of TCO 6.1.1460]
TCO-number: 6.1.1461
Written-by: GROSSMAN Creation-date: 19-Jun-85 10:32:12
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LLMOP MEXEC STG GLOBS
Problem: Ethernet system ID's generated by LLMOP did not contain correct
system time information.
Diagnosis: The system date and time was hardwired to be 23-Mar-1984 12:30:30.50.
This was probably done because the system ID message could be generated at
interrupt level. The date and time conversion routines are not available at
interrupt level so the programmer just fudged up some numbers.
Solution: Generate system ID's in CHKR. This way, LLMOP can use the date and
time conversion JSYSs to acquire the necessary values for the system ID message.
Now, when a Request ID message is received at interrupt level, a request gets
queued up to CHKR level, and job 0 gets run immediately. Also, periodically
(every 10 minutes) CHKR will call LLMOP to generate an unsolicited System ID
message.
[End of TCO 6.1.1461]
TCO-number: 6.1.1462
Written-by: GRANT Creation-date: 19-Jun-85 15:26:33
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKLP
Problem: CIPDFQ BUGINFs after reloading the CI.
Diagnosis: TOPS-20 is not processing the response queue. This occurred
after a planned CRAM parity error; one action taken by TOPS-20
in the error processing is that of processing the response queue
and cleaning all the command queues. The routine KLPRQA is called
in this case just as it is during normal operation. During the
error processing case, the port has been disabled but by calling
KLPRQA we incorrectly enable the port for a small amount of time.
During this time the port is capable of putting packets on the
response queue.
Thus, when we finally reload and start the port, it finds the
response queue non-empty and never generates an interrupt.
Solution: Create a second entry point KLPRQC to be used when we want to
process the response queue without enabling the port.
[End of TCO 6.1.1462]
TCO-number: 6.1.1463
Written-by: MOSER Creation-date: 19-Jun-85 16:21:29
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PAGUTL
Problem: OFNBDB BUGHLT.
Diagnosis: The problem occurs when a file is going from short to long.
The following scenario reproduces the bug.
Users Moe and Larry both open STOOGES.DAT as a short file.
User Moe goes out for lunch while Larry extends the file long. When
Larry makes the file go long the following counts are in effect.
Super PT share count = 3 (1 long opener [Larry] and 2 second level PTs)
Share count on second level PT0 = 3 (Moe, Larry and extra count from Larry)
Now Larry closes the file (Moe is still at lunch) The Share count on Super
PT becomes 0 and the OFN is deassigned) The share count on the PT for file
section 0 is still non zero because of Moe (still at lunch).
Now enter Curly who also opens the file. He assigns a OFN for the
Super PT and gets a different one than Larry had. When Curly tries to
aquire an OFN for file section 0 he finds the one that Moe is holding
but the data in SPTO4 does not agree with the data provided by Curly
in the call. An OFNBDB results.
Solution: When assigning an OFN for a long file section 0 always use the
callers data. OFNBDB will still exist for other cases.
[End of TCO 6.1.1463]
TCO-number: 6.1.1464
Written-by: MELOHN Creation-date: 19-Jun-85 20:31:44
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: COMND CTERMD CTHSRV FILMSC TTYDEF TTYSRV
Problem: CTERM terminals do not "unwrap" successfully; that is, if
you create a long enough line to wrap, and then edit that line to be
only one line long again, the CTERM-SERVER loses the prompt that began
the line.
Diagnosis: In this case, the server must have a local copy of the
prompt in order to reprint the line successfully.
Solution: Add a pointer to the prompt string as an argument to the
.MOTXT MTOPR% (AC 4). Make TEXTI% and friends fill in this prompt
string with the users's ^R buffer pointer. Make CTHSRV send this
prompt to the server. Make the server know that the prompt is really
an ^R buffer, and load it into the TEXTI% on the remote system.
[End of TCO 6.1.1464]
TCO-number: 6.1.1465
Written-by: MELOHN Creation-date: 19-Jun-85 20:38:43
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CTHSRV CTERMD
Problem: DECnet-DOS won't talk CTERM with us.
Diagnosis: Multiple problems; the LOKCDB routine has some code which
normal speed CTERM connections apprently never tested. The GETIMG
routine; which is supposed to parse a DNA image field doesnt work, and
always uses the defaults, which keep everyone but DOS happy. We treat
DOS, and any non-10-20 system like VMS and don't trust it to do
editing. DOS can do editing.
Solution: Fix problems, make CTERM assume that all implementations
handle the entire protocol with the exception of VMS.
[End of TCO 6.1.1465]
TCO-number: 6.1.1466
Written-by: PALMIERI Creation-date: 20-Jun-85 15:16:34
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: STG
Problem: No DECnet nodename if not defined in CONFIG.CMD.
Diagnosis: The default name is nulls.
Solution: Make the nodename and nodename count RSIs of TOPS20 and 6
respectivly.
[End of TCO 6.1.1466]
TCO-number: 6.1.1467
Written-by: GRANT Creation-date: 20-Jun-85 18:02:22
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: phyklp
Problem: Unnecessary CI command when CI microcode is reloaded.
Diagnosis: Now that CI microcode is loaded directly from the monitor we
get the version from the load file; we don't need to do the
READ-COUNTER command to find out the microcode verion anymore.
Solution: Remove the READ-COUNTER command from routine STRKLP, but have
routine LODUCD put the edit number in the CDB so the utilities
can find it. The entire version (major, minor, edit) is in
location UCDVER.
[End of TCO 6.1.1467]
TCO-number: 6.1.1468
Written-by: GRANT Creation-date: 21-Jun-85 12:38:20
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYSIO
Related-QAR: 838233
Problem: PHYTPD BUGINF description is misleading.
Solution: Chane name to PHYCPI (CI path ignored) and make the description
more informative.
[End of TCO 6.1.1468]
TCO-number: 6.1.1469
Written-by: PALMIERI Creation-date: 21-Jun-85 15:07:17
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: SCJSYS
Related-QAR: 838514
Problem: DECnet DCN write only logical links sometimes hang other end's
SRV link.
Diagnosis: A CLOSF% may complete "too soon" if the logical link is write
only. When the user issues a close a synchronous disconnect
call is issued to SCLINK. This causes a DI to be sent to the other
end of the link. If the DI can not go out immediately it is queued
up for transmission later. SCJSYS notices that the link is "write
only" and decides that it can remove the entire port database at this
time since there should not be any pending input from the network.
In cleaning up the link it issues a Release for the link to SCLINK.
If the DI is still on the queue to be transmitted it is discarded.
The other end of the link never gets the DI and must wait for
a "no confidence" on the link before cleaning up.
Solution: Don't check in DNETIN to see if the link is "write only".
Instead, always call SCLINK to read any pending data (there shouldn't
be any). If there is no data SCLINK will block until the DC arrives
which means the other end of the link has received and processed the
DI. The only need of the write only check was for the MTOPR function
READ LINK STATUS so make the check there.
[End of TCO 6.1.1469]
TCO-number: 6.1.1470
Written-by: MELOHN Creation-date: 21-Jun-85 17:33:49
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: CTHSRV
Problem: CTERM loses characters on input
Diagnosis: CTHSRV sent start read messages requesting more input
than the line buffer could hold.
Solution: Build start read messages with input length equal to TIMAX
minus TTICT, which is the remaining space in the line buffer. Defer
additional start reads until the line buffer has more than five bytes
free.
[End of TCO 6.1.1470]
TCO-number: 6.1.1472
Written-by: PALMIERI Creation-date: 24-Jun-85 14:38:02
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: ROUTER
Related-QAR: 838475
Problem: Setting cost on DTE circuits seems to have no effect on cost to
nodes whose path is over the DTE.
Diagnosis: Cost paramenter for other than first circuit is ignored when doing
routing updates.
Solution: When stepping through the circuits to build the routing vector
use the cost for that circuit when computing costs to nodes over that
circuit.
[End of TCO 6.1.1472]
TCO-number: 6.1.1473
Written-by: GRANT Creation-date: 25-Jun-85 13:19:13
Edited-by: GRANT Edit-date: 25-Jun-85 13:22:10
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: phyklp cfssrv PHYSIO
Problem: Current CI diagnostic strategy using NO-ANSWER doesn't work well.
Diagnosis: The CI microcode doesn't control the ACKing/NAKing of incoming
packets. So, the best it can do is not send back an IDREC even
though the REQUEST-ID has been ACKed. This isn't good enough because
the other system sees that the NO-ANSWER system is there because it
ACKed the packet. This results in KLPNOA BUGCHKs rather than simply
ignoring it.
Solution: Instead of faking non-existence, set the maintenance bit in the
port state field of the IDREC. This will alert the other systems.
[End of TCO 6.1.1473]
TCO-number: 6.1.1474
Written-by: LEACHE Creation-date: 25-Jun-85 13:58:34
Edited-by: LEACHE Edit-date: 25-Jun-85 14:06:59
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: JSYSA
Related-QAR: 838507 838509
Problem: Password expiration not working correctly.
Diagnosis: Leftover development code at LOGI2 is interfering with real
code at CHKPSW.
Solution: Remove bogus code.
[End of TCO 6.1.1474]
TCO-number: 6.1.1475
Written-by: MOSER Creation-date: 28-Jun-85 11:19:21
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: GETSAV
Problem: GET of some execute only programs fails.
Diagnosis: AC not always setup properly for calls to SREADF and CREADF.
Solution: Set up T1.
[End of TCO 6.1.1475]
TCO-number: 6.1.1476
Written-by: PALMIERI Creation-date: 28-Jun-85 15:07:10
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: LATSRV
Problem: COMMMS bughlt's as a result of memory being returned by LATSRV.
Diagnosis: If an attempt to post a buffer to NISRV fails the failure
reason is returned in T1. LATSRV originally had the buffer address
in T1 and never saved it. Instead it tries to return whatever T1
now points to.
Solution: Save buffer address in a STKVAR and retore it before calling
DNFWDS.
[End of TCO 6.1.1476]
TCO-number: 6.1.1477
Written-by: GRANT Creation-date: 29-Jun-85 22:21:24
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: PHYKLP MEXEC
Problem: Running IPALOD by hand causes confusion.
Diagnosis: Now that the CI20 microcode is read into memory at system startup,
you can't load a different version without rebooting the system.
However, if you put a new IPALOD up and run it, it comes out and
says "Loading .......x.y(z)" indicating it loaded the new version
of the microcode when it actually reloaded the version that was read
in at system startup.
Solution: Make 2 entry points for IPALOD. When a user tries to run IPALOD
it will not say "Loading.....x.y(z)", but will say "Loading
microcode that was read in at system startup".
[End of TCO 6.1.1477]
TCO-number: 6.1.1478
Written-by: LEACHE Creation-date: 10-Jul-85 13:32:01
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: FORK
Problem: PDVOP% causes freespace damage, RELBAD's, ILMNRF's, etc.
Diagnosis: PDVOP function .POLOC causes recursive execution of PDVOP
with function .PONAM. The PDVOP code fails to reinitialize the datablock
size in the argblock, so that the 2'nd through n'th recursion has an
enormously high value (something like 1,,1) stipulated as the size of
a block that is really 8 words long. This usually causes no problem,
since the 8 word block is more than enough space to hold most program
name strings. However, bogus executions of PDVOP (such as recently
performed by the EXEC) can create PDV's containing program name strings
that exceed 8 words in length. The recursive PDVOP will then destroy
information in the freespace block adjacent to the PDVOP data block.
Solution: Reinitialize the blocksize value before each recursive execution
of PDVOP.
[End of TCO 6.1.1478]
TCO-number: 6.1.1479
Written-by: WAGNER Creation-date: 10-Jul-85 13:38:14
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: MONSYM
Problem: MONSYM is very large
Diagnosis: It is not purging unneeded storage.
Solution: Have it purge .ERCOD on the second pass since it is used as internal
symbol only.
[End of TCO 6.1.1479]
TCO-number: 6.1.1480
Written-by: MELOHN Creation-date: 10-Jul-85 16:50:04
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Problem: ILMNRFs when PROs or VAXes acting as LAT servers with PO/S
or Oracle attempt to connect to TOPS-20 and any user does an NTINF% of them.
Diagnosis: These brain-damaged LAT implementations do not set the LAT
server name. In this case we should display the Hex hardware address
of the remote server, but the code to do this jumps to the wrong part
of the routine and the system trips because the appropriate ACs are
not set up.
Solution: Jump to the right place to correctly display the hardware
address of these "pseudo-lat-servers". Thanks to Peter Donahue for
helping find and exterminate this bug.
[End of TCO 6.1.1480]
TCO-number: 6.1.1481
Written-by: PALMIERI Creation-date: 15-Jul-85 15:01:04
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: CTHSRV
Problem: COMMMS BUGHLTs
Diagnosis: If CTHSRV fails is intialize a connect block when opening a SRV
connection is attempts to release the associated buffer, a pointer
to which is in the CDB. It uses P3 as a pointer to the CDB instead
of the correct AC CDB.
Solution: Change P3 to CDB.
[End of TCO 6.1.1481]
TCO-number: 6.1.1482
Written-by: PALMIERI Creation-date: 15-Jul-85 15:11:10
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: Monitor
Routines-affected: PHYKNI
Problem: KNIIPF BUGHLT if system power fails.
Diagnosis: PHYKNI believes that it should never receive a RESET CHANNEL
call from PHYSIO and BUGHLTs if it gets one. However this call
is executed as a result of a power failure.
Solution: Add routine KNIRSC to handle the reset channel call and stop and
request reload for all KLNIs.
[End of TCO 6.1.1482]
TCO-number: 6.1.1483
Written-by: MELOHN Creation-date: 15-Jul-85 15:39:11
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: latsrv
Problem: RESCHK BUGHLTs
Diagnosis: The header on a LAT transmit buffer was smashed. Examination of the
previous buffer revealed that it was a LAT circuit block. The code which clears
the counters at the end of the circuit block XBLTs one too many words and zeros
the first location of the next block of memory. We crash if we try to return
this block, since the check word has been cleared.
Solution: Fix the routine that clears the circuit counters to zero the correct
number of words.
[End of TCO 6.1.1483]
TCO-number: 6.1.1484
Written-by: MELOHN Creation-date: 15-Jul-85 15:42:21
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: latsrv
Problem: ILMNRF BUGHLT
Diagnosis: LAT server sends us an invalid Circuit block vector which we blithly
use to index into space.
Solution: Check all CB vectors sent from the server to see if they make sense.
If they do not, consider it an illegal message.
[End of TCO 6.1.1484]
TCO-number: 6.1.1485
Written-by: MELOHN Creation-date: 16-Jul-85 14:13:56
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: STG
Related-QAR: 838536
Problem: BADTTYs can occur if you build a monitor with LAHFLG turned off.
Diagnosis: NTTLAH should be set to zero if LAHFLG is turned off.
Solution: Do it.
[End of TCO 6.1.1485]
TCO-number: 6.1.1486
Written-by: MELOHN Creation-date: 16-Jul-85 14:17:55
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: LATSRV
Related-QAR: 838534
Problem: Stop reason code sent to servers is garbage when host gets a LATIST
BUGINF.
Diagnosis: Reason code is not set up after LATIST call.
Solution: Set up the code to be invalid slot/format error and return it to the
server.
[End of TCO 6.1.1486]
TCO-number: 6.1.1487
Written-by: MOSER Creation-date: 16-Jul-85 16:00:09
Edit-checked: No Document: No TCO-tested: No
Maintenance-release: No Hardware-related: No
Program: MONITOR
Routines-affected: APRSRV
Problem: BOGUS dumps. PF in the Bughlt code. TRAPS0 shows PF occurred
trying to reference word 6,,0.
Diagnosis: The Bughlt code tries to store the previous context ACs using
a STPAC. macro. This does a XCTBMU (data from prev context into monitor) of
a BLT of the ACs. When previous context is monitor and the section is not
0/1 then this BLT references memory not the ACs. This is a bug in the microcode.
Solution: Do not do a STPAC. but do a PXCT of an XBLT. instead. This might
eventually get fixed in the u-code but for now this will work.
[End of TCO 6.1.1487]