PDP-10 Archive: mco24.rpt from bb-bt99r-bb

Trailing-Edge - PDP-10 Archives - bb-bt99r-bb - mco24.rpt

There are no other files named mco24.rpt in the archive.

MCO: 14172		Name: DPM		Date: 24-Feb-89:04:01:53


[Symptom]
     A batch login may fail if the number of logged-in jobs minus  the
number of reserved batch job slots is greater than LOGMAX.

[Diagnosis]
     The difference between LOGMAX and JOBMAX is the  number  of  jobs
reserved  for  emergency  logins.  A logging-in timesharing job may be
granted access if LOGNUM will not exceed LOGMAX, and providing  BATMIN
job  slots are reserved for batch logins.  However, if the job logging
in is running under batch, then BATMIN must not  be  included  in  the
computation.

[Cure]
     Don't account for BATMIN job slots when a batch  job  is  logging
in.  Its inclusion is only meaningful for timesharing logins.

[Keywords]
BATMIN

[Related MCOs]
13932, 13137

[Related SPRs]
36246

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	424	UUOCON	ACCLOG
704A	


[End of MCO 14172]

MCO: 14173		Name: DPM		Date: 24-Feb-89:09:34:47


[Symptom]
The old methods of DAEMON error logging leave something to be desired.

[Diagnosis]
Currently, most of the monitor expects DAEMON to gather additional data
for ERROR.SYS beyond what it's initially given.  This exercises race
conditions, slows performance because jobs are sometimes stopped until
DAEMON is finished, and makes DAEMON dependant upon monitor versions
and data structure formats.

[Cure]
Start converting the old-style DAEMON calls to use System Error Blocks.
SEBs eliminate the race conditions because one is queued up for each
error log entry rather than always overwriting the same storage with
new error data.  Performance is improved by not having to prevent jobs
from running while DAEMON is logging the error.  This also eliminates
the dependancy of DAEMON upon the monitor because the monitor will
format the entire record.  DAEMON merely copies SEBs into ERROR.SYS.

This edit will do:

	DL10 error records
	I/O BUS LPT error records
	Stopcode records
	Software Events (POKE, RTTRP, SNOOP, and TRPSET)

[Keywords]
ERROR LOGGING

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	424	CLOCK1	DAEEST
704A		COMDEV	DL10EL
		ERRCON	DIELOG,XFRSEB
		LPTSER	LPTSYR
		RTTRP	RTRET
		S	EX.SYE,EX.DEL
		UUOCON	POKE2,SNPIBP,TRPSTX


[End of MCO 14173]

MCO: 14174		Name: DPM		Date: 28-Feb-89:05:45:07


[Symptom]
New:  To accomodate future tape service big fixes and enhancements,
increase the size of the IORB.  Do this by defining a set of "common"
IORB definitions, to be used initially by tape service, and possibly
later by FILSER.  Append to the common portion, the tape-specific
words.

Common words:
		 .ORG	0
	IRBLNK::!BLOCK	1		;FORWARD LINK TO NEXT IORB
	IRBACC::!BLOCK	1		;ACTIVE (CURRENT) CHANNEL COMMAND
	IRBCCW::!BLOCK	<MXPORT==:4>	;ADDRESSES OF CHANNEL COMMANDS
	IRBIVA::!BLOCK	1		;ADDRESS OF INTERRUPT ROUTINE
	IRBDDB::!BLOCK	1		;ADDRESS OF DDB BEING SERVICED
	IRBSIZ::!			;LENGTH OF COMMON IORB
		 .ORG

Tape-specific words:
		 .ORG	IRBSIZ
	TRBFNC::!BLOCK	1		;FUNCTION DATA
	TRBSTS::!BLOCK	1		;TERMINATION STATUS
	TRBRCT::!BLOCK	1		;BYTE COUNT OF TRANSFER, IF DATA READ
	TRBLEN::!			;LENGTH OF BLOCK
		 .ORG

IRBLNK is the old TRBLNK, but a full word quantity.
IRBCCW is the merger of TRBXCW and TRBEXL.
IRBIVA is the old TRBIVA.

TRBFNC is the old LH or TRBLNK and now can grow beyond bit 17.
TRBSTS could also be made a full word quantity.

[Diagnosis]

[Cure]

[Keywords]
MAGTAPE

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO
Documentation change

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	424	DEVPRM
704A		SCAPRM
		TAPSER
		TAPUUO
		T78KON
		TCXKON
		TD2KON
		TM2KON
		TMXKON
		TS1KON
		TX1KON


[End of MCO 14174]

MCO: 14175		Name: JEG/DPM		Date: 28-Feb-89:06:06:19


[Symptom]
ADP code reading.  Jeff Gunter points out that SCNPIF doesn't include
DSKBIT in configurations which have only a single CPU.  Why is that,
he said?

[Diagnosis]
Don't know.  Looks like an oversight.  While this is a common configuration,
it could only cause problems when the monitor is in the middle of a SCNOFF
and FILSER decides to print "problem on device" at interrupt level.  The
SCNOFF will not have turned off DSKCHN, thus allowing FILSER to do obscene
things at inappropriate times.

[Cure]
Probably doesn't happen alot.  Remove the conditional assembly and always
include DSKBIT in SCNPIF.  This is necessary only because FILSER insists
on typing out at interrupt level.

[Keywords]
SCNSER

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	424	COMMON	SCNPIF
704A	


[End of MCO 14175]

MCO: 14176		Name: DPM		Date:  7-Mar-89:05:38:49


[Symptom]
It has been brought to our attention that some customer(s) want to
install RM05s on their DEC-10.  So be it, however, this is not
recommeded and will remain UNSUPPORTED.

RM05s are interesting devices.  They are faster than an RP06 and
consume less power, as they require only single phase power.  An
RM05 has 30 sectors/track (10 more than an RP06), yet they run
at the same 3600 RPM.  Therefore, the capacity and the transfer
rate is about one third greater than an RP06.

Don't look a gift horse in the mouth.  For starters, the head crash
rate is rather high.  It seems that RM05s work best when left alone.
Despite the fact that they use removable media, frequent disk pack
changes greatly increase the chance of a head crash.  The heads fly
fairly close to an RM05 pack; much closer than in an RP06.  Presumably,
this is the main cause of head crashes.  Also, parts for RM05s are not
nearly as plentiful as are those for RP06s.

[Diagnosis]
Missing table entries in RPXKON.

[Cure]
Add entries to the tables for blocks per unit, etc.  This is all that's
required to make RM05s work.  In all other manners, RM05s behave like
an RP06.

[Keywords]
RPXKON

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	425	AUTCON	DTRTBL
		DEVPRM	TY.RM5
		RPXKON	TYPTAB
		UUOSYM	.DCUR5


[End of MCO 14176]

MCO: 14177		Name: DPM		Date:  6-Mar-89:05:54:25


[Symptom]
DAEMON error logging.

[Diagnosis]
Yes.

[Cure]
Convert more old-style calls to use System Error Blocks.  Changes
in this edit include:

1. Channel NXM & parity error logging.
   7.04 records written by DAEMON contained mostly junk.

2. DECtape error logging.

3. KS10 memory error logging.
   Doc change:  This adds one word (.CPMFL) to the CPU subtable
   FOR KS10 memory errors.  This word is a flag which indicates
   the last type of error (0 = soft, 1 = hard).  Also, the length
   of the subtable (.CPMSL) was off by one word and has now been
   corrected.

4. KS10 card reader & line printer error logging.

[Keywords]
ERROR LOGGING

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO
Documentation change

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	425	APRSER	MEMCHK
704A		CD2SER	CDRSYR
		COMDEV	DTXEST,DTXEFL,DTXEBK
		COMMON	.CPMFL,.CPMSL
		DTASER	ERRS,DTASYR
		ERRCON	CHNCO3
		LP2SER	LPTSYR


[End of MCO 14177]

MCO: 14178		Name: RCB		Date: 13-Mar-89:15:12:41


[Symptom]
STOPCDs AAO and IME, undeserved address checks, and undeserved checksum errors
during dump-mode I/O.

[Diagnosis]
In the old days before 7.03, LRNGE was called to range-check an IOWD.  It
checked everything we needed to have checked just fine.  One of the things
which it checks is that the range of addresses does not cross a section
boundary.  Thus, it was no longer appropriate once .FOFXI/.FOFXO (extended dump
I/O) were added to the FILOP. UUO.  MONPFH does not check that old-style
IOWD-based I/O does not cross a section boundary, nor does it check that the
I/O is not done to the ACs.  This can lead to AAOs.  If the user's working set
includes swapper-write-locked pages, then MONPFH will call LRGNE, even though
it might be doing extended I/O, thus resulting in an undeserved address check
error for an I/O doubleword which crosses a section boundary.

If FILIO has to perform error recovery and retries during a dump-mode I/O
operation which ends at a section boundary, under some circumstances it leaves
DEVISN containing bogus information in the DDB.  If this was also the first
block in a retrieval pointer, we will then proceed to attempt to calculate the
checksum based on a user address which we calculate, in part, from this junk in
DEVISN(F).  This can cause either an IME or an undeserved checksum error.

Finally, much of the above is exacerbated by NXCMR in UUOCON, which is the
common routine used to fetch and validate the next IOWD in a user's channel
command list.  It does not validate correctly when MONPFH passes it an IOWD
which either starts with or crosses a section boundary.

[Cure]
Teach NXCMR how to validate all IOWDs which PFDOIO might pass.  Correct all
incorrect uses of DEVISN(F).  Teach PFDOIO to use ZRNGE rather than LRNGE when
it wants to fix up swapper-write-locked pages.  Teach PFHDMP to give an address
check error when an old-style IOWD crosses a section boundary.  Teach CHKSUM to
use GETEWD rather than GETWRD, so that it always fetches the correct word from
the user's buffer.  Teach PFDOIO to validate the range of address for I/O in
order to be sure that I/O is not attempted to the ACs.

[Keywords]
DUMP I/O
AAO
IME
ADDRESS CHECK
CHECKSUM ERROR
IOIMPM
IO.IMP

[Related MCOs]
13932, 13137

[Related SPRs]
35576, 36064

[MCO status]
Checked

[MCO attributes]
Extended addressing only

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	426	UUOCON	FOPN9B,UINITC,RELEA4,NXCHIT
704A		MONPFH	PFHDM1,DOIO2
		FILIO	SATADR,MONIOY,SETLS7,POSER2,ECC2,ECC3,NOECC,CHKSUM,CSHC2B,CSHB2C
		FILUUO	DUMPG9


[End of MCO 14178]

MCO: 14179		Name: JEG/DPM		Date: 21-Mar-89:05:41:31


[Symptom]
FILSER doesn't usually continue from a DHD stopcode (Don't Have DA).

[Diagnosis]
If IOSDA is off in S (but not necessarily in DEVIOS), then a DHD
will result.  But if the job really does own the DA resource, it
will hang, since the DA is never released.

[Cure]
Let the DHD return .+1.  Further checks will prevent the DA from
being returned for the wrong job (a RWD is likely).  If we manage
not to get a RWD, then the DA will be released and the monitor
will continue with no problems.

[Keywords]
STOPCODE DHD

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	427	FILIO	DWNDA
704A	


[End of MCO 14179]

MCO: 14180		Name: JEG/DPM		Date: 21-Mar-89:05:45:20


[Symptom]
Stopcode KLPKAF following parity scans.

[Diagnosis]
A parity scan requires more than KAFTIM seconds to complete.
If PPDSEC doesn't get called soon enough (and it won't because
of the scan), it declares the KLIPA dead.

[Cure]
Increase KAFTIM from 10 to 35 seconds.  This allows about 8 seconds
per meg plus a few extra for good measure.  Increase KNISER's timer
(also called KAFTIM) from 30 to 35 seconds too.

[Keywords]
KLPKAF

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	427	KLPSER	KAFTIM
704A		KNISER	KAFTIM


[End of MCO 14180]

MCO: 14181		Name: JEG/DPM		Date: 21-Mar-89:05:49:40


[Symptom]
DI hangs on RA failovers.  A failover can leave several jobs stuck in
"problem on device" mode for the old unit, even after lots of time
passes.

[Diagnosis]
PCLDSK may inadvertantly get called with an "old" unit if a failover
is happening while another CPU is preparing to start I/O.  The "old"
unit was OK, but now, KDBCAM contains zero, causing PCLDSK to get
called.  PCLDSK sees no CPUs (and indeed there aren't any with the
old unit) and calls HNGSTP, eventually looping back to PCLDSK again
with the "old" unit.

[Cure]
If there is an online alternate port, use it and bypass HNGSTP.

[Keywords]
FAILOVER

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	427	FILIO	PCLDSK
704A	


[End of MCO 14181]

MCO: 14182		Name: JEG/DPM		Date: 21-Mar-89:05:54:13


[Symptom]
If a CPU croaks before it can be warm-restarted successfully, and
field service is able to fix it "on the fly", sometimes bad things
(usually hangs) happen immediately following the J 400.

[Diagnosis]
This can happen because a CPU restart clears SP.CJn for all jobs,
and then CPUZAPs the "running job" for the CPU, leaving a small
window when the job can be scheduled to run on another CPU.

[Cure]
Change SPRINI to call CPUZAP first, and then clear SP.CJn.

[Keywords]
WARM RESTART

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	427	COMMON	SPRLP1,SPRI11
704A	


[End of MCO 14182]

MCO: 14183		Name: JEG/DPM		Date: 21-Mar-89:05:59:19


[Symptom]
Stopcode KAF in QUESER.

[Diagnosis]
It is possible for one CPU to be didling the database at UUO level
with the EQ lock, while ENQMIN runs at interrupt level on another
CPU.  If UUO level CPU removes and releases the free core holding a
block that is being scanned by ENQMIN, KAFs or other stopcdes may
result.

[Cure]
Implement a scheme where UUO level waits for interrupt level and
interrupt level punts if UUO level holds the EQ resourse.

[Keywords]
STOPCODE KAF

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	427	QUESER	ENQMN2,EQLOCK,LOKINQ
704A	


[End of MCO 14183]

MCO: 14184		Name: JEG/DPM		Date: 21-Mar-89:06:03:20


[Symptom]
If a CI disk contains HOM blocks which look like valid but contain
a zero word for the structure name, a failover will cause PULSAR
to sniff out the disk and mount a structure with no name.

[Diagnosis]
Monitor never checks for a zero structure name in DEFSTR.

[Cure]
Return "illegal structure name" error when no name is given.

[Keywords]
DEFINE STRUCTURE

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	427	FILFND	DEFSTR
704A	


[End of MCO 14184]

MCO: 14186		Name: DPM		Date: 24-Mar-89:08:37:20


[Symptom]
Stopcode KAF in KNISER.

[Diagnosis]
On a very busy Ethernet wire, it is possible to spend more than 6
seconds at interrupt level taking packets off the KLNI.  RSX-20F
has little patience for this sort of nonsense, so it KAFs the -10.

[Cure]
Put an arbitrary limit on the number of packets that we'll process
in a single interrupt.  Experimentation has proven that trying to
remove 2100 (decimal) or more packets from the queue will result in
a KAF.  Therefore, set the limit to 2000.  Location .PBMPP (maximum
packets processed) in the KDB/PCB contains the limit and can easily
be patched to a different value.  When the limit is exceeded, a
KNIKSP (KLNI Service Paused) info stopcode will be typed on the CTY.
Then the PIA will be removed for one second to let things settle down.

[Keywords]
KNISER KAF

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	430	KNISER	KNIRQ1,KNIPAU,KNICON
704A	


[End of MCO 14186]

MCO: 14189		Name: JEG/DPM		Date: 28-Mar-89:08:22:29


[Symptom]
If a program dies with infinite IPCF quotas and freecore is very low
or about to expire, the system grinds to a standstill.  Some jobs are
stuck NApping and others get unexpected error returns.  Trying to log
off the offending job fails.

[Diagnosis]
IPCLGO does two things.  It sends a logout message to QUASAR and it
turns around all unreceived messages; in that order.  The send to
QUASAR will fail because there is no available freecore, and the
logging out job owns a large chunk of it.

[Cure]
Reverse the order of things.  First, empty the send and receive queues,
then send the logout message to quasar.

[Keywords]
IPCF LOGOUT

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	430	IPCSER	IPCLGO
704A	


[End of MCO 14189]

MCO: 14190		Name: JEG/DPM		Date:  4-Apr-89:05:14:36


[Symptom]
Stopcode IME removing a structure.  Other problems possible too.  When
allocation is in progress, or the ACCs and NMBs are in transition, and
a structure is being removed, an IME is likely to occur on a busy system.

[Diagnosis]
TAKBLK and friends rely on DEVUNI(F) to indicate the target unit for a
structure is still valid.  FILSER normally depends upon TSTGEN checking
UNIGEN.  The window is sufficiently large to allow the SKIPN DEVUNI to
work while REMSTR is removing a structure.

[Cure]
Change TAKBLK to call TSTGEN.  Make BMPGEN get and release the DA around
the update of UNIGEN.

[Keywords]
DISMOUNT

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	431	FILFND	BMPGN1
704A		FILIO	TAKBL0,TAKBLJ


[End of MCO 14190]

MCO: 14191		Name: RCB		Date:  5-Apr-89:22:16:13


[Symptom]
Hung ANF traffic to a node.  Especially common over an Ethernet channel.  It
may (sometimes) correct itself eventually, especially if it was not an Ethernet
channel that was involved.

[Diagnosis]
After NETWRT queues an output message (PCB) to the FEK, it calls its device
driver to perform the output.  This can happen several times before the device
driver tells the FEK routine that the output has happened.  At that point, the
FEK routine tell NETSER that the message has been sent.  This causes the PCB to
placed on a generic output-done queue for NETSCN to process.  Once we get to
NETSCN, we move PCBs from the this queue to a queue for the NDB for the node to
which we were sending the message.  The subroutine responsible for this,
NTSC.O, is also responsible for keeping the NDBLMS (last message sent) field
updated.  It does this by noting the message number of each PCB it places into
the output-pending queue in NDBLMS.  However, the PCB queue from which it is
taking these messages is unordered, and this can lead to having a very long
list of messages, with NDBLMS reflecting only (for example) the first of them.
Once this has happened, CHKNCA (check network-control ACK) will ignore an ACK
for any message beyond that in LDBLMS.  However, the remote is quite likely to
send us an ACK for the actual last message in the ACK-pending queue.  This
leads to a full output queue and a refusal to transmit any further data
messages, at least until the REP/NAK timer causes us to send a REP, which will
result in a NAK.  Because we ignored the implicit ACK present in the NAK, we
will still have a queue of outstanding messages, which the NAK will cause us to
retransmit all at once.  Unless the device driver stutters in a friendly
manner, this will merely get us into the same mess again with the same set of
messages, and no progress will ever be seen.

[Cure]
In NTSC.O, only change NDBLMS if it's moving in a forward direction.  In
INCTNK, where we resend the queue in response to a NAK, reset NDBLMS to NDBLAP
in order to avoid possible ACK races.

[Keywords]
ANF Ethernet
Hung ANF

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
Checked

[MCO attributes]
HOSS attention

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	432	NETSER	NTSC.O,INCTNK
704A	


[End of MCO 14191]

MCO: 14192		Name: RCB		Date:  5-Apr-89:22:55:57


[Symptom]
Terminal characteristics get handled incorrectly during a SET HOST
session which is handled by NETVTM.

[Diagnosis]
Setting the terminal type happens after all the other characteristics
get set, and clobbers them.

[Cure]
Save the other characteristics until after we set the terminal type in VTMCHR.

[Keywords]
NETVTM
SET HOST
terminal characteristics

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
Checked

[MCO attributes]
HOSS attention

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	432	NETVTM	VTMCHR
704A	


[End of MCO 14192]

MCO: 14193		Name: DPM		Date:  6-Apr-89:11:36:40


[Symptom]
MCO 14190 went a bit too far.

[Diagnosis]
In trying to close the window where a structure could be removed
while other things were being done to the ACC/NMB blocks, BMPGEN
was modified to get and give the DA resource.  However, one needs
a DDB to use the DA and REMSTR doesn't have one to use.  Also,
BMPGEN expects F to contain a STR DB addr, not a DDB.

[Cure]
Can't plug the hole that tight.  Remove references to the DA in
BMPGEN and live with occasional IMEs.  There is no structure-wide
resource to take care of this situation.  Too bad.

[Keywords]
REMSTR

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	432	FILFND	BMPGEN
704A	


[End of MCO 14193]

MCO: 14194		Name: KDO		Date:  6-Apr-89:12:18:04


[Symptom]
Invalid status returned in ETHNT. UUO User Buffer Descriptor (UBD) blocks.

[Diagnosis]
Missing code.

[Cure]
Add code.

[Keywords]

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
KL10 only

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	432	ETHUUO	ENCXDG

704A	


[End of MCO 14194]

MCO: 14195		Name: KDO		Date: 10-Apr-89:10:54:04


[Symptom]
Adjacency up/down events for DECnet endnodes on multi-area LANs.

[Diagnosis]
DECnet is choosing a designated router outside it's area.

[Cure]
Ignore Ethernet Router Hello messages from outside our area.

[Keywords]

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
KL10 only

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	432	ROUTER	RHMASE

705	


[End of MCO 14195]

MCO: 14197		Name: DPM		Date: 11-Apr-89:08:44:04


[Symptom]
REFSTR creates files with strange version numbers.

[Diagnosis]
Sticking REFSTR's version rather than the monitor's is at best,
non-standard.  But when displayed by DIRECT, it looks like a bug.

[Cure]
Use CNFDVN instead.

[Keywords]
REFRESH

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	423	REFSTR	RIBST1


[End of MCO 14197]

MCO: 14198		Name: LWS		Date: 17-Apr-89:10:18:26


[Symptom]
1. Same TM02/3 controller register dumped 9 times in TUB
on error.

2. SPEAR doesn't know how to interpret TM02/3 controlled tape drive
error entries or the monitor doesn't give SPEAR what it expects, take
your pick.

[Diagnosis]
1. RDREGS in TM2KON expects T2 to still contain controller
register number on return from RDMBR. RDMBR clears all but register
data in T2.

2. SPEAR expects 2 equal length blocks of error status information
in the error entry (IEP and FEP data). However, TM2KONs IEP length
is 1 and FEP length is 16 (octal). So we only write 1 word of "IEP"
information. This causes SPEAR's interpretation of the error to be
garbage (1. above doesn't help either).

Note: The TUB for a TM02/3 controlled tape drive contains 2 blocks
of TM2ELN words each for "IEP" and "FEP" error information. But
the IEP word is set for only a length of 1. Why? I don't know.
Poking the IEP word on 2476 to be the same as the FEP word causes
2 sets of error information to be dumped and SPEAR correctly
interprets the error. So it seems we can change SPEAR to handle
unequal length "IEP" and "FEP" error blocks, or have the monitor
dump equal length blocks.

[Cure]
1. PUSH/POP T2 around call to RDMBR at RDREG1.

2. Change LH of TUBIEP in TM2KON to be -TM2ELN.

[Keywords]
SPEAR
TM02/3
TU77

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO
Field service attention
PCO required

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	433	TM2KON	TUBIEP,RDREG1


[End of MCO 14198]

MCO: 14199		Name: LWS		Date: 17-Apr-89:11:17:10


[Symptom]
Can't assign a network device if a device of the same
type doesn't exist on the local host.

[Diagnosis]
If the device doesn't exist on the local host there
will not be an entry in GENTAB for the corresponding device.
The call to CHKGEN in DVSTAS will fail and we bomb the user even
though the network device does exist and is assignable.

[Cure]
At the non-skip return after the call to CHKGEN in DVSTAS
load F with the start of the DDB chain and fall through into
code that will eventually do the right stuff. But! This is not
going to work correctly all the time. If no local line printers
exist and we're trying to find a network printer DDB, we eventually
build a DDB for the network printer and try to link it between
the 'DSK' DDB and the 'SWAP' DDB - ding ding ding, IME. This happens
because LNKDDB in AUTCON likes to keep the DDB chain in sorted order
by device name. So 'LPT' falls between 'DSK' and 'SWAP', but 'DSK'
DDBs are in the hiseg. In order to avoid the wrath of FILSER, change
the name of SWPDDB to 'DSKSWP'.

[Keywords]
NETWORK DEVICE

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
Beware file entry required
New development MCO
PCO required

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	433	COMMOD	DEVNAM
		UUOCON	DVSTAS


[End of MCO 14199]

MCO: 14200		Name: LWS/DPM		Date: 20-Apr-89:10:52:06


[Symptom]
Tape UDBs on KS not filled in with prototype data.

[Diagnosis]
AUTUDB doesn't compute ending address for BLT.

[Cure]
ADDI P2,(U)

[Keywords]
KS
SPEAR

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO
Field service attention
PCO required
Single-section monitors only

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	434	AUTCON	AUTUD1


[End of MCO 14200]

MCO: 14201		Name: RCB		Date: 25-Apr-89:07:05:00


[Symptom]
FILSER's error reporting leaves too big a window for DAEMON to get stale
information for SPEAR to report.  Not only that, but DAEMON even has to guess
just what kind of error it is supposed to report.

[Diagnosis]
The ERRPT. UUO just doesn't give us enough to work with.  We need to use
system error blocks if we're going to get it right.

[Cure]
Do so.  This adds EX.AVL to the bits which can be set in the transfer table
header by the SEBTBL macro.  If EX.AVL is set, the error entry will be copied
to AVAIL.SYS as well as to ERROR.SYS.  This also changes the way in which all
disks report their errors.  There is now a kontroller dispatch entry, KONELG,
which is used by FILIO to format an error block and queue it up for DAEMON.

[Keywords]
Disk errors
Error logging
DAEMON
System error blocks
SPEAR

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
Checked

[MCO attributes]
Beware file entry required
New development MCO

[BEWARE text]
The format of the DSK KDB has changed again, with the addition of the KONELG
dispatch entry for error logging.  Any local disk device drivers will need to
be changed accordingly.  See MDEELG in FILIO for an example of how to do this.
DAEMON version 23A(1026) or later must be installed before this MCO.

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
704A		FILIO
705	434	COMMON
		DPXKON
		FHXKON
		FSXKON
		RAXKON
		RHXKON
		RNXKON
		RPXKON
		DSXKON
		COMMOD
		DEVPRM
		S
		ERRCON
		DTASER
		COMDEV


[End of MCO 14201]

MCO: 14202		Name: RCB		Date: 25-Apr-89:07:16:53


[Symptom]
Jobs get stuck in event wait for system IPCF, and need manual intervention to
be restarted.  If they were logging out at the time, the job slot is stuck and
useless.

[Diagnosis]
[SYSTEM]GOPHER is completely ignorant of the possibility that a system program
like the account daemon might die and get logged out, thus causing its IPCF
receive queue to be "returned to sender, address unknown".  It just throws the
returned messages on the floor, and leaves the user's job waiting for an
acknowledgement message which will never come.

[Cure]
Educate the rodent.  Check the returned message field, and validate it
against the expected sequence number.  If it matches, give the user an error
return from SENDSP, so that a QUEUE. UUO (for example) will give the "component
not running" error, and FILDAE messages will be handled as though FILDAE had
never been running.

[Keywords]
EW hang
System IPCF wait

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
Checked

[MCO attributes]
New development MCO

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
704A		IPCSER
705	434


[End of MCO 14202]

MCO: 14203		Name: RCB		Date: 25-Apr-89:08:11:52


[Symptom]
System error blocks can eat up all of free core if DAEMON isn't running.

[Diagnosis]
Once they get queued, they are only deleted when some privileged program
executes a SEBLK. UUO.

[Cure]
Add a timer.  Once a minute, we will look for any blocks which are older than
SEBAGE minutes and delete them.  SEBAGE defaults to 10 (decimal), and can be
changed with MONGEN.  If SEBAGE is set to zero, the error blocks will live
forever.

[Keywords]
System error blocks
free core limits

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
Checked

[MCO attributes]
New development MCO
Documentation change

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	434	ERRCON
704A		COMMON
		CLOCK1


[End of MCO 14203]

MCO: 14204		Name: DPM		Date: 27-Apr-89:06:36:33


[Symptom]
     In some configurations, LINK will report NDBNNM as undefined even
though ANF-10 network software is loaded.

[Diagnosis]
     This problem is one of programming style  and  MACRO's  tolerance
for  conflicting  symbol  definitions.   NDBNNM  is defined in NETPRM,
which is searched by NETSER.  The first  several  references  to  this
symbol are properly made.  However, at NDBAS1 the symbol is referenced
as external.   MACRO  should  probably  flag  this  as  a  "E"  error.
Instead,  the original value of the symbol is lost and MACRO generates
global fixup requests for all references to NDBNNM.   It's  not  clear
why  this  problem  has  surfaced  now,  as the code at NETAS1 has not
changed for several monitor releases, but correcting the reference  in
NETAS1 makes resolves the undefined global.

[Cure]
     Reference NDMNNM as an internal quantity.

[Keywords]
UNDEFINED GLOBAL

[Related MCOs]
13932, 13137

[Related SPRs]
36260

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	435	NETSER	NETAS1
704A	


[End of MCO 14204]

MCO: 14205		Name: RCB		Date: 27-Apr-89:18:49:36


[Symptom]
MCO 14165 didn't go far enough.  PAGE. UUO function .PAGAC still isn't always
right.  Spy pages for sections 3-36 are sometimes reported as being unreadable.

[Diagnosis]
PAGA93, which finds a page number to return for a spy page in sections 3-36,
doesn't preserve T2.  Its caller wants T2 to contain the map entry after the
call, as well as before.

[Cure]
Preserve the map entry in T2.

[Keywords]

PAGE. UUO
Page accessability

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
Checked

[MCO attributes]
New development MCO

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	435	VMSER	PAGA93
704A	


[End of MCO 14205]

MCO: 14206		Name: DPM		Date: 28-Apr-89:05:51:47


[Symptom]
Stopcode IME removing a structure (revisited).

[Diagnosis]
Previous MCOs didn't plug all the holes, although the window was made
much smaller.

[Cure]
Prevent races by incrementing UNIGEN while holding the DA.  Conceptually,
this is easy, but BMPGEN is called with F pointing to a STR, not a DDB.
Therefore, change UPDA & DWNDA to get the job number from .USJOB rather
than from PJOBN.  This is OK since the use of DA requires a job to be
mapped to reference PJOBN anyway.

[Keywords]
REMOVE STRUCTURE

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	435	FILFND	BMPGN1
704A		FILIO	UPDA,DWNDA


[End of MCO 14206]

MCO: 14207		Name: LWS		Date: 29-Apr-89:14:48:25


[Symptom]
Can't create a SSL larger than .SLMXJ (maximum JSL size)
structures using STRUUO.

[Diagnosis]
Code in SLSTRR and SLCHK always use .SLMXJ as a maximum
without checking to see if its a JSL or the SSL.

[Cure]
Check search list type (RH(F)=0 means SSL) and use appropriate
maximum value.

[Keywords]
SSL

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	435	FILFND	SLSTRR,SLCHK

704A	


[End of MCO 14207]

MCO: 14209		Name: DPM		Date:  8-May-89:08:57:19


[Symptom]
Pathological names whose first component is NUL do not necessarily
behave as the NUL device.  A DEVCHR, or DEVTYP of the sixbit name
returns disk-only bits.  The same is true if you do one of these
UUOs on an open channel.  However, if a LOOKUP or ENTER is done,
then the right thing comes back.  Also, DEVNAM never returns NUL
and WATCH FILES doesn't expand the filespec correctly.

[Diagnosis]
The monitor believes a pathological name can only be a disk device
and everybody knows that NUL is really a disk even though it claims
to be all devices.  But FILSER doesn't make that claim often enough.

[Cure]
Fix SETDDB to test for pathologcal NUL as well as assigned NUL.  Change
NULTST to test for DVDSK and DVTTY instead of sixbit NUL.  Fix PRTDDB
to print NUL instead of a logical device name.  Add crock routine
LNMNUL to do the grunt work when it's really necessary to know if
it's the NUL device.

[Keywords]
NUL

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	436	COMCON	PRTDDB
		FILUUO	NULTST,LNMNUL,SETDDB
		UUOCON	DVCHR,UDVNAM

704A	


[End of MCO 14209]

MCO: 14211		Name: DPM		Date: 15-May-89:09:29:52


[Symptom]
It's difficult to measure magtape performance on a per-kontroller
basis without using any counters.

[Diagnosis]
Never done before I guess.

[Cure]
Add two new counters to the KDB:  TKBCRD counts characters read and
TKBCWR counts characters written.

[Keywords]
MAGTAPE PERFORMANCE

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	437	DEVPRM	TKBCRD,TKBCWR
		T78KON
		TCXKON
		TD2KON
		TM2KON
		TMXKON
		TS1KON
		TX1KON

704A	


[End of MCO 14211]

MCO: 14212		Name: LWS		Date: 15-May-89:10:21:02


[Symptom]
Undeserved ?Illegal memory reference in jobs with a shared
hiseg.

[Diagnosis]
If a sharable hiseg is expanding and there are enough
secondary map slots available to map the expansion, RDOMP is not
set for any other job using the same hiseg.

[Cure]
In GTHMAP, if there are enough map slots for the expansion, call
HRDOMP via HGHAPP so all other users of the same hiseg will have their
maps redone before they run again.

[Keywords]
Sharable high segments

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO
PCO required

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	437	SEGCON	GTHMP1

704A	


[End of MCO 14212]

MCO: 14214		Name: JAD		Date: 25-May-89:07:51:12


[Symptom]
Possible SCAFOO stopcodes in a maximally-configured CI network.

[Diagnosis]
Insufficient path blocks available for the number of CI nodes and
CPUs in the CI/system configuration.  There is space available for
32 path blocks, but a maximally-configured system could require
much more.  Problem occurs with definition of C%PBLL (number of
path blocks) - it is defined as 2*C%SBLL (number of system blocks).
Depending on the number of CI nodes and CPUs, this definition may
leave insufficient path blocks.

[Cure]
Redefine C%PBLL as 6*C%SBLL - this will allow for the largest
possible CI and CPU configuration.

[Keywords]
CI
SCAFOO

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO
KL10 only

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705		SCAPRM	C%PBLL


[End of MCO 14214]

MCO: 14217		Name: DPM/RJF		Date: 30-May-89:06:58:55


[Symptom]
Various problems suspending and resuming a system:

1. KLIPAs and KLNIs don't get reloaded.

2. KLNIs will be restarted even if they had first been removed.

3. Stopcode NULFNC during the suspend.

[Diagnosis]
1. Code to call PPDINX and KNIINI is under an IFG <M.CPU-1> conditional
   in COMMON, so it is not included in single CPU configurations.

2. When a KLNI is removed, the bit corresponding to the proper KLNI on
   a given CPU is set to indicate that the device is to be ignored on
   subsequent initialization calls.  However, IPAMSK is never checked
   on KLNI restarts.

3. For reasons that escape me, the NULFEK is being called on system
   sleep/resume when it hadn't before.  Apparently this never worked
   before, but it went unnoticed.  The dispatch table does not contain
   the appropriate entries for these NETSER functions.

[Cure]
1. Move the calls to PPDINX and KNIINI outside the IFG <M.CPU-1> conditional.

2. Teach KNIINI to respect IPAMSK on KLNI restarts.

3. Add system sleep/resume entry point to NULFEK's dispatch table.

[Keywords]
SYSTEM SLEEP

[Related MCOs]
13932, 13137

[Related SPRs]
36269

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	441	COMMON	SPRIN5
		KNISER	KNIINI
		NULFEK	NLFDSP

704A	


[End of MCO 14217]

MCO: 14218		Name: LWS		Date:  8-Jun-89:09:08:11


[Symptom]
Undeserved memory parity errors on KLs with 4MW of memory.

[Diagnosis]
RH20s do undetermined things when accessing the last physical
(quad)word in 4MW. This is an RH20 problem. This problem was never
encountered in previous versions of the monitor and BOOT. The monitor
used to put its hiseg at the very top of memory. Then BOOT occupied the
top of memory. Now, BOOT is still there, but it now frees the pages
at the top because they contain tape drivers that are not needed once
BOOT is done. So, these pages at the top of memory are free to use
by the monitor. When a user gets the last page of memory, it's fair
game for I/O by an RH20.

[Cure]
For lack of something better to do at the moment, if the last
page of a 4MW system is free, mark it as non-existant in NXMTAB and PAGTAB
and set MEMSIZ to 17,,777000 instead of 20,,000000.

[Keywords]
4 MW
parity

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO
Field service attention
KL10 only

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	442	SYSINI	MMTIN9

704	


[End of MCO 14218]

MCO: 14219		Name: DPM		Date: 27-Jun-89:06:37:34


[Symptom]
There appears to be no upper bounds on the number of extended RIBs
FILSER is content to create.  You can literally fill a disk with
extended RIBs for a single file.  When you CLOSE the file, you might
as well take the rest of the day off, because FILSER has lots of
bookkeeping to perform.

[Diagnosis]
RIBXRA contains an 8-bit field for the extended RIB number.  FILSER
never checks for field wrap around.  The RIB number is only read back
when a user specifies a negative USETI, and otherwise serves no real
purpose.

[Cure]
Check for wrap around and impose an additional limit based on the
contents of MUSTMX when RIBs are created.  Set the maximum number
of USETIs to 255 decimal.

[Keywords]
EXTENDED RIBS

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	443	COMMOD	DESRBC,MUSTMX
		FILIO	EXTRB2

704A	


[End of MCO 14219]

MCO: 14220		Name: RCB		Date: 30-Jun-89:00:15:53


[Symptom]
An OPEN which specifies a logical name or a pathological name can fail or find
the wrong device.

[Diagnosis]
The DDB search logic does not allow certain names to be found unless they are
assigned to disks (i.e., funny-space DDBs).  CK2CHR gets called when it should
not.  For that matter, LP will match a terminal assigned as LPT but not as LP.

[Cure]
For 2-character device names which CK2CHR changes, do the DDB searching twice.
First, try the original name.  If that fails or returns DSKDDB, then try again
with the expanded name.  If the second search fails but the first returned
DSKDDB, then return the results from the first DDB search.  Eliminate the hacks
for CK2CHR and SY: from the search loop.

[Keywords]
PDP-11 names

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
Checked

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	444	UUOCON	DDBSCC
704A	


[End of MCO 14220]

MCO: 14221		Name: RCB		Date:  1-Jul-89:01:56:17


[Symptom]
KAF at PI level of the NIA20.

[Diagnosis]
Taking too long to empty the response queue (MCO 14186 revisited).

[Cure]
Check .CPTMF to try to be sure that too much time won't pass during a single
KLNI interrupt.  Also, move the check to after the callback so that we don't
drop the buffers on the floor.  Otherwise, after long enough, the protocols
will run out of buffers (especially DECnet).

Because .CPTMF is slightly bogus just as the system is coming up, ignore it
until .CPUPT is at least 2 (ticks).  Note that the counters and limits added
by MCO 14186 are still present and in force.

[Keywords]
KAF
NIA20
KNIKSP

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
Checked

[MCO attributes]
Field service attention
HOSS attention
KL10 only

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	444	KNISER	KNIRQ1
704A	


[End of MCO 14221]

MCO: 14222		Name: RCB		Date:  7-Jul-89:23:34:54


[Symptom]
System is annoyingly sluggish at system startup time.

[Diagnosis]
Trying to run dozens of copies of INITIA on random terminals all at the same
time, in dozens of job slots.

[Cure]
Only sort of.  Invent a new MONGEN-definable symbol, DSDRIC (dataset devices
run INITIA CUSP), to control whether INITIA runs on dataset lines.  It will
default to one, which means that INITIA will continue to run on datasets at
system startup.  If set to zero at MONGEN time, TTYINI will not force INITIA
commands on the datasets.  For the curious, the reason INITIA runs on datasets
at startup time is because of the existence of hardware interfaces which need
to have parameters set even before a call comes in to the modem.  However, most
sites probably have more well-behaved interfaces, and will be able to set
DSDRIC to zero.

[Keywords]
sluggish startup
INITIA
datasets

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
Checked

[MCO attributes]
New development MCO
Documentation change

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	444	COMDEV	DSCTAB
704A		SCNSER	TTINI2


[End of MCO 14222]

MCO: 14223		Name: DPM		Date: 18-Jul-89:05:32:00


[Symptom]
DA28s don't work.

[Diagnosis]
XTCLNK assigns junk names to UDBs.  Later calls to build DDBs fails
because the target UDBs cannot be found.  Also, XTCSER will not
assemble with FTMP turned off because of references to SCNLOK and
OUCHE.

[Cure]
Correct logic that builds UDB names.  Put IFN FTMP conditionals
around the reference to SCNLOK.  Make OUCHE available in all KL10
configurations.

[Keywords]
DA28

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	445	APRSER	OUCHE
		COMMON	OUCHTB
		XTCSER	XTCLN2,CHKTYP,MPIOWD

704A	


[End of MCO 14223]

MCO: 14224		Name: DPM		Date: 20-Jul-89:11:58:59


[Symptom]
Random job tables (mostly JBTSTS) get clobbered, wierd crashes,
general mayhem.

[Diagnosis]
Steve Perkins is running .EXE files created on the -20 again.
If the .EXE directory claims to have sharable pages that aren't
also marked as high segment pages, GETEXE returns flags indicating
the image is sharable, but with no high segment.  Parts of GET
clean up assume that if the sharable bit is on, then there must
be a high segment.  This is true for .EXE files creates on a -10,
but not otherwise.  Anyway, making this assumption, SEGCON blindly
picks up high seg block addresses (which are usually zero) and
indexing off of zero, proceeds to write all over the monitor's
low segment.

[Cure]
While processing .EXE directory entries, turn off the sharable
bit if the high segment bit is not turned on.

[Keywords]
TOPS-20 EXE FILES

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	446	SEGCON	WANTIT
704A	


[End of MCO 14224]

MCO: 14225		Name: DPM		Date: 25-Jul-89:05:28:01


[Symptom]
SA10s don't function in an environment where DF10C-based device drivers
exist (TM2KON for one).

[Diagnosis]
DF10C drivers fail to test for the presence of SA10 devices.  Therefore,
SA10s look like 18-bit DF10s.

[Cure]
Test SI.SAX in the CONI word in the appropriate xxxCFG routines.

[Keywords]
SA10

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	446	FSXKON	FSXCFG
		RPXKON	RPXCFG
		TM2KON	TM2CFG

704A	


[End of MCO 14225]

MCO: 14226		Name: DPM		Date:  3-Aug-89:09:18:56


[Symptom]
Several annoying problem that prevent SA10-based tape from working well.

[Diagnosis]
1. SAXSER & TS1KON bum a bit in the KDBUNI word to indicate a
   software interrupt was requested.  This means that KDBs can't be
   compared against each other, so AUTCON will build multiple KDBs
   for a single SA10 kontroller.

2. Tapes ported between a DX10 or a DX20 and an SA10 will have duplicate
   UDBs and DDBs built.  This is because TD2KON and TX1KON do not know
   how to extract drive serial numbers.  Subsequent comparisons between
   a drive S/N and an existing one don't match, so AUTCON beleived it's
   looking at two different drives.

3. The code to compare drive serial number is not interlocked in AUTCON.
   Under the righ circumstances, two configuring CPUs which have detected
   the same drive, might not notice the other.

[Cure]
1. Move the software bit into KDBSTS.  It's a better place for such
   things.

2. Fix TX1KON and TD2KON.

3. SYSPIF/SYSPIN around much of AUTDPU.

[Keywords]
SA10

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	447	AUTCON	AUTDPU
		DEVPRM	KD.SIR
		SAXPRM	SA.SIR
		SAXSER	SAXINT
		TD2KON	TD2DRV
		TS1KON	TS1DRV
		TX1KON	TX1DRV

704A	


[End of MCO 14226]

MCO: 14227		Name: DPM		Date:  3-Aug-89:09:20:23


[Symptom]
Possible tape hangs after a CPU restart.

[Diagnosis]
SPRINI doesn't clear the TAPSER interlock nesting flag.

[Cure]
Do so.

[Keywords]
INTERLOCKS

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	447	COMMON	SPRI10
704A	


[End of MCO 14227]

MCO: 14228		Name: RCB		Date:  3-Aug-89:15:33:21


[Symptom]
Problems setting explicit speeds on TTY lines in the ANF front ends.

[Diagnosis]
Trying to do autobaud even though the speed has been set to something
other than the autobaud speed.

[Cure]
Don't do that.  If the speed is set in the config.P11 file, and that speed
is not the autobaud speed (currently 2400 baud), override the ABD
characteristic for the line.

[Keywords]
Autobaud
Non-autobaud
TnXS
ANF10

[Related MCOs]
13932, 13137

[Related SPRs]
36270, 36268

[MCO status]
None

[MCO attributes]
Field service attention
HOSS attention

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	450	CONFIG	P11
704A		DNTTY	P11
		DNLBLK	P11
		MACROS	P11


[End of MCO 14228]

MCO: 14229		Name: RCB		Date:  8-Aug-89:22:22:51


[Symptom]
Monitor too big and slow.  Not enough free bits in JBTSTS.

[Diagnosis]
Lots of places in the monitor test bit JDC from JBTSTS.  A few
others clear it.  Only DAECOM can set it.  It is unreachable code, left over
from the old DCORE and DUMP commands and the days when DAEMON handled
virtual references for EXAMINE, DEPOSIT, and VERSION commands.  The JDC bit is
consequently never set, and all the tests for it are redundant.

[Cure]
Free up the bit in JBTSTS, and eliminate all references to it.

[Keywords]
PERFORMANCE

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
Checked

[MCO attributes]
New development MCO
Documentation change

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	450	COMCON
704A		CLOCK1
		SCHED1
		SCNSER
		S


[End of MCO 14229]

MCO: 14230		Name: DPM		Date: 15-Aug-89:06:36:14


[Symptom]
More error logging stuff ...

[Diagnosis]
Yes.

[Cure]
Convert more old-style DAEMON error logging calls to use the
System Error Blocks.  This edit converts:

1. CPU attached/detached records.
2. Node online/offline records.
3. Date/time change records.

Code is also inplace to handle system reload (.ERWHY) records, but
because of interface problems with DAEMON and AVAIL.SYS, this call
will be temporarily neutered.

[Keywords]
DAEMON ERROR LOGGING

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	450	COMCON	SETDAT
		CPNSER	CPUCSC
		NETSER	NODEAM
		S	.ERMVR
		SYSINI	SYSRLD,SYSAVL

704A	


[End of MCO 14230]

MCO: 14233		Name: RCB		Date: 22-Aug-89:09:55:53


[Symptom]
Undeserved KNIKSP stopcodes.

[Diagnosis]
 .CPTMF limit is exceeded at system startup time.

[Cure]
If .CPUPT is lower, then don't KNIKSP.

[Keywords]

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
KL10 only

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	451	KNISER	KNIRQ1
704A	


[End of MCO 14233]

MCO: 14234		Name: DPM		Date: 23-Aug-89:07:34:30


[Symptom]
Programs using external tasks (XTCSER) hang following attempts to JAM
powered off remote computers.

[Diagnosis]
If FTMP is turned off, the call to CHKTYP from DWNUNI says to never do
typeout.  DWNUNI simply returns without clearing any DA28 errors which
caused the unit to be declared down.  Thus, the DA28 becomes unusable
for all other users.  A similar situation exists where connect errors
are processed.  In this case, we forget to force the unit offline.

[Cure]
Three things.  First, fix CHKTYP to work correctly with FTMP turned off.
Second, if no typeout is to be done, skip around the message generation
code and clear the DA28.  Finally, on connect errors, always force the
unit offline whether or not we'll type a message.

[Keywords]
DA28 ERRORS

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	451	XTCSER	CHKTYP,DWNUNI,CHKCER
704A	


[End of MCO 14234]

MCO: 14235		Name: DPM		Date: 29-Aug-89:07:32:01


[Symptom]
More error logging stuff.

[Diagnosis]
Yes.

[Cure]
Teach the monitor to write the following records as system
error blocks:

	.ERCSC	Configuration status change (memory on/off line)
	.ERKSN	KS10 NXM trap
	.ERKPT	KL10/KS10 parity trap
	.ERCSB	CPU status block
	.ERDSB	Device status block

[Keywords]
DAEMON ERROR LOGGING

[Related MCOs]
13932, 13137

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	452	APRSER	PRHMF7,DAELOG,MEMELG
		COMCON	MEMONU,MEMON8
		COMMON	OLDNXM,DIACSB,DIADSB
		LOKCON	MEMOFU,MEMOF2

704A	


[End of MCO 14235]

MCO: 14238		Name: JC		Date:  1-Sep-89:13:34:18


[Symptom]
TOPS-10 is missing the TRANSLate command.

[Diagnosis]
No one ever put it in.

[Cure]
Add one.

[Keywords]
TRANSL
LOGIN
commands

[Related MCOs]
None

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
New development MCO
Documentation change

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	453	COMMON	COMTAB


[End of MCO 14238]

MCO: 14240		Name: DPM		Date:  5-Sep-89:05:41:06


[Symptom]
More error logging stuff.

[Diagnosis]
Yes.

[Cure]
1. Add support for .ERSNX (NXM sweep).

2. Add support for .ERSPR (parity sweep).

3. Turn on .ERWHY/.ERMRV logging.

[Keywords]
ERROR LOGGING

[Related MCOs]
None

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
Beware file entry required

[BEWARE text]
DAEMON version 23A(1027) or later is required.  Earlier versions
will cause .ERMRV records to be written into ERROR.SYS instead of
AVAIL.SYS.  When this happens, SPEAR will report an unknown record
type in ERROR.SYS.

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	453	ERRCON	PARSWP,PARELG,NXMSWP,NXMELG,XFRSE2
		S	EX.NER
		SYSINI	LLMSTR,AVLTBL

704A	


[End of MCO 14240]

MCO: 14241		Name: DPM		Date:  6-Sep-89:07:23:57


[Symptom]
Stopcode OVA on a KS10 during SYSINI.

[Diagnosis]
EVA pages overflow BOOT address space because the high segment
grew a bit.

[Cure]
Slide the high segment origin down 2 pages.

[Keywords]
HIGH SEGMENT

[Related MCOs]
None

[Related SPRs]
None

[MCO status]
None

[MCO attributes]
None

[Validity]

Monitor	 Load	Module	 Tags
-------	------	------	------
705	453	COMMON	MONORG
704A	


[End of MCO 14241]