Trailing-Edge
-
PDP-10 Archives
-
BB-JR93N-BB_1990
-
10,7/mon/monitr.mco
There are 9 other files named monitr.mco in the archive. Click here to see a list.
MCO: 14111 Name: JMF Date: 1-Sep-88:07:17:30
[Symptom]
Patches made to virtual user mode programs with FILDDT disappear.
[Diagnosis]
If the patch happens to get made to a write locked page, the page
doesn't get written to the swapping space the next time the job gets
swapped out.
[Cure]
If the job and the page are in core and the page is write locked,
write enable the page and decrement .USWLP before copying the data
from the patcher to the patchee.
[Keywords]
JOBPEK
[Related MCOs]
None
[Related QARs]
None
[MCO status]
Deferred
[MCO attributes]
New development MCO
QAR answer
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 UUOCON JOBPK3,JPKLW?
VMSER RTNFS0
[End of MCO 14111]
MCO: 14127 Name: JMF Date: 27-Sep-88:05:23:44
[Symptom]
Non-zero section address break doesn't work as expected.
[Diagnosis]
1) Section number gets lost in DATAO APR, in SSEUB.
2) SET BREAK command changing conditions but not break address zaps section
number.
[Cure]
1) DATAO APR,.CPAPR
2) DPB rather than various flavors of HLLxy.
[Keywords]
extended addressing
address break
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Deferred
[MCO attributes]
New development MCO
KL10 only
Extended addressing only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 APRSER SSEU2
COMCON SETB10
[End of MCO 14127]
MCO: 14131 Name: ERS Date: 13-Oct-88:09:41:14
[Symptom]
All known bad areas on a disk are not known to the monitor. Possible, but
unlikely IME.
[Diagnosis]
When we're scanning the BAT blocks we first figure out how many we
have to scan. To get this we add the number of bad regions the monitor found
to the number of areas the disk started with (bad regions found by the various
diagnostic programs). However, the latter we get by indexing off of T3. T3
happens to point to outer space.
[Cure]
Set up T3.
[Keywords]
Bad regions
Swap read errors?
[Related MCOs]
None
[Related QARs]
None
[MCO status]
None
[MCO attributes]
PCO required
QAR answer
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 REFSTR SCNBAT
[End of MCO 14131]
MCO: 14132 Name: JAD Date: 13-Oct-88:10:47:42
[Symptom]
Possible inconsistent runtimes on the KL (MCO 13856 revisited).
[Diagnosis]
Forgot one case where "Inhibit Update" was set needlessly.
[Cure]
Clean it up.
[Keywords]
RUNTIME
[Related MCOs]
13856
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 APRSER SSEUB
[End of MCO 14132]
MCO: 14133 Name: JAD Date: 13-Oct-88:10:55:32
[Symptom]
Protocol pause doesn't exist under secondary protocol, but DTESER
doesn't check before trying to effect protocol pause.
[Diagnosis]
Missing test.
[Cure]
Test ED.PPC at SETPP before doing anything rash.
[Keywords]
PROTOCOL PAUSE
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 DTESER SETPP
[End of MCO 14133]
MCO: 14134 Name: JAD Date: 14-Oct-88:10:57:48
[Symptom]
(Unsupported) feature to print PC during SET WATCH FILE output gets
wrong PC during RENAME.
[Diagnosis]
PATH UUO done by PTHFIL blows away .USMUO.
[Cure]
Use JOBPDO+1 for PC.
[Keywords]
WATCH FILE
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 UUOCON WCHPCP
[End of MCO 14134]
MCO: 14135 Name: JAD Date: 14-Oct-88:11:04:43
[Symptom]
Including expensive "want to run" time calculation is an all or
nothing proposition.
[Diagnosis]
Either you JFCL RQTPAT or you don't. If you do, it happens
every tick.
[Cure]
Invent a MONGEN-definable symbol M.NRQT which is the number of
ticks between the "want to run" time calculation. If zero, the
expensive calculation is never done. Patchable on the fly by
twiddling a variable in SCHED1.
[Keywords]
WANT TO RUN TIME
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 COMMON M.NRQT
SCHED1 RQTPAT
[End of MCO 14135]
MCO: 14136 Name: DPM Date: 18-Oct-88:03:04:41
[Symptom]
Giving up the CX resource for the wrong job.
[Diagnosis]
In CTXSER when setting context and saved page quotas, we get the CX
resource if the target job is not ourselves. This works just fine
because the purpose of the CX is to prevent a context block or PDB
from changing out from under us. However at completion of the UUO
function, we only give back the CX is we were changing our quotas.
[Cure]
Only give back the CX if the target is not ourselves.
[Keywords]
CONTEXTS
[Related MCOs]
11102
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 CTXSER XITQTA
704A
703A
[End of MCO 14136]
MCO: 14137 Name: LWS/DPM Date: 19-Oct-88:18:30:13
[Symptom]
Autoconfiguring of -20F devices works by sheer luck or
doesn't work at all. When it does work, the devices sometimes work.
[Diagnosis]
1. We send the "request for device status" msg to -20F in
the wrong format, i.e. 0 byte,,unit # byte.
2. In DCRSER and DLPSER we use the wrong half of an AC to pick up
FE device unit number.
3. We "timeshare" the same word in the device DDB for two different
things.
[Cure]
1. Change FNCTAB dispatch of .EMRDS msg to use "line/data"
format instead of "line" format. This causes msg to be sent in
correct format, i.e. unit # byte,,0 byte.
2. HRRx's --> HLRx's
3. .ORG DEVLSD's --> .ORG DEVLEN's
[Keywords]
FE devices
RSX20F
printers
readers
[Related MCOs]
None
[Related QARs]
None
[MCO status]
None
[MCO attributes]
KL10 only
QAR answer
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 DTESER FNCTAB
DLPSER DLPDT1,.ORG
DCRSER DCRDT1,.ORG
[End of MCO 14137]
MCO: 14138 Name: LWS Date: 19-Oct-88:19:06:10
[Symptom]
MCO 14126 incomplete
[Diagnosis]
In TPDSMM/CMM all tape kontrollers on the same channel
are put in maintenance mode, but I forgot about dual ported units.
Trying to put the DX20 on 1026 in maintenance mode using MTA0 as
the arg to the DIAG. UUO puts the DX10 in maintenance mode.
[Cure]
Add code to check UDBKDB and put all kontrollers found
in maintenance mode also.
[Keywords]
DIAGs
[Related MCOs]
14126
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 TAPUUO TPDSMM,TPDCMM
[End of MCO 14138]
MCO: 14140 Name: JEG Date: 25-Oct-88:04:35:36
[Symptom]
1. SA10 related crashes not as useful as they could be.
2. Missing improvements in disk code.
[Diagnosis]
1. SAXSER would squirrel away interesting data in the KDBs
on a crash if only someone would ask it to.
2. I've been busy.
[Cure]
1. Call SAXDMP from COMMON in DVCSTS.
2. Implement improved disk driver.
[Keywords]
SA10
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 COMMON DVCST2
DSXKON LOTS
[End of MCO 14140]
MCO: 14141 Name: DPM/JMF Date: 25-Oct-88:04:55:24
[Symptom]
Stopcode KAF trying to start I/O BUS printer.
[Diagnosis]
Hard to say, but it looks like LPTINI was never called, although it's
not obvious how that could happen. Further inspection reveals that the
length of the DDB is wrong. LPTCHF (PI channel flags), value 24 is the
first word in the device dependant portion of the DDB. That's also the
value of DEVCTR. If DEVCTR gets zeroed, the PI channel flags get wiped
out and the contents put into the RH of the CONSO skip chain test. The
next interrupt would not be serviced because the condition bits were all
zeroed and a KAF results.
Other devices could have other problem depending upon the usage of the
words between the starting origin (DEVLSD, DEVLLD, etc.) and DEVLEN.
[Cure]
For all incorrectly defined DDBs, origin the device dependant portion at
DEVLEN.
[Keywords]
DEVLEN
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 CD2SER CD2DDB
CDRSER CDRDDB
LP2SER LP2DDB
LPTSER LPTDDB
PLTSER PLTDDB
PTYSER PTYDDB
[End of MCO 14141]
MCO: 14142 Name: RCB Date: 25-Oct-88:05:36:08
[Symptom]
STRUUO .FSRSL (read search list) is less friendly than the GOBSTR loop that
it's supposed to replace.
[Diagnosis]
Demanding godly privs or same job to read a search list, when GOBSTR only
requires that the invoking job have the same PPN as the target job, or have
some flavor of PEEK/SPY privs, or that the job be reading the SSL.
[Cure]
Change the STRUUO's priv checking to match that of GOBSTR.
[Keywords]
STRUUO .FSRSL
GOBSTR
consistency
[Related MCOs]
13314
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 FILFND RSLSTR
[End of MCO 14142]
MCO: 14143 Name: RCB Date: 25-Oct-88:05:51:33
[Symptom]
Too hard to tell which Autopatch tape a customer is running when we get the
dumps.
[Diagnosis]
No way to distinguish between post-7.04 release monitors.
[Cure]
Change the way A00SVN and A00DLN are used in building A00VER and AXXDVN.
(These are GETTAB items %CNVER and %CNDVN.) This week's monitor will be
load 410 of 7.04A as far as the macros in COMMON are concerned. The load
numbers will be recycled annually, at the same time as we bump the minor
version number (A00SVN). This way, the version stamp on the dump will narrow
down which tape it could have been from, and a check of MONVER will allow us
to tell even more precisely. A00MCO should have been good enough, but it seems
that some customers like to change it when they install published patches.
[Keywords]
Autopatch
Revision control
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A 410 COMMON AXXVER
705 410
[End of MCO 14143]
MCO: 14144 Name: RCB Date: 25-Oct-88:06:41:41
[Symptom]
Can't always connect to TSK devices on other nodes when we should be able.
[Diagnosis]
NETDEV (called from AUTLNK) updates our NDB with our new configuration
(without benefit of interlock) but never tells anyone else in the network about
our changes.
[Cure]
Change NETDEV to light a flag for NETSCN to recompute our configuration. If
it changes, we'll mark everyone else's NDB as needing to hear about it. Later
on in NETSCN, we'll try to tell them all about it.
[Keywords]
ERTNA%
[Related MCOs]
13924
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 NETSER NETDEV,NCSCNF,NETSCN,ICMRCF
NETPRM NDB.XC
[End of MCO 14144]
MCO: 14145 Name: DPM Date: 31-Oct-88:03:58:25
[Symptom]
New: Add a couple of items that were omitted from 704 because of last minute
documentation constraints.
1. Make control-T print the CPU the job last ran on.
2. Make SET WATCH FILES print the PC of the UUO.
[Diagnosis]
[Cure]
[Keywords]
CONTROL-T
SET WATCH FILES
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 411 COMCON USECPU
UUOCON WCHPCP
704A
[End of MCO 14145]
MCO: 14147 Name: JMF Date: 9-Nov-88:07:48:35
[Symptom]
MX gets a protection failure when it tries to append to a mail file
if its running virtual.
[Diagnosis]
Can page fault after doing updating ENTER and if the UUO is
restarted, the combination of FO.PRV and junk in E+3 left over from the
LOOKUP/ENTER results in a protection failure.
[Cure]
If appending in buffered mode, call OUTF early (before updating ENTER)
to eliminate page faults after ENTER has been done.
[Keywords]
.FOAPP
FO.PRV
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 412 UUOCON FOPEN2,FOPN9B
[End of MCO 14147]
MCO: 14148 Name: DPM Date: 14-Nov-88:04:58:54
[Symptom]
Stopcode IME while performing magtape I/O.
[Diagnosis]
If buffered I/O is being done on a DX10 and if a the buffer
overhead words (.BFSTS, .BFHDR, and .BFCNT) are split accross a page
boundry such that .BFCNT resides in the page following .BFHDR, and
that page happens to get destroyed, then an IME will result when
MAKLST tries to read the user's word count for the buffer. No address
checking is done on the word count word in this case.
[Cure]
Add a call to IADRCK.
[Keywords]
MAKLST
[Related MCOs]
None
[Related SPRs]
36173
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 412 TAPUUO CHNLS2
704A
[End of MCO 14148]
MCO: 14152 Name: RCB Date: 22-Nov-88:06:01:09
[Symptom]
Files created in SYS: no longer get PRVSYS or PRYSYS as appropriate to the
extension (non-.SYS or .SYS).
[Diagnosis]
Not sure when this broke, but SYSDEV gets cleared in LH(F) and never set again.
[Cure]
Fix the places that want to know or that already check to get SYSDEV right.
In particular, don't just range check against SYSNDX, since that keeps STD: from
lighting SYSDEV. Check the actual PPN of the device instead.
[Keywords]
SYSDEV
PRVSYS
PRYSYS
[Related MCOs]
None
[Related SPRs]
36161
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 413 FILUUO SDVTSS,TSTDSK,FOUND0,CREAL5,CURPP1
704A
[End of MCO 14152]
MCO: 14153 Name: DPM Date: 28-Nov-88:09:24:30
[Symptom]
Attempts to log off a job which was stopped in the process of
logging out get a "No such device" error.
[Diagnosis]
If a job was somehow stopped while logging out, the job may have
been partially destroyed. In particular, there may be no remaining
context blocks. Subsequent attempts to kill the job fail because the
run of the LOGIN program to log the job out will not succeed. This is
because DDBSRC fails when no context block is found.
[Cure]
The situations surrounding this problem are pretty arcane.
Typically, a job gets into this state because an idle job killer
incorrectly selects a job which is already logging out. The usual
methods of such programs include forcibly HALTing the job in a manner
which bypasses JACCT and Control-C trapping. Hence, the resulting
problem of a halted and partially destroyed job is caused by a
privileged program circumventing privileged protection schemes.
There are a couple of different approaches to solving this
problem. The simplest is to defend against idle job killers. If the
job is logging out, never allow a job to be stopped. This is most
easily accomplished by testing PD.LGO in word .PDDFL of the PDB in the
routine SIMCHK. PD.LGO is turned on by the LOGOUT UUO.
[Keywords]
LOGOUT
[Related MCOs]
None
[Related SPRs]
35781, 36146
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 414 CLOCK1 SIMCHK
704A
[End of MCO 14153]
MCO: 14154 Name: KDO Date: 28-Nov-88:19:39:01
[Symptom]
Definition of the context block is esthetically unappealing.
[Diagnosis]
"symbol" == "previous symbol" + "a bunch"
[Cure]
Use .ORG instead.
[Keywords]
maintainability
cleanliness
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 414 CTXSER .CTFLG
704A
[End of MCO 14154]
MCO: 14155 Name: KDO Date: 18-Dec-88:18:02:24
[Symptom]
Cannot define the default circuit cost for each device type.
[Diagnosis]
No code.
[Cure]
Add the following symbols to COMDEV:
%RTCTST circuit cost for TST device
%RTCDTE circuit cost for DTE device
%RTCKDP circuit cost for KDP device
%RTCDDP circuit cost for DDP device
%RTCCIP circuit cost for CI device
%RTCETH circuit cost for Ethernet device
%RTCDMR circuit cost for DMR device
These symbols are used in the KONCST table of D36COM.
[Keywords]
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 415 COMDEV
D36COM KONCST
ROUTER
[End of MCO 14155]
MCO: 14156 Name: DPM Date: 4-Jan-89:06:28:51
[Symptom]
The system wide VM counters for IW and NIW page faults are
half-word quantities which don't take too long to overflow.
[Diagnosis]
Old monitors didn't page fault too often. Now they do.
[Cure]
Add two new GETTABs:
%VMIWS==42,,113 ;SYSTEM COUNT OF "IN WORKING SET" FAULTS
%VMNIW==43,,113 ;SYSTEM COUNT OF "NOT IN WORKING SET" FAULTS
Also, because SYSTAT and SYSDPY are crufty programs and not easily
modified keep SYSVCT up to date, but mark it and GETTAB %VMSPF as
obsolete to entice programs to use the new counters. If SYSTAT and
SYSDPY are ever fixed, the monitor will cease to maintain SYSVCT,
so programs shouldn't rely on %VMSPF.
[Keywords]
VM COUNTERS
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
UUOSYM change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 416 COMMON SYSVCT,SYSIWS,SYSNIW
704A MONPFH PFHXCI,PFHXCN
UUOSYM %VMIWS,%VMNIW
VMSER USRFL7
[End of MCO 14156]
MCO: 14157 Name: RCB Date: 11-Jan-89:20:14:12
[Symptom]
Problems with TSK devices:
1) Can't always do an "enter passive" which is restricted to a specific node.
2) The count of TSK devices is decremented more often than it's incremented.
[Diagnosis]
1) The remote doesn't admit to TSKs until someone does an unrestricted
"enter passive" there.
2) AUTKIL is checking the next DDB's station number value rather than that of
the DDB being removed when deciding whether to decrement the device count.
[Cure]
1) Always claim at least one TSK DDB if TSK service is loaded.
2) Check the right DDB in AUTKIL.
[Keywords]
TSK
NETCNF
DDBCNT
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A 417 NETSER NTSC.C
705 417 AUTCON AUTKI4
[End of MCO 14157]
MCO: 14158 Name: RCB Date: 11-Jan-89:21:17:51
[Symptom]
Jobs using the MIC RESPONCE feature hang sometimes on a terminal, and always
on a PTY.
[Diagnosis]
Race condition in MICLG3 which can cause us never to notify MIC that it's time
to take the response, and a mistaken test in PTYSER (the JOBSTS UUO) that won't
let us even try to notify MIC that the time has come.
[Cure]
Yes.
[Keywords]
MIC RESPONCE
MIC UNDER BATCH
[Related MCOs]
13932, 13137
[Related SPRs]
36167
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A 417 SCNSER MICLG3,TOPMCL,TOPMS1,TOPMG1
705 417 PTYSER UJBST6
[End of MCO 14158]
MCO: 14159 Name: RCB Date: 20-Jan-89:14:07:59
[Symptom]
Fallback presentation of eight-bit characters doesn't work when a free CRLF is
required by the character expansion.
[Diagnosis]
The code to re-eat a character for echo or output doesn't handle the case of a
multi-part character expansion.
[Cure]
Keep track of which character (from an expansion or otherwise) caused the line
wrap, so we can send the right one when the time comes to re-eat it.
[Keywords]
Two-part characters
Three-part characters
Fallback presentation
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A 417 SCNSER LDBOST,XMTCH1,XMTREO,XMTREE,REEAT
705
[End of MCO 14159]
MCO: 14160 Name: LWS Date: 29-Jan-89:20:49:51
[Symptom]
1. Problems detecting "data errors" on 20F card readers.
2. 20F card reader ignored after reading a card with a 9-punch in
column 1.
[Diagnosis]
1. Part of the problem is 20F itself. In V16-00, when a data
error occurs, the bad data is passed to the -10. Then the status msg
comes, but since I/O is not in progress we pitch the status msg.
V16-01 of RSX20F fixes the problem of passing the bad data instead of
just sending a status msg. (and fixes the problem where it always sends
a status msg after any data transfer from the reader).
The monitor never checks status bits that indicate a data error. The bit
it checks is not set by 20F when a read/stack/pick check occurs.
2. Before processing any msg from 20F, DCRSER calls SETRGS to setup
ACs and find the DDB etc. The first thing SETRGS does is test the 1st byte
of the msg for the "non-existant" device bit - useful during autoconfiguration
when examing a status msg. However, on a data transfer, a 9-punch
in Col. 1 happens to be the same bit!
[Cure]
1. Check read/pick/stack check bits in status byte also.
2. Change SETRGS entry point to SETRGX and only call it when a status
msg is received. (the ONLY time we care about non-existant devices).
Move SETRGS entry down a few instructions where it starts looking
for a reader DDB.
[Keywords]
card readers
RSX20F
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 420 DCRSER SETRGS,F11DVS
[End of MCO 14160]
MCO: 14161 Name: DPM Date: 1-Feb-89:08:24:52
[Symptom]
In some configurations, an offline alternate port claims to be an
RP04.
[Diagnosis]
This problem is highly dependent upon timing and configuration,
and only affects MASSBUS disks. At system startup time, the disk
drives are autoconfigured. Drive type information is gathered and
properly stored in the unit data blocks. Later, ONCMOD will build the
in-core structure data base and again, attempt to read the drive
types. This redundant drive type check exists to guard against the
operator swaping LAP plugs, thus changing an RP06 into an RP05. If
the drive type register cannot be read, then incorrect data is stored
in the drive type byte in the unit data block.
It is not clear why the second attempt to read the drive type
register fails. The DATAI to read the register returns zeros.
Normally, this could happen because the other port is busy or if the
last I/O operation on the other port failed to do a dual-port drive
release upon completion. Since the drive is offline, no I/O was
started. Also, it has been observed that if one or more online drives
exist with a higher unit number, then the problem disappears. This
indicates a possible hang in the controller. In addition, if the
interval between checking the primary port and and the alternate port
is sufficiently long, then the DATAI always succeeds.
[Cure]
Problems similar to this have existed for at least 3 monitor
releases. The only flaw which all monitors have in common is that the
failure to read the drive type is ignored and junk overwrites the
drive type code in the unit data block. A simple solution is to jump
around the code which stores the drive type byte. After all this
time, it seems unlikely we will determine the real nature of the DATAI
failure, so this work around must suffice.
[Keywords]
RP04
[Related MCOs]
13932, 13137
[Related SPRs]
36230
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 420 ONCMOD TRYUNI
704A
[End of MCO 14161]
MCO: 14162 Name: JEG/DPM Date: 6-Feb-89:05:34:53
[Symptom]
Day one 7-series bug: Stopcode DAU and corrupt user core images.
[Diagnosis]
A user job enables for clock interrupts via APRENB. APRSUB proceeds
in a normal service-a-clock-tick fashion, but notices that the user
has requested a clock-interrupt, and so it exits not with POPJ but
with a JRST off to APRUTP. APRUTP may decide to fall thru to APRUT2.
If T4 doesn't have UE.PEF/UE.NXM on (and it won't of course) it will
continue to fall thru. APRUT2 will decide there is a loop in the trap
handler (and there is). At this point APRUT2 loads the double word PC
into T3/T4, and saves it off to .CPAPC for the error message. Then
it branches off to APRUTW. APRUTW sets up the APRLOP PC, and exits off
to APRSTU. APRSTU looks at T4 expecting possibly to find UE.PEF!UE.NXM,
but instead it has some PC bits left over from APRUT2. This fools
APRSTU into calling DIENLK.
[Cure]
At APRUT2, don't clobber T4 with a PC. Use T1/T2 instead.
[Keywords]
CLOCK INTERRUPT
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 421 CLOCK1 APRUT2
704A
[End of MCO 14162]
MCO: 14163 Name: DPM/RCB Date: 7-Feb-89:07:57:02
[Symptom]
A user can PIVOT away from a PPN that the CHGPPN checks
will not allow him to return to.
[Diagnosis]
Oversight.
[Cure]
Always allow CHGPPN to work if returning back to the job's
logged-in PPN.
[Keywords]
CHGPPN
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 421 UUOCON CHGPPN
704A
[End of MCO 14163]
MCO: 14164 Name: TL Date: 10-Feb-89:15:34:21
[Symptom]
RX2 STOPCDs
[Diagnosis]
If the RX20 (RX211) controller is broken such that TR is not returned to
STRTIO, it is possible (but unlikely) for the RX20 controller to post an
error interrupt.
If it does, then the error interrupt service routine will free the controller,
or, worse yet, schedule IO for another drive. In either case, we return from
the interrupt back into STARTIO, where we now write the drive registers out of
sync with what the controller expects.
This causes an error interrupt, and since no drive is (probably) active,
an RX2 STOPCD.
[Cure]
Turn the PI system OFF while in STARTIO. On TR errors, deschedule
the controller before turning it back ON. Since it's possible for the
KS to accept a vectored interrupt before the deschedule code resets
the interrupt enable bit, teach RX2INT to dismiss unexpected interrupts
rather than STOPCD.
[Keywords]
RX2
STOPCD
RX2SER
RX20
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
KS10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 422 RX2SER RX2INT,STRTIO,SETPAR
704A
[End of MCO 14164]
MCO: 14165 Name: RCB Date: 14-Feb-89:07:45:35
[Symptom]
PAGE. UUO function .PAGAC does not work right for non-existent pages in
mapped sections.
[Diagnosis]
The code to report non-existent pages and/or independent sections is
not sufficiently forgiving of dependent sections.
[Cure]
Always return the mapping information for dependent sections.
[Keywords]
Page Accessability
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Extended addressing only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 422 VMSER PAGAC1,PAGAC6,PAGAC7
704A
[End of MCO 14165]
MCO: 14166 Name: DPM Date: 14-Feb-89:07:59:06
[Symptom]
Ill mem ref running SPEAR following magtape error logging. Seen mostly
with multi-ported tapes, but theoretically possible on any tape.
[Diagnosis]
When two or more kontrollers have access to the same tape drive, the IEP
and FEP blocks are timeshared. It is expected that DAEMON will finish
servicing one error before the monitor queues up data for the next. In
practice, this isn't always the case.
[Cure]
Convert TAPSER, TAPUUO, and the drivers to use system error blocks. When
an error occurs, the data will be copied into the SEBs and queued up for
DAEMON to write into ERROR.SYS. The biggest problem with doing this is
the monitor must format the error record itself, as SEBs are merely copied
into the error file without modification. No big deal. This will reduce
the monitor's dependency on DAEMON.
[Keywords]
MAGTAPE ERROR LOGGING
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 422 AUTCON RDDTN
704A DEVPRM TUB
S
TAPSER TAPDRV
TAPUUO LOTS
T78KON
TCXKON
TD2KON
TM2KON
TMXKON
TS1KON
TX1KON
[End of MCO 14166]
MCO: 14168 Name: RCB Date: 20-Feb-89:12:55:04
[Symptom]
Two complaints received regarding ONCMOD:
KLAD pack interface isn't as friendly as it could be.
Bad block typeout is sometimes a little too terse.
[Diagnosis]
We special-case the KLAD structure in several places, but we don't
treat it specially when gathering units for a defining a structure.
After all, we know that the KLAD pack is just one pack.
If the user requests that bad blocks be shown for a unit, but the unit
has no bad blocks recorded, then we don't type out anything about
bad blocks. This leaves the user wondering whether we forgot about
the request to show them.
[Cure]
Only ask for one spindle when gathering units for structure KLAD.
Add the message "[No bad blocks found on unit <unitname>]".
[Keywords]
KLAD
BAF
BAT
Bad blocks
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A 423 ONCMOD GETBAT,GETUN6
705
[End of MCO 14168]
MCO: 14169 Name: RCB Date: 21-Feb-89:08:45:24
[Symptom]
Invalid prompt for first logical block for swapping when defining a structure.
[Diagnosis]
Re-use of the old value for a unit which is no longer valid after other
changes to its swapping parameters.
[Cure]
Range check the old value, and don't use it for the default if it's invalid.
[Keywords]
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A 423 ONCMOD GETSW1
705
[End of MCO 14169]
MCO: 14171 Name: LWS Date: 22-Feb-89:18:41:42
[Symptom]
1. Problems running DFDXC and DFDXD in user mode. Specifically
when using the "Specify Channel Program" DIAG. function, .DISCP.
2. We try to load the DX10 on CPU0 from CPU1 when a DX20 diag exits.
[Diagnosis]
1. The diags use the .DIAAU (Assign all units) DIAG. function
to keep things nice when it starts init'ing the channel, etc. When
the .DISCP DIAG. function is used, the monitor grabs the DDB for the
tape drive from the PDB. However, the DDB in the PDB when the .DIAAU
function is used is the last tape drive DDB. Not necessarily the DDB
for the tape drive the diag is using. So the CCW list is build for
the wrong drive (except when the last drive is used). After subsequent
calls using .DISCP, we run out of free core for CCWs.
2. When a DX20 diag puts the controller in maintenance mode, we detect
that the DX10 can also access the drives so we put it in maintenance
mode also. This keeps TAPSEC happy. The diag sets CPU to CPU1 because
that's where the DX20 is located. When the diag releases the DX20,
or is ^C'd, TPMCMX is called to free up everything. Since we are
running on CPU1, the load of the DX10 fails (cause we use the TPKRES
and TPKLOD dispatches from TPMCMX).
[Cure]
1. Can of worms. Because of the way the DIAG. functions work
wrt to DIAKDU and DIADEV, make the diags do a .DIASU (Assign single
unit) DIAG. fucntion so that the proper DDB is placed in the PDB and
is found by the next .DISCP DIAG. function. In order for this
to work, we have to let TPDASU (and TPDAAU for consistency) do their
stuff even if F is nonzero on entry to the routine. They still make
sure that the current job is the one executing the UUO if the DDB is
already "owned".
2. In TPMCMX, check the KDBCAM mask against .CPBIT before calling TPKRES
and TPKLOD routines.
[Keywords]
Diagnostics
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 423 TAPUUO TPDASU,TPDAAU,TPMCMX,TPDHVF
[End of MCO 14171]
MCO: 14172 Name: DPM Date: 24-Feb-89:04:01:53
[Symptom]
A batch login may fail if the number of logged-in jobs minus the
number of reserved batch job slots is greater than LOGMAX.
[Diagnosis]
The difference between LOGMAX and JOBMAX is the number of jobs
reserved for emergency logins. A logging-in timesharing job may be
granted access if LOGNUM will not exceed LOGMAX, and providing BATMIN
job slots are reserved for batch logins. However, if the job logging
in is running under batch, then BATMIN must not be included in the
computation.
[Cure]
Don't account for BATMIN job slots when a batch job is logging
in. Its inclusion is only meaningful for timesharing logins.
[Keywords]
BATMIN
[Related MCOs]
13932, 13137
[Related SPRs]
36246
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 424 UUOCON ACCLOG
704A
[End of MCO 14172]
MCO: 14173 Name: DPM Date: 24-Feb-89:09:34:47
[Symptom]
The old methods of DAEMON error logging leave something to be desired.
[Diagnosis]
Currently, most of the monitor expects DAEMON to gather additional data
for ERROR.SYS beyond what it's initially given. This exercises race
conditions, slows performance because jobs are sometimes stopped until
DAEMON is finished, and makes DAEMON dependant upon monitor versions
and data structure formats.
[Cure]
Start converting the old-style DAEMON calls to use System Error Blocks.
SEBs eliminate the race conditions because one is queued up for each
error log entry rather than always overwriting the same storage with
new error data. Performance is improved by not having to prevent jobs
from running while DAEMON is logging the error. This also eliminates
the dependancy of DAEMON upon the monitor because the monitor will
format the entire record. DAEMON merely copies SEBs into ERROR.SYS.
This edit will do:
DL10 error records
I/O BUS LPT error records
Stopcode records
Software Events (POKE, RTTRP, SNOOP, and TRPSET)
[Keywords]
ERROR LOGGING
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 424 CLOCK1 DAEEST
704A COMDEV DL10EL
ERRCON DIELOG,XFRSEB
LPTSER LPTSYR
RTTRP RTRET
S EX.SYE,EX.DEL
UUOCON POKE2,SNPIBP,TRPSTX
[End of MCO 14173]
MCO: 14174 Name: DPM Date: 28-Feb-89:05:45:07
[Symptom]
New: To accomodate future tape service big fixes and enhancements,
increase the size of the IORB. Do this by defining a set of "common"
IORB definitions, to be used initially by tape service, and possibly
later by FILSER. Append to the common portion, the tape-specific
words.
Common words:
.ORG 0
IRBLNK::!BLOCK 1 ;FORWARD LINK TO NEXT IORB
IRBACC::!BLOCK 1 ;ACTIVE (CURRENT) CHANNEL COMMAND
IRBCCW::!BLOCK <MXPORT==:4> ;ADDRESSES OF CHANNEL COMMANDS
IRBIVA::!BLOCK 1 ;ADDRESS OF INTERRUPT ROUTINE
IRBDDB::!BLOCK 1 ;ADDRESS OF DDB BEING SERVICED
IRBSIZ::! ;LENGTH OF COMMON IORB
.ORG
Tape-specific words:
.ORG IRBSIZ
TRBFNC::!BLOCK 1 ;FUNCTION DATA
TRBSTS::!BLOCK 1 ;TERMINATION STATUS
TRBRCT::!BLOCK 1 ;BYTE COUNT OF TRANSFER, IF DATA READ
TRBLEN::! ;LENGTH OF BLOCK
.ORG
IRBLNK is the old TRBLNK, but a full word quantity.
IRBCCW is the merger of TRBXCW and TRBEXL.
IRBIVA is the old TRBIVA.
TRBFNC is the old LH or TRBLNK and now can grow beyond bit 17.
TRBSTS could also be made a full word quantity.
[Diagnosis]
[Cure]
[Keywords]
MAGTAPE
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 424 DEVPRM
704A SCAPRM
TAPSER
TAPUUO
T78KON
TCXKON
TD2KON
TM2KON
TMXKON
TS1KON
TX1KON
[End of MCO 14174]
MCO: 14175 Name: JEG/DPM Date: 28-Feb-89:06:06:19
[Symptom]
ADP code reading. Jeff Gunter points out that SCNPIF doesn't include
DSKBIT in configurations which have only a single CPU. Why is that,
he said?
[Diagnosis]
Don't know. Looks like an oversight. While this is a common configuration,
it could only cause problems when the monitor is in the middle of a SCNOFF
and FILSER decides to print "problem on device" at interrupt level. The
SCNOFF will not have turned off DSKCHN, thus allowing FILSER to do obscene
things at inappropriate times.
[Cure]
Probably doesn't happen alot. Remove the conditional assembly and always
include DSKBIT in SCNPIF. This is necessary only because FILSER insists
on typing out at interrupt level.
[Keywords]
SCNSER
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 424 COMMON SCNPIF
704A
[End of MCO 14175]
MCO: 14176 Name: DPM Date: 7-Mar-89:05:38:49
[Symptom]
It has been brought to our attention that some customer(s) want to
install RM05s on their DEC-10. So be it, however, this is not
recommeded and will remain UNSUPPORTED.
RM05s are interesting devices. They are faster than an RP06 and
consume less power, as they require only single phase power. An
RM05 has 30 sectors/track (10 more than an RP06), yet they run
at the same 3600 RPM. Therefore, the capacity and the transfer
rate is about one third greater than an RP06.
Don't look a gift horse in the mouth. For starters, the head crash
rate is rather high. It seems that RM05s work best when left alone.
Despite the fact that they use removable media, frequent disk pack
changes greatly increase the chance of a head crash. The heads fly
fairly close to an RM05 pack; much closer than in an RP06. Presumably,
this is the main cause of head crashes. Also, parts for RM05s are not
nearly as plentiful as are those for RP06s.
[Diagnosis]
Missing table entries in RPXKON.
[Cure]
Add entries to the tables for blocks per unit, etc. This is all that's
required to make RM05s work. In all other manners, RM05s behave like
an RP06.
[Keywords]
RPXKON
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 425 AUTCON DTRTBL
DEVPRM TY.RM5
RPXKON TYPTAB
UUOSYM .DCUR5
[End of MCO 14176]
MCO: 14177 Name: DPM Date: 6-Mar-89:05:54:25
[Symptom]
DAEMON error logging.
[Diagnosis]
Yes.
[Cure]
Convert more old-style calls to use System Error Blocks. Changes
in this edit include:
1. Channel NXM & parity error logging.
7.04 records written by DAEMON contained mostly junk.
2. DECtape error logging.
3. KS10 memory error logging.
Doc change: This adds one word (.CPMFL) to the CPU subtable
FOR KS10 memory errors. This word is a flag which indicates
the last type of error (0 = soft, 1 = hard). Also, the length
of the subtable (.CPMSL) was off by one word and has now been
corrected.
4. KS10 card reader & line printer error logging.
[Keywords]
ERROR LOGGING
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 425 APRSER MEMCHK
704A CD2SER CDRSYR
COMDEV DTXEST,DTXEFL,DTXEBK
COMMON .CPMFL,.CPMSL
DTASER ERRS,DTASYR
ERRCON CHNCO3
LP2SER LPTSYR
[End of MCO 14177]
MCO: 14178 Name: RCB Date: 13-Mar-89:15:12:41
[Symptom]
STOPCDs AAO and IME, undeserved address checks, and undeserved checksum errors
during dump-mode I/O.
[Diagnosis]
In the old days before 7.03, LRNGE was called to range-check an IOWD. It
checked everything we needed to have checked just fine. One of the things
which it checks is that the range of addresses does not cross a section
boundary. Thus, it was no longer appropriate once .FOFXI/.FOFXO (extended dump
I/O) were added to the FILOP. UUO. MONPFH does not check that old-style
IOWD-based I/O does not cross a section boundary, nor does it check that the
I/O is not done to the ACs. This can lead to AAOs. If the user's working set
includes swapper-write-locked pages, then MONPFH will call LRGNE, even though
it might be doing extended I/O, thus resulting in an undeserved address check
error for an I/O doubleword which crosses a section boundary.
If FILIO has to perform error recovery and retries during a dump-mode I/O
operation which ends at a section boundary, under some circumstances it leaves
DEVISN containing bogus information in the DDB. If this was also the first
block in a retrieval pointer, we will then proceed to attempt to calculate the
checksum based on a user address which we calculate, in part, from this junk in
DEVISN(F). This can cause either an IME or an undeserved checksum error.
Finally, much of the above is exacerbated by NXCMR in UUOCON, which is the
common routine used to fetch and validate the next IOWD in a user's channel
command list. It does not validate correctly when MONPFH passes it an IOWD
which either starts with or crosses a section boundary.
[Cure]
Teach NXCMR how to validate all IOWDs which PFDOIO might pass. Correct all
incorrect uses of DEVISN(F). Teach PFDOIO to use ZRNGE rather than LRNGE when
it wants to fix up swapper-write-locked pages. Teach PFHDMP to give an address
check error when an old-style IOWD crosses a section boundary. Teach CHKSUM to
use GETEWD rather than GETWRD, so that it always fetches the correct word from
the user's buffer. Teach PFDOIO to validate the range of address for I/O in
order to be sure that I/O is not attempted to the ACs.
[Keywords]
DUMP I/O
AAO
IME
ADDRESS CHECK
CHECKSUM ERROR
IOIMPM
IO.IMP
[Related MCOs]
13932, 13137
[Related SPRs]
35576, 36064
[MCO status]
Checked
[MCO attributes]
Extended addressing only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 426 UUOCON FOPN9B,UINITC,RELEA4,NXCHIT
704A MONPFH PFHDM1,DOIO2
FILIO SATADR,MONIOY,SETLS7,POSER2,ECC2,ECC3,NOECC,CHKSUM,CSHC2B,CSHB2C
FILUUO DUMPG9
[End of MCO 14178]
MCO: 14179 Name: JEG/DPM Date: 21-Mar-89:05:41:31
[Symptom]
FILSER doesn't usually continue from a DHD stopcode (Don't Have DA).
[Diagnosis]
If IOSDA is off in S (but not necessarily in DEVIOS), then a DHD
will result. But if the job really does own the DA resource, it
will hang, since the DA is never released.
[Cure]
Let the DHD return .+1. Further checks will prevent the DA from
being returned for the wrong job (a RWD is likely). If we manage
not to get a RWD, then the DA will be released and the monitor
will continue with no problems.
[Keywords]
STOPCODE DHD
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 427 FILIO DWNDA
704A
[End of MCO 14179]
MCO: 14180 Name: JEG/DPM Date: 21-Mar-89:05:45:20
[Symptom]
Stopcode KLPKAF following parity scans.
[Diagnosis]
A parity scan requires more than KAFTIM seconds to complete.
If PPDSEC doesn't get called soon enough (and it won't because
of the scan), it declares the KLIPA dead.
[Cure]
Increase KAFTIM from 10 to 35 seconds. This allows about 8 seconds
per meg plus a few extra for good measure. Increase KNISER's timer
(also called KAFTIM) from 30 to 35 seconds too.
[Keywords]
KLPKAF
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 427 KLPSER KAFTIM
704A KNISER KAFTIM
[End of MCO 14180]
MCO: 14181 Name: JEG/DPM Date: 21-Mar-89:05:49:40
[Symptom]
DI hangs on RA failovers. A failover can leave several jobs stuck in
"problem on device" mode for the old unit, even after lots of time
passes.
[Diagnosis]
PCLDSK may inadvertantly get called with an "old" unit if a failover
is happening while another CPU is preparing to start I/O. The "old"
unit was OK, but now, KDBCAM contains zero, causing PCLDSK to get
called. PCLDSK sees no CPUs (and indeed there aren't any with the
old unit) and calls HNGSTP, eventually looping back to PCLDSK again
with the "old" unit.
[Cure]
If there is an online alternate port, use it and bypass HNGSTP.
[Keywords]
FAILOVER
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 427 FILIO PCLDSK
704A
[End of MCO 14181]
MCO: 14182 Name: JEG/DPM Date: 21-Mar-89:05:54:13
[Symptom]
If a CPU croaks before it can be warm-restarted successfully, and
field service is able to fix it "on the fly", sometimes bad things
(usually hangs) happen immediately following the J 400.
[Diagnosis]
This can happen because a CPU restart clears SP.CJn for all jobs,
and then CPUZAPs the "running job" for the CPU, leaving a small
window when the job can be scheduled to run on another CPU.
[Cure]
Change SPRINI to call CPUZAP first, and then clear SP.CJn.
[Keywords]
WARM RESTART
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 427 COMMON SPRLP1,SPRI11
704A
[End of MCO 14182]
MCO: 14183 Name: JEG/DPM Date: 21-Mar-89:05:59:19
[Symptom]
Stopcode KAF in QUESER.
[Diagnosis]
It is possible for one CPU to be didling the database at UUO level
with the EQ lock, while ENQMIN runs at interrupt level on another
CPU. If UUO level CPU removes and releases the free core holding a
block that is being scanned by ENQMIN, KAFs or other stopcdes may
result.
[Cure]
Implement a scheme where UUO level waits for interrupt level and
interrupt level punts if UUO level holds the EQ resourse.
[Keywords]
STOPCODE KAF
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 427 QUESER ENQMN2,EQLOCK,LOKINQ
704A
[End of MCO 14183]
MCO: 14184 Name: JEG/DPM Date: 21-Mar-89:06:03:20
[Symptom]
If a CI disk contains HOM blocks which look like valid but contain
a zero word for the structure name, a failover will cause PULSAR
to sniff out the disk and mount a structure with no name.
[Diagnosis]
Monitor never checks for a zero structure name in DEFSTR.
[Cure]
Return "illegal structure name" error when no name is given.
[Keywords]
DEFINE STRUCTURE
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 427 FILFND DEFSTR
704A
[End of MCO 14184]
MCO: 14186 Name: DPM Date: 24-Mar-89:08:37:20
[Symptom]
Stopcode KAF in KNISER.
[Diagnosis]
On a very busy Ethernet wire, it is possible to spend more than 6
seconds at interrupt level taking packets off the KLNI. RSX-20F
has little patience for this sort of nonsense, so it KAFs the -10.
[Cure]
Put an arbitrary limit on the number of packets that we'll process
in a single interrupt. Experimentation has proven that trying to
remove 2100 (decimal) or more packets from the queue will result in
a KAF. Therefore, set the limit to 2000. Location .PBMPP (maximum
packets processed) in the KDB/PCB contains the limit and can easily
be patched to a different value. When the limit is exceeded, a
KNIKSP (KLNI Service Paused) info stopcode will be typed on the CTY.
Then the PIA will be removed for one second to let things settle down.
[Keywords]
KNISER KAF
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 430 KNISER KNIRQ1,KNIPAU,KNICON
704A
[End of MCO 14186]
MCO: 14189 Name: JEG/DPM Date: 28-Mar-89:08:22:29
[Symptom]
If a program dies with infinite IPCF quotas and freecore is very low
or about to expire, the system grinds to a standstill. Some jobs are
stuck NApping and others get unexpected error returns. Trying to log
off the offending job fails.
[Diagnosis]
IPCLGO does two things. It sends a logout message to QUASAR and it
turns around all unreceived messages; in that order. The send to
QUASAR will fail because there is no available freecore, and the
logging out job owns a large chunk of it.
[Cure]
Reverse the order of things. First, empty the send and receive queues,
then send the logout message to quasar.
[Keywords]
IPCF LOGOUT
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 430 IPCSER IPCLGO
704A
[End of MCO 14189]
MCO: 14190 Name: JEG/DPM Date: 4-Apr-89:05:14:36
[Symptom]
Stopcode IME removing a structure. Other problems possible too. When
allocation is in progress, or the ACCs and NMBs are in transition, and
a structure is being removed, an IME is likely to occur on a busy system.
[Diagnosis]
TAKBLK and friends rely on DEVUNI(F) to indicate the target unit for a
structure is still valid. FILSER normally depends upon TSTGEN checking
UNIGEN. The window is sufficiently large to allow the SKIPN DEVUNI to
work while REMSTR is removing a structure.
[Cure]
Change TAKBLK to call TSTGEN. Make BMPGEN get and release the DA around
the update of UNIGEN.
[Keywords]
DISMOUNT
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 431 FILFND BMPGN1
704A FILIO TAKBL0,TAKBLJ
[End of MCO 14190]
MCO: 14191 Name: RCB Date: 5-Apr-89:22:16:13
[Symptom]
Hung ANF traffic to a node. Especially common over an Ethernet channel. It
may (sometimes) correct itself eventually, especially if it was not an Ethernet
channel that was involved.
[Diagnosis]
After NETWRT queues an output message (PCB) to the FEK, it calls its device
driver to perform the output. This can happen several times before the device
driver tells the FEK routine that the output has happened. At that point, the
FEK routine tell NETSER that the message has been sent. This causes the PCB to
placed on a generic output-done queue for NETSCN to process. Once we get to
NETSCN, we move PCBs from the this queue to a queue for the NDB for the node to
which we were sending the message. The subroutine responsible for this,
NTSC.O, is also responsible for keeping the NDBLMS (last message sent) field
updated. It does this by noting the message number of each PCB it places into
the output-pending queue in NDBLMS. However, the PCB queue from which it is
taking these messages is unordered, and this can lead to having a very long
list of messages, with NDBLMS reflecting only (for example) the first of them.
Once this has happened, CHKNCA (check network-control ACK) will ignore an ACK
for any message beyond that in LDBLMS. However, the remote is quite likely to
send us an ACK for the actual last message in the ACK-pending queue. This
leads to a full output queue and a refusal to transmit any further data
messages, at least until the REP/NAK timer causes us to send a REP, which will
result in a NAK. Because we ignored the implicit ACK present in the NAK, we
will still have a queue of outstanding messages, which the NAK will cause us to
retransmit all at once. Unless the device driver stutters in a friendly
manner, this will merely get us into the same mess again with the same set of
messages, and no progress will ever be seen.
[Cure]
In NTSC.O, only change NDBLMS if it's moving in a forward direction. In
INCTNK, where we resend the queue in response to a NAK, reset NDBLMS to NDBLAP
in order to avoid possible ACK races.
[Keywords]
ANF Ethernet
Hung ANF
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
HOSS attention
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 432 NETSER NTSC.O,INCTNK
704A
[End of MCO 14191]
MCO: 14192 Name: RCB Date: 5-Apr-89:22:55:57
[Symptom]
Terminal characteristics get handled incorrectly during a SET HOST
session which is handled by NETVTM.
[Diagnosis]
Setting the terminal type happens after all the other characteristics
get set, and clobbers them.
[Cure]
Save the other characteristics until after we set the terminal type in VTMCHR.
[Keywords]
NETVTM
SET HOST
terminal characteristics
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
HOSS attention
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 432 NETVTM VTMCHR
704A
[End of MCO 14192]
MCO: 14193 Name: DPM Date: 6-Apr-89:11:36:40
[Symptom]
MCO 14190 went a bit too far.
[Diagnosis]
In trying to close the window where a structure could be removed
while other things were being done to the ACC/NMB blocks, BMPGEN
was modified to get and give the DA resource. However, one needs
a DDB to use the DA and REMSTR doesn't have one to use. Also,
BMPGEN expects F to contain a STR DB addr, not a DDB.
[Cure]
Can't plug the hole that tight. Remove references to the DA in
BMPGEN and live with occasional IMEs. There is no structure-wide
resource to take care of this situation. Too bad.
[Keywords]
REMSTR
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 432 FILFND BMPGEN
704A
[End of MCO 14193]
MCO: 14194 Name: KDO Date: 6-Apr-89:12:18:04
[Symptom]
Invalid status returned in ETHNT. UUO User Buffer Descriptor (UBD) blocks.
[Diagnosis]
Missing code.
[Cure]
Add code.
[Keywords]
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 432 ETHUUO ENCXDG
704A
[End of MCO 14194]
MCO: 14195 Name: KDO Date: 10-Apr-89:10:54:04
[Symptom]
Adjacency up/down events for DECnet endnodes on multi-area LANs.
[Diagnosis]
DECnet is choosing a designated router outside it's area.
[Cure]
Ignore Ethernet Router Hello messages from outside our area.
[Keywords]
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 432 ROUTER RHMASE
705
[End of MCO 14195]
MCO: 14197 Name: DPM Date: 11-Apr-89:08:44:04
[Symptom]
REFSTR creates files with strange version numbers.
[Diagnosis]
Sticking REFSTR's version rather than the monitor's is at best,
non-standard. But when displayed by DIRECT, it looks like a bug.
[Cure]
Use CNFDVN instead.
[Keywords]
REFRESH
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 423 REFSTR RIBST1
[End of MCO 14197]
MCO: 14198 Name: LWS Date: 17-Apr-89:10:18:26
[Symptom]
1. Same TM02/3 controller register dumped 9 times in TUB
on error.
2. SPEAR doesn't know how to interpret TM02/3 controlled tape drive
error entries or the monitor doesn't give SPEAR what it expects, take
your pick.
[Diagnosis]
1. RDREGS in TM2KON expects T2 to still contain controller
register number on return from RDMBR. RDMBR clears all but register
data in T2.
2. SPEAR expects 2 equal length blocks of error status information
in the error entry (IEP and FEP data). However, TM2KONs IEP length
is 1 and FEP length is 16 (octal). So we only write 1 word of "IEP"
information. This causes SPEAR's interpretation of the error to be
garbage (1. above doesn't help either).
Note: The TUB for a TM02/3 controlled tape drive contains 2 blocks
of TM2ELN words each for "IEP" and "FEP" error information. But
the IEP word is set for only a length of 1. Why? I don't know.
Poking the IEP word on 2476 to be the same as the FEP word causes
2 sets of error information to be dumped and SPEAR correctly
interprets the error. So it seems we can change SPEAR to handle
unequal length "IEP" and "FEP" error blocks, or have the monitor
dump equal length blocks.
[Cure]
1. PUSH/POP T2 around call to RDMBR at RDREG1.
2. Change LH of TUBIEP in TM2KON to be -TM2ELN.
[Keywords]
SPEAR
TM02/3
TU77
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
Field service attention
PCO required
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 433 TM2KON TUBIEP,RDREG1
[End of MCO 14198]
MCO: 14199 Name: LWS Date: 17-Apr-89:11:17:10
[Symptom]
Can't assign a network device if a device of the same
type doesn't exist on the local host.
[Diagnosis]
If the device doesn't exist on the local host there
will not be an entry in GENTAB for the corresponding device.
The call to CHKGEN in DVSTAS will fail and we bomb the user even
though the network device does exist and is assignable.
[Cure]
At the non-skip return after the call to CHKGEN in DVSTAS
load F with the start of the DDB chain and fall through into
code that will eventually do the right stuff. But! This is not
going to work correctly all the time. If no local line printers
exist and we're trying to find a network printer DDB, we eventually
build a DDB for the network printer and try to link it between
the 'DSK' DDB and the 'SWAP' DDB - ding ding ding, IME. This happens
because LNKDDB in AUTCON likes to keep the DDB chain in sorted order
by device name. So 'LPT' falls between 'DSK' and 'SWAP', but 'DSK'
DDBs are in the hiseg. In order to avoid the wrath of FILSER, change
the name of SWPDDB to 'DSKSWP'.
[Keywords]
NETWORK DEVICE
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
Beware file entry required
New development MCO
PCO required
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 433 COMMOD DEVNAM
UUOCON DVSTAS
[End of MCO 14199]
MCO: 14200 Name: LWS/DPM Date: 20-Apr-89:10:52:06
[Symptom]
Tape UDBs on KS not filled in with prototype data.
[Diagnosis]
AUTUDB doesn't compute ending address for BLT.
[Cure]
ADDI P2,(U)
[Keywords]
KS
SPEAR
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Field service attention
PCO required
Single-section monitors only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 434 AUTCON AUTUD1
[End of MCO 14200]
MCO: 14201 Name: RCB Date: 25-Apr-89:07:05:00
[Symptom]
FILSER's error reporting leaves too big a window for DAEMON to get stale
information for SPEAR to report. Not only that, but DAEMON even has to guess
just what kind of error it is supposed to report.
[Diagnosis]
The ERRPT. UUO just doesn't give us enough to work with. We need to use
system error blocks if we're going to get it right.
[Cure]
Do so. This adds EX.AVL to the bits which can be set in the transfer table
header by the SEBTBL macro. If EX.AVL is set, the error entry will be copied
to AVAIL.SYS as well as to ERROR.SYS. This also changes the way in which all
disks report their errors. There is now a kontroller dispatch entry, KONELG,
which is used by FILIO to format an error block and queue it up for DAEMON.
[Keywords]
Disk errors
Error logging
DAEMON
System error blocks
SPEAR
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Beware file entry required
New development MCO
[BEWARE text]
The format of the DSK KDB has changed again, with the addition of the KONELG
dispatch entry for error logging. Any local disk device drivers will need to
be changed accordingly. See MDEELG in FILIO for an example of how to do this.
DAEMON version 23A(1026) or later must be installed before this MCO.
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A FILIO
705 434 COMMON
DPXKON
FHXKON
FSXKON
RAXKON
RHXKON
RNXKON
RPXKON
DSXKON
COMMOD
DEVPRM
S
ERRCON
DTASER
COMDEV
[End of MCO 14201]
MCO: 14202 Name: RCB Date: 25-Apr-89:07:16:53
[Symptom]
Jobs get stuck in event wait for system IPCF, and need manual intervention to
be restarted. If they were logging out at the time, the job slot is stuck and
useless.
[Diagnosis]
[SYSTEM]GOPHER is completely ignorant of the possibility that a system program
like the account daemon might die and get logged out, thus causing its IPCF
receive queue to be "returned to sender, address unknown". It just throws the
returned messages on the floor, and leaves the user's job waiting for an
acknowledgement message which will never come.
[Cure]
Educate the rodent. Check the returned message field, and validate it
against the expected sequence number. If it matches, give the user an error
return from SENDSP, so that a QUEUE. UUO (for example) will give the "component
not running" error, and FILDAE messages will be handled as though FILDAE had
never been running.
[Keywords]
EW hang
System IPCF wait
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A IPCSER
705 434
[End of MCO 14202]
MCO: 14203 Name: RCB Date: 25-Apr-89:08:11:52
[Symptom]
System error blocks can eat up all of free core if DAEMON isn't running.
[Diagnosis]
Once they get queued, they are only deleted when some privileged program
executes a SEBLK. UUO.
[Cure]
Add a timer. Once a minute, we will look for any blocks which are older than
SEBAGE minutes and delete them. SEBAGE defaults to 10 (decimal), and can be
changed with MONGEN. If SEBAGE is set to zero, the error blocks will live
forever.
[Keywords]
System error blocks
free core limits
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 434 ERRCON
704A COMMON
CLOCK1
[End of MCO 14203]
MCO: 14204 Name: DPM Date: 27-Apr-89:06:36:33
[Symptom]
In some configurations, LINK will report NDBNNM as undefined even
though ANF-10 network software is loaded.
[Diagnosis]
This problem is one of programming style and MACRO's tolerance
for conflicting symbol definitions. NDBNNM is defined in NETPRM,
which is searched by NETSER. The first several references to this
symbol are properly made. However, at NDBAS1 the symbol is referenced
as external. MACRO should probably flag this as a "E" error.
Instead, the original value of the symbol is lost and MACRO generates
global fixup requests for all references to NDBNNM. It's not clear
why this problem has surfaced now, as the code at NETAS1 has not
changed for several monitor releases, but correcting the reference in
NETAS1 makes resolves the undefined global.
[Cure]
Reference NDMNNM as an internal quantity.
[Keywords]
UNDEFINED GLOBAL
[Related MCOs]
13932, 13137
[Related SPRs]
36260
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 435 NETSER NETAS1
704A
[End of MCO 14204]
MCO: 14205 Name: RCB Date: 27-Apr-89:18:49:36
[Symptom]
MCO 14165 didn't go far enough. PAGE. UUO function .PAGAC still isn't always
right. Spy pages for sections 3-36 are sometimes reported as being unreadable.
[Diagnosis]
PAGA93, which finds a page number to return for a spy page in sections 3-36,
doesn't preserve T2. Its caller wants T2 to contain the map entry after the
call, as well as before.
[Cure]
Preserve the map entry in T2.
[Keywords]
PAGE. UUO
Page accessability
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 435 VMSER PAGA93
704A
[End of MCO 14205]
MCO: 14206 Name: DPM Date: 28-Apr-89:05:51:47
[Symptom]
Stopcode IME removing a structure (revisited).
[Diagnosis]
Previous MCOs didn't plug all the holes, although the window was made
much smaller.
[Cure]
Prevent races by incrementing UNIGEN while holding the DA. Conceptually,
this is easy, but BMPGEN is called with F pointing to a STR, not a DDB.
Therefore, change UPDA & DWNDA to get the job number from .USJOB rather
than from PJOBN. This is OK since the use of DA requires a job to be
mapped to reference PJOBN anyway.
[Keywords]
REMOVE STRUCTURE
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 435 FILFND BMPGN1
704A FILIO UPDA,DWNDA
[End of MCO 14206]
MCO: 14207 Name: LWS Date: 29-Apr-89:14:48:25
[Symptom]
Can't create a SSL larger than .SLMXJ (maximum JSL size)
structures using STRUUO.
[Diagnosis]
Code in SLSTRR and SLCHK always use .SLMXJ as a maximum
without checking to see if its a JSL or the SSL.
[Cure]
Check search list type (RH(F)=0 means SSL) and use appropriate
maximum value.
[Keywords]
SSL
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 435 FILFND SLSTRR,SLCHK
704A
[End of MCO 14207]
MCO: 14209 Name: DPM Date: 8-May-89:08:57:19
[Symptom]
Pathological names whose first component is NUL do not necessarily
behave as the NUL device. A DEVCHR, or DEVTYP of the sixbit name
returns disk-only bits. The same is true if you do one of these
UUOs on an open channel. However, if a LOOKUP or ENTER is done,
then the right thing comes back. Also, DEVNAM never returns NUL
and WATCH FILES doesn't expand the filespec correctly.
[Diagnosis]
The monitor believes a pathological name can only be a disk device
and everybody knows that NUL is really a disk even though it claims
to be all devices. But FILSER doesn't make that claim often enough.
[Cure]
Fix SETDDB to test for pathologcal NUL as well as assigned NUL. Change
NULTST to test for DVDSK and DVTTY instead of sixbit NUL. Fix PRTDDB
to print NUL instead of a logical device name. Add crock routine
LNMNUL to do the grunt work when it's really necessary to know if
it's the NUL device.
[Keywords]
NUL
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 436 COMCON PRTDDB
FILUUO NULTST,LNMNUL,SETDDB
UUOCON DVCHR,UDVNAM
704A
[End of MCO 14209]
MCO: 14211 Name: DPM Date: 15-May-89:09:29:52
[Symptom]
It's difficult to measure magtape performance on a per-kontroller
basis without using any counters.
[Diagnosis]
Never done before I guess.
[Cure]
Add two new counters to the KDB: TKBCRD counts characters read and
TKBCWR counts characters written.
[Keywords]
MAGTAPE PERFORMANCE
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 437 DEVPRM TKBCRD,TKBCWR
T78KON
TCXKON
TD2KON
TM2KON
TMXKON
TS1KON
TX1KON
704A
[End of MCO 14211]
MCO: 14212 Name: LWS Date: 15-May-89:10:21:02
[Symptom]
Undeserved ?Illegal memory reference in jobs with a shared
hiseg.
[Diagnosis]
If a sharable hiseg is expanding and there are enough
secondary map slots available to map the expansion, RDOMP is not
set for any other job using the same hiseg.
[Cure]
In GTHMAP, if there are enough map slots for the expansion, call
HRDOMP via HGHAPP so all other users of the same hiseg will have their
maps redone before they run again.
[Keywords]
Sharable high segments
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
PCO required
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 437 SEGCON GTHMP1
704A
[End of MCO 14212]
MCO: 14214 Name: JAD Date: 25-May-89:07:51:12
[Symptom]
Possible SCAFOO stopcodes in a maximally-configured CI network.
[Diagnosis]
Insufficient path blocks available for the number of CI nodes and
CPUs in the CI/system configuration. There is space available for
32 path blocks, but a maximally-configured system could require
much more. Problem occurs with definition of C%PBLL (number of
path blocks) - it is defined as 2*C%SBLL (number of system blocks).
Depending on the number of CI nodes and CPUs, this definition may
leave insufficient path blocks.
[Cure]
Redefine C%PBLL as 6*C%SBLL - this will allow for the largest
possible CI and CPU configuration.
[Keywords]
CI
SCAFOO
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 SCAPRM C%PBLL
[End of MCO 14214]
MCO: 14217 Name: DPM/RJF Date: 30-May-89:06:58:55
[Symptom]
Various problems suspending and resuming a system:
1. KLIPAs and KLNIs don't get reloaded.
2. KLNIs will be restarted even if they had first been removed.
3. Stopcode NULFNC during the suspend.
[Diagnosis]
1. Code to call PPDINX and KNIINI is under an IFG <M.CPU-1> conditional
in COMMON, so it is not included in single CPU configurations.
2. When a KLNI is removed, the bit corresponding to the proper KLNI on
a given CPU is set to indicate that the device is to be ignored on
subsequent initialization calls. However, IPAMSK is never checked
on KLNI restarts.
3. For reasons that escape me, the NULFEK is being called on system
sleep/resume when it hadn't before. Apparently this never worked
before, but it went unnoticed. The dispatch table does not contain
the appropriate entries for these NETSER functions.
[Cure]
1. Move the calls to PPDINX and KNIINI outside the IFG <M.CPU-1> conditional.
2. Teach KNIINI to respect IPAMSK on KLNI restarts.
3. Add system sleep/resume entry point to NULFEK's dispatch table.
[Keywords]
SYSTEM SLEEP
[Related MCOs]
13932, 13137
[Related SPRs]
36269
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 441 COMMON SPRIN5
KNISER KNIINI
NULFEK NLFDSP
704A
[End of MCO 14217]
MCO: 14218 Name: LWS Date: 8-Jun-89:09:08:11
[Symptom]
Undeserved memory parity errors on KLs with 4MW of memory.
[Diagnosis]
RH20s do undetermined things when accessing the last physical
(quad)word in 4MW. This is an RH20 problem. This problem was never
encountered in previous versions of the monitor and BOOT. The monitor
used to put its hiseg at the very top of memory. Then BOOT occupied the
top of memory. Now, BOOT is still there, but it now frees the pages
at the top because they contain tape drivers that are not needed once
BOOT is done. So, these pages at the top of memory are free to use
by the monitor. When a user gets the last page of memory, it's fair
game for I/O by an RH20.
[Cure]
For lack of something better to do at the moment, if the last
page of a 4MW system is free, mark it as non-existant in NXMTAB and PAGTAB
and set MEMSIZ to 17,,777000 instead of 20,,000000.
[Keywords]
4 MW
parity
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
Field service attention
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 442 SYSINI MMTIN9
704
[End of MCO 14218]
MCO: 14219 Name: DPM Date: 27-Jun-89:06:37:34
[Symptom]
There appears to be no upper bounds on the number of extended RIBs
FILSER is content to create. You can literally fill a disk with
extended RIBs for a single file. When you CLOSE the file, you might
as well take the rest of the day off, because FILSER has lots of
bookkeeping to perform.
[Diagnosis]
RIBXRA contains an 8-bit field for the extended RIB number. FILSER
never checks for field wrap around. The RIB number is only read back
when a user specifies a negative USETI, and otherwise serves no real
purpose.
[Cure]
Check for wrap around and impose an additional limit based on the
contents of MUSTMX when RIBs are created. Set the maximum number
of USETIs to 255 decimal.
[Keywords]
EXTENDED RIBS
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 443 COMMOD DESRBC,MUSTMX
FILIO EXTRB2
704A
[End of MCO 14219]
MCO: 14220 Name: RCB Date: 30-Jun-89:00:15:53
[Symptom]
An OPEN which specifies a logical name or a pathological name can fail or find
the wrong device.
[Diagnosis]
The DDB search logic does not allow certain names to be found unless they are
assigned to disks (i.e., funny-space DDBs). CK2CHR gets called when it should
not. For that matter, LP will match a terminal assigned as LPT but not as LP.
[Cure]
For 2-character device names which CK2CHR changes, do the DDB searching twice.
First, try the original name. If that fails or returns DSKDDB, then try again
with the expanded name. If the second search fails but the first returned
DSKDDB, then return the results from the first DDB search. Eliminate the hacks
for CK2CHR and SY: from the search loop.
[Keywords]
PDP-11 names
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 444 UUOCON DDBSCC
704A
[End of MCO 14220]
MCO: 14221 Name: RCB Date: 1-Jul-89:01:56:17
[Symptom]
KAF at PI level of the NIA20.
[Diagnosis]
Taking too long to empty the response queue (MCO 14186 revisited).
[Cure]
Check .CPTMF to try to be sure that too much time won't pass during a single
KLNI interrupt. Also, move the check to after the callback so that we don't
drop the buffers on the floor. Otherwise, after long enough, the protocols
will run out of buffers (especially DECnet).
Because .CPTMF is slightly bogus just as the system is coming up, ignore it
until .CPUPT is at least 2 (ticks). Note that the counters and limits added
by MCO 14186 are still present and in force.
[Keywords]
KAF
NIA20
KNIKSP
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Field service attention
HOSS attention
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 444 KNISER KNIRQ1
704A
[End of MCO 14221]
MCO: 14222 Name: RCB Date: 7-Jul-89:23:34:54
[Symptom]
System is annoyingly sluggish at system startup time.
[Diagnosis]
Trying to run dozens of copies of INITIA on random terminals all at the same
time, in dozens of job slots.
[Cure]
Only sort of. Invent a new MONGEN-definable symbol, DSDRIC (dataset devices
run INITIA CUSP), to control whether INITIA runs on dataset lines. It will
default to one, which means that INITIA will continue to run on datasets at
system startup. If set to zero at MONGEN time, TTYINI will not force INITIA
commands on the datasets. For the curious, the reason INITIA runs on datasets
at startup time is because of the existence of hardware interfaces which need
to have parameters set even before a call comes in to the modem. However, most
sites probably have more well-behaved interfaces, and will be able to set
DSDRIC to zero.
[Keywords]
sluggish startup
INITIA
datasets
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 444 COMDEV DSCTAB
704A SCNSER TTINI2
[End of MCO 14222]
MCO: 14223 Name: DPM Date: 18-Jul-89:05:32:00
[Symptom]
DA28s don't work.
[Diagnosis]
XTCLNK assigns junk names to UDBs. Later calls to build DDBs fails
because the target UDBs cannot be found. Also, XTCSER will not
assemble with FTMP turned off because of references to SCNLOK and
OUCHE.
[Cure]
Correct logic that builds UDB names. Put IFN FTMP conditionals
around the reference to SCNLOK. Make OUCHE available in all KL10
configurations.
[Keywords]
DA28
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 445 APRSER OUCHE
COMMON OUCHTB
XTCSER XTCLN2,CHKTYP,MPIOWD
704A
[End of MCO 14223]
MCO: 14224 Name: DPM Date: 20-Jul-89:11:58:59
[Symptom]
Random job tables (mostly JBTSTS) get clobbered, wierd crashes,
general mayhem.
[Diagnosis]
Steve Perkins is running .EXE files created on the -20 again.
If the .EXE directory claims to have sharable pages that aren't
also marked as high segment pages, GETEXE returns flags indicating
the image is sharable, but with no high segment. Parts of GET
clean up assume that if the sharable bit is on, then there must
be a high segment. This is true for .EXE files creates on a -10,
but not otherwise. Anyway, making this assumption, SEGCON blindly
picks up high seg block addresses (which are usually zero) and
indexing off of zero, proceeds to write all over the monitor's
low segment.
[Cure]
While processing .EXE directory entries, turn off the sharable
bit if the high segment bit is not turned on.
[Keywords]
TOPS-20 EXE FILES
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 446 SEGCON WANTIT
704A
[End of MCO 14224]
MCO: 14225 Name: DPM Date: 25-Jul-89:05:28:01
[Symptom]
SA10s don't function in an environment where DF10C-based device drivers
exist (TM2KON for one).
[Diagnosis]
DF10C drivers fail to test for the presence of SA10 devices. Therefore,
SA10s look like 18-bit DF10s.
[Cure]
Test SI.SAX in the CONI word in the appropriate xxxCFG routines.
[Keywords]
SA10
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 446 FSXKON FSXCFG
RPXKON RPXCFG
TM2KON TM2CFG
704A
[End of MCO 14225]
MCO: 14226 Name: DPM Date: 3-Aug-89:09:18:56
[Symptom]
Several annoying problem that prevent SA10-based tape from working well.
[Diagnosis]
1. SAXSER & TS1KON bum a bit in the KDBUNI word to indicate a
software interrupt was requested. This means that KDBs can't be
compared against each other, so AUTCON will build multiple KDBs
for a single SA10 kontroller.
2. Tapes ported between a DX10 or a DX20 and an SA10 will have duplicate
UDBs and DDBs built. This is because TD2KON and TX1KON do not know
how to extract drive serial numbers. Subsequent comparisons between
a drive S/N and an existing one don't match, so AUTCON beleived it's
looking at two different drives.
3. The code to compare drive serial number is not interlocked in AUTCON.
Under the righ circumstances, two configuring CPUs which have detected
the same drive, might not notice the other.
[Cure]
1. Move the software bit into KDBSTS. It's a better place for such
things.
2. Fix TX1KON and TD2KON.
3. SYSPIF/SYSPIN around much of AUTDPU.
[Keywords]
SA10
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 447 AUTCON AUTDPU
DEVPRM KD.SIR
SAXPRM SA.SIR
SAXSER SAXINT
TD2KON TD2DRV
TS1KON TS1DRV
TX1KON TX1DRV
704A
[End of MCO 14226]
MCO: 14227 Name: DPM Date: 3-Aug-89:09:20:23
[Symptom]
Possible tape hangs after a CPU restart.
[Diagnosis]
SPRINI doesn't clear the TAPSER interlock nesting flag.
[Cure]
Do so.
[Keywords]
INTERLOCKS
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 447 COMMON SPRI10
704A
[End of MCO 14227]
MCO: 14228 Name: RCB Date: 3-Aug-89:15:33:21
[Symptom]
Problems setting explicit speeds on TTY lines in the ANF front ends.
[Diagnosis]
Trying to do autobaud even though the speed has been set to something
other than the autobaud speed.
[Cure]
Don't do that. If the speed is set in the config.P11 file, and that speed
is not the autobaud speed (currently 2400 baud), override the ABD
characteristic for the line.
[Keywords]
Autobaud
Non-autobaud
TnXS
ANF10
[Related MCOs]
13932, 13137
[Related SPRs]
36270, 36268
[MCO status]
None
[MCO attributes]
Field service attention
HOSS attention
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 450 CONFIG P11
704A DNTTY P11
DNLBLK P11
MACROS P11
[End of MCO 14228]
MCO: 14229 Name: RCB Date: 8-Aug-89:22:22:51
[Symptom]
Monitor too big and slow. Not enough free bits in JBTSTS.
[Diagnosis]
Lots of places in the monitor test bit JDC from JBTSTS. A few
others clear it. Only DAECOM can set it. It is unreachable code, left over
from the old DCORE and DUMP commands and the days when DAEMON handled
virtual references for EXAMINE, DEPOSIT, and VERSION commands. The JDC bit is
consequently never set, and all the tests for it are redundant.
[Cure]
Free up the bit in JBTSTS, and eliminate all references to it.
[Keywords]
PERFORMANCE
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 450 COMCON
704A CLOCK1
SCHED1
SCNSER
S
[End of MCO 14229]
MCO: 14230 Name: DPM Date: 15-Aug-89:06:36:14
[Symptom]
More error logging stuff ...
[Diagnosis]
Yes.
[Cure]
Convert more old-style DAEMON error logging calls to use the
System Error Blocks. This edit converts:
1. CPU attached/detached records.
2. Node online/offline records.
3. Date/time change records.
Code is also inplace to handle system reload (.ERWHY) records, but
because of interface problems with DAEMON and AVAIL.SYS, this call
will be temporarily neutered.
[Keywords]
DAEMON ERROR LOGGING
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 450 COMCON SETDAT
CPNSER CPUCSC
NETSER NODEAM
S .ERMVR
SYSINI SYSRLD,SYSAVL
704A
[End of MCO 14230]
MCO: 14231 Name: RCB/DPM Date: 15-Aug-89:07:43:18
[Symptom]
Convert DAEMON reporting of KL error chunks from RSX20F to use system
error blocks. This eliminates two words in the CDB, .CPETM and .CPEAD.
In order to accomplish this cleanly, there is now a new routine in IPCSER,
OPRMSG, which allows one to queue up messages for ORION. If ORION is not
running, the messages can optionally be sent to OPR: or the CTY. See IPCSER
for the calling sequence. The behavior is controlled by bits in T1 on the
call, of the form OPM.??, which are defined in S.
[Diagnosis]
[Cure]
[Keywords]
KL error chunks
system error blocks
system messages
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
Deferred
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 450 DTEPRM
704A DTESER
S
IPCSER
ERRCON
COMCON
CLOCK1
COMMON
[End of MCO 14231]
MCO: 14233 Name: RCB Date: 22-Aug-89:09:55:53
[Symptom]
Undeserved KNIKSP stopcodes.
[Diagnosis]
.CPTMF limit is exceeded at system startup time.
[Cure]
If .CPUPT is lower, then don't KNIKSP.
[Keywords]
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 451 KNISER KNIRQ1
704A
[End of MCO 14233]
MCO: 14234 Name: DPM Date: 23-Aug-89:07:34:30
[Symptom]
Programs using external tasks (XTCSER) hang following attempts to JAM
powered off remote computers.
[Diagnosis]
If FTMP is turned off, the call to CHKTYP from DWNUNI says to never do
typeout. DWNUNI simply returns without clearing any DA28 errors which
caused the unit to be declared down. Thus, the DA28 becomes unusable
for all other users. A similar situation exists where connect errors
are processed. In this case, we forget to force the unit offline.
[Cure]
Three things. First, fix CHKTYP to work correctly with FTMP turned off.
Second, if no typeout is to be done, skip around the message generation
code and clear the DA28. Finally, on connect errors, always force the
unit offline whether or not we'll type a message.
[Keywords]
DA28 ERRORS
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 451 XTCSER CHKTYP,DWNUNI,CHKCER
704A
[End of MCO 14234]
MCO: 14235 Name: DPM Date: 29-Aug-89:07:32:01
[Symptom]
More error logging stuff.
[Diagnosis]
Yes.
[Cure]
Teach the monitor to write the following records as system
error blocks:
.ERCSC Configuration status change (memory on/off line)
.ERKSN KS10 NXM trap
.ERKPT KL10/KS10 parity trap
.ERCSB CPU status block
.ERDSB Device status block
[Keywords]
DAEMON ERROR LOGGING
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 452 APRSER PRHMF7,DAELOG,MEMELG
COMCON MEMONU,MEMON8
COMMON OLDNXM,DIACSB,DIADSB
LOKCON MEMOFU,MEMOF2
704A
[End of MCO 14235]
MCO: 14236 Name: KBY Date: 29-Aug-89:08:27:46
[Symptom]
FA resource scheduling leaves something to be desired. The schedular
knows how to wake up just the job that needs it, but everyone wakes up now
any time it's given up.
[Diagnosis]
No code.
[Cure]
Add code (the remaining routines necessary to do the unwind properly).
[Keywords]
FA
UNWIND
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Deferred
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 452 FILIO UPFA,DWNFA
704A COMMOD
S
CLOCK1 SRFREE
[End of MCO 14236]
MCO: 14237 Name: KBY Date: 29-Aug-89:08:34:37
[Symptom]
Job stuck; SYSTAT shows it's locked (even though it really isn't).
[Diagnosis]
Due to the extra calls to SCDCHK to prevent KAFs in large PAGE. UUOs,
we can potentially block in a PAGE. UUO. If pages were allocated to the job
by CHGPGS (because they were available at the time), but during the block
we decide to swap out the job, we could potentially lose those pages to
never-never land since they are not in anyone's map. To prevent this,
CHGPGS lights NSHF (but not NSWP) akin to MAPBAK so that the swapper won't
touch the job. Unfortunately, if the job has a sharable high segment, someone
else using it might call XPANDH (which can happen even without really wanting
to expand the high seg as we tend to do this at the drop of a hat) and set
JXPN for the job blocked at CHGPGS. At this point the schedular will not
run the job because of JXPN and the swapper won't clear JXPN (even without
swapping the job which may not be necessary) because of NSHF which won't
get cleared until the job finishes running through CHGPGS (deadly embrace).
[Cure]
The schedular will except jobs owning disk resources from the JXPN check.
Do so also with jobs having NSHF on but not NSWP (a state only the monitor
can cause in limited situations such as the above).
[Keywords]
JXPN
NSHF
[Related MCOs]
13932, 13137
[Related SPRs]
36245
[MCO status]
Deferred
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 452 SCHED1 CJFRCX
704A
[End of MCO 14237]
MCO: 14238 Name: JC Date: 1-Sep-89:13:34:18
[Symptom]
TOPS-10 is missing the TRANSLate command.
[Diagnosis]
No one ever put it in.
[Cure]
Add one.
[Keywords]
TRANSL
LOGIN
commands
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 453 COMMON COMTAB
[End of MCO 14238]
MCO: 14240 Name: DPM Date: 5-Sep-89:05:41:06
[Symptom]
More error logging stuff.
[Diagnosis]
Yes.
[Cure]
1. Add support for .ERSNX (NXM sweep).
2. Add support for .ERSPR (parity sweep).
3. Turn on .ERWHY/.ERMRV logging.
[Keywords]
ERROR LOGGING
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Beware file entry required
[BEWARE text]
DAEMON version 23A(1027) or later is required. Earlier versions
will cause .ERMRV records to be written into ERROR.SYS instead of
AVAIL.SYS. When this happens, SPEAR will report an unknown record
type in ERROR.SYS.
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 453 ERRCON PARSWP,PARELG,NXMSWP,NXMELG,XFRSE2
S EX.NER
SYSINI LLMSTR,AVLTBL
704A
[End of MCO 14240]
MCO: 14241 Name: DPM Date: 6-Sep-89:07:23:57
[Symptom]
Stopcode OVA on a KS10 during SYSINI.
[Diagnosis]
EVA pages overflow BOOT address space because the high segment
grew a bit.
[Cure]
Slide the high segment origin down 2 pages.
[Keywords]
HIGH SEGMENT
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 453 COMMON MONORG
704A
[End of MCO 14241]
MCO: 14242 Name: KDO Date: 11-Sep-89:14:05:55
[Symptom]
Unusable TTY DBBs.
[Diagnosis]
LATSER creates a TTY DDB for host-initiated connects, but INITIA uses a
different one, causing LATSER's to float free.
[Cure]
If it hurts everytime I do this, don't do it anymore.
[Keywords]
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 454 LATSER GETTDB
704A
[End of MCO 14242]
MCO: 14243 Name: ERS Date: 12-Sep-89:08:18:55
[Symptom]
Various. Lots of monitor too big and slow. Several places that a user mode
section number are lost. And possible working-set confusion if a multi-section
program had a PFH. (Probably wouldn't work anyway.)
[Diagnosis]
GETPC/PUTPC
[Cure]
Remove uses of GETPC/PUTPC. In some cases we simply put the same code in
minus a couple JRSTs. In other places it gets a little more complicated. The
DDT command should now include the section number in the one-word old PC in
JOBDAT. Assume that an extended user is not in his PFH. (A bit of work
would be involved in making an extended PFH work.) Rewrite DOINT. Net result
is that we'll store the section number in the old PC portion of the interrupt
block.
[Keywords]
GETPC
User-mode
Extended-addressing
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
Beware file entry required
New development MCO
Documentation change
[BEWARE text]
Some one word PCs will now contain the section number where they did
not it the past. In paticular commands like DDT should preserve the section
number in .JBOPC. Also, the old PC in the interrupt block should now contain
the section number.
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 454 COMCON USAVE,SEGRLX
ERRCON DOINT
VMSER USRFL6,USRFL7,GETDDT,UPAGE4,PAGA1C
[End of MCO 14243]
MCO: 14244 Name: DPM Date: 18-Sep-89:06:31:17
[Symptom]
If a logical name points to NUL, the FILOP returned filespec
will not store the correct device name following a LOOKUP or
ENTER.
[Diagnosis]
Oversight. The retured device name is the logical name.
[Cure]
Call LNMNUL and return NUL if appropriate.
[Keywords]
NUL
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 455 UUOCON FOPFI0
704A
[End of MCO 14244]
MCO: 14245 Name: DPM Date: 19-Sep-89:06:27:02
[Symptom]
On a very slow system, IPCF sends to jobs which logged in
via FRCLIN can get receiver quota exhausted errors.
[Diagnosis]
The receiver hasn't had the chance to pump up its IPCF quotas.
This is most easily seen on a heavily loaded KS10.
[Cure]
Have LOGREF set the quotas to 511.
[Keywords]
IPCF QUOTAS
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 455 COMCON LOGRF2
704A
[End of MCO 14245]
MCO: 14246 Name: ERS Date: 19-Sep-89:07:57:56
[Symptom]
Monitor too big and slow.
[Diagnosis]
Old code for GET.EXE.
[Cure]
Remove it.
[Keywords]
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 455 COMCON SGSET,UGTSEG
[End of MCO 14246]
MCO: 14247 Name: ERS Date: 19-Sep-89:08:07:26
[Symptom]
GETPC/PUTPC, the second half.
[Diagnosis]
yes.
[Cure]
Yes.
[Keywords]
GETPC
PUTPC
GETPCS
byebye
[Related MCOs]
14243
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 455 ERRCON DOINT
S GETPC,GETPCS,PUTPC
CLOCK1 NOTACL,INCTM4,CIP9,STOP1H,SETPIT,SETPIU,USTART
[End of MCO 14247]
MCO: 14248 Name: RCB Date: 26-Sep-89:05:52:43
[Symptom]
MCO 14231 revisited:
Convert DAEMON reporting of KL error chunks from RSX20F to use system
error blocks. This eliminates two words in the CDB, .CPETM and .CPEAD.
[Diagnosis]
yes.
[Cure]
yes.
This also makes DTE. UUO function 20 (.DTERT) obsolete.
[Keywords]
KL error chunks
system error blocks
system messages
[Related MCOs]
14231
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 456 DTEPRM
704A DTESER
IPCSER
COMMON
[End of MCO 14248]
MCO: 14249 Name: KDO Date: 26-Sep-89:07:27:50
[Symptom]
LAT is slow to start.
[Diagnosis]
If the multicast message is sent before the Ethernet service routines have set
the channel address, LAT servers will use the wrong Ethernet address when trying
to connect to TOPS-10.
[Cure]
Delay the multicast message until after ETHSER does the Set-Channel-Address
(NU.SCA) callback.
[Keywords]
[Related MCOs]
None
[Related SPRs]
36229
[MCO status]
None
[MCO attributes]
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 456 LATSER CBRDSP,LATLSC,LATSCA
704A
[End of MCO 14249]
MCO: 14250 Name: DPM Date: 3-Oct-89:07:45:50
[Symptom]
No way to cause IPA dumps to be written cleanly.
[Diagnosis]
DAEMON currently does this by using system error blocks; a method
which is at best an ugly crock.
[Cure]
Invent a way to allow the monitor to run things at UUO level. This
amounts to adding a forced .EXEC command which when performed on
FRCLIN will create a job slot and run a specified routine at UUO
level. At completion, the control transfers to JOBKL and the job
will be destroyed. This will be used to write IPA dump files. This
MCO however, only implements the necessary code to create the job.
The actual dump stuff will happen in a later MCO.
[Keywords]
DAEMON ERROR LOGGING
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 457 COMCON LOGREF
COMMON COMTAB
CLOCK1 CIP2
SCNSER TTFCOM
704A
[End of MCO 14250]
MCO: 14253 Name: DPM Date: 17-Oct-89:07:24:25
[Symptom]
The monitor has an annoying habit of dumping even if the system has
been up for less than 5 minutes. This is contrary to previous behavior.
[Diagnosis]
While it may be a desirable thing to do under some circumstances, it
isn't desirable in all cases.
[Cure]
Make it optional. In cases where the system crashes during the first 5
minutes of uptime, dump only if the symbol ATODMP is non-zero. By default,
it will be set to 1. Sites which find this behavior disgusting can set it
to 0.
[Keywords]
DUMP
[Related MCOs]
13809
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 461 COMMON ATODMP
MONBTS RLDMON
704A
[End of MCO 14253]
MCO: 14254 Name: DPM Date: 26-Oct-89:07:43:47
[Symptom]
Occasionally, structures cannot be mounted after ATTACHing disk
drives or after a newly formatted pack has been defined.
[Diagnosis]
The routine DSKDRV is responsible for setting up a UDB following
an ATTACH. If errors occurred reading device registers, the unit
status is set appropriately to reflect the error condition. However,
if no errors occurred, DSKDRV assumes a pack must be mounted and
changes the status to 'pack is mounted'. Later, when the STRUUO is
done to define a structure, it will fail because the UDB claims a pack
is already mounted. In the case of a newly formatted pack, ONCMOD
neglects to set the unit state to 'no pack mounted' when the HOM
blocks cannot be read.
[Cure]
Following an ATTACH, do not change the unit status unless there
were errors. When HOM blocks cannot be read, set the unit status to
'no pack mounted'.
[Keywords]
ATTACH DISK
DEFINE STRUCTURE
[Related MCOs]
None
[Related SPRs]
36276
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 462 FILIO DSKDR8
ONCMOD TRYHOM
704A
[End of MCO 14254]
MCO: 14255 Name: RCB Date: 26-Oct-89:11:38:09
[Symptom]
Batch streams can hang forever when they use MIC.
[Diagnosis]
Scenario:
MIC file enables the RESPONSE feature in order to trap error messages into a
MIC variable (parameter). Some program it invokes types out an error message.
MIC wants to get the entire error message into the response buffer, and not
just a part of it, so it waits for the job to go to monitor level or to block
in TIOWQ (TTY I/O wait) before it reads the text. To make sure the text is
available for MIC to read, SCNSER refuses to allow output to happen until MIC's
conditions are satisfied *and* MIC has read the response buffer. Thus, when
the program that was invoked types out a reasonably short error message (so
that it doesn't block in TO) and then loops in NAPQ waiting for the chunks to
empty out before it decides how to type its next prompt, there is a deadlock.
The program never satisfies the MIC conditions for getting the response buffer
read, and thus output never happens, and thus the program is waiting for MIC
waiting for the program waiting for MIC ....
[Cure]
Since the MIC RESPONSE buffer is only 21 octal words in length, and is ASCIZ,
MIC will only ever see a maximum of 84 (decimal) characters of response text.
In other words, it only expects to see one line. So, add a bit in the LDB,
L1LEEL (end of error line, B6 in LDBBYT). This bit is twiddled during the same
routine that notifies us of an error character. The code in XMTMIC which
checks for whether to tell MIC that the response buffer is available will
consider having L1LEEL set to be as good as being in TIOWQ. I.e., if we have
gone back to the left margin since seeing the error character, we will tell MIC
to do its thing.
[Keywords]
MIC under BATCH
hung PTY
[Related MCOs]
None
[Related SPRs]
36279
[MCO status]
Checked
[MCO attributes]
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 462 SCNSER LDBBYT,MICLG3,MICPS4
704A
[End of MCO 14255]
MCO: 14258 Name: DPM Date: 14-Nov-89:06:09:54
[Symptom]
IPA dump file writing facility appears as a wart on the error logging
code.
[Diagnosis]
Way back when, the only way to get UUO-level work done was to get
DAEMON to do some work for you. IPA dump files were processed through
the error logging code by prodding DAEMON with a SPEAR record that was
suppressed from ERROR.SYS.
[Cure]
Now that there's a way to make UUO-level things happen, teach the monitor
to write the dump files and eliminate the need for DEAMON interaction.
[Keywords]
IPA DUMP
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 464 AUTCON AUTDMP
CLOCK1 EXPJO1
COMDEV IPADMP
FILUUO UNQIFL,UNQINI
704A
[End of MCO 14258]
MCO: 14259 Name: DPM Date: 14-Nov-89:06:50:39
[Symptom]
Inaccessible code left over from efforts to clean up error
logging code.
[Diagnosis]
Was just waiting 'til it was all over.
[Cure]
Remove DAEDIE, DAEDSJ, DAEEIM, DAEERR, DAERPT, and DAESJE. Also
remove the interlock word, DAELOK. Shrinks CLOCK1 by 3 blocks.
[Keywords]
ERROR LOGGING
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 464 CLOCK1 DAEDIE,DAEDSJ,DAEEIM,DAEERR,DAERPT,DAESJE,DAELOK
704A
[End of MCO 14259]
MCO: 14261 Name: DPM Date: 21-Nov-89:08:25:53
[Symptom]
On KS10s, defining non-standard device parameters doesn't work.
COMDEV gets assembly errors.
[Diagnosis]
The MDKS10 macro has a junk parameter filled in for the MASSBUS
unit number.
[Cure]
Don't put out sixbit jibberish where a number is expected.
[Keywords]
MDKS10
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 465 MONGEN MDT3A
704A
[End of MCO 14261]
MCO: 14262 Name: JEG/DPM Date: 28-Nov-89:04:51:45
[Symptom]
Possible DI hangs or KAFs out of PCLDSK.
[Diagnosis]
When doing queued protocol for disks, if the primary port is offline,
alot of things can go wrong requeing the I/O to another port.
1. References to UNIKON should be indexed by T1, not U.
2. References to UNIALT are OK for only for CI disks.
3. Extra JUMPN to test the results from CPUOK.
4. Merely checking KDBCAM for non-zero value doesn't guarantee
the other CPU(s) are running.
[Cure]
1. Index UNIKON by T1.
2. Test for a CI disk. If so, use UNIALT. Use UNI2ND for all others.
3. Remove JUMPN. We wouldn't have gotten to PCLOFL if the initial
call to CPUOK was successful.
4. Make a second call to CPUOK to test the new accessibility bits
from the alternate or detached port.
[Keywords]
DI HANG
KAF
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 466 FILIO PCLOFL
704A
[End of MCO 14262]
MCO: 14263 Name: DPM Date: 28-Nov-89:06:45:40
[Symptom]
With DAEMON no longer dependant upon the monitor version, the methods
of determining what is the proper DAEMON version and who is a legal
DAEMON no longer work. Also, there is still some lurking inaccessible
code.
[Diagnosis]
Time for a change.
[Cure]
A SETUUO will be provided so DAEMON can set its job number in the
monitor. It is function 53 (.STDAE). A corresponding GETTAB (%CNDJN,
212,,11) wil read the job number back. The SIXBIT/DAEMON/ name and
JACCT bit are no longer required. In fact, DAEMON has been removed
from PRVTAB.
Also, remove the ERRPT. UUO as the monitor no longer leaves data for
DAEMON to scavenge by this method. The UUO, as well as GETTAB table
entries %LDERT, %LDPT1, %LDPT2, %LDLTH, and %LDESZ are now obsolete.
Stopcode IBI gets deleted along with the code at STOP1 to try to restart
DAEMON after it halts. It can never be made to work.
DAEMON version 24(1030) or later is required from now on.
[Keywords]
DAEMON
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Beware file entry required
Documentation change
UUOSYM change
[BEWARE text]
DAEMON version 24(1030) or later is required with monitor load 466.
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 466 CLOCK1
COMCON
COMMON
COMMOD
UUOCON
UUOSYM
704A
[End of MCO 14263]
MCO: 14264 Name: DPM Date: 30-Nov-89:07:10:40
[Symptom]
Using the MONGEN option to set non-standard device parameters,
defining a printer to be upper case only has no effect. The monitor
treats the printer as lower case. Manually turining off DVLPTL in the
DEVCHR word of the DDB makes the problem disappear and the printer
behave like an upper case only printer.
[Diagnosis]
The routine AUTMDT scans MDTs for non-standard device parameters.
If the device is specified (to MONGEN) using a device code OR a
non-zero CPU number, then everything works as expected. However, if
the customer defaults the device code AND the CPU number (or supplies
CPU0), then a zero device specifier is inserted the MDT. A zero word
signals the end of the MDT. Therefore, AUTMDT will never scan the
entire table and never find the customer specified parameters. Also,
it is possible for AUTMDT to exit without returning the MDT data under
some circumstances.
[Cure]
In MONGEN, set a bit in the device specifier word of the MDT
entry which indicates the word is valid. Thus, CPU0 with a defaulted
device code of zero will no longer look like the table terminator.
Also insure that the MDT data is always returned properly.
[Keywords]
MDT
[Related MCOs]
None
[Related SPRs]
36282
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 467 AUTCON AUTMD3,MDTDV1
DEVPRM MD.VAL
MONGEN ASKRE5,MDT6,MDTTAB
704A
[End of MCO 14264]
MCO: 14265 Name: DPM Date: 4-Dec-89:08:22:27
[Symptom]
Rewinds and skip file operations time out prematurely on 3600 foot
magtapes.
[Diagnosis]
Hung timers are based on the amount of time needed to perform a
given function on a 2400 foot magtape. The values fall short for
3600 foot reels.
[Cure]
Increase all hung timer values by one half.
[Keywords]
HUNG TIMERS
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 467 T78KON HNGTBL
TCXKON HNGTBL
TD2KON HNGTBL
TM2KON HNGTBL
TMXKON HNGTBL
TS1KON HNGTBL
TX1KON HNGTBL
704A
[End of MCO 14265]
MCO: 14266 Name: DPM Date: 5-Dec-89:06:31:35
[Symptom]
No way for the old and new DAEMONs to tell which version ought
to be run.
[Diagnosis]
%CNDAE returns 704, but both the old and new DAEMONs run under
different flavors of 704.
[Cure]
Have %CNDAE return 705. The new DAEMON will require this, but
if it sees 704, it will run DAE704.
[Keywords]
DAEMON
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Beware file entry required
[BEWARE text]
DAEMON which has run with earlier versions of 7.04 should be
renamed to SYS:DAE704.EXE. DAEMON version 24 should be placed
on SYS as DAEMON.EXE. If there is a chance that an earlier 7.04
monitor may occasionally be run, the new DAEMON should also be
copied to SYS with the name DAE705.EXE. This will allow for the
proper synchronization of DAEMONs with the monitor regardless of
which version of 7.04 is run.
%CNDAE is a GETTAB which allows DAEMON to synchronize with
monitor versions. It is intended for use only by DAEMON. Other
programs such as ACTLIB, LOGIN, REACT, and WHO have incorrectly
used this GETTAB to return the monitor version where another,
more appropriate GETTAB, %CNDVN, should have been used. The
Digital programs have been changed to use %CNDVN. Sites should
make similar changes to any user-written programs which may have
used %CNDAE.
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 467 COMMON CNFDAE
704A
[End of MCO 14266]
MCO: 14267 Name: LWS Date: 11-Dec-89:17:28:27
[Symptom]
Problems assigning/init'ing etc devices in monitors with no
ANF support.
[Diagnosis]
AUTDDB does not make device names of the form DEVNNU when NN
is 00. In this case it makes a name of the form DEVU, eg. LPT0 instead
of LPT000. DVSTAS depends on the U of NNU being the last sixbit character
in DEVNAM (bits 30-35) when searching for a DDB. GALAXY spoolers generate
device names of the form DEV00U when ANF is not supported in the monitor.
Using DEV00U as a device name in various UUOs fails because DEV00U will
never match the device name in the DDB, which is DEVU.
[Cure]
Have AUTDDB always build device names of the form DEVNNU when
DR.NET is lit.
[Keywords]
AUTOCONFIGURE
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 470 AUTCON AUTDDB
704A
[End of MCO 14267]
MCO: 14268 Name: DPM Date: 12-Dec-89:07:02:45
[Symptom]
More error logging stuff.
1. The monitor doesn't write DECtape records.
2. The definition of record type 75 is wrong.
[Diagnosis]
1. SPEAR didn't use to understand DECtape records. Now it does.
2. Record type 75 claims it's only used for IPA20 dumps. Not so.
[Cure]
1. Remove references to M.DTAE (introduced during 7.04 development)
as normally turned on. This will cause the monitor to write DECtape
error records.
2. Redefine record 75 to be a generic device dump record with the name
.ESDVD (UUOSYM) and .ERDVD (S). SPEAR also understands this record
now. The monitor still doesn't write this record, but it will soon.
[Keywords]
ERROR LOGGING
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Documentation change
UUOSYM change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 470 COMDEV M.DTAE
DTASER M.DTAE
S .ERDVD
UUOSYM .ERDVD
704A
[End of MCO 14268]
MCO: 14269 Name: DPM/RCB Date: 19-Dec-89:07:32:29
[Symptom]
On multi-CPU systems, at system startup, one frequently
sees varying CPU uptimes and/or undeserved CPUn not running warnings
on the CTY.
[Diagnosis]
When the clocks are turned on, only the policy CPUs uptime
counter is more or less accurate. Non-policy CPUs are looping in their
AC loop waiting for the system to start. During this time, they take
no interrupts and therefore never update their uptime or OK word. When
the system starts timesharing, the uptime words are guaranteed to be
skewed and sometimes the OK words are positive, causing the warnings
on the CTY.
[Cure]
Prior to turning on the clocks, make all CPU's uptime words agree.
Also fix the OK words to be properly negative.
[Keywords]
UPTIME
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 471 SYSINI TIMINI,NODDT
704A
[End of MCO 14269]
MCO: 14270 Name: DPM Date: 26-Dec-89:07:49:18
[Symptom]
After the release of 7.04, SNOOPY fails with the error
"? Undefined breakpoint symbol TM0IN1".
[Diagnosis]
The CPU dependant code for interval timer interrupts was removed
as part of 7.04 development. Because of this change, a SNOOP. UUO
cannot be used to patch the interval timer code without incuring
excessive overhead in the job which is doing the snooping. (It must
weed out all calls except those from the target CPU.)
[Cure]
Add a new SETUUO (.STITP==54) to allow a job to patch the interval
timer. The job must have POKE privs, be [1,2], or running with
JACCT set, and contiguously locked in EVM. The call is:
MOVE AC,[.STITP,,addr]
SETUUO AC,
no privs, bad arguments
success
addr: CPU mask
instruction to XCT (relocated)
For this to work, two CDB locations have been added. .CPITP contains
the instruction to execute and .CPITJ contains the job number which
patched the interval timer code. When interrupts are processed, if
.CPITP is non-zero, it will be executed. A suitably privileged job
may set .CPITP if .CPITJ is zero or is already owned by the job
executing the SETUUO. .CPITP may be cleared by supplying a zero for
the instruction to execute. These words will be forcibly cleared when
a job exits prematurely (ESTOP), control-C's out (STOP1) or does a
RESET UUO.
For the curious, two new GETTABs have been added. %CVITP and %CVITJ
return .CPITP and .CPITJ respectively, although SNOOPY or any other
performance measuring program should have no need to rely on these
words.
[Keywords]
SNOOPY
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Documentation change
UUOSYM change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 472 APRSER TIMINT,SETITP,CLRITP
CLOCK1 ESTOP1,STOP1
COMCON SETTBL
COMMON .CPITP,.CPITJ
UUOCON RESET
UUOSYM %CVITP,%CVITJ,.STITP
704A
[End of MCO 14270]
MCO: 14271 Name: RCB Date: 30-Dec-89:06:36:16
[Symptom]
Device errors and uptime statistics are getting lost.
[Diagnosis]
DAEMON is unreliable about finding crash dumps and reporting errors and
AVAIL statistics from them.
[Cure]
Have the monitor do it. This adds module CRSINI to ERRCON.MAC.
This also adds two new STOPCDs (both in CRSINI):
CRSIAF, type INFO -- CRSINI allocation failure.
CRSINI could not allocate an exec process block in order
to run its UUO-level code.
OLDMON, type INFO -- OLD monitor found in crash file
CRSINI found that the crash file pointed to by BOOT was
for an older monitor than it can process.
[Keywords]
DAEMON
SPEAR
AVAIL
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 472 SYSINI SYSINH,SYSRLD,SYSAVL,BOOTFX
704A COMMON CNFDAE
S .ERCIN,.ERHSB,.EXHDR
CLOCK1 CIP2,SETDJB
ERRCON SEBTIM,XFRSEB,CRSINI
[End of MCO 14271]
MCO: 14111 Name: JMF Date: 1-Sep-88:07:17:30
[Symptom]
Patches made to virtual user mode programs with FILDDT disappear.
[Diagnosis]
If the patch happens to get made to a write locked page, the page
doesn't get written to the swapping space the next time the job gets
swapped out.
[Cure]
If the job and the page are in core and the page is write locked,
write enable the page and decrement .USWLP before copying the data
from the patcher to the patchee.
[Keywords]
JOBPEK
[Related MCOs]
13932, 13137
[Related QARs]
None
[MCO status]
Deferred
[MCO attributes]
New development MCO
QAR answer
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 UUOCON JOBPK3,JPKLW?
VMSER RTNFS0
[End of MCO 14111]
MCO: 14127 Name: JMF Date: 27-Sep-88:05:23:44
[Symptom]
Non-zero section address break doesn't work as expected.
[Diagnosis]
1) Section number gets lost in DATAO APR, in SSEUB.
2) SET BREAK command changing conditions but not break address zaps section
number.
[Cure]
1) DATAO APR,.CPAPR
2) DPB rather than various flavors of HLLxy.
[Keywords]
extended addressing
address break
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Deferred
[MCO attributes]
New development MCO
KL10 only
Extended addressing only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 APRSER SSEU2
COMCON SETB10
[End of MCO 14127]
MCO: 14131 Name: ERS Date: 13-Oct-88:09:41:14
[Symptom]
All known bad areas on a disk are not known to the monitor. Possible, but
unlikely IME.
[Diagnosis]
When we're scanning the BAT blocks we first figure out how many we
have to scan. To get this we add the number of bad regions the monitor found
to the number of areas the disk started with (bad regions found by the various
diagnostic programs). However, the latter we get by indexing off of T3. T3
happens to point to outer space.
[Cure]
Set up T3.
[Keywords]
Bad regions
Swap read errors?
[Related MCOs]
13932, 13137
[Related QARs]
None
[MCO status]
None
[MCO attributes]
PCO required
QAR answer
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 REFSTR SCNBAT
[End of MCO 14131]
MCO: 14132 Name: JAD Date: 13-Oct-88:10:47:42
[Symptom]
Possible inconsistent runtimes on the KL (MCO 13856 revisited).
[Diagnosis]
Forgot one case where "Inhibit Update" was set needlessly.
[Cure]
Clean it up.
[Keywords]
RUNTIME
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 APRSER SSEUB
[End of MCO 14132]
MCO: 14133 Name: JAD Date: 13-Oct-88:10:55:32
[Symptom]
Protocol pause doesn't exist under secondary protocol, but DTESER
doesn't check before trying to effect protocol pause.
[Diagnosis]
Missing test.
[Cure]
Test ED.PPC at SETPP before doing anything rash.
[Keywords]
PROTOCOL PAUSE
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 DTESER SETPP
[End of MCO 14133]
MCO: 14134 Name: JAD Date: 14-Oct-88:10:57:48
[Symptom]
(Unsupported) feature to print PC during SET WATCH FILE output gets
wrong PC during RENAME.
[Diagnosis]
PATH UUO done by PTHFIL blows away .USMUO.
[Cure]
Use JOBPDO+1 for PC.
[Keywords]
WATCH FILE
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 UUOCON WCHPCP
[End of MCO 14134]
MCO: 14135 Name: JAD Date: 14-Oct-88:11:04:43
[Symptom]
Including expensive "want to run" time calculation is an all or
nothing proposition.
[Diagnosis]
Either you JFCL RQTPAT or you don't. If you do, it happens
every tick.
[Cure]
Invent a MONGEN-definable symbol M.NRQT which is the number of
ticks between the "want to run" time calculation. If zero, the
expensive calculation is never done. Patchable on the fly by
twiddling a variable in SCHED1.
[Keywords]
WANT TO RUN TIME
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 COMMON M.NRQT
SCHED1 RQTPAT
[End of MCO 14135]
MCO: 14136 Name: DPM Date: 18-Oct-88:03:04:41
[Symptom]
Giving up the CX resource for the wrong job.
[Diagnosis]
In CTXSER when setting context and saved page quotas, we get the CX
resource if the target job is not ourselves. This works just fine
because the purpose of the CX is to prevent a context block or PDB
from changing out from under us. However at completion of the UUO
function, we only give back the CX is we were changing our quotas.
[Cure]
Only give back the CX if the target is not ourselves.
[Keywords]
CONTEXTS
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 CTXSER XITQTA
704A
703A
[End of MCO 14136]
MCO: 14137 Name: LWS/DPM Date: 19-Oct-88:18:30:13
[Symptom]
Autoconfiguring of -20F devices works by sheer luck or
doesn't work at all. When it does work, the devices sometimes work.
[Diagnosis]
1. We send the "request for device status" msg to -20F in
the wrong format, i.e. 0 byte,,unit # byte.
2. In DCRSER and DLPSER we use the wrong half of an AC to pick up
FE device unit number.
3. We "timeshare" the same word in the device DDB for two different
things.
[Cure]
1. Change FNCTAB dispatch of .EMRDS msg to use "line/data"
format instead of "line" format. This causes msg to be sent in
correct format, i.e. unit # byte,,0 byte.
2. HRRx's --> HLRx's
3. .ORG DEVLSD's --> .ORG DEVLEN's
[Keywords]
FE devices
RSX20F
printers
readers
[Related MCOs]
13932, 13137
[Related QARs]
None
[MCO status]
Checked
[MCO attributes]
KL10 only
QAR answer
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 DTESER FNCTAB
DLPSER DLPDT1,.ORG
DCRSER DCRDT1,.ORG
[End of MCO 14137]
MCO: 14138 Name: LWS Date: 19-Oct-88:19:06:10
[Symptom]
MCO 14126 incomplete
[Diagnosis]
In TPDSMM/CMM all tape kontrollers on the same channel
are put in maintenance mode, but I forgot about dual ported units.
Trying to put the DX20 on 1026 in maintenance mode using MTA0 as
the arg to the DIAG. UUO puts the DX10 in maintenance mode.
[Cure]
Add code to check UDBKDB and put all kontrollers found
in maintenance mode also.
[Keywords]
DIAGs
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 TAPUUO TPDSMM,TPDCMM
[End of MCO 14138]
MCO: 14140 Name: JEG Date: 25-Oct-88:04:35:36
[Symptom]
1. SA10 related crashes not as useful as they could be.
2. Missing improvements in disk code.
[Diagnosis]
1. SAXSER would squirrel away interesting data in the KDBs
on a crash if only someone would ask it to.
2. I've been busy.
[Cure]
1. Call SAXDMP from COMMON in DVCSTS.
2. Implement improved disk driver.
[Keywords]
SA10
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 COMMON DVCST2
DSXKON LOTS
[End of MCO 14140]
MCO: 14141 Name: DPM/JMF Date: 25-Oct-88:04:55:24
[Symptom]
Stopcode KAF trying to start I/O BUS printer.
[Diagnosis]
Hard to say, but it looks like LPTINI was never called, although it's
not obvious how that could happen. Further inspection reveals that the
length of the DDB is wrong. LPTCHF (PI channel flags), value 24 is the
first word in the device dependant portion of the DDB. That's also the
value of DEVCTR. If DEVCTR gets zeroed, the PI channel flags get wiped
out and the contents put into the RH of the CONSO skip chain test. The
next interrupt would not be serviced because the condition bits were all
zeroed and a KAF results.
Other devices could have other problem depending upon the usage of the
words between the starting origin (DEVLSD, DEVLLD, etc.) and DEVLEN.
[Cure]
For all incorrectly defined DDBs, origin the device dependant portion at
DEVLEN.
[Keywords]
DEVLEN
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 CD2SER CD2DDB
CDRSER CDRDDB
LP2SER LP2DDB
LPTSER LPTDDB
PLTSER PLTDDB
PTYSER PTYDDB
[End of MCO 14141]
MCO: 14142 Name: RCB Date: 25-Oct-88:05:36:08
[Symptom]
STRUUO .FSRSL (read search list) is less friendly than the GOBSTR loop that
it's supposed to replace.
[Diagnosis]
Demanding godly privs or same job to read a search list, when GOBSTR only
requires that the invoking job have the same PPN as the target job, or have
some flavor of PEEK/SPY privs, or that the job be reading the SSL.
[Cure]
Change the STRUUO's priv checking to match that of GOBSTR.
[Keywords]
STRUUO .FSRSL
GOBSTR
consistency
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 FILFND RSLSTR
[End of MCO 14142]
MCO: 14143 Name: RCB Date: 25-Oct-88:05:51:33
[Symptom]
Too hard to tell which Autopatch tape a customer is running when we get the
dumps.
[Diagnosis]
No way to distinguish between post-7.04 release monitors.
[Cure]
Change the way A00SVN and A00DLN are used in building A00VER and AXXDVN.
(These are GETTAB items %CNVER and %CNDVN.) This week's monitor will be
load 410 of 7.04A as far as the macros in COMMON are concerned. The load
numbers will be recycled annually, at the same time as we bump the minor
version number (A00SVN). This way, the version stamp on the dump will narrow
down which tape it could have been from, and a check of MONVER will allow us
to tell even more precisely. A00MCO should have been good enough, but it seems
that some customers like to change it when they install published patches.
[Keywords]
Autopatch
Revision control
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A 410 COMMON AXXVER
705 410
[End of MCO 14143]
MCO: 14144 Name: RCB Date: 25-Oct-88:06:41:41
[Symptom]
Can't always connect to TSK devices on other nodes when we should be able.
[Diagnosis]
NETDEV (called from AUTLNK) updates our NDB with our new configuration
(without benefit of interlock) but never tells anyone else in the network about
our changes.
[Cure]
Change NETDEV to light a flag for NETSCN to recompute our configuration. If
it changes, we'll mark everyone else's NDB as needing to hear about it. Later
on in NETSCN, we'll try to tell them all about it.
[Keywords]
ERTNA%
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 NETSER NETDEV,NCSCNF,NETSCN,ICMRCF
NETPRM NDB.XC
[End of MCO 14144]
MCO: 14145 Name: DPM Date: 31-Oct-88:03:58:25
[Symptom]
New: Add a couple of items that were omitted from 704 because of last minute
documentation constraints.
1. Make control-T print the CPU the job last ran on.
2. Make SET WATCH FILES print the PC of the UUO.
[Diagnosis]
[Cure]
[Keywords]
CONTROL-T
SET WATCH FILES
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 411 COMCON USECPU
UUOCON WCHPCP
704A
[End of MCO 14145]
MCO: 14147 Name: JMF Date: 9-Nov-88:07:48:35
[Symptom]
MX gets a protection failure when it tries to append to a mail file
if its running virtual.
[Diagnosis]
Can page fault after doing updating ENTER and if the UUO is
restarted, the combination of FO.PRV and junk in E+3 left over from the
LOOKUP/ENTER results in a protection failure.
[Cure]
If appending in buffered mode, call OUTF early (before updating ENTER)
to eliminate page faults after ENTER has been done.
[Keywords]
.FOAPP
FO.PRV
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 412 UUOCON FOPEN2,FOPN9B
[End of MCO 14147]
MCO: 14148 Name: DPM Date: 14-Nov-88:04:58:54
[Symptom]
Stopcode IME while performing magtape I/O.
[Diagnosis]
If buffered I/O is being done on a DX10 and if a the buffer
overhead words (.BFSTS, .BFHDR, and .BFCNT) are split accross a page
boundry such that .BFCNT resides in the page following .BFHDR, and
that page happens to get destroyed, then an IME will result when
MAKLST tries to read the user's word count for the buffer. No address
checking is done on the word count word in this case.
[Cure]
Add a call to IADRCK.
[Keywords]
MAKLST
[Related MCOs]
13932, 13137
[Related SPRs]
36173
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 412 TAPUUO CHNLS2
704A
[End of MCO 14148]
MCO: 14152 Name: RCB Date: 22-Nov-88:06:01:09
[Symptom]
Files created in SYS: no longer get PRVSYS or PRYSYS as appropriate to the
extension (non-.SYS or .SYS).
[Diagnosis]
Not sure when this broke, but SYSDEV gets cleared in LH(F) and never set again.
[Cure]
Fix the places that want to know or that already check to get SYSDEV right.
In particular, don't just range check against SYSNDX, since that keeps STD: from
lighting SYSDEV. Check the actual PPN of the device instead.
[Keywords]
SYSDEV
PRVSYS
PRYSYS
[Related MCOs]
13932, 13137
[Related SPRs]
36161
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 413 FILUUO SDVTSS,TSTDSK,FOUND0,CREAL5,CURPP1
704A
[End of MCO 14152]
MCO: 14153 Name: DPM Date: 28-Nov-88:09:24:30
[Symptom]
Attempts to log off a job which was stopped in the process of
logging out get a "No such device" error.
[Diagnosis]
If a job was somehow stopped while logging out, the job may have
been partially destroyed. In particular, there may be no remaining
context blocks. Subsequent attempts to kill the job fail because the
run of the LOGIN program to log the job out will not succeed. This is
because DDBSRC fails when no context block is found.
[Cure]
The situations surrounding this problem are pretty arcane.
Typically, a job gets into this state because an idle job killer
incorrectly selects a job which is already logging out. The usual
methods of such programs include forcibly HALTing the job in a manner
which bypasses JACCT and Control-C trapping. Hence, the resulting
problem of a halted and partially destroyed job is caused by a
privileged program circumventing privileged protection schemes.
There are a couple of different approaches to solving this
problem. The simplest is to defend against idle job killers. If the
job is logging out, never allow a job to be stopped. This is most
easily accomplished by testing PD.LGO in word .PDDFL of the PDB in the
routine SIMCHK. PD.LGO is turned on by the LOGOUT UUO.
[Keywords]
LOGOUT
[Related MCOs]
13932, 13137
[Related SPRs]
35781, 36146
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 414 CLOCK1 SIMCHK
704A
[End of MCO 14153]
MCO: 14154 Name: KDO Date: 28-Nov-88:19:39:01
[Symptom]
Definition of the context block is esthetically unappealing.
[Diagnosis]
"symbol" == "previous symbol" + "a bunch"
[Cure]
Use .ORG instead.
[Keywords]
maintainability
cleanliness
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 414 CTXSER .CTFLG
704A
[End of MCO 14154]
MCO: 14155 Name: KDO Date: 18-Dec-88:18:02:24
[Symptom]
Cannot define the default circuit cost for each device type.
[Diagnosis]
No code.
[Cure]
Add the following symbols to COMDEV:
%RTCTST circuit cost for TST device
%RTCDTE circuit cost for DTE device
%RTCKDP circuit cost for KDP device
%RTCDDP circuit cost for DDP device
%RTCCIP circuit cost for CI device
%RTCETH circuit cost for Ethernet device
%RTCDMR circuit cost for DMR device
These symbols are used in the KONCST table of D36COM.
[Keywords]
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 415 COMDEV
D36COM KONCST
ROUTER
[End of MCO 14155]
MCO: 14156 Name: DPM Date: 4-Jan-89:06:28:51
[Symptom]
The system wide VM counters for IW and NIW page faults are
half-word quantities which don't take too long to overflow.
[Diagnosis]
Old monitors didn't page fault too often. Now they do.
[Cure]
Add two new GETTABs:
%VMIWS==42,,113 ;SYSTEM COUNT OF "IN WORKING SET" FAULTS
%VMNIW==43,,113 ;SYSTEM COUNT OF "NOT IN WORKING SET" FAULTS
Also, because SYSTAT and SYSDPY are crufty programs and not easily
modified keep SYSVCT up to date, but mark it and GETTAB %VMSPF as
obsolete to entice programs to use the new counters. If SYSTAT and
SYSDPY are ever fixed, the monitor will cease to maintain SYSVCT,
so programs shouldn't rely on %VMSPF.
[Keywords]
VM COUNTERS
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
UUOSYM change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 416 COMMON SYSVCT,SYSIWS,SYSNIW
704A MONPFH PFHXCI,PFHXCN
UUOSYM %VMIWS,%VMNIW
VMSER USRFL7
[End of MCO 14156]
MCO: 14157 Name: RCB Date: 11-Jan-89:20:14:12
[Symptom]
Problems with TSK devices:
1) Can't always do an "enter passive" which is restricted to a specific node.
2) The count of TSK devices is decremented more often than it's incremented.
[Diagnosis]
1) The remote doesn't admit to TSKs until someone does an unrestricted
"enter passive" there.
2) AUTKIL is checking the next DDB's station number value rather than that of
the DDB being removed when deciding whether to decrement the device count.
[Cure]
1) Always claim at least one TSK DDB if TSK service is loaded.
2) Check the right DDB in AUTKIL.
[Keywords]
TSK
NETCNF
DDBCNT
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A 417 NETSER NTSC.C
705 417 AUTCON AUTKI4
[End of MCO 14157]
MCO: 14158 Name: RCB Date: 11-Jan-89:21:17:51
[Symptom]
Jobs using the MIC RESPONCE feature hang sometimes on a terminal, and always
on a PTY.
[Diagnosis]
Race condition in MICLG3 which can cause us never to notify MIC that it's time
to take the response, and a mistaken test in PTYSER (the JOBSTS UUO) that won't
let us even try to notify MIC that the time has come.
[Cure]
Yes.
[Keywords]
MIC RESPONCE
MIC UNDER BATCH
[Related MCOs]
13932, 13137
[Related SPRs]
36167
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A 417 SCNSER MICLG3,TOPMCL,TOPMS1,TOPMG1
705 417 PTYSER UJBST6
[End of MCO 14158]
MCO: 14159 Name: RCB Date: 20-Jan-89:14:07:59
[Symptom]
Fallback presentation of eight-bit characters doesn't work when a free CRLF is
required by the character expansion.
[Diagnosis]
The code to re-eat a character for echo or output doesn't handle the case of a
multi-part character expansion.
[Cure]
Keep track of which character (from an expansion or otherwise) caused the line
wrap, so we can send the right one when the time comes to re-eat it.
[Keywords]
Two-part characters
Three-part characters
Fallback presentation
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A 417 SCNSER LDBOST,XMTCH1,XMTREO,XMTREE,REEAT
705
[End of MCO 14159]
MCO: 14160 Name: LWS Date: 29-Jan-89:20:49:51
[Symptom]
1. Problems detecting "data errors" on 20F card readers.
2. 20F card reader ignored after reading a card with a 9-punch in
column 1.
[Diagnosis]
1. Part of the problem is 20F itself. In V16-00, when a data
error occurs, the bad data is passed to the -10. Then the status msg
comes, but since I/O is not in progress we pitch the status msg.
V16-01 of RSX20F fixes the problem of passing the bad data instead of
just sending a status msg. (and fixes the problem where it always sends
a status msg after any data transfer from the reader).
The monitor never checks status bits that indicate a data error. The bit
it checks is not set by 20F when a read/stack/pick check occurs.
2. Before processing any msg from 20F, DCRSER calls SETRGS to setup
ACs and find the DDB etc. The first thing SETRGS does is test the 1st byte
of the msg for the "non-existant" device bit - useful during autoconfiguration
when examing a status msg. However, on a data transfer, a 9-punch
in Col. 1 happens to be the same bit!
[Cure]
1. Check read/pick/stack check bits in status byte also.
2. Change SETRGS entry point to SETRGX and only call it when a status
msg is received. (the ONLY time we care about non-existant devices).
Move SETRGS entry down a few instructions where it starts looking
for a reader DDB.
[Keywords]
card readers
RSX20F
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 420 DCRSER SETRGS,F11DVS
[End of MCO 14160]
MCO: 14161 Name: DPM Date: 1-Feb-89:08:24:52
[Symptom]
In some configurations, an offline alternate port claims to be an
RP04.
[Diagnosis]
This problem is highly dependent upon timing and configuration,
and only affects MASSBUS disks. At system startup time, the disk
drives are autoconfigured. Drive type information is gathered and
properly stored in the unit data blocks. Later, ONCMOD will build the
in-core structure data base and again, attempt to read the drive
types. This redundant drive type check exists to guard against the
operator swaping LAP plugs, thus changing an RP06 into an RP05. If
the drive type register cannot be read, then incorrect data is stored
in the drive type byte in the unit data block.
It is not clear why the second attempt to read the drive type
register fails. The DATAI to read the register returns zeros.
Normally, this could happen because the other port is busy or if the
last I/O operation on the other port failed to do a dual-port drive
release upon completion. Since the drive is offline, no I/O was
started. Also, it has been observed that if one or more online drives
exist with a higher unit number, then the problem disappears. This
indicates a possible hang in the controller. In addition, if the
interval between checking the primary port and and the alternate port
is sufficiently long, then the DATAI always succeeds.
[Cure]
Problems similar to this have existed for at least 3 monitor
releases. The only flaw which all monitors have in common is that the
failure to read the drive type is ignored and junk overwrites the
drive type code in the unit data block. A simple solution is to jump
around the code which stores the drive type byte. After all this
time, it seems unlikely we will determine the real nature of the DATAI
failure, so this work around must suffice.
[Keywords]
RP04
[Related MCOs]
13932, 13137
[Related SPRs]
36230
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 420 ONCMOD TRYUNI
704A
[End of MCO 14161]
MCO: 14162 Name: JEG/DPM Date: 6-Feb-89:05:34:53
[Symptom]
Day one 7-series bug: Stopcode DAU and corrupt user core images.
[Diagnosis]
A user job enables for clock interrupts via APRENB. APRSUB proceeds
in a normal service-a-clock-tick fashion, but notices that the user
has requested a clock-interrupt, and so it exits not with POPJ but
with a JRST off to APRUTP. APRUTP may decide to fall thru to APRUT2.
If T4 doesn't have UE.PEF/UE.NXM on (and it won't of course) it will
continue to fall thru. APRUT2 will decide there is a loop in the trap
handler (and there is). At this point APRUT2 loads the double word PC
into T3/T4, and saves it off to .CPAPC for the error message. Then
it branches off to APRUTW. APRUTW sets up the APRLOP PC, and exits off
to APRSTU. APRSTU looks at T4 expecting possibly to find UE.PEF!UE.NXM,
but instead it has some PC bits left over from APRUT2. This fools
APRSTU into calling DIENLK.
[Cure]
At APRUT2, don't clobber T4 with a PC. Use T1/T2 instead.
[Keywords]
CLOCK INTERRUPT
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 421 CLOCK1 APRUT2
704A
[End of MCO 14162]
MCO: 14163 Name: DPM/RCB Date: 7-Feb-89:07:57:02
[Symptom]
A user can PIVOT away from a PPN that the CHGPPN checks
will not allow him to return to.
[Diagnosis]
Oversight.
[Cure]
Always allow CHGPPN to work if returning back to the job's
logged-in PPN.
[Keywords]
CHGPPN
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 421 UUOCON CHGPPN
704A
[End of MCO 14163]
MCO: 14164 Name: TL Date: 10-Feb-89:15:34:21
[Symptom]
RX2 STOPCDs
[Diagnosis]
If the RX20 (RX211) controller is broken such that TR is not returned to
STRTIO, it is possible (but unlikely) for the RX20 controller to post an
error interrupt.
If it does, then the error interrupt service routine will free the controller,
or, worse yet, schedule IO for another drive. In either case, we return from
the interrupt back into STARTIO, where we now write the drive registers out of
sync with what the controller expects.
This causes an error interrupt, and since no drive is (probably) active,
an RX2 STOPCD.
[Cure]
Turn the PI system OFF while in STARTIO. On TR errors, deschedule
the controller before turning it back ON. Since it's possible for the
KS to accept a vectored interrupt before the deschedule code resets
the interrupt enable bit, teach RX2INT to dismiss unexpected interrupts
rather than STOPCD.
[Keywords]
RX2
STOPCD
RX2SER
RX20
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
KS10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 422 RX2SER RX2INT,STRTIO,SETPAR
704A
[End of MCO 14164]
MCO: 14165 Name: RCB Date: 14-Feb-89:07:45:35
[Symptom]
PAGE. UUO function .PAGAC does not work right for non-existent pages in
mapped sections.
[Diagnosis]
The code to report non-existent pages and/or independent sections is
not sufficiently forgiving of dependent sections.
[Cure]
Always return the mapping information for dependent sections.
[Keywords]
Page Accessability
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Extended addressing only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 422 VMSER PAGAC1,PAGAC6,PAGAC7
704A
[End of MCO 14165]
MCO: 14166 Name: DPM Date: 14-Feb-89:07:59:06
[Symptom]
Ill mem ref running SPEAR following magtape error logging. Seen mostly
with multi-ported tapes, but theoretically possible on any tape.
[Diagnosis]
When two or more kontrollers have access to the same tape drive, the IEP
and FEP blocks are timeshared. It is expected that DAEMON will finish
servicing one error before the monitor queues up data for the next. In
practice, this isn't always the case.
[Cure]
Convert TAPSER, TAPUUO, and the drivers to use system error blocks. When
an error occurs, the data will be copied into the SEBs and queued up for
DAEMON to write into ERROR.SYS. The biggest problem with doing this is
the monitor must format the error record itself, as SEBs are merely copied
into the error file without modification. No big deal. This will reduce
the monitor's dependency on DAEMON.
[Keywords]
MAGTAPE ERROR LOGGING
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 422 AUTCON RDDTN
704A DEVPRM TUB
S
TAPSER TAPDRV
TAPUUO LOTS
T78KON
TCXKON
TD2KON
TM2KON
TMXKON
TS1KON
TX1KON
[End of MCO 14166]
MCO: 14168 Name: RCB Date: 20-Feb-89:12:55:04
[Symptom]
Two complaints received regarding ONCMOD:
KLAD pack interface isn't as friendly as it could be.
Bad block typeout is sometimes a little too terse.
[Diagnosis]
We special-case the KLAD structure in several places, but we don't
treat it specially when gathering units for a defining a structure.
After all, we know that the KLAD pack is just one pack.
If the user requests that bad blocks be shown for a unit, but the unit
has no bad blocks recorded, then we don't type out anything about
bad blocks. This leaves the user wondering whether we forgot about
the request to show them.
[Cure]
Only ask for one spindle when gathering units for structure KLAD.
Add the message "[No bad blocks found on unit <unitname>]".
[Keywords]
KLAD
BAF
BAT
Bad blocks
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A 423 ONCMOD GETBAT,GETUN6
705
[End of MCO 14168]
MCO: 14169 Name: RCB Date: 21-Feb-89:08:45:24
[Symptom]
Invalid prompt for first logical block for swapping when defining a structure.
[Diagnosis]
Re-use of the old value for a unit which is no longer valid after other
changes to its swapping parameters.
[Cure]
Range check the old value, and don't use it for the default if it's invalid.
[Keywords]
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A 423 ONCMOD GETSW1
705
[End of MCO 14169]
MCO: 14171 Name: LWS Date: 22-Feb-89:18:41:42
[Symptom]
1. Problems running DFDXC and DFDXD in user mode. Specifically
when using the "Specify Channel Program" DIAG. function, .DISCP.
2. We try to load the DX10 on CPU0 from CPU1 when a DX20 diag exits.
[Diagnosis]
1. The diags use the .DIAAU (Assign all units) DIAG. function
to keep things nice when it starts init'ing the channel, etc. When
the .DISCP DIAG. function is used, the monitor grabs the DDB for the
tape drive from the PDB. However, the DDB in the PDB when the .DIAAU
function is used is the last tape drive DDB. Not necessarily the DDB
for the tape drive the diag is using. So the CCW list is build for
the wrong drive (except when the last drive is used). After subsequent
calls using .DISCP, we run out of free core for CCWs.
2. When a DX20 diag puts the controller in maintenance mode, we detect
that the DX10 can also access the drives so we put it in maintenance
mode also. This keeps TAPSEC happy. The diag sets CPU to CPU1 because
that's where the DX20 is located. When the diag releases the DX20,
or is ^C'd, TPMCMX is called to free up everything. Since we are
running on CPU1, the load of the DX10 fails (cause we use the TPKRES
and TPKLOD dispatches from TPMCMX).
[Cure]
1. Can of worms. Because of the way the DIAG. functions work
wrt to DIAKDU and DIADEV, make the diags do a .DIASU (Assign single
unit) DIAG. fucntion so that the proper DDB is placed in the PDB and
is found by the next .DISCP DIAG. function. In order for this
to work, we have to let TPDASU (and TPDAAU for consistency) do their
stuff even if F is nonzero on entry to the routine. They still make
sure that the current job is the one executing the UUO if the DDB is
already "owned".
2. In TPMCMX, check the KDBCAM mask against .CPBIT before calling TPKRES
and TPKLOD routines.
[Keywords]
Diagnostics
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 423 TAPUUO TPDASU,TPDAAU,TPMCMX,TPDHVF
[End of MCO 14171]
MCO: 14172 Name: DPM Date: 24-Feb-89:04:01:53
[Symptom]
A batch login may fail if the number of logged-in jobs minus the
number of reserved batch job slots is greater than LOGMAX.
[Diagnosis]
The difference between LOGMAX and JOBMAX is the number of jobs
reserved for emergency logins. A logging-in timesharing job may be
granted access if LOGNUM will not exceed LOGMAX, and providing BATMIN
job slots are reserved for batch logins. However, if the job logging
in is running under batch, then BATMIN must not be included in the
computation.
[Cure]
Don't account for BATMIN job slots when a batch job is logging
in. Its inclusion is only meaningful for timesharing logins.
[Keywords]
BATMIN
[Related MCOs]
13932, 13137
[Related SPRs]
36246
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 424 UUOCON ACCLOG
704A
[End of MCO 14172]
MCO: 14173 Name: DPM Date: 24-Feb-89:09:34:47
[Symptom]
The old methods of DAEMON error logging leave something to be desired.
[Diagnosis]
Currently, most of the monitor expects DAEMON to gather additional data
for ERROR.SYS beyond what it's initially given. This exercises race
conditions, slows performance because jobs are sometimes stopped until
DAEMON is finished, and makes DAEMON dependant upon monitor versions
and data structure formats.
[Cure]
Start converting the old-style DAEMON calls to use System Error Blocks.
SEBs eliminate the race conditions because one is queued up for each
error log entry rather than always overwriting the same storage with
new error data. Performance is improved by not having to prevent jobs
from running while DAEMON is logging the error. This also eliminates
the dependancy of DAEMON upon the monitor because the monitor will
format the entire record. DAEMON merely copies SEBs into ERROR.SYS.
This edit will do:
DL10 error records
I/O BUS LPT error records
Stopcode records
Software Events (POKE, RTTRP, SNOOP, and TRPSET)
[Keywords]
ERROR LOGGING
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 424 CLOCK1 DAEEST
704A COMDEV DL10EL
ERRCON DIELOG,XFRSEB
LPTSER LPTSYR
RTTRP RTRET
S EX.SYE,EX.DEL
UUOCON POKE2,SNPIBP,TRPSTX
[End of MCO 14173]
MCO: 14174 Name: DPM Date: 28-Feb-89:05:45:07
[Symptom]
New: To accomodate future tape service big fixes and enhancements,
increase the size of the IORB. Do this by defining a set of "common"
IORB definitions, to be used initially by tape service, and possibly
later by FILSER. Append to the common portion, the tape-specific
words.
Common words:
.ORG 0
IRBLNK::!BLOCK 1 ;FORWARD LINK TO NEXT IORB
IRBACC::!BLOCK 1 ;ACTIVE (CURRENT) CHANNEL COMMAND
IRBCCW::!BLOCK <MXPORT==:4> ;ADDRESSES OF CHANNEL COMMANDS
IRBIVA::!BLOCK 1 ;ADDRESS OF INTERRUPT ROUTINE
IRBDDB::!BLOCK 1 ;ADDRESS OF DDB BEING SERVICED
IRBSIZ::! ;LENGTH OF COMMON IORB
.ORG
Tape-specific words:
.ORG IRBSIZ
TRBFNC::!BLOCK 1 ;FUNCTION DATA
TRBSTS::!BLOCK 1 ;TERMINATION STATUS
TRBRCT::!BLOCK 1 ;BYTE COUNT OF TRANSFER, IF DATA READ
TRBLEN::! ;LENGTH OF BLOCK
.ORG
IRBLNK is the old TRBLNK, but a full word quantity.
IRBCCW is the merger of TRBXCW and TRBEXL.
IRBIVA is the old TRBIVA.
TRBFNC is the old LH or TRBLNK and now can grow beyond bit 17.
TRBSTS could also be made a full word quantity.
[Diagnosis]
[Cure]
[Keywords]
MAGTAPE
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 424 DEVPRM
704A SCAPRM
TAPSER
TAPUUO
T78KON
TCXKON
TD2KON
TM2KON
TMXKON
TS1KON
TX1KON
[End of MCO 14174]
MCO: 14175 Name: JEG/DPM Date: 28-Feb-89:06:06:19
[Symptom]
ADP code reading. Jeff Gunter points out that SCNPIF doesn't include
DSKBIT in configurations which have only a single CPU. Why is that,
he said?
[Diagnosis]
Don't know. Looks like an oversight. While this is a common configuration,
it could only cause problems when the monitor is in the middle of a SCNOFF
and FILSER decides to print "problem on device" at interrupt level. The
SCNOFF will not have turned off DSKCHN, thus allowing FILSER to do obscene
things at inappropriate times.
[Cure]
Probably doesn't happen alot. Remove the conditional assembly and always
include DSKBIT in SCNPIF. This is necessary only because FILSER insists
on typing out at interrupt level.
[Keywords]
SCNSER
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 424 COMMON SCNPIF
704A
[End of MCO 14175]
MCO: 14176 Name: DPM Date: 7-Mar-89:05:38:49
[Symptom]
It has been brought to our attention that some customer(s) want to
install RM05s on their DEC-10. So be it, however, this is not
recommeded and will remain UNSUPPORTED.
RM05s are interesting devices. They are faster than an RP06 and
consume less power, as they require only single phase power. An
RM05 has 30 sectors/track (10 more than an RP06), yet they run
at the same 3600 RPM. Therefore, the capacity and the transfer
rate is about one third greater than an RP06.
Don't look a gift horse in the mouth. For starters, the head crash
rate is rather high. It seems that RM05s work best when left alone.
Despite the fact that they use removable media, frequent disk pack
changes greatly increase the chance of a head crash. The heads fly
fairly close to an RM05 pack; much closer than in an RP06. Presumably,
this is the main cause of head crashes. Also, parts for RM05s are not
nearly as plentiful as are those for RP06s.
[Diagnosis]
Missing table entries in RPXKON.
[Cure]
Add entries to the tables for blocks per unit, etc. This is all that's
required to make RM05s work. In all other manners, RM05s behave like
an RP06.
[Keywords]
RPXKON
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 425 AUTCON DTRTBL
DEVPRM TY.RM5
RPXKON TYPTAB
UUOSYM .DCUR5
[End of MCO 14176]
MCO: 14177 Name: DPM Date: 6-Mar-89:05:54:25
[Symptom]
DAEMON error logging.
[Diagnosis]
Yes.
[Cure]
Convert more old-style calls to use System Error Blocks. Changes
in this edit include:
1. Channel NXM & parity error logging.
7.04 records written by DAEMON contained mostly junk.
2. DECtape error logging.
3. KS10 memory error logging.
Doc change: This adds one word (.CPMFL) to the CPU subtable
FOR KS10 memory errors. This word is a flag which indicates
the last type of error (0 = soft, 1 = hard). Also, the length
of the subtable (.CPMSL) was off by one word and has now been
corrected.
4. KS10 card reader & line printer error logging.
[Keywords]
ERROR LOGGING
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 425 APRSER MEMCHK
704A CD2SER CDRSYR
COMDEV DTXEST,DTXEFL,DTXEBK
COMMON .CPMFL,.CPMSL
DTASER ERRS,DTASYR
ERRCON CHNCO3
LP2SER LPTSYR
[End of MCO 14177]
MCO: 14178 Name: RCB Date: 13-Mar-89:15:12:41
[Symptom]
STOPCDs AAO and IME, undeserved address checks, and undeserved checksum errors
during dump-mode I/O.
[Diagnosis]
In the old days before 7.03, LRNGE was called to range-check an IOWD. It
checked everything we needed to have checked just fine. One of the things
which it checks is that the range of addresses does not cross a section
boundary. Thus, it was no longer appropriate once .FOFXI/.FOFXO (extended dump
I/O) were added to the FILOP. UUO. MONPFH does not check that old-style
IOWD-based I/O does not cross a section boundary, nor does it check that the
I/O is not done to the ACs. This can lead to AAOs. If the user's working set
includes swapper-write-locked pages, then MONPFH will call LRGNE, even though
it might be doing extended I/O, thus resulting in an undeserved address check
error for an I/O doubleword which crosses a section boundary.
If FILIO has to perform error recovery and retries during a dump-mode I/O
operation which ends at a section boundary, under some circumstances it leaves
DEVISN containing bogus information in the DDB. If this was also the first
block in a retrieval pointer, we will then proceed to attempt to calculate the
checksum based on a user address which we calculate, in part, from this junk in
DEVISN(F). This can cause either an IME or an undeserved checksum error.
Finally, much of the above is exacerbated by NXCMR in UUOCON, which is the
common routine used to fetch and validate the next IOWD in a user's channel
command list. It does not validate correctly when MONPFH passes it an IOWD
which either starts with or crosses a section boundary.
[Cure]
Teach NXCMR how to validate all IOWDs which PFDOIO might pass. Correct all
incorrect uses of DEVISN(F). Teach PFDOIO to use ZRNGE rather than LRNGE when
it wants to fix up swapper-write-locked pages. Teach PFHDMP to give an address
check error when an old-style IOWD crosses a section boundary. Teach CHKSUM to
use GETEWD rather than GETWRD, so that it always fetches the correct word from
the user's buffer. Teach PFDOIO to validate the range of address for I/O in
order to be sure that I/O is not attempted to the ACs.
[Keywords]
DUMP I/O
AAO
IME
ADDRESS CHECK
CHECKSUM ERROR
IOIMPM
IO.IMP
[Related MCOs]
13932, 13137
[Related SPRs]
35576, 36064
[MCO status]
Checked
[MCO attributes]
Extended addressing only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 426 UUOCON FOPN9B,UINITC,RELEA4,NXCHIT
704A MONPFH PFHDM1,DOIO2
FILIO SATADR,MONIOY,SETLS7,POSER2,ECC2,ECC3,NOECC,CHKSUM,CSHC2B,CSHB2C
FILUUO DUMPG9
[End of MCO 14178]
MCO: 14179 Name: JEG/DPM Date: 21-Mar-89:05:41:31
[Symptom]
FILSER doesn't usually continue from a DHD stopcode (Don't Have DA).
[Diagnosis]
If IOSDA is off in S (but not necessarily in DEVIOS), then a DHD
will result. But if the job really does own the DA resource, it
will hang, since the DA is never released.
[Cure]
Let the DHD return .+1. Further checks will prevent the DA from
being returned for the wrong job (a RWD is likely). If we manage
not to get a RWD, then the DA will be released and the monitor
will continue with no problems.
[Keywords]
STOPCODE DHD
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 427 FILIO DWNDA
704A
[End of MCO 14179]
MCO: 14180 Name: JEG/DPM Date: 21-Mar-89:05:45:20
[Symptom]
Stopcode KLPKAF following parity scans.
[Diagnosis]
A parity scan requires more than KAFTIM seconds to complete.
If PPDSEC doesn't get called soon enough (and it won't because
of the scan), it declares the KLIPA dead.
[Cure]
Increase KAFTIM from 10 to 35 seconds. This allows about 8 seconds
per meg plus a few extra for good measure. Increase KNISER's timer
(also called KAFTIM) from 30 to 35 seconds too.
[Keywords]
KLPKAF
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 427 KLPSER KAFTIM
704A KNISER KAFTIM
[End of MCO 14180]
MCO: 14181 Name: JEG/DPM Date: 21-Mar-89:05:49:40
[Symptom]
DI hangs on RA failovers. A failover can leave several jobs stuck in
"problem on device" mode for the old unit, even after lots of time
passes.
[Diagnosis]
PCLDSK may inadvertantly get called with an "old" unit if a failover
is happening while another CPU is preparing to start I/O. The "old"
unit was OK, but now, KDBCAM contains zero, causing PCLDSK to get
called. PCLDSK sees no CPUs (and indeed there aren't any with the
old unit) and calls HNGSTP, eventually looping back to PCLDSK again
with the "old" unit.
[Cure]
If there is an online alternate port, use it and bypass HNGSTP.
[Keywords]
FAILOVER
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 427 FILIO PCLDSK
704A
[End of MCO 14181]
MCO: 14182 Name: JEG/DPM Date: 21-Mar-89:05:54:13
[Symptom]
If a CPU croaks before it can be warm-restarted successfully, and
field service is able to fix it "on the fly", sometimes bad things
(usually hangs) happen immediately following the J 400.
[Diagnosis]
This can happen because a CPU restart clears SP.CJn for all jobs,
and then CPUZAPs the "running job" for the CPU, leaving a small
window when the job can be scheduled to run on another CPU.
[Cure]
Change SPRINI to call CPUZAP first, and then clear SP.CJn.
[Keywords]
WARM RESTART
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 427 COMMON SPRLP1,SPRI11
704A
[End of MCO 14182]
MCO: 14183 Name: JEG/DPM Date: 21-Mar-89:05:59:19
[Symptom]
Stopcode KAF in QUESER.
[Diagnosis]
It is possible for one CPU to be didling the database at UUO level
with the EQ lock, while ENQMIN runs at interrupt level on another
CPU. If UUO level CPU removes and releases the free core holding a
block that is being scanned by ENQMIN, KAFs or other stopcdes may
result.
[Cure]
Implement a scheme where UUO level waits for interrupt level and
interrupt level punts if UUO level holds the EQ resourse.
[Keywords]
STOPCODE KAF
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 427 QUESER ENQMN2,EQLOCK,LOKINQ
704A
[End of MCO 14183]
MCO: 14184 Name: JEG/DPM Date: 21-Mar-89:06:03:20
[Symptom]
If a CI disk contains HOM blocks which look like valid but contain
a zero word for the structure name, a failover will cause PULSAR
to sniff out the disk and mount a structure with no name.
[Diagnosis]
Monitor never checks for a zero structure name in DEFSTR.
[Cure]
Return "illegal structure name" error when no name is given.
[Keywords]
DEFINE STRUCTURE
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 427 FILFND DEFSTR
704A
[End of MCO 14184]
MCO: 14186 Name: DPM Date: 24-Mar-89:08:37:20
[Symptom]
Stopcode KAF in KNISER.
[Diagnosis]
On a very busy Ethernet wire, it is possible to spend more than 6
seconds at interrupt level taking packets off the KLNI. RSX-20F
has little patience for this sort of nonsense, so it KAFs the -10.
[Cure]
Put an arbitrary limit on the number of packets that we'll process
in a single interrupt. Experimentation has proven that trying to
remove 2100 (decimal) or more packets from the queue will result in
a KAF. Therefore, set the limit to 2000. Location .PBMPP (maximum
packets processed) in the KDB/PCB contains the limit and can easily
be patched to a different value. When the limit is exceeded, a
KNIKSP (KLNI Service Paused) info stopcode will be typed on the CTY.
Then the PIA will be removed for one second to let things settle down.
[Keywords]
KNISER KAF
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 430 KNISER KNIRQ1,KNIPAU,KNICON
704A
[End of MCO 14186]
MCO: 14189 Name: JEG/DPM Date: 28-Mar-89:08:22:29
[Symptom]
If a program dies with infinite IPCF quotas and freecore is very low
or about to expire, the system grinds to a standstill. Some jobs are
stuck NApping and others get unexpected error returns. Trying to log
off the offending job fails.
[Diagnosis]
IPCLGO does two things. It sends a logout message to QUASAR and it
turns around all unreceived messages; in that order. The send to
QUASAR will fail because there is no available freecore, and the
logging out job owns a large chunk of it.
[Cure]
Reverse the order of things. First, empty the send and receive queues,
then send the logout message to quasar.
[Keywords]
IPCF LOGOUT
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 430 IPCSER IPCLGO
704A
[End of MCO 14189]
MCO: 14190 Name: JEG/DPM Date: 4-Apr-89:05:14:36
[Symptom]
Stopcode IME removing a structure. Other problems possible too. When
allocation is in progress, or the ACCs and NMBs are in transition, and
a structure is being removed, an IME is likely to occur on a busy system.
[Diagnosis]
TAKBLK and friends rely on DEVUNI(F) to indicate the target unit for a
structure is still valid. FILSER normally depends upon TSTGEN checking
UNIGEN. The window is sufficiently large to allow the SKIPN DEVUNI to
work while REMSTR is removing a structure.
[Cure]
Change TAKBLK to call TSTGEN. Make BMPGEN get and release the DA around
the update of UNIGEN.
[Keywords]
DISMOUNT
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 431 FILFND BMPGN1
704A FILIO TAKBL0,TAKBLJ
[End of MCO 14190]
MCO: 14191 Name: RCB Date: 5-Apr-89:22:16:13
[Symptom]
Hung ANF traffic to a node. Especially common over an Ethernet channel. It
may (sometimes) correct itself eventually, especially if it was not an Ethernet
channel that was involved.
[Diagnosis]
After NETWRT queues an output message (PCB) to the FEK, it calls its device
driver to perform the output. This can happen several times before the device
driver tells the FEK routine that the output has happened. At that point, the
FEK routine tell NETSER that the message has been sent. This causes the PCB to
placed on a generic output-done queue for NETSCN to process. Once we get to
NETSCN, we move PCBs from the this queue to a queue for the NDB for the node to
which we were sending the message. The subroutine responsible for this,
NTSC.O, is also responsible for keeping the NDBLMS (last message sent) field
updated. It does this by noting the message number of each PCB it places into
the output-pending queue in NDBLMS. However, the PCB queue from which it is
taking these messages is unordered, and this can lead to having a very long
list of messages, with NDBLMS reflecting only (for example) the first of them.
Once this has happened, CHKNCA (check network-control ACK) will ignore an ACK
for any message beyond that in LDBLMS. However, the remote is quite likely to
send us an ACK for the actual last message in the ACK-pending queue. This
leads to a full output queue and a refusal to transmit any further data
messages, at least until the REP/NAK timer causes us to send a REP, which will
result in a NAK. Because we ignored the implicit ACK present in the NAK, we
will still have a queue of outstanding messages, which the NAK will cause us to
retransmit all at once. Unless the device driver stutters in a friendly
manner, this will merely get us into the same mess again with the same set of
messages, and no progress will ever be seen.
[Cure]
In NTSC.O, only change NDBLMS if it's moving in a forward direction. In
INCTNK, where we resend the queue in response to a NAK, reset NDBLMS to NDBLAP
in order to avoid possible ACK races.
[Keywords]
ANF Ethernet
Hung ANF
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
HOSS attention
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 432 NETSER NTSC.O,INCTNK
704A
[End of MCO 14191]
MCO: 14192 Name: RCB Date: 5-Apr-89:22:55:57
[Symptom]
Terminal characteristics get handled incorrectly during a SET HOST
session which is handled by NETVTM.
[Diagnosis]
Setting the terminal type happens after all the other characteristics
get set, and clobbers them.
[Cure]
Save the other characteristics until after we set the terminal type in VTMCHR.
[Keywords]
NETVTM
SET HOST
terminal characteristics
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
HOSS attention
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 432 NETVTM VTMCHR
704A
[End of MCO 14192]
MCO: 14193 Name: DPM Date: 6-Apr-89:11:36:40
[Symptom]
MCO 14190 went a bit too far.
[Diagnosis]
In trying to close the window where a structure could be removed
while other things were being done to the ACC/NMB blocks, BMPGEN
was modified to get and give the DA resource. However, one needs
a DDB to use the DA and REMSTR doesn't have one to use. Also,
BMPGEN expects F to contain a STR DB addr, not a DDB.
[Cure]
Can't plug the hole that tight. Remove references to the DA in
BMPGEN and live with occasional IMEs. There is no structure-wide
resource to take care of this situation. Too bad.
[Keywords]
REMSTR
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 432 FILFND BMPGEN
704A
[End of MCO 14193]
MCO: 14194 Name: KDO Date: 6-Apr-89:12:18:04
[Symptom]
Invalid status returned in ETHNT. UUO User Buffer Descriptor (UBD) blocks.
[Diagnosis]
Missing code.
[Cure]
Add code.
[Keywords]
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 432 ETHUUO ENCXDG
704A
[End of MCO 14194]
MCO: 14195 Name: KDO Date: 10-Apr-89:10:54:04
[Symptom]
Adjacency up/down events for DECnet endnodes on multi-area LANs.
[Diagnosis]
DECnet is choosing a designated router outside it's area.
[Cure]
Ignore Ethernet Router Hello messages from outside our area.
[Keywords]
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 432 ROUTER RHMASE
705
[End of MCO 14195]
MCO: 14197 Name: DPM Date: 11-Apr-89:08:44:04
[Symptom]
REFSTR creates files with strange version numbers.
[Diagnosis]
Sticking REFSTR's version rather than the monitor's is at best,
non-standard. But when displayed by DIRECT, it looks like a bug.
[Cure]
Use CNFDVN instead.
[Keywords]
REFRESH
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 423 REFSTR RIBST1
[End of MCO 14197]
MCO: 14198 Name: LWS Date: 17-Apr-89:10:18:26
[Symptom]
1. Same TM02/3 controller register dumped 9 times in TUB
on error.
2. SPEAR doesn't know how to interpret TM02/3 controlled tape drive
error entries or the monitor doesn't give SPEAR what it expects, take
your pick.
[Diagnosis]
1. RDREGS in TM2KON expects T2 to still contain controller
register number on return from RDMBR. RDMBR clears all but register
data in T2.
2. SPEAR expects 2 equal length blocks of error status information
in the error entry (IEP and FEP data). However, TM2KONs IEP length
is 1 and FEP length is 16 (octal). So we only write 1 word of "IEP"
information. This causes SPEAR's interpretation of the error to be
garbage (1. above doesn't help either).
Note: The TUB for a TM02/3 controlled tape drive contains 2 blocks
of TM2ELN words each for "IEP" and "FEP" error information. But
the IEP word is set for only a length of 1. Why? I don't know.
Poking the IEP word on 2476 to be the same as the FEP word causes
2 sets of error information to be dumped and SPEAR correctly
interprets the error. So it seems we can change SPEAR to handle
unequal length "IEP" and "FEP" error blocks, or have the monitor
dump equal length blocks.
[Cure]
1. PUSH/POP T2 around call to RDMBR at RDREG1.
2. Change LH of TUBIEP in TM2KON to be -TM2ELN.
[Keywords]
SPEAR
TM02/3
TU77
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
Field service attention
PCO required
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 433 TM2KON TUBIEP,RDREG1
[End of MCO 14198]
MCO: 14199 Name: LWS Date: 17-Apr-89:11:17:10
[Symptom]
Can't assign a network device if a device of the same
type doesn't exist on the local host.
[Diagnosis]
If the device doesn't exist on the local host there
will not be an entry in GENTAB for the corresponding device.
The call to CHKGEN in DVSTAS will fail and we bomb the user even
though the network device does exist and is assignable.
[Cure]
At the non-skip return after the call to CHKGEN in DVSTAS
load F with the start of the DDB chain and fall through into
code that will eventually do the right stuff. But! This is not
going to work correctly all the time. If no local line printers
exist and we're trying to find a network printer DDB, we eventually
build a DDB for the network printer and try to link it between
the 'DSK' DDB and the 'SWAP' DDB - ding ding ding, IME. This happens
because LNKDDB in AUTCON likes to keep the DDB chain in sorted order
by device name. So 'LPT' falls between 'DSK' and 'SWAP', but 'DSK'
DDBs are in the hiseg. In order to avoid the wrath of FILSER, change
the name of SWPDDB to 'DSKSWP'.
[Keywords]
NETWORK DEVICE
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
Beware file entry required
New development MCO
PCO required
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 433 COMMOD DEVNAM
UUOCON DVSTAS
[End of MCO 14199]
MCO: 14200 Name: LWS/DPM Date: 20-Apr-89:10:52:06
[Symptom]
Tape UDBs on KS not filled in with prototype data.
[Diagnosis]
AUTUDB doesn't compute ending address for BLT.
[Cure]
ADDI P2,(U)
[Keywords]
KS
SPEAR
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Field service attention
PCO required
Single-section monitors only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 434 AUTCON AUTUD1
[End of MCO 14200]
MCO: 14201 Name: RCB Date: 25-Apr-89:07:05:00
[Symptom]
FILSER's error reporting leaves too big a window for DAEMON to get stale
information for SPEAR to report. Not only that, but DAEMON even has to guess
just what kind of error it is supposed to report.
[Diagnosis]
The ERRPT. UUO just doesn't give us enough to work with. We need to use
system error blocks if we're going to get it right.
[Cure]
Do so. This adds EX.AVL to the bits which can be set in the transfer table
header by the SEBTBL macro. If EX.AVL is set, the error entry will be copied
to AVAIL.SYS as well as to ERROR.SYS. This also changes the way in which all
disks report their errors. There is now a kontroller dispatch entry, KONELG,
which is used by FILIO to format an error block and queue it up for DAEMON.
[Keywords]
Disk errors
Error logging
DAEMON
System error blocks
SPEAR
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Beware file entry required
New development MCO
[BEWARE text]
The format of the DSK KDB has changed again, with the addition of the KONELG
dispatch entry for error logging. Any local disk device drivers will need to
be changed accordingly. See MDEELG in FILIO for an example of how to do this.
DAEMON version 23A(1026) or later must be installed before this MCO.
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A FILIO
705 434 COMMON
DPXKON
FHXKON
FSXKON
RAXKON
RHXKON
RNXKON
RPXKON
DSXKON
COMMOD
DEVPRM
S
ERRCON
DTASER
COMDEV
[End of MCO 14201]
MCO: 14202 Name: RCB Date: 25-Apr-89:07:16:53
[Symptom]
Jobs get stuck in event wait for system IPCF, and need manual intervention to
be restarted. If they were logging out at the time, the job slot is stuck and
useless.
[Diagnosis]
[SYSTEM]GOPHER is completely ignorant of the possibility that a system program
like the account daemon might die and get logged out, thus causing its IPCF
receive queue to be "returned to sender, address unknown". It just throws the
returned messages on the floor, and leaves the user's job waiting for an
acknowledgement message which will never come.
[Cure]
Educate the rodent. Check the returned message field, and validate it
against the expected sequence number. If it matches, give the user an error
return from SENDSP, so that a QUEUE. UUO (for example) will give the "component
not running" error, and FILDAE messages will be handled as though FILDAE had
never been running.
[Keywords]
EW hang
System IPCF wait
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704A IPCSER
705 434
[End of MCO 14202]
MCO: 14203 Name: RCB Date: 25-Apr-89:08:11:52
[Symptom]
System error blocks can eat up all of free core if DAEMON isn't running.
[Diagnosis]
Once they get queued, they are only deleted when some privileged program
executes a SEBLK. UUO.
[Cure]
Add a timer. Once a minute, we will look for any blocks which are older than
SEBAGE minutes and delete them. SEBAGE defaults to 10 (decimal), and can be
changed with MONGEN. If SEBAGE is set to zero, the error blocks will live
forever.
[Keywords]
System error blocks
free core limits
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 434 ERRCON
704A COMMON
CLOCK1
[End of MCO 14203]
MCO: 14204 Name: DPM Date: 27-Apr-89:06:36:33
[Symptom]
In some configurations, LINK will report NDBNNM as undefined even
though ANF-10 network software is loaded.
[Diagnosis]
This problem is one of programming style and MACRO's tolerance
for conflicting symbol definitions. NDBNNM is defined in NETPRM,
which is searched by NETSER. The first several references to this
symbol are properly made. However, at NDBAS1 the symbol is referenced
as external. MACRO should probably flag this as a "E" error.
Instead, the original value of the symbol is lost and MACRO generates
global fixup requests for all references to NDBNNM. It's not clear
why this problem has surfaced now, as the code at NETAS1 has not
changed for several monitor releases, but correcting the reference in
NETAS1 makes resolves the undefined global.
[Cure]
Reference NDMNNM as an internal quantity.
[Keywords]
UNDEFINED GLOBAL
[Related MCOs]
13932, 13137
[Related SPRs]
36260
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 435 NETSER NETAS1
704A
[End of MCO 14204]
MCO: 14205 Name: RCB Date: 27-Apr-89:18:49:36
[Symptom]
MCO 14165 didn't go far enough. PAGE. UUO function .PAGAC still isn't always
right. Spy pages for sections 3-36 are sometimes reported as being unreadable.
[Diagnosis]
PAGA93, which finds a page number to return for a spy page in sections 3-36,
doesn't preserve T2. Its caller wants T2 to contain the map entry after the
call, as well as before.
[Cure]
Preserve the map entry in T2.
[Keywords]
PAGE. UUO
Page accessability
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 435 VMSER PAGA93
704A
[End of MCO 14205]
MCO: 14206 Name: DPM Date: 28-Apr-89:05:51:47
[Symptom]
Stopcode IME removing a structure (revisited).
[Diagnosis]
Previous MCOs didn't plug all the holes, although the window was made
much smaller.
[Cure]
Prevent races by incrementing UNIGEN while holding the DA. Conceptually,
this is easy, but BMPGEN is called with F pointing to a STR, not a DDB.
Therefore, change UPDA & DWNDA to get the job number from .USJOB rather
than from PJOBN. This is OK since the use of DA requires a job to be
mapped to reference PJOBN anyway.
[Keywords]
REMOVE STRUCTURE
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 435 FILFND BMPGN1
704A FILIO UPDA,DWNDA
[End of MCO 14206]
MCO: 14207 Name: LWS Date: 29-Apr-89:14:48:25
[Symptom]
Can't create a SSL larger than .SLMXJ (maximum JSL size)
structures using STRUUO.
[Diagnosis]
Code in SLSTRR and SLCHK always use .SLMXJ as a maximum
without checking to see if its a JSL or the SSL.
[Cure]
Check search list type (RH(F)=0 means SSL) and use appropriate
maximum value.
[Keywords]
SSL
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 435 FILFND SLSTRR,SLCHK
704A
[End of MCO 14207]
MCO: 14209 Name: DPM Date: 8-May-89:08:57:19
[Symptom]
Pathological names whose first component is NUL do not necessarily
behave as the NUL device. A DEVCHR, or DEVTYP of the sixbit name
returns disk-only bits. The same is true if you do one of these
UUOs on an open channel. However, if a LOOKUP or ENTER is done,
then the right thing comes back. Also, DEVNAM never returns NUL
and WATCH FILES doesn't expand the filespec correctly.
[Diagnosis]
The monitor believes a pathological name can only be a disk device
and everybody knows that NUL is really a disk even though it claims
to be all devices. But FILSER doesn't make that claim often enough.
[Cure]
Fix SETDDB to test for pathologcal NUL as well as assigned NUL. Change
NULTST to test for DVDSK and DVTTY instead of sixbit NUL. Fix PRTDDB
to print NUL instead of a logical device name. Add crock routine
LNMNUL to do the grunt work when it's really necessary to know if
it's the NUL device.
[Keywords]
NUL
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 436 COMCON PRTDDB
FILUUO NULTST,LNMNUL,SETDDB
UUOCON DVCHR,UDVNAM
704A
[End of MCO 14209]
MCO: 14211 Name: DPM Date: 15-May-89:09:29:52
[Symptom]
It's difficult to measure magtape performance on a per-kontroller
basis without using any counters.
[Diagnosis]
Never done before I guess.
[Cure]
Add two new counters to the KDB: TKBCRD counts characters read and
TKBCWR counts characters written.
[Keywords]
MAGTAPE PERFORMANCE
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 437 DEVPRM TKBCRD,TKBCWR
T78KON
TCXKON
TD2KON
TM2KON
TMXKON
TS1KON
TX1KON
704A
[End of MCO 14211]
MCO: 14212 Name: LWS Date: 15-May-89:10:21:02
[Symptom]
Undeserved ?Illegal memory reference in jobs with a shared
hiseg.
[Diagnosis]
If a sharable hiseg is expanding and there are enough
secondary map slots available to map the expansion, RDOMP is not
set for any other job using the same hiseg.
[Cure]
In GTHMAP, if there are enough map slots for the expansion, call
HRDOMP via HGHAPP so all other users of the same hiseg will have their
maps redone before they run again.
[Keywords]
Sharable high segments
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
PCO required
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 437 SEGCON GTHMP1
704A
[End of MCO 14212]
MCO: 14214 Name: JAD Date: 25-May-89:07:51:12
[Symptom]
Possible SCAFOO stopcodes in a maximally-configured CI network.
[Diagnosis]
Insufficient path blocks available for the number of CI nodes and
CPUs in the CI/system configuration. There is space available for
32 path blocks, but a maximally-configured system could require
much more. Problem occurs with definition of C%PBLL (number of
path blocks) - it is defined as 2*C%SBLL (number of system blocks).
Depending on the number of CI nodes and CPUs, this definition may
leave insufficient path blocks.
[Cure]
Redefine C%PBLL as 6*C%SBLL - this will allow for the largest
possible CI and CPU configuration.
[Keywords]
CI
SCAFOO
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 SCAPRM C%PBLL
[End of MCO 14214]
MCO: 14217 Name: DPM/RJF Date: 30-May-89:06:58:55
[Symptom]
Various problems suspending and resuming a system:
1. KLIPAs and KLNIs don't get reloaded.
2. KLNIs will be restarted even if they had first been removed.
3. Stopcode NULFNC during the suspend.
[Diagnosis]
1. Code to call PPDINX and KNIINI is under an IFG <M.CPU-1> conditional
in COMMON, so it is not included in single CPU configurations.
2. When a KLNI is removed, the bit corresponding to the proper KLNI on
a given CPU is set to indicate that the device is to be ignored on
subsequent initialization calls. However, IPAMSK is never checked
on KLNI restarts.
3. For reasons that escape me, the NULFEK is being called on system
sleep/resume when it hadn't before. Apparently this never worked
before, but it went unnoticed. The dispatch table does not contain
the appropriate entries for these NETSER functions.
[Cure]
1. Move the calls to PPDINX and KNIINI outside the IFG <M.CPU-1> conditional.
2. Teach KNIINI to respect IPAMSK on KLNI restarts.
3. Add system sleep/resume entry point to NULFEK's dispatch table.
[Keywords]
SYSTEM SLEEP
[Related MCOs]
13932, 13137
[Related SPRs]
36269
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 441 COMMON SPRIN5
KNISER KNIINI
NULFEK NLFDSP
704A
[End of MCO 14217]
MCO: 14218 Name: LWS Date: 8-Jun-89:09:08:11
[Symptom]
Undeserved memory parity errors on KLs with 4MW of memory.
[Diagnosis]
RH20s do undetermined things when accessing the last physical
(quad)word in 4MW. This is an RH20 problem. This problem was never
encountered in previous versions of the monitor and BOOT. The monitor
used to put its hiseg at the very top of memory. Then BOOT occupied the
top of memory. Now, BOOT is still there, but it now frees the pages
at the top because they contain tape drivers that are not needed once
BOOT is done. So, these pages at the top of memory are free to use
by the monitor. When a user gets the last page of memory, it's fair
game for I/O by an RH20.
[Cure]
For lack of something better to do at the moment, if the last
page of a 4MW system is free, mark it as non-existant in NXMTAB and PAGTAB
and set MEMSIZ to 17,,777000 instead of 20,,000000.
[Keywords]
4 MW
parity
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
Field service attention
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 442 SYSINI MMTIN9
704
[End of MCO 14218]
MCO: 14219 Name: DPM Date: 27-Jun-89:06:37:34
[Symptom]
There appears to be no upper bounds on the number of extended RIBs
FILSER is content to create. You can literally fill a disk with
extended RIBs for a single file. When you CLOSE the file, you might
as well take the rest of the day off, because FILSER has lots of
bookkeeping to perform.
[Diagnosis]
RIBXRA contains an 8-bit field for the extended RIB number. FILSER
never checks for field wrap around. The RIB number is only read back
when a user specifies a negative USETI, and otherwise serves no real
purpose.
[Cure]
Check for wrap around and impose an additional limit based on the
contents of MUSTMX when RIBs are created. Set the maximum number
of USETIs to 255 decimal.
[Keywords]
EXTENDED RIBS
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 443 COMMOD DESRBC,MUSTMX
FILIO EXTRB2
704A
[End of MCO 14219]
MCO: 14220 Name: RCB Date: 30-Jun-89:00:15:53
[Symptom]
An OPEN which specifies a logical name or a pathological name can fail or find
the wrong device.
[Diagnosis]
The DDB search logic does not allow certain names to be found unless they are
assigned to disks (i.e., funny-space DDBs). CK2CHR gets called when it should
not. For that matter, LP will match a terminal assigned as LPT but not as LP.
[Cure]
For 2-character device names which CK2CHR changes, do the DDB searching twice.
First, try the original name. If that fails or returns DSKDDB, then try again
with the expanded name. If the second search fails but the first returned
DSKDDB, then return the results from the first DDB search. Eliminate the hacks
for CK2CHR and SY: from the search loop.
[Keywords]
PDP-11 names
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 444 UUOCON DDBSCC
704A
[End of MCO 14220]
MCO: 14221 Name: RCB Date: 1-Jul-89:01:56:17
[Symptom]
KAF at PI level of the NIA20.
[Diagnosis]
Taking too long to empty the response queue (MCO 14186 revisited).
[Cure]
Check .CPTMF to try to be sure that too much time won't pass during a single
KLNI interrupt. Also, move the check to after the callback so that we don't
drop the buffers on the floor. Otherwise, after long enough, the protocols
will run out of buffers (especially DECnet).
Because .CPTMF is slightly bogus just as the system is coming up, ignore it
until .CPUPT is at least 2 (ticks). Note that the counters and limits added
by MCO 14186 are still present and in force.
[Keywords]
KAF
NIA20
KNIKSP
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Field service attention
HOSS attention
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 444 KNISER KNIRQ1
704A
[End of MCO 14221]
MCO: 14222 Name: RCB Date: 7-Jul-89:23:34:54
[Symptom]
System is annoyingly sluggish at system startup time.
[Diagnosis]
Trying to run dozens of copies of INITIA on random terminals all at the same
time, in dozens of job slots.
[Cure]
Only sort of. Invent a new MONGEN-definable symbol, DSDRIC (dataset devices
run INITIA CUSP), to control whether INITIA runs on dataset lines. It will
default to one, which means that INITIA will continue to run on datasets at
system startup. If set to zero at MONGEN time, TTYINI will not force INITIA
commands on the datasets. For the curious, the reason INITIA runs on datasets
at startup time is because of the existence of hardware interfaces which need
to have parameters set even before a call comes in to the modem. However, most
sites probably have more well-behaved interfaces, and will be able to set
DSDRIC to zero.
[Keywords]
sluggish startup
INITIA
datasets
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 444 COMDEV DSCTAB
704A SCNSER TTINI2
[End of MCO 14222]
MCO: 14223 Name: DPM Date: 18-Jul-89:05:32:00
[Symptom]
DA28s don't work.
[Diagnosis]
XTCLNK assigns junk names to UDBs. Later calls to build DDBs fails
because the target UDBs cannot be found. Also, XTCSER will not
assemble with FTMP turned off because of references to SCNLOK and
OUCHE.
[Cure]
Correct logic that builds UDB names. Put IFN FTMP conditionals
around the reference to SCNLOK. Make OUCHE available in all KL10
configurations.
[Keywords]
DA28
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 445 APRSER OUCHE
COMMON OUCHTB
XTCSER XTCLN2,CHKTYP,MPIOWD
704A
[End of MCO 14223]
MCO: 14224 Name: DPM Date: 20-Jul-89:11:58:59
[Symptom]
Random job tables (mostly JBTSTS) get clobbered, wierd crashes,
general mayhem.
[Diagnosis]
Steve Perkins is running .EXE files created on the -20 again.
If the .EXE directory claims to have sharable pages that aren't
also marked as high segment pages, GETEXE returns flags indicating
the image is sharable, but with no high segment. Parts of GET
clean up assume that if the sharable bit is on, then there must
be a high segment. This is true for .EXE files creates on a -10,
but not otherwise. Anyway, making this assumption, SEGCON blindly
picks up high seg block addresses (which are usually zero) and
indexing off of zero, proceeds to write all over the monitor's
low segment.
[Cure]
While processing .EXE directory entries, turn off the sharable
bit if the high segment bit is not turned on.
[Keywords]
TOPS-20 EXE FILES
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 446 SEGCON WANTIT
704A
[End of MCO 14224]
MCO: 14225 Name: DPM Date: 25-Jul-89:05:28:01
[Symptom]
SA10s don't function in an environment where DF10C-based device drivers
exist (TM2KON for one).
[Diagnosis]
DF10C drivers fail to test for the presence of SA10 devices. Therefore,
SA10s look like 18-bit DF10s.
[Cure]
Test SI.SAX in the CONI word in the appropriate xxxCFG routines.
[Keywords]
SA10
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 446 FSXKON FSXCFG
RPXKON RPXCFG
TM2KON TM2CFG
704A
[End of MCO 14225]
MCO: 14226 Name: DPM Date: 3-Aug-89:09:18:56
[Symptom]
Several annoying problem that prevent SA10-based tape from working well.
[Diagnosis]
1. SAXSER & TS1KON bum a bit in the KDBUNI word to indicate a
software interrupt was requested. This means that KDBs can't be
compared against each other, so AUTCON will build multiple KDBs
for a single SA10 kontroller.
2. Tapes ported between a DX10 or a DX20 and an SA10 will have duplicate
UDBs and DDBs built. This is because TD2KON and TX1KON do not know
how to extract drive serial numbers. Subsequent comparisons between
a drive S/N and an existing one don't match, so AUTCON beleived it's
looking at two different drives.
3. The code to compare drive serial number is not interlocked in AUTCON.
Under the righ circumstances, two configuring CPUs which have detected
the same drive, might not notice the other.
[Cure]
1. Move the software bit into KDBSTS. It's a better place for such
things.
2. Fix TX1KON and TD2KON.
3. SYSPIF/SYSPIN around much of AUTDPU.
[Keywords]
SA10
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 447 AUTCON AUTDPU
DEVPRM KD.SIR
SAXPRM SA.SIR
SAXSER SAXINT
TD2KON TD2DRV
TS1KON TS1DRV
TX1KON TX1DRV
704A
[End of MCO 14226]
MCO: 14227 Name: DPM Date: 3-Aug-89:09:20:23
[Symptom]
Possible tape hangs after a CPU restart.
[Diagnosis]
SPRINI doesn't clear the TAPSER interlock nesting flag.
[Cure]
Do so.
[Keywords]
INTERLOCKS
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 447 COMMON SPRI10
704A
[End of MCO 14227]
MCO: 14228 Name: RCB Date: 3-Aug-89:15:33:21
[Symptom]
Problems setting explicit speeds on TTY lines in the ANF front ends.
[Diagnosis]
Trying to do autobaud even though the speed has been set to something
other than the autobaud speed.
[Cure]
Don't do that. If the speed is set in the config.P11 file, and that speed
is not the autobaud speed (currently 2400 baud), override the ABD
characteristic for the line.
[Keywords]
Autobaud
Non-autobaud
TnXS
ANF10
[Related MCOs]
13932, 13137
[Related SPRs]
36270, 36268
[MCO status]
None
[MCO attributes]
Field service attention
HOSS attention
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 450 CONFIG P11
704A DNTTY P11
DNLBLK P11
MACROS P11
[End of MCO 14228]
MCO: 14229 Name: RCB Date: 8-Aug-89:22:22:51
[Symptom]
Monitor too big and slow. Not enough free bits in JBTSTS.
[Diagnosis]
Lots of places in the monitor test bit JDC from JBTSTS. A few
others clear it. Only DAECOM can set it. It is unreachable code, left over
from the old DCORE and DUMP commands and the days when DAEMON handled
virtual references for EXAMINE, DEPOSIT, and VERSION commands. The JDC bit is
consequently never set, and all the tests for it are redundant.
[Cure]
Free up the bit in JBTSTS, and eliminate all references to it.
[Keywords]
PERFORMANCE
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 450 COMCON
704A CLOCK1
SCHED1
SCNSER
S
[End of MCO 14229]
MCO: 14230 Name: DPM Date: 15-Aug-89:06:36:14
[Symptom]
More error logging stuff ...
[Diagnosis]
Yes.
[Cure]
Convert more old-style DAEMON error logging calls to use the
System Error Blocks. This edit converts:
1. CPU attached/detached records.
2. Node online/offline records.
3. Date/time change records.
Code is also inplace to handle system reload (.ERWHY) records, but
because of interface problems with DAEMON and AVAIL.SYS, this call
will be temporarily neutered.
[Keywords]
DAEMON ERROR LOGGING
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 450 COMCON SETDAT
CPNSER CPUCSC
NETSER NODEAM
S .ERMVR
SYSINI SYSRLD,SYSAVL
704A
[End of MCO 14230]
MCO: 14231 Name: RCB/DPM Date: 15-Aug-89:07:43:18
[Symptom]
Convert DAEMON reporting of KL error chunks from RSX20F to use system
error blocks. This eliminates two words in the CDB, .CPETM and .CPEAD.
In order to accomplish this cleanly, there is now a new routine in IPCSER,
OPRMSG, which allows one to queue up messages for ORION. If ORION is not
running, the messages can optionally be sent to OPR: or the CTY. See IPCSER
for the calling sequence. The behavior is controlled by bits in T1 on the
call, of the form OPM.??, which are defined in S.
[Diagnosis]
[Cure]
[Keywords]
KL error chunks
system error blocks
system messages
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
Deferred
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 450 DTEPRM
704A DTESER
S
IPCSER
ERRCON
COMCON
CLOCK1
COMMON
[End of MCO 14231]
MCO: 14233 Name: RCB Date: 22-Aug-89:09:55:53
[Symptom]
Undeserved KNIKSP stopcodes.
[Diagnosis]
.CPTMF limit is exceeded at system startup time.
[Cure]
If .CPUPT is lower, then don't KNIKSP.
[Keywords]
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 451 KNISER KNIRQ1
704A
[End of MCO 14233]
MCO: 14234 Name: DPM Date: 23-Aug-89:07:34:30
[Symptom]
Programs using external tasks (XTCSER) hang following attempts to JAM
powered off remote computers.
[Diagnosis]
If FTMP is turned off, the call to CHKTYP from DWNUNI says to never do
typeout. DWNUNI simply returns without clearing any DA28 errors which
caused the unit to be declared down. Thus, the DA28 becomes unusable
for all other users. A similar situation exists where connect errors
are processed. In this case, we forget to force the unit offline.
[Cure]
Three things. First, fix CHKTYP to work correctly with FTMP turned off.
Second, if no typeout is to be done, skip around the message generation
code and clear the DA28. Finally, on connect errors, always force the
unit offline whether or not we'll type a message.
[Keywords]
DA28 ERRORS
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 451 XTCSER CHKTYP,DWNUNI,CHKCER
704A
[End of MCO 14234]
MCO: 14235 Name: DPM Date: 29-Aug-89:07:32:01
[Symptom]
More error logging stuff.
[Diagnosis]
Yes.
[Cure]
Teach the monitor to write the following records as system
error blocks:
.ERCSC Configuration status change (memory on/off line)
.ERKSN KS10 NXM trap
.ERKPT KL10/KS10 parity trap
.ERCSB CPU status block
.ERDSB Device status block
[Keywords]
DAEMON ERROR LOGGING
[Related MCOs]
13932, 13137
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 452 APRSER PRHMF7,DAELOG,MEMELG
COMCON MEMONU,MEMON8
COMMON OLDNXM,DIACSB,DIADSB
LOKCON MEMOFU,MEMOF2
704A
[End of MCO 14235]
MCO: 14236 Name: KBY Date: 29-Aug-89:08:27:46
[Symptom]
FA resource scheduling leaves something to be desired. The schedular
knows how to wake up just the job that needs it, but everyone wakes up now
any time it's given up.
[Diagnosis]
No code.
[Cure]
Add code (the remaining routines necessary to do the unwind properly).
[Keywords]
FA
UNWIND
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Deferred
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 452 FILIO UPFA,DWNFA
704A COMMOD
S
CLOCK1 SRFREE
[End of MCO 14236]
MCO: 14237 Name: KBY Date: 29-Aug-89:08:34:37
[Symptom]
Job stuck; SYSTAT shows it's locked (even though it really isn't).
[Diagnosis]
Due to the extra calls to SCDCHK to prevent KAFs in large PAGE. UUOs,
we can potentially block in a PAGE. UUO. If pages were allocated to the job
by CHGPGS (because they were available at the time), but during the block
we decide to swap out the job, we could potentially lose those pages to
never-never land since they are not in anyone's map. To prevent this,
CHGPGS lights NSHF (but not NSWP) akin to MAPBAK so that the swapper won't
touch the job. Unfortunately, if the job has a sharable high segment, someone
else using it might call XPANDH (which can happen even without really wanting
to expand the high seg as we tend to do this at the drop of a hat) and set
JXPN for the job blocked at CHGPGS. At this point the schedular will not
run the job because of JXPN and the swapper won't clear JXPN (even without
swapping the job which may not be necessary) because of NSHF which won't
get cleared until the job finishes running through CHGPGS (deadly embrace).
[Cure]
The schedular will except jobs owning disk resources from the JXPN check.
Do so also with jobs having NSHF on but not NSWP (a state only the monitor
can cause in limited situations such as the above).
[Keywords]
JXPN
NSHF
[Related MCOs]
13932, 13137
[Related SPRs]
36245
[MCO status]
Deferred
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 452 SCHED1 CJFRCX
704A
[End of MCO 14237]
MCO: 14238 Name: JC Date: 1-Sep-89:13:34:18
[Symptom]
TOPS-10 is missing the TRANSLate command.
[Diagnosis]
No one ever put it in.
[Cure]
Add one.
[Keywords]
TRANSL
LOGIN
commands
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 453 COMMON COMTAB
[End of MCO 14238]
MCO: 14240 Name: DPM Date: 5-Sep-89:05:41:06
[Symptom]
More error logging stuff.
[Diagnosis]
Yes.
[Cure]
1. Add support for .ERSNX (NXM sweep).
2. Add support for .ERSPR (parity sweep).
3. Turn on .ERWHY/.ERMRV logging.
[Keywords]
ERROR LOGGING
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Beware file entry required
[BEWARE text]
DAEMON version 23A(1027) or later is required. Earlier versions
will cause .ERMRV records to be written into ERROR.SYS instead of
AVAIL.SYS. When this happens, SPEAR will report an unknown record
type in ERROR.SYS.
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 453 ERRCON PARSWP,PARELG,NXMSWP,NXMELG,XFRSE2
S EX.NER
SYSINI LLMSTR,AVLTBL
704A
[End of MCO 14240]
MCO: 14241 Name: DPM Date: 6-Sep-89:07:23:57
[Symptom]
Stopcode OVA on a KS10 during SYSINI.
[Diagnosis]
EVA pages overflow BOOT address space because the high segment
grew a bit.
[Cure]
Slide the high segment origin down 2 pages.
[Keywords]
HIGH SEGMENT
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 453 COMMON MONORG
704A
[End of MCO 14241]
MCO: 14242 Name: KDO Date: 11-Sep-89:14:05:55
[Symptom]
Unusable TTY DBBs.
[Diagnosis]
LATSER creates a TTY DDB for host-initiated connects, but INITIA uses a
different one, causing LATSER's to float free.
[Cure]
If it hurts everytime I do this, don't do it anymore.
[Keywords]
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 454 LATSER GETTDB
704A
[End of MCO 14242]
MCO: 14243 Name: ERS Date: 12-Sep-89:08:18:55
[Symptom]
Various. Lots of monitor too big and slow. Several places that a user mode
section number are lost. And possible working-set confusion if a multi-section
program had a PFH. (Probably wouldn't work anyway.)
[Diagnosis]
GETPC/PUTPC
[Cure]
Remove uses of GETPC/PUTPC. In some cases we simply put the same code in
minus a couple JRSTs. In other places it gets a little more complicated. The
DDT command should now include the section number in the one-word old PC in
JOBDAT. Assume that an extended user is not in his PFH. (A bit of work
would be involved in making an extended PFH work.) Rewrite DOINT. Net result
is that we'll store the section number in the old PC portion of the interrupt
block.
[Keywords]
GETPC
User-mode
Extended-addressing
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
Beware file entry required
New development MCO
Documentation change
[BEWARE text]
Some one word PCs will now contain the section number where they did
not it the past. In paticular commands like DDT should preserve the section
number in .JBOPC. Also, the old PC in the interrupt block should now contain
the section number.
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 454 COMCON USAVE,SEGRLX
ERRCON DOINT
VMSER USRFL6,USRFL7,GETDDT,UPAGE4,PAGA1C
[End of MCO 14243]
MCO: 14244 Name: DPM Date: 18-Sep-89:06:31:17
[Symptom]
If a logical name points to NUL, the FILOP returned filespec
will not store the correct device name following a LOOKUP or
ENTER.
[Diagnosis]
Oversight. The retured device name is the logical name.
[Cure]
Call LNMNUL and return NUL if appropriate.
[Keywords]
NUL
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 455 UUOCON FOPFI0
704A
[End of MCO 14244]
MCO: 14245 Name: DPM Date: 19-Sep-89:06:27:02
[Symptom]
On a very slow system, IPCF sends to jobs which logged in
via FRCLIN can get receiver quota exhausted errors.
[Diagnosis]
The receiver hasn't had the chance to pump up its IPCF quotas.
This is most easily seen on a heavily loaded KS10.
[Cure]
Have LOGREF set the quotas to 511.
[Keywords]
IPCF QUOTAS
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 455 COMCON LOGRF2
704A
[End of MCO 14245]
MCO: 14246 Name: ERS Date: 19-Sep-89:07:57:56
[Symptom]
Monitor too big and slow.
[Diagnosis]
Old code for GET.EXE.
[Cure]
Remove it.
[Keywords]
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 455 COMCON SGSET,UGTSEG
[End of MCO 14246]
MCO: 14247 Name: ERS Date: 19-Sep-89:08:07:26
[Symptom]
GETPC/PUTPC, the second half.
[Diagnosis]
yes.
[Cure]
Yes.
[Keywords]
GETPC
PUTPC
GETPCS
byebye
[Related MCOs]
14243
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 455 ERRCON DOINT
S GETPC,GETPCS,PUTPC
CLOCK1 NOTACL,INCTM4,CIP9,STOP1H,SETPIT,SETPIU,USTART
[End of MCO 14247]
MCO: 14248 Name: RCB Date: 26-Sep-89:05:52:43
[Symptom]
MCO 14231 revisited:
Convert DAEMON reporting of KL error chunks from RSX20F to use system
error blocks. This eliminates two words in the CDB, .CPETM and .CPEAD.
[Diagnosis]
yes.
[Cure]
yes.
This also makes DTE. UUO function 20 (.DTERT) obsolete.
[Keywords]
KL error chunks
system error blocks
system messages
[Related MCOs]
14231
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 456 DTEPRM
704A DTESER
IPCSER
COMMON
[End of MCO 14248]
MCO: 14249 Name: KDO Date: 26-Sep-89:07:27:50
[Symptom]
LAT is slow to start.
[Diagnosis]
If the multicast message is sent before the Ethernet service routines have set
the channel address, LAT servers will use the wrong Ethernet address when trying
to connect to TOPS-10.
[Cure]
Delay the multicast message until after ETHSER does the Set-Channel-Address
(NU.SCA) callback.
[Keywords]
[Related MCOs]
None
[Related SPRs]
36229
[MCO status]
None
[MCO attributes]
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 456 LATSER CBRDSP,LATLSC,LATSCA
704A
[End of MCO 14249]
MCO: 14250 Name: DPM Date: 3-Oct-89:07:45:50
[Symptom]
No way to cause IPA dumps to be written cleanly.
[Diagnosis]
DAEMON currently does this by using system error blocks; a method
which is at best an ugly crock.
[Cure]
Invent a way to allow the monitor to run things at UUO level. This
amounts to adding a forced .EXEC command which when performed on
FRCLIN will create a job slot and run a specified routine at UUO
level. At completion, the control transfers to JOBKL and the job
will be destroyed. This will be used to write IPA dump files. This
MCO however, only implements the necessary code to create the job.
The actual dump stuff will happen in a later MCO.
[Keywords]
DAEMON ERROR LOGGING
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 457 COMCON LOGREF
COMMON COMTAB
CLOCK1 CIP2
SCNSER TTFCOM
704A
[End of MCO 14250]
MCO: 14253 Name: DPM Date: 17-Oct-89:07:24:25
[Symptom]
The monitor has an annoying habit of dumping even if the system has
been up for less than 5 minutes. This is contrary to previous behavior.
[Diagnosis]
While it may be a desirable thing to do under some circumstances, it
isn't desirable in all cases.
[Cure]
Make it optional. In cases where the system crashes during the first 5
minutes of uptime, dump only if the symbol ATODMP is non-zero. By default,
it will be set to 1. Sites which find this behavior disgusting can set it
to 0.
[Keywords]
DUMP
[Related MCOs]
13809
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 461 COMMON ATODMP
MONBTS RLDMON
704A
[End of MCO 14253]
MCO: 14254 Name: DPM Date: 26-Oct-89:07:43:47
[Symptom]
Occasionally, structures cannot be mounted after ATTACHing disk
drives or after a newly formatted pack has been defined.
[Diagnosis]
The routine DSKDRV is responsible for setting up a UDB following
an ATTACH. If errors occurred reading device registers, the unit
status is set appropriately to reflect the error condition. However,
if no errors occurred, DSKDRV assumes a pack must be mounted and
changes the status to 'pack is mounted'. Later, when the STRUUO is
done to define a structure, it will fail because the UDB claims a pack
is already mounted. In the case of a newly formatted pack, ONCMOD
neglects to set the unit state to 'no pack mounted' when the HOM
blocks cannot be read.
[Cure]
Following an ATTACH, do not change the unit status unless there
were errors. When HOM blocks cannot be read, set the unit status to
'no pack mounted'.
[Keywords]
ATTACH DISK
DEFINE STRUCTURE
[Related MCOs]
None
[Related SPRs]
36276
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 462 FILIO DSKDR8
ONCMOD TRYHOM
704A
[End of MCO 14254]
MCO: 14255 Name: RCB Date: 26-Oct-89:11:38:09
[Symptom]
Batch streams can hang forever when they use MIC.
[Diagnosis]
Scenario:
MIC file enables the RESPONSE feature in order to trap error messages into a
MIC variable (parameter). Some program it invokes types out an error message.
MIC wants to get the entire error message into the response buffer, and not
just a part of it, so it waits for the job to go to monitor level or to block
in TIOWQ (TTY I/O wait) before it reads the text. To make sure the text is
available for MIC to read, SCNSER refuses to allow output to happen until MIC's
conditions are satisfied *and* MIC has read the response buffer. Thus, when
the program that was invoked types out a reasonably short error message (so
that it doesn't block in TO) and then loops in NAPQ waiting for the chunks to
empty out before it decides how to type its next prompt, there is a deadlock.
The program never satisfies the MIC conditions for getting the response buffer
read, and thus output never happens, and thus the program is waiting for MIC
waiting for the program waiting for MIC ....
[Cure]
Since the MIC RESPONSE buffer is only 21 octal words in length, and is ASCIZ,
MIC will only ever see a maximum of 84 (decimal) characters of response text.
In other words, it only expects to see one line. So, add a bit in the LDB,
L1LEEL (end of error line, B6 in LDBBYT). This bit is twiddled during the same
routine that notifies us of an error character. The code in XMTMIC which
checks for whether to tell MIC that the response buffer is available will
consider having L1LEEL set to be as good as being in TIOWQ. I.e., if we have
gone back to the left margin since seeing the error character, we will tell MIC
to do its thing.
[Keywords]
MIC under BATCH
hung PTY
[Related MCOs]
None
[Related SPRs]
36279
[MCO status]
Checked
[MCO attributes]
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 462 SCNSER LDBBYT,MICLG3,MICPS4
704A
[End of MCO 14255]
MCO: 14258 Name: DPM Date: 14-Nov-89:06:09:54
[Symptom]
IPA dump file writing facility appears as a wart on the error logging
code.
[Diagnosis]
Way back when, the only way to get UUO-level work done was to get
DAEMON to do some work for you. IPA dump files were processed through
the error logging code by prodding DAEMON with a SPEAR record that was
suppressed from ERROR.SYS.
[Cure]
Now that there's a way to make UUO-level things happen, teach the monitor
to write the dump files and eliminate the need for DEAMON interaction.
[Keywords]
IPA DUMP
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 464 AUTCON AUTDMP
CLOCK1 EXPJO1
COMDEV IPADMP
FILUUO UNQIFL,UNQINI
704A
[End of MCO 14258]
MCO: 14259 Name: DPM Date: 14-Nov-89:06:50:39
[Symptom]
Inaccessible code left over from efforts to clean up error
logging code.
[Diagnosis]
Was just waiting 'til it was all over.
[Cure]
Remove DAEDIE, DAEDSJ, DAEEIM, DAEERR, DAERPT, and DAESJE. Also
remove the interlock word, DAELOK. Shrinks CLOCK1 by 3 blocks.
[Keywords]
ERROR LOGGING
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 464 CLOCK1 DAEDIE,DAEDSJ,DAEEIM,DAEERR,DAERPT,DAESJE,DAELOK
704A
[End of MCO 14259]
MCO: 14261 Name: DPM Date: 21-Nov-89:08:25:53
[Symptom]
On KS10s, defining non-standard device parameters doesn't work.
COMDEV gets assembly errors.
[Diagnosis]
The MDKS10 macro has a junk parameter filled in for the MASSBUS
unit number.
[Cure]
Don't put out sixbit jibberish where a number is expected.
[Keywords]
MDKS10
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 465 MONGEN MDT3A
704A
[End of MCO 14261]
MCO: 14262 Name: JEG/DPM Date: 28-Nov-89:04:51:45
[Symptom]
Possible DI hangs or KAFs out of PCLDSK.
[Diagnosis]
When doing queued protocol for disks, if the primary port is offline,
alot of things can go wrong requeing the I/O to another port.
1. References to UNIKON should be indexed by T1, not U.
2. References to UNIALT are OK for only for CI disks.
3. Extra JUMPN to test the results from CPUOK.
4. Merely checking KDBCAM for non-zero value doesn't guarantee
the other CPU(s) are running.
[Cure]
1. Index UNIKON by T1.
2. Test for a CI disk. If so, use UNIALT. Use UNI2ND for all others.
3. Remove JUMPN. We wouldn't have gotten to PCLOFL if the initial
call to CPUOK was successful.
4. Make a second call to CPUOK to test the new accessibility bits
from the alternate or detached port.
[Keywords]
DI HANG
KAF
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 466 FILIO PCLOFL
704A
[End of MCO 14262]
MCO: 14263 Name: DPM Date: 28-Nov-89:06:45:40
[Symptom]
With DAEMON no longer dependant upon the monitor version, the methods
of determining what is the proper DAEMON version and who is a legal
DAEMON no longer work. Also, there is still some lurking inaccessible
code.
[Diagnosis]
Time for a change.
[Cure]
A SETUUO will be provided so DAEMON can set its job number in the
monitor. It is function 53 (.STDAE). A corresponding GETTAB (%CNDJN,
212,,11) wil read the job number back. The SIXBIT/DAEMON/ name and
JACCT bit are no longer required. In fact, DAEMON has been removed
from PRVTAB.
Also, remove the ERRPT. UUO as the monitor no longer leaves data for
DAEMON to scavenge by this method. The UUO, as well as GETTAB table
entries %LDERT, %LDPT1, %LDPT2, %LDLTH, and %LDESZ are now obsolete.
Stopcode IBI gets deleted along with the code at STOP1 to try to restart
DAEMON after it halts. It can never be made to work.
DAEMON version 24(1030) or later is required from now on.
[Keywords]
DAEMON
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Beware file entry required
Documentation change
UUOSYM change
[BEWARE text]
DAEMON version 24(1030) or later is required with monitor load 466.
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 466 CLOCK1
COMCON
COMMON
COMMOD
UUOCON
UUOSYM
704A
[End of MCO 14263]
MCO: 14264 Name: DPM Date: 30-Nov-89:07:10:40
[Symptom]
Using the MONGEN option to set non-standard device parameters,
defining a printer to be upper case only has no effect. The monitor
treats the printer as lower case. Manually turining off DVLPTL in the
DEVCHR word of the DDB makes the problem disappear and the printer
behave like an upper case only printer.
[Diagnosis]
The routine AUTMDT scans MDTs for non-standard device parameters.
If the device is specified (to MONGEN) using a device code OR a
non-zero CPU number, then everything works as expected. However, if
the customer defaults the device code AND the CPU number (or supplies
CPU0), then a zero device specifier is inserted the MDT. A zero word
signals the end of the MDT. Therefore, AUTMDT will never scan the
entire table and never find the customer specified parameters. Also,
it is possible for AUTMDT to exit without returning the MDT data under
some circumstances.
[Cure]
In MONGEN, set a bit in the device specifier word of the MDT
entry which indicates the word is valid. Thus, CPU0 with a defaulted
device code of zero will no longer look like the table terminator.
Also insure that the MDT data is always returned properly.
[Keywords]
MDT
[Related MCOs]
None
[Related SPRs]
36282
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 467 AUTCON AUTMD3,MDTDV1
DEVPRM MD.VAL
MONGEN ASKRE5,MDT6,MDTTAB
704A
[End of MCO 14264]
MCO: 14265 Name: DPM Date: 4-Dec-89:08:22:27
[Symptom]
Rewinds and skip file operations time out prematurely on 3600 foot
magtapes.
[Diagnosis]
Hung timers are based on the amount of time needed to perform a
given function on a 2400 foot magtape. The values fall short for
3600 foot reels.
[Cure]
Increase all hung timer values by one half.
[Keywords]
HUNG TIMERS
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 467 T78KON HNGTBL
TCXKON HNGTBL
TD2KON HNGTBL
TM2KON HNGTBL
TMXKON HNGTBL
TS1KON HNGTBL
TX1KON HNGTBL
704A
[End of MCO 14265]
MCO: 14266 Name: DPM Date: 5-Dec-89:06:31:35
[Symptom]
No way for the old and new DAEMONs to tell which version ought
to be run.
[Diagnosis]
%CNDAE returns 704, but both the old and new DAEMONs run under
different flavors of 704.
[Cure]
Have %CNDAE return 705. The new DAEMON will require this, but
if it sees 704, it will run DAE704.
[Keywords]
DAEMON
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Beware file entry required
[BEWARE text]
DAEMON which has run with earlier versions of 7.04 should be
renamed to SYS:DAE704.EXE. DAEMON version 24 should be placed
on SYS as DAEMON.EXE. If there is a chance that an earlier 7.04
monitor may occasionally be run, the new DAEMON should also be
copied to SYS with the name DAE705.EXE. This will allow for the
proper synchronization of DAEMONs with the monitor regardless of
which version of 7.04 is run.
%CNDAE is a GETTAB which allows DAEMON to synchronize with
monitor versions. It is intended for use only by DAEMON. Other
programs such as ACTLIB, LOGIN, REACT, and WHO have incorrectly
used this GETTAB to return the monitor version where another,
more appropriate GETTAB, %CNDVN, should have been used. The
Digital programs have been changed to use %CNDVN. Sites should
make similar changes to any user-written programs which may have
used %CNDAE.
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 467 COMMON CNFDAE
704A
[End of MCO 14266]
MCO: 14267 Name: LWS Date: 11-Dec-89:17:28:27
[Symptom]
Problems assigning/init'ing etc devices in monitors with no
ANF support.
[Diagnosis]
AUTDDB does not make device names of the form DEVNNU when NN
is 00. In this case it makes a name of the form DEVU, eg. LPT0 instead
of LPT000. DVSTAS depends on the U of NNU being the last sixbit character
in DEVNAM (bits 30-35) when searching for a DDB. GALAXY spoolers generate
device names of the form DEV00U when ANF is not supported in the monitor.
Using DEV00U as a device name in various UUOs fails because DEV00U will
never match the device name in the DDB, which is DEVU.
[Cure]
Have AUTDDB always build device names of the form DEVNNU when
DR.NET is lit.
[Keywords]
AUTOCONFIGURE
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 470 AUTCON AUTDDB
704A
[End of MCO 14267]
MCO: 14268 Name: DPM Date: 12-Dec-89:07:02:45
[Symptom]
More error logging stuff.
1. The monitor doesn't write DECtape records.
2. The definition of record type 75 is wrong.
[Diagnosis]
1. SPEAR didn't use to understand DECtape records. Now it does.
2. Record type 75 claims it's only used for IPA20 dumps. Not so.
[Cure]
1. Remove references to M.DTAE (introduced during 7.04 development)
as normally turned on. This will cause the monitor to write DECtape
error records.
2. Redefine record 75 to be a generic device dump record with the name
.ESDVD (UUOSYM) and .ERDVD (S). SPEAR also understands this record
now. The monitor still doesn't write this record, but it will soon.
[Keywords]
ERROR LOGGING
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Documentation change
UUOSYM change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 470 COMDEV M.DTAE
DTASER M.DTAE
S .ERDVD
UUOSYM .ERDVD
704A
[End of MCO 14268]
MCO: 14269 Name: DPM/RCB Date: 19-Dec-89:07:32:29
[Symptom]
On multi-CPU systems, at system startup, one frequently
sees varying CPU uptimes and/or undeserved CPUn not running warnings
on the CTY.
[Diagnosis]
When the clocks are turned on, only the policy CPUs uptime
counter is more or less accurate. Non-policy CPUs are looping in their
AC loop waiting for the system to start. During this time, they take
no interrupts and therefore never update their uptime or OK word. When
the system starts timesharing, the uptime words are guaranteed to be
skewed and sometimes the OK words are positive, causing the warnings
on the CTY.
[Cure]
Prior to turning on the clocks, make all CPU's uptime words agree.
Also fix the OK words to be properly negative.
[Keywords]
UPTIME
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 471 SYSINI TIMINI,NODDT
704A
[End of MCO 14269]
MCO: 14270 Name: DPM Date: 26-Dec-89:07:49:18
[Symptom]
After the release of 7.04, SNOOPY fails with the error
"? Undefined breakpoint symbol TM0IN1".
[Diagnosis]
The CPU dependant code for interval timer interrupts was removed
as part of 7.04 development. Because of this change, a SNOOP. UUO
cannot be used to patch the interval timer code without incuring
excessive overhead in the job which is doing the snooping. (It must
weed out all calls except those from the target CPU.)
[Cure]
Add a new SETUUO (.STITP==54) to allow a job to patch the interval
timer. The job must have POKE privs, be [1,2], or running with
JACCT set, and contiguously locked in EVM. The call is:
MOVE AC,[.STITP,,addr]
SETUUO AC,
no privs, bad arguments
success
addr: CPU mask
instruction to XCT (relocated)
For this to work, two CDB locations have been added. .CPITP contains
the instruction to execute and .CPITJ contains the job number which
patched the interval timer code. When interrupts are processed, if
.CPITP is non-zero, it will be executed. A suitably privileged job
may set .CPITP if .CPITJ is zero or is already owned by the job
executing the SETUUO. .CPITP may be cleared by supplying a zero for
the instruction to execute. These words will be forcibly cleared when
a job exits prematurely (ESTOP), control-C's out (STOP1) or does a
RESET UUO.
For the curious, two new GETTABs have been added. %CVITP and %CVITJ
return .CPITP and .CPITJ respectively, although SNOOPY or any other
performance measuring program should have no need to rely on these
words.
[Keywords]
SNOOPY
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Documentation change
UUOSYM change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 472 APRSER TIMINT,SETITP,CLRITP
CLOCK1 ESTOP1,STOP1
COMCON SETTBL
COMMON .CPITP,.CPITJ
UUOCON RESET
UUOSYM %CVITP,%CVITJ,.STITP
704A
[End of MCO 14270]
MCO: 14271 Name: RCB Date: 30-Dec-89:06:36:16
[Symptom]
Device errors and uptime statistics are getting lost.
[Diagnosis]
DAEMON is unreliable about finding crash dumps and reporting errors and
AVAIL statistics from them.
[Cure]
Have the monitor do it. This adds module CRSINI to ERRCON.MAC.
This also adds two new STOPCDs (both in CRSINI):
CRSIAF, type INFO -- CRSINI allocation failure.
CRSINI could not allocate an exec process block in order
to run its UUO-level code.
OLDMON, type INFO -- OLD monitor found in crash file
CRSINI found that the crash file pointed to by BOOT was
for an older monitor than it can process.
[Keywords]
DAEMON
SPEAR
AVAIL
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 472 SYSINI SYSINH,SYSRLD,SYSAVL,BOOTFX
704A COMMON CNFDAE
S .ERCIN,.ERHSB,.EXHDR
CLOCK1 CIP2,SETDJB
ERRCON SEBTIM,XFRSEB,CRSINI
[End of MCO 14271]
MCO: 14272 Name: RCB Date: 9-Jan-90:07:08:37
[Symptom]
Confusion results when DAEMON restarts before the monitor crashed and never
seems to have been started after the reload.
[Diagnosis]
DAEMON writes an entry into ERROR.SYS before it reads the entries from the
monitor. Since the monitor has read the crash file and has the stopcode
information waiting for DAEMON to log, this results in the entries being
written in the wrong order. Their timestamps are correct, but it still looks
strange, leading to questions about just what's wrong.
[Cure]
Make the .STDAE SETUUO code which interlocks DAEMON startup handle queueing up
a DAEMON-restarted entry into the system error blocks. If DAEMON also writes
one, then the one which is out of order can safely be ignored. Eventually,
DAEMON will no longer write such entries and the DAEMON restarts which SPEAR
reports will always be synchronized properly with other reported events.
Since this entry is only reported, and not used by any of the COMPUTE
functions, it is safe to have extra or missing entries while advancing from one
autopatch tape to the next.
[Keywords]
DAEMON restart
SPEAR
ERROR.SYS
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 473 CLOCK1 SETDJB
704A S .ERDPE
[End of MCO 14272]
MCO: 14273 Name: RCB Date: 9-Jan-90:07:16:36
[Symptom]
Too hard for programs to call others via the CTX. UUO and tell whether the
called programs succeeded after the UUO returns. It is usually necessary to
modify the called program to pay attention to the CTX. data buffer interface in
order to obtain the desired behavior in the face of errors.
[Diagnosis]
JOBDAT location .JBERR is not being handled properly. Errors which occur (and
are counted) in the inferior context are lost.
[Cure]
Keep .JBERR updated in the superior context. The inferior will start with
zero in the word, and anything in the word at context deletion (POP) time will
be added into the superior's count.
[Keywords]
Contexts
.JBERR
[Related MCOs]
11102
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 473 CTXSER CSRTAB,CTXPOP
704A
[End of MCO 14273]
MCO: 14274 Name: RCB Date: 9-Jan-90:07:26:43
[Symptom]
It's the first full week of a new year, and we're almost out of load numbers.
[Diagnosis]
yes.
[Cure]
Recycle the load numbers but bump the minor version number.
[Keywords]
Version control
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
HOSS attention
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 410 COMMON A00DLN,A00SVN
704B
[End of MCO 14274]
MCO: 14275 Name: DPM Date: 23-Jan-90:05:59:17
[Symptom]
Methods for recording AVAIL statistics are too complex.
[Diagnosis]
DAEMON maintains AVAIL.SYS which is nearly an image of ERROR.SYS.
SPEAR will happily extract AVAIL data from ERROR.SYS if it is told
to do so.
[Cure]
Remove references to the AVAIL bits in the system error block logging
code. SPEAR %2(1152) will default to reading ERROR.SYS for AVAIL
(compute) data. DAEMON %24(1032) will no longer write AVAIL.SYS files.
[Keywords]
AVAIL.SYS
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 411 APRSER MELTBL
COMCON DTCTBL
CPNSER CPATBL,CPDTBL
ERRCON XFRSE2
FILIO CSCBEG
NETSER NOFTBL,NONTBL
S EH.AVL,EH.NER,EX.AVL,EX.NER
SYSINI AVLTBL
704B
[End of MCO 14275]
MCO: 14276 Name: DPM/RCB Date: 23-Jan-90:07:14:18
[Symptom]
Network ERROR.SYS entries written out of sequence.
[Diagnosis]
The DAEMON UUO function to append a record to ERROR.SYS
is handled asynchronously to the queueing of system error blocks.
[Cure]
Teach the monitor to intercept this DAEMON function and put
the data into a system error block.
[Keywords]
ERROR.SYS
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 411 UUOCON CALDAE
704B
[End of MCO 14276]
MCO: 14277 Name: RCB Date: 30-Jan-90:05:32:50
[Symptom]
INITIA fails to identify terminals on dataset lines. In particular, it fails
on a reverse-LAT connection to a KS-10.
[Diagnosis]
When SCNSER is told by the device driver that a local dataset has raised
carrier, it sets the 'blind' flag in the dataset control table (DSCTAB) to
allow for possible junk characters coming while the line state settles. This
is mostly an artifact of history, since it was done for acoustical couplers.
However, it causes any incoming characters in the first 1-2 seconds after
carrier is seen to be ignored. If the baud rate is high enough, and the system
is otherwise lightly loaded, INITIA will have started its escape sequence
handling by then. The first characters of the response will be thrown away,
but not all of them, and INITIA will declare the line type to be unknown.
[Cure]
Refuse to transmit characters to a local dataset until the 'blind' flag
(DSCBLI) is clear. After all, if we're worried about acoustical couplers, we
shouldn't send any data until the handset is in the cradle so that the
characters will appear on the terminal. Anyway, INITIA already waits for the
data to leave the TTY chunks before it starts its timers, so this will work for
keeping INITIA happy.
[Keywords]
Reverse-LAT
Datasets
Terminal type interrogation
[Related MCOs]
None
[Related SPRs]
36235
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 412 SCNSER XMTCHR
704B DZINT DZQADD
[End of MCO 14277]
MCO: 14278 Name: DPM/RCB Date: 30-Jan-90:06:45:30
[Symptom]
Stopcode KNIKSP at system startup.
[Diagnosis]
Our efforts to keep make the system uptime and CPU OK words correct
worked well. So well in fact that it caused undeserved KNIKSPs at
system startup. The code to maintain the elapsed time during SYSINI
(and timesharing for that matter) assumes that even though the PIs get
turned off occasionally, we never miss more than one tick. Not so.
[Cure]
Do a RDTIME and compute the number of ticks which have elapsed since
the last interrupt. Use this number to keep APRTIM accurate.
[Keywords]
KNIKSP
APRTIM
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 412 SYSINI ONCINT,ONCIN0,APRCHK
704B
[End of MCO 14278]
MCO: 14280 Name: DPM Date: 6-Feb-90:06:53:58
[Symptom]
1. Stopcode KNIKSP during system initialization (revisited).
2. Stopcode KAF following a series of KNIKSPs under timesharing.
3. LAT boxes drop links to the -10 following a KNIKSP.
[Diagnosis]
1. Like all code which tries to prevent KAFs, KNISER looks at .CPTMF
to determine if we're about to KAF. During system initialization,
this variable is counted up, but DTEs are never serviced. Hence,
it always appears as if a KAF is about to happen and KNISER shuts
down the NIA20. One cannot simply service the DTEs during SYSINI
because much of the time, the clock channel is turned off.
2. While the monitor keeps on running even though many KNIKSPs have
occured, repeated attempts to type out stopcode text can cause a
KAF. This is an aspect of how DIE types on the CTY.
3. LAT boxes expect host systems to keep in nearly constant contact.
The NIA20 is shut down for one second after a KNIKSP, and this
interval is just over that limit that LAT boxes will allow before
breaking contact. There's no way to tell the LAT box the -10 is
going away but will be back shortly.
[Cure]
1. Have the once-a-tick code service DTEs during system initialization.
Most of the time, no work is done except to reset .CPTMF as primary
protocol is not started until late in SYSINI. But, this is enough
to keep KNISER happy. Also, in the interest of keeping common code
between SYSINI and APRSER/CLOCK1, use the same scheme for maintaining
elapsed time under timesharing. Rely on the meters to compute the
time since the last clock interrupt. This maintains clock accuracy
over times when the PI system is shut off. This adds two words to the
CDB. .CPRTM is a double-word quantity that holds the RDTIME base at
the last interrupt.
2. Only report the first KNIKSP in the interval set in KNIOVC. By
default, this location contains a 5, which means only 1 KNIKSP every
5 seconds will appear on the CTY, but the NIA20 will still be shut
down when .CPTMF is nearing a critical threshold.
3. Reduce the time the NIA20 is shut down from one second to a half second.
This interval is stored in KNIZTM and can be patched on the fly if
a more appropriate value is needed. KNIZTM contains the "sleep" time
in ticks.
[Keywords]
KNIKSP
CLOCKS
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 413 APRSER APRTIM,CLKDDR,DISKAL,TIMIN2,TIMIN4
CLOCK1 APRSU2,APRSU4,INITIC
COMMON .CPRTM,SPRI5A
DTESER STAPPC,STXPP1
KNISER .PBOVF,KNIOVC,KNIPAU,KNIRQ1,KNISEC,KNIZTM
ONCMOD ONCBN1
SYSINI APRCHK,GETOPT,HAVTM5,ONCIN0,SYSIN1,TACINI
704A
[End of MCO 14280]
MCO: 14281 Name: DPM Date: 13-Feb-90:05:52:48
[Symptom]
Stopcode EUE on a KS10 doing MTAPEs.
[Diagnosis]
IRBIVA in the IORB is filled in with either a zero or a callback
address to be processed at interrupt completion. Under extended
addressing monitors, we take care to make sure that the left half
of the address contains no junk (flags) from the MTAPE dispatch
table. We neglect to do the same for KS10 monitors.
[Cure]
Clear junk in left half word.
[Keywords]
MTAPE
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Single-section monitors only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 414 TAPUUO MTAPG2
704B
[End of MCO 14281]
MCO: 14282 Name: RCB/DPM Date: 13-Feb-90:06:36:04
[Symptom]
UIL stopcode when an SA10 controller with no attached devices thinks it found
a NXM when trying to access its low core logout area.
[Diagnosis]
Code not defensive for this case.
[Cure]
Yes.
[Keywords]
SA10
Channel NXM
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 414 SAXSER SAXINT,SAXIN1
704B
[End of MCO 14282]
MCO: 14283 Name: DPM/RCB Date: 13-Feb-90:06:51:30
[Symptom]
Too hard to look at paging crashes.
[Diagnosis]
No GETTABs available to point to the start of the various
queues and tables.
[Cure]
Add some:
%VMPTB==44,,113 ;ADDRESS OF PAGTAB
%VMPT2==45,,113 ;ADDRESS OF PT2TAB
%VMMTB==46,,113 ;ADDRESS OF MEMTAB
%VMEVM==47,,113 ;AOBJN POINTER TO EVM BITMAP
%VMPTR==50,,113 ;POINTER TO FREE PAGES (PAGPTR)
%VMINQ==51,,113 ;HEADER OF THE "IN" QUEUE
%VMINC==52,,113 ;COUNT OF PAGES IN THE "IN" QUEUE
%VMSNQ==53,,113 ;HEADER OF THE SLOW IN QUEUE
%VMSNC==54,,113 ;COUNT OF PAGES IN THE SLOW "IN" QUEUE
%VMIPQ==55,,113 ;HEADER OF THE IN-PROGRESS PAGING QUEUE
%VMIPC==56,,113 ;COUNT OF PAGES IN THE IN-PROGRESS QUEUE
%VMOUQ==57,,113 ;HEADER OF THE "OUT" PAGING QUEUE
%VMOUC==60,,113 ;COUNT OF PAGES IN THE "OUT" QUEUE
%VMLPT==61,,113 ;HEADER OF THE QUEUE OF LOCKING PAGES
%VMLPC==62,,113 ;NUMBER OF PAGES IN THE LOCK QUEUE
%VMLCT==63,,113 ;NUMBER OF AVAILABLE PAGES ACCOUNTING FOR %VMLPC
[Keywords]
PAGING
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
UUOSYM change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 414 COMMON .GTVM
UUOSYM
704B
[End of MCO 14283]
MCO: 14284 Name: DPM/RCB Date: 13-Feb-90:07:20:41
[Symptom]
DECnet stopcode ROUCGV at system start in routing monitors.
[Diagnosis]
When DECnet initializes, all events are logged by default. NML
hasn't started up yet, so there's no way to suppress logging of anything.
[Cure]
Change the default from logging all events to logging none. This is
consistant with the way VMS works. It does require, however, customers
make a change to NCP.CMD to selectively enable logging events.
[Keywords]
DECnet events
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Beware file entry required
[BEWARE text]
Customers wishing to have DECnet events logged must change
their NCP.CMD file to include the appropriate SET LOGGING FILE EVENT
command for the event types they are interested in.
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 414 NTMAN NMXFIL
704B
[End of MCO 14284]
MCO: 14285 Name: DPM Date: 20-Feb-90:08:35:03
[Symptom]
Stopcode KNIKSP. KLNI overloaded with incoming packets.
[Diagnosis]
Networks are getting busier every day.
[Cure]
Turn on hardware multi-cast filtering in the NIA20. Note
that this works only for single CPU monitors. Don't fully
understand the problems with SMP but they are definitely in
DECnet.
[Keywords]
MULTI-CAST
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 415 ETHSER
KNISER
COMMON
704B
[End of MCO 14285]
MCO: 14287 Name: DPM Date: 27-Feb-90:05:22:15
[Symptom]
If a CPU has internal memory but some external channels, the monitor
will happily try to use the external channel.
[Diagnosis]
No one ever bothered to check for this case.
[Cure]
At system startup, have AUTCON make note of the fact that internal
memory is in use. Then, when about to build a channel data block,
check for internal memory. If it is in use, don't build the channel
data block.
Note: Edit 100 to BOOT fixes an identical problem. However, the
monitor is not dependent upon the BOOT change or vice versa.
[Keywords]
EXTERNAL CHANNELS, INTERNAL MEMORY
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 416 AUTCON AUTCHN,AUTCPU,AUTMEM
704B
[End of MCO 14287]
MCO: 14288 Name: DPM Date: 27-Feb-90:05:34:47
[Symptom]
When a continuable stopcode occurs, the time of day is off by upwards
of 30 seconds when the system continues.
[Diagnosis]
Previous to MCO 14xxx, BOOT was called with the PI system turned off.
Therefore, there were no timer interrupts to count and time of day
accuracy was lost. After said MCO, the monitor relied on the meters
to maintain accurate time. But the microcode updates the mega-ticks
in the EPT. When BOOT is called, the EPT is switched and the counters
updated in the wrong place. So, APRTIM can't maintain time of day
accuracy even though it ought to work.
[Cure]
Pick up the incremental RDTIME values left by BOOT in its vector and
adjust .CPRTM appropriately.
[Keywords]
TIME OF DAY
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 416 MONBTS BTCALL
704B
[End of MCO 14288]
MCO: 14289 Name: DPM Date: 6-Mar-90:08:34:44
[Symptom]
Problems when multiple jobs try to change the policy CPU at the
same time.
[Diagnosis]
Prior to changing the policy CPU, all other CPUs are forced to jump
into their ACs. A check is made to be sure this has happened before
proceeding and if not, a call to DELAY1 will be done. DELAY1 causes
the job to be resheduled which could allow another job to execute
the same code.
[Cure]
Don't call DELAY1. Instead wait with a JRST .-1.
[Keywords]
SET POLICY
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 417 CPNSER SETPCP
704B
[End of MCO 14289]
MCO: 14291 Name: ERS Date: 26-Mar-90:17:54:42
[Symptom]
Add/Remove CPU doesn't. (1 of 3)
[Diagnosis]
ZAPDSK doesn't do a complete job. It does not check for active DRBs.
[Cure]
Check and delete DRBs for the job being zapped.
[Keywords]
ZAPDSK
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 420 FILIO ZAPDSK
[End of MCO 14291]
MCO: 14292 Name: KDO Date: 28-Mar-90:12:15:22
[Symptom]
1. UIL stopcodes
2. Undeserved MOPIFC (INF) stopcodes
3. LLMOP UUO only useful for OPERATOR jobs.
[Diagnosis]
1. Zeroes in a dispatch table.
2. Too many undefined MOP functions.
3. Call to PRVJ instead of PRVBIT.
[Cure]
1. Avoid jumps to location zero.
2. Support more MOP function codes.
3. Call PRVBIT.
[Keywords]
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
Documentation change
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 421 LLMOP RCSFCD,ULLMOP
[End of MCO 14292]
MCO: 14293 Name: KDO Date: 28-Mar-90:12:17:18
[Symptom]
Monitor too big.
[Diagnosis]
LLMOP must be loaded.
[Cure]
Provide an unsupported feature test switch (FTEMOP) to turn off LLMOP.
[Keywords]
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
Documentation change
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 421 MONGEN
COMDEV ULLMOP,LLMINI,LLMMIN,LLMPSI
[End of MCO 14293]
MCO: 14294 Name: ERS Date: 3-Apr-90:08:59:37
[Symptom]
Add/Remove CPU doesn't. (2 of 3)
[Diagnosis]
ZAPDSK must change it's mapping to touch other job's DDBs. However, if we
start at UUO level we'll come back with the wrong stack. Life is a serious
downer after this.
[Cure]
If we're at UUO level make sure we switch to the NULL job stack
before we change the mappings. Then switch back to the proper stack later.
[Keywords]
ZAPDSK
ADD/REMOVE CPU
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 421 FILIO ZAPDSK
[End of MCO 14294]
MCO: 14295 Name: DPM Date: 10-Apr-90:05:16:04
[Symptom]
If a DECnet node comes online as a non-router, then switches to a
routing node, our Phase IV implementation cannot handle this change
and issues a ROUNAV stopcode. ROUNAV means there is no adjacency
vector for the node which now claims to be a router. This is a
fairly common occurance these days, as DECnet router boxes behave
in this manner. When this sequence of events occur, all further
communications with the offending node are impossible. If the node
was the area router, the only recourse is to reload the -10.
[Diagnosis]
When the ajacency block is built, the vector is not filled in because
the node is a non-router and cannot possibly use the vector. Despite
this, core is allocated for the vector (in a perverted sort of way),
but it is never used.
[Cure]
Remember the vector address whenever the adjacency block is built.
If the node is a router, also fill in the working copy of the vector
address. Later, when a node changes its state to a router, if the
adjacency vector pointer is zero, pick up the saved copy of the vector
and use it. For the paranoid, if the copy is zero, then issue the
ROUNAV.
[Keywords]
ROUNAV
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 422 D36PAR AJRVC
ROUTER RTRBAV,RTRMAJ
704B
[End of MCO 14295]
MCO: 14296 Name: RCB Date: 10-Apr-90:07:43:46
[Symptom]
Half-implemented feature of supporting ISO/Latin-1 in parallel with DEC/MCS
doesn't really work.
[Diagnosis]
No code to support the ISO fallback mappings in the CHTRN. UUO. As a result,
if someone with an 8-bit username logs in on a terminal which supports the ISO
character set, LOGIN will convert the name to junk.
[Cure]
Add the code. This adds bit CH.ISO to the possibilities for the flags halfword
in the UUO. If set, the fallback mapping will be that of ISO Latin Alphabet
number 1 (ISO 8859-1). If clear, DEC/MCS will continue to be used.
[Keywords]
8-bit ASCII
ISO
Latin-1
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
Documentation change
UUOSYM change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 422 SCNSER
704B UUOSYM CHTRN.
[End of MCO 14296]
MCO: 14297 Name: RCB/JAD Date: 10-Apr-90:07:51:45
[Symptom]
The SNOOP. UUO doesn't work when trying to analyze code in non-zero sections.
[Diagnosis]
The code is too stupid to handle NZS references, even though the UUO
argument block likes it fine.
[Cure]
Enlighten the code.
[Keywords]
SNOOP. UUO
Multi-section code
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 422 UUOCON
704B
[End of MCO 14297]
MCO: 14298 Name: RCB Date: 10-Apr-90:07:56:34
[Symptom]
Per-CPU GETTABs for non-existent CPUs can appear to succeed when
they shouldn't.
[Diagnosis]
The extra tables are present in the monitor, but point to zeros.
[Cure]
Fix them to point to NULGTB instead, so that the GETTAB UUO will know
better than to return junk.
[Keywords]
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 422 COMMON
704B
[End of MCO 14298]
MCO: 14299 Name: RCB Date: 10-Apr-90:07:59:14
[Symptom]
Monitor too slow. Commands (at least) can be delayed longer than they
ought to be.
[Diagnosis]
SIMCHK tries to check for a monitor PC at UUODON, but only gets it
right when the UUO's return is in section zero.
[Cure]
Yes.
[Keywords]
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Extended addressing only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 422 CLOCK1 SIMCHK
704B
[End of MCO 14299]
MCO: 14300 Name: RCB Date: 10-Apr-90:08:04:29
[Symptom]
User-mode diagnostics can flood the monitor with error packets faster
than DAEMON can process them. We run out of freecore.
[Diagnosis]
The diagnostic is trying to be safe by clear its user-I/O bit whenever it
doesn't need it for a while, and then doing a TRPSET to get it set again. Each
TRPSET UUO makes an error entry.
[Cure]
Add bit UP.TUR (TRPSET UUO reported) to .USBTS, and use it to keep track of
whether we've reported any recent TRPSET UUOs. If we report a TRPSET, we set
this bit. If a TRPSET UUO is processed, and the bit is already set, and the
AC is zero, then we'll skip logging that call. A RESET UUO will clear the bit.
[Keywords]
User-mode diagnostics
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 422 UUOCON TRPSTU
704B S UP.TUR
[End of MCO 14300]
MCO: 14301 Name: RCB Date: 11-Apr-90:16:47:09
[Symptom]
KAF on one CPU in a multi-CPU system can migrate to each of them.
[Diagnosis]
When BECOM0 is trying to send the message to the new CTY to inform the
operator that a new CPU has taken over policy, it violates the rules with
respect to the SCNSER interlock. This results in a nested attempt to obtain
the interlock during BECOM0, which results in another KAF, this time on the new
policy CPU. The problem then repeats until we run out of CPUs.
[Cure]
Don't do typeout at APR PI level. Make a clock queue entry to send the
message at PI 7, in (new) routine BECOM7 in CPNSER.
[Keywords]
Simultaneous KAFs
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Multi CPU only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 423 CPNSER BECOM0
704B
[End of MCO 14301]
MCO: 14302 Name: RCB Date: 12-Apr-90:02:29:58
[Symptom]
MCO 14956 incomplete.
[Diagnosis]
Some lurking (old) bugs were made easier to exercise. Bad conversions happen
for certain characters under the right(?) combinations of bit selections.
[Cure]
Always update the character attribute bits we're checking after we've changed
our notion of the current character.
[Keywords]
8-bit ASCII
CHTRN. UUO
[Related MCOs]
14296
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 423 SCNSER CHTRN8,CHTRN2
704B
[End of MCO 14302]
MCO: 14303 Name: ERS Date: 17-Apr-90:05:58:23
[Symptom]
Job hung.
[Diagnosis]
The job in question is the current job on a CPU that fails. After
the CPU fails the job may still be the "current job" on that CPU.
[Cure]
When a CPU dies gracefully, have it submit a request to have it's current
job informed of it's unfortunate state and left in a more friendly state.
[Keywords]
Hung job
CPU stopcodes
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
Multi CPU only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 433 ERRCON ZAPZAP
[End of MCO 14303]
MCO: 14305 Name: RCB Date: 17-Apr-90:08:44:31
[Symptom]
Stopcodes PFL and IME seen, others possible.
[Diagnosis]
When doing a MERGE or GETSEG in a core image where section 0 is full, we find
a page in section 0 to 'hide' for a while so that the save/get code can have
its directory page. Later, we're supposed to put it back the way we found it.
However, the MERGE and GETSEG cases clobber the register in which its location
is remembered. This causes us to use a bogus number later as a disk address
or as a physical page number. In any case, if an error is encountered, we will
simply lose the page forever.
[Cure]
Add a word to the UPT, .USSDP, and use it instead of P4 when trying to
restore the user's saved page. Clear it when we're done with it, just for the
sake of paranoia. Make GTSAVP a little more robust, and then call it from
SGRELE if it looks necessary.
[Keywords]
PFL
IME
KAF
corrupted core image
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 423 S .UPPAT
704B COMCON SAVFIN,RELDR6,RELDR1,SAV3,GTSAVP,SEGRLX
SEGCON GETFI6
[End of MCO 14305]
MCO: 14306 Name: DPM Date: 3-Jul-90:08:33:24
[Symptom]
New: Define a new HOM block bit (HOMHWP==1B34) which indicates the
structure must be hardware write protected before it can be mounted.
This is most useful for archive disks. Add a new question to the
CHANGE STRUCTURE command:
Always mount structure hardware write protected (NO,YES)
The default answer is the current setting, normaly NO. PULSAR version
5(544) or later respects this bit.
[Diagnosis]
[Cure]
[Keywords]
ARCHIVE,HOMHWP
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 424 COMMOD HOMHWP,STRHWP
ONCMOD TRYHO4
704B
[End of MCO 14306]
MCO: 14307 Name: RCB/LWS Date: 10-Jul-90:08:38:57
[Symptom]
Jobs using MDA-controlled tapes hang in event wait for the labeller process.
Others possible.
[Diagnosis]
Certain UUOs, especially IPCFM., sometimes call GETWRD or a similar routine
with a JCH in J. While the GETWRD series of routines has code to accept this,
the case of an error and a subsequent call to MONPFH fail on this case. The
end result is that a UUO will fail with an undeserved addressing error. In the
particular case reported from the field, PULSAR got an error in an IPCFM. UUO
to read the system PID index for a given PID. This caused PULSAR to ignore the
labeller request from [SYSTEM]IPCC since PULSAR believed that the IPCF message
was not from a trusted process. Thus, the job hung waiting for PULSAR.
[Cure]
Fix the GETWRD routines to handle JCHs in J during page faults. Don't
invoke the PFH with anything but a job number in J.
[Keywords]
JCH in GETWRD
MONPFH
[Related MCOs]
None
[Related SPRs]
36293
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 425 DATMAN GETWRY,PUTWRZ
704B MONPFH PFHGWF
[End of MCO 14307]
MCO: 14308 Name: RCB Date: 10-Jul-90:10:38:28
[Symptom]
Stopcode NPJ while migrating swapping space.
[Diagnosis]
While scanning jobs to try to remove all pages from the unit being taken down,
CHKMIG scans from job number 1 up through HGHJOB. If there is a gap in the
list of assigned job numbers, and if the last job before that gap was in core
(swapped in) but virtual, then PFHMIG will be called for the unassigned job
number.
[Cure]
Don't try to migrate nonexistent jobs. Test JNA in JBTSTS for each job we
start to process at CHKMI1.
[Keywords]
Migration
Swapper
PFN
[Related MCOs]
None
[Related SPRs]
36298
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 425 SCHED1 CHKMI1
704B
[End of MCO 14308]
MCO: 14309 Name: ERS Date: 4-Sep-90:17:15:13
[Symptom]
APRENB will sometimes give incorrect results.
[Diagnosis]
If the user tries to write to the high-seg the PC returned in .JBTPC is incorrect. In this case we have stompped on T1 during our call to USRFLT.
[Cure]
Restore the double word PC after the call to USRFLT.
[Keywords]
APRENB
USRFLT
Double-word PC
[Related MCOs]
None
[Related SPRs]
36302
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 426 APRSER SUILM1
[End of MCO 14309]
MCO: 14310 Name: DPM/RCB Date: 28-Sep-90:04:29:56
[Symptom]
There is the possibility of data corruption or physical disk damage on
RA80s and RA81s when the heads are not moved for some "long" period of
time.
[Diagnosis]
It seems that tiny foreign particles inside the HDA can become
magnetized over time. If the heads are stationary long enough those
magnetized particles can become attached to the surface of a disk and
may actually write the disk. It is not clear if this can result in
permanent damage to the HDA or simply destroy the data that was on it.
[Cure]
Read a random block every 20 minutes. This change utilizes the
already existing code to read HOM blocks, except that a random block
number is substituted. FILIO will insure that each read is not too
"close" to the previous. If the newly selected random block is within
2% of the size of the disk (or 16704 blocks for an RA81), then another
block is picked. The time interval is skewed by 1 second across
drives to minimize I/O collisions.
The UDB contains 2 new words: UNIRBT contains the initial time
interval in the left half word and the running timer in the right half
word. the granularity is in seconds. UNIRBN contains the last
selected random block number.
[Keywords]
RA DISKS
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
Field service attention
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 426 DEVPRM UNIRBT,UNIRBN
FILIO CHKRRB,FINRRB,SETID3
RAXKON RAXSEC,RAXUP6
704B
[End of MCO 14310]
MCO: 14311 Name: DPM Date: 28-Sep-90:04:52:47
[Symptom]
Stopcode IME when some DX20 tape errors are encountered.
[Diagnosis]
If a microprocessor error occurs the code to read and log the DX20
registers for error analysis always assumes there must have been
some outstanding I/O request. It loads T1 up with a "saved" IORB
address to check if the request was for data or positioning. If
no IORB existed, then T1 contains a zero and the subsequest tests
generate an illegal indirect page fail. If there is a microprocessor
error, the liklehood of it happening without an IORB is pretty
high.
[Cure]
If there is not a valid IORB, don't check for a data request.
[Keywords]
DX20 TAPE ERRORS
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 426 TD2KON RDIRE1
704B
[End of MCO 14311]
MCO: 14312 Name: KBY Date: 1-Oct-90:07:34:23
[Symptom]
Stopcode KAF before MCO 14252 (still possible but less likely); monitors
with lots of big (are there any other kind?) virtual jobs run way too slow.
The OUT queue is most monitors is exceptionally long, which in itself isn't
all that bad, except most of the pages can never be reclaimed.
[Diagnosis]
It turns out there are actually two problems. The first is that
when a job deletes a page which is on the OUT queue, the page doesn't go to
the free core list then; we just zero the MEMTAB entry so the page can't be
found again (locally speaking, this is faster, easier, and less complicated
than returning the page, particularly when the code was implemented). Although
this is OK for small numbers of pages, a large (is there any other kind?)
virtual job which CORE 0's can leave a lot of pages stranded there this way.
It is true we will reclaim the pages if we need memory, but if we don't
(because the customer is just paranoid in setting low physical limits and
never believed what we told him about VM), but if we don't, those virtual
jobs which remain have to wade through a particularly long OUT queue for
any page with a disk address.
The second problem is that if we "page out" a page which was write-locked
by the monitor and already on the swapping space, we move the page to the
OUT queue and write the disk address in the map, but zero the MEMTAB entry
so that we can never find the page on the OUT queue anyway. This is the
biggest generator of "useless" pages on the OUT queue on 1026.
In a previous dump from Rohm, there were approximately 4400 pages on the
OUT queue, only 230 of which were reclaimable (the dump was a "legitimate"
KAF wading through the queue a number of times by all appearances). On
1026, it's more like 1300 pages without this MCO vs. ~150 with.
[Cure]
1. Return pages to the free list when deleting them from the OUT queue.
This was hard before LKPSF, but now the hard work is already done by locating
the page on the OUT queue in the first place.
2. Set MT.JOB and P2.VPN (the latter for the courtesy of SPY programs)
when "paging out" a write-locked page.
[Keywords]
KAF
CORE 0
[Related MCOs]
14252
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 426 VMSER PAGOM4,RMVPG1
704A VMSER PAGOM4,RMVPG1
[End of MCO 14312]
MCO: 14313 Name: RCB Date: 1-Oct-90:15:03:18
[Symptom]
PSIs never taken for TTYs connected to MPX channels. This used to work.
[Diagnosis]
Broken as a side effect of the change to make image mode on TTYs under MPX
work. Overzealous clearing of old variables clobbered the up pointer from the
TTY DDB to the MPX DDB.
[Cure]
Yes.
[Keywords]
MPX
TTY PSIs
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 427 SCNSER TTYMPX,RECIN6
MPXSER
704B
[End of MCO 14313]
MCO: 14314 Name: RCB Date: 2-Oct-90:05:58:07
[Symptom]
SEGOP. throws away hisegs when it shouldn't and causes IMEs.
[Diagnosis]
Clearing the "what are we doing" bits in SGAEND too soon, and
calling PUTWRD with a hiseg number in J while depositing to a hiseg.
[Cure]
Yes. This makes GETWRD/PUTWRD even more robust with respect to having
trash in J than they were before. Now, J only has to be valid as a JCH when
the UPMP (.USJOB) hasn't been setup yet.
[Keywords]
SEGOP. .SGGET
Losing hisegs
ERFNF%
IME
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 427 SEGCON GETH5A,INPSE2,INPMG2
704B COMCON SOPGE6
DATMAN GETWRA,GETWRY,PUTWRA,PUTWRZ
[End of MCO 14314]
MCO: 14315 Name: DPM Date: 5-Oct-90:06:09:15
[Symptom]
Undefined globals attempting to link a monitor without DECnet.
[Diagnosis]
Oversight.
[Cure]
Add dummy definitions for .SAVn and DDIINE to COMDEV, and conditionals
around references to .CPTPN in COMMON..
[Keywords]
DECNET
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 430 COMDEV .SAVN,DDIINE
704B
[End of MCO 14315]
MCO: 14316 Name: RCB Date: 9-Oct-90:04:34:47
[Symptom]
Crashes not getting copied and DAEMON information not getting logged at
system startup.
[Diagnosis]
EXPJOB fails, blowing away the CRSCPY command. It carefully checks for
having no outstanding command on FRCLIN and its MIC interlock being free, but
it does not notice that there's a job (INITIA) already running on the line.
The forced .EXEC loses with '?Please type ^C first' which clears the type-ahead
for the CRSCPY command.
[Cure]
Add a call to COMQ before deciding to try to force the .EXEC command.
[Keywords]
CRSCPY
AVAIL
SPEAR
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 430 COMMON DAEJOB
CLOCK1 STDAEM,EXPJOB
[End of MCO 14316]
MCO: 14317 Name: KBY Date: 13-Oct-90:12:19:50
[Symptom]
Multiple guess:
The infamous PQW bug is:
(a) A rare species; very difficult to find
(b) Of world-wide distribution, but most common in Germany
(c) Finally extinct
(d) All of the above
Other scenarios are possible, but PQW is the most common. Virtual page number
spray into the P2.VPN field of certain pages, specifically those whose physical
page number lies between JOBMAX+M.CPU and JBTMAX+M.CPU (i. e. pages numbers
that correspond to high segment numbers expressed as SPT offsets). Potentially
causes problems whenever a new high segment is initiallized as the spray always
happens then, but usually only causes problems when the specified pages occur
are on the IN or SN queues at the time of initialization.
[Diagnosis]
When initializing the new high seg in the user's address space, NREMAP
eventually gets called to move the pointers to the high seg map and convert
the read-in pages into indirect pointers to the secondary map, and also to
move the pointers to the correct high segment origin. MV1PG is called for
the latter, and part of the code it calls (primarily for the usage of the
MOVPGS code) checks to see if the page being moved has PM.OIQ on; if so
it changes the P2.VPN field to reflect the new virtual page the page
is being moved to. The problem is, by this time the map is supposed
to contain indirect pointers, where the PM.OIQ bit isn't valid any more (it
overlaps into the secondary map offset field in the secondary pointer).
Thus, if the address offset into the secondary map has the PM.OIQ bit on
as part of that offset (=40), the physical page corresponding to the
high seg number + M.CPU (=the SPT offset) gets sprayed with all, but
ending with the last virtual page number of the high seg being initialized
that has the PM.OIQ bit on in the offset. This means that low numbered
pages are affected and tend to get sprayed with relatively high virtual
page numbers. Things go downhill from there.
An additional problem is that NREMAP doesn't actually set up the
indirect pointers correctly anyway; these presumably get fixed up later
by some call to REDOMP before the user ever tries to use them.
[Cure]
Make NREMAP set the indirect pointers up correctly. Make MVPMT return
if the pointer isn't a direct (PM.DCD) type pointer.
[Keywords]
PQW
P2.VPN spray
[Related MCOs]
None
[Related SPRs]
36285
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 430 VMSER MVPMT,NREMA6
704A
[End of MCO 14317]
MCO: 14324 Name: KBY/RCB Date: 7-Nov-90:09:47:05
[Symptom]
KAF uncovered on KS (430 doesn't).
[Diagnosis]
Losing DRP flag in GVFWDS from call to get a large (multi-page) block of
funny space.
[Cure]
Chnage some SPUSH/SPOP macros to PUSH/POP instructions to preserve the flag
even on the KS.
[Keywords]
KAF
KS10
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
KS10 only
Single-section monitors only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 431 VMSER GVFWD3,GVFWD4
[End of MCO 14324]
MCO: 14325 Name: DPM/RCB Date: 7-Nov-90:09:47:05
[Symptom]
Time is short.
[Diagnosis]
The end is near.
[Cure]
It's time to fade away.
The members of the TOPS-10 group, their management, and their close relatives
would like to thank you for your support over the past twenty-six years. It
has been a pleasure working with you. We hope you have fond memories of your
association with us and your DECsystem-10.
Ann Barr
Spider Boardman
Dave Braithwaite
Joann Creely
Tony Dziedzic
Dave Eklund
Linda Feldeisen
Jim Flemming
John Francini
Bob Frohreich
Ruth Fong
Tim Litt
Don Mastrovito
Kevin O'Kelley
Julie Pratt
Christine Quiriy
Ned Santee
Larry Sendlosky
Kimo Yap
[Keywords]
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 431 COMMON A00DLN
[End of MCO 14325]
MCO: 14327 Name: RCB/ERS Date: 29-Nov-90:11:34:23
[Symptom]
STOPCDs IME, IBZ, UIL during or shortly after system startup.
Uninterruptable loops at PC 0 also seen.
[Diagnosis]
Bugs in FLPUDB/DETUDB, and in LLMOP. LLMOP was trying to modify data
in section three while running in section zero, and FLPUDB was doing a POPJ with
U still pushed on the stack, due to an alternate entry point for ONCMOD for
ONCBND's use.
The latter has been observed to lead to a UIL, and could account for the PC 0
looping (as an MUUO, if the MUUO trap location got zeroed as a side-effect of
this bug).
[Cure]
SE1ENT and stack re-phasing as appropriate.
[Keywords]
IME
UIL
IBZ
PC 0 UUO loop
System startup
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
Field service attention
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 432 ONCMOD ONCBND
FILIO DETUDB
LLMOP LLMINI
[End of MCO 14327]
MCO: 14328 Name: RCB Date: 29-Nov-90:17:19:36
[Symptom]
FRCLIN INITIA hangs in NA state. System startup never completes.
[Diagnosis]
TTYDET does not wake up COMCON when it should.
[Cure]
Add a check for break characters into CNCMOD, and call COMSET if it
look appropriate.
[Keywords]
Hung startup
FRCLIN INITIA
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
705 432 SCNSER CNCMOD
[End of MCO 14328]