Trailing-Edge
-
PDP-10 Archives
-
bb-jr93d-bb
-
7,6/ap017/mon703.d17
There is 1 other file named mon703.d17 in the archive. Click here to see a list.
MCO: 13284 Name: DPM/TL Date: 23-Feb-87:02:33:56
[Symptom]
DECsystem-10 NOT RUNNING when TGHA runs.
[Diagnosis]
DTESER doesn't know that -20F has a protocol pause mode where it will
leave the -10 alone for about 30 seconds while it enters secondary
protocol. Doing this part is easy, however, a few other buggers were
encountered:
1. Stopcode IME referencing .PDDIA in the PDB.
2. Stopcode IME when stuffing pointers into maps using IDPB instructions.
3. Stopcode IME putting -20F into protocol pause. GETETD can't run in
section one like MOSSER does.
4. -20F enters secondary protocol upon exiting protocol pause.
5. KLPSER will shut down KLIPA that aren't running with memory is set offline.
It will try to turn on the KLIPAs when memory is set online even if they
weren't on to begin with.
6. Ditto for KNISER.
[Cure]
1. MOSSER sets up W using a MOVE W,JBTPDB##(J). Call FNDPDS like it should.
2. GTPME hasn't returned a byte pointer for quite some time. Do a MOVEM and
an AOS instead of an IDPB.
3. Cause DTEAPP to run in section zero with the rest of the DTE code.
4. Clear out the last command -20F saw before entering protocol pause.
5. Respect IPAMSK.
6. Ditto for KNISER.
[Comments]
[Keywords]
TGHA
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 344 COMMON .CPEPW
DTESER DTEAPP,SETPP
MOSSER DIAGT1,DIAGVM
KLPSER PPDMFL,PPDMON
KNISER KNIMOF,KNIMON
703A
[End of MCO 13284]
MCO: 13286 Name: JJF Date: 24-Feb-87:11:15:49
[Symptom]
No way for customers to enter their own specially-defined PIDs
in the system PID table.
[Diagnosis]
Make room.
[Cure]
Add two negative entries to the system special PID table in COMMON.
[Comments]
For EWS, ISWS, and NETSPL
[Keywords]
SPECIAL PIDS
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Retracted
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 345 COMMON .GTCSD
703A
[End of MCO 13286]
MCO: 13289 Name: JAD Date: 26-Feb-87:12:39:36
[Symptom]
Potential software checksum errors on files which span more than
one unit in a structure.
[Diagnosis]
MCO 11937 fixed problems with returning the "DA" resource on the
"wrong" (original) unit when a file switched to a new unit. However,
DEVUNI contains the original unit rather than the new unit, causing
the checksum to be computed incorrectly.
[Cure]
Call STORU in OUTGRP after calling WRTPTR.
[Comments]
Simultaneous update on a multi-unit structure still has several
holes, primarily due to the fact we own the DA on unit "A", call
TAKCHK/TAKBLK, and believe owning the DA on unit "A" will prevent
the job from looking at unit "B" once we've stored the unit change
pointer. Tain't necessarily so.
Probably requires a per-file allocation resource (a few bits in the
NMB?) to prevent this type of race condition. One of the guys at
ADP threatened (promised?) to look at it and get back to me if he
came up with a reasonable solution.
[Keywords]
SIMULTANEOUS UPDATE
MULTI-UNIT STRUCTURES
[Related MCOs]
11937
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 345 FILIO OUTGRP
703A
[End of MCO 13289]
MCO: 13290 Name: TL Date: 27-Feb-87:09:10:10
[Symptom]
1. "Modems" don't work on the KS.
2. KS Autobaud code is different from KL/ANF.
a. It supports fewer speeds.
b. It's more susceptible to noisy lines, and defective modems.
3. ANF autobaud sometimes fails at 1200 (^C, even)
4. ANF autobaud at 600 is hopelessly unreliable.
[Diagnosis]
1. They aren't real modems, and they don't obey the EIA spec.
a. Devices which raise (or drop) RLSD and RI simultaneously lose.
b. Devices which pulse RI only once, instead of at 20 Hz like
the Bell system lose.
2. Once it did more than the others. Now, it's older and does less.
3. The "Ignore" next character bit isn't set for ^C even 1200 in
high-speed mode. This can cause a low-speed switch, which makes
it even less likely that we will recognize the ^C next time.
4. The original empirical studies made it look like it might work;
analysis shows that we are asking the UART to sample a bit while it
is in transition. Since it's the start bit, it's quite hopeless.
[Cure]
1. Violate yet more of the EIA spec. DEC's way is better.
2. Replace the autobaud code with the algorithm used by ANF-10 and RSX-20F.
Search at two speeds, and support all the same speeds as the others.
3. Set the bit.
4. Remove 600 baud from the tables in ANF.
[Comments]
Makes reverse LAT connections work. Simplifies the SPD.
SPD revisions: The SPD references the original KS behavior (which changed
long ago). Remove the KS-specific references, and merge with the KL/ANF
description. Add support for 1800 and 2400 bps. Footnote 4800 bps
to show that it is detected only with ^C, not CR.
This MCO causes ANF, the KS, and -20F to all behave the same way--years
late.
Thanks to Paul Mead and Brian Lilja of the CSSC/CS for testing this code
and assisting with remote debugging. KS4097 STILL doesn't have the
facilities for this, though the request has been pending since 14-Apr-86!
[Keywords]
VAXination
KS
Autobaud
LAT
Modem Control
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
703A DZINT LOTS,AND,LOTS
704 DNTTY LSPTAB,HSPTAB
[End of MCO 13290]
MCO: 13291 Name: RJF Date: 27-Feb-87:13:33:41
[Symptom]
Checkpointing a file using the FILOP. update-rib function can
cause a user specified blocking-factor to be messed up.
A FILOP, update-rib doesn't work correctly if the current buffer is
exactly full.
[Diagnosis]
The FILOP. code assumes that its implied "close" logic was
required to output the last buffer. It then reads the last block of
the file back into the current output buffer so the user may append to
it. A program that has done an OUT before the FILOP. update-rib, doesn't
want this to happen. The OUT was done to advance to the next block of
the file.
The second problem happens if a FILOP. update-rib is done when the
current buffer is exactly full. A virgin buffer ring header is returned
that can cause the user program all kinds of problems.
[Cure]
For the first problem, only read the last block of a file in if
the FILOP. update-rib function caused it to be written.
For the second problem, always call OUTF to unvirginize the ring header,
even if we don't have to merge any data into the current output buffer.
[Comments]
I think the only thing that does this stuff is COBOL.
[Keywords]
FILOP.
UPDATE-RIB
CHECKPOINTING
.FOURB
[Related MCOs]
None
[Related SPRs]
35229
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 342 S
703A UUOCON
FILUUO
[End of MCO 13291]
MCO: 13292 Name: JAD Date: 2-Mar-87:12:20:40
[Symptom]
MCO 13277 got lost
[Diagnosis]
Don't ask me, guess that's why we're still seeing IMEs and etc.
when SABs start on a page boundary.
[Cure]
Try again.
[Comments]
I'm concerned. We seem to have lost a number of MCOs over the past
few weeks. Is someone getting careless, or is there a bug in the
procedure?
[Keywords]
CORGRS
[Related MCOs]
13277
[Related SPRs]
None
[MCO status]
Restricted distribution
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 345 ONCMOD CORGRS
703A
[End of MCO 13292]
MCO: 13299 Name: JJF Date: 5-Mar-87:13:39:03
[Symptom]
No way to easily patch in a customer system-wide PID.
[Diagnosis]
No negative entries in system PID table.
[Cure]
Add, under FTPATT, two negative words in the system PID table.
[Comments]
13286 gotten right. This allows the stupid Galaxy 2 QUASAR needed for
NETSPL to work.
[Keywords]
Customer PIDs
[Related MCOs]
13286
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 346 COMMON .GTCSD
703A
[End of MCO 13299]
MCO: 13300 Name: JAD Date: 5-Mar-87:14:17:01
[Symptom]
Probable KAF stopcode during illegal UUO (or any call to GIVRES).
[Diagnosis]
GIVRES calls DTXFRE which thinks it is being called with a DDB address
in F. Wrong. F contains zero at this point, so a LDB J,PJOBN gets the
contents of DEVJOB (= location 22 = BOOTWD). Eventually we'll call
SRFREE which calls OWNIP which loops waiting for the CX resource.
Even if J didn't contain garbage, we PUSH the ACs on the stack in the
reverse order of which SRFRDT expects.
[Cure]
Remove the LDB J,PJOBN from DTXFRE, and add a PUSH/POP of T1 so the
stack is phased as SRFRDT expects.
[Comments]
This is the cause of the KAFs we've seen under the A/P monitor.
Did ANYONE test this code?
[Keywords]
DTXFRE
KAF
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 346 DTASER DTXFRE
703A
[End of MCO 13300]
MCO: 13301 Name: JAD Date: 5-Mar-87:16:13:16
[Symptom]
Stopcode UIL, EUE, etc.
[Diagnosis]
DEFINEing breakpoints and REMOVEing them without INSERTing them
causes the replaced monitor instruction to get zeroed since it was
never really replaced (SNPRMI contains zero).
[Cure]
Call CKBINS at REMBPS and don't try to remove uninserted breakpoints.
[Comments]
SNOOP. is a good way to crash the system, but this seems a little
excessive.
[Keywords]
SNOOP. UUO
[Related MCOs]
None
[Related SPRs]
35710
[MCO status]
Restricted distribution
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 346 UUOCON REMBPS
703A
[End of MCO 13301]
MCO: 13308 Name: JAD Date: 10-Mar-87:12:10:38
[Symptom]
Trying to do I/O to a file located on a structure in the job's search
list AFTER a structure which has a RIB error for the directory
fails with bizarre error codes. Specifying the structure name
explicitly or removing the structure with the RIB error for the
directory from the search list gets around the problem.
[Diagnosis]
STRLUP tries to read the UFD RIB to see if the file desired is
contained in that directory. If the read of the RIB fails, I/O
error bits are left over in S. Eventually they get stored in
DEVIOS (after the file is found on some subsequent structure)
which will cause the first I/O operation on that file to fail.
[Cure]
Clear IOIMPM/IODTER/IODERR at SCNSTR before branching back to
STRLUP to try the next structure in the search list.
[Comments]
A really wierd bug Eklund found while DSKB was falling apart
(I guess the HDA is 3 months old now, eh? Time for more Elmer's
glue to hold the heads on.).
We actually SELL these disks?
[Keywords]
RIB ERRORS
LOOKUP
DSK
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 347 FILFND SCNSTR
703A
[End of MCO 13308]
MCO: 13309 Name: JAD Date: 10-Mar-87:12:32:35
[Symptom]
Allocating more blocks to a file doesn't always use the expected
unit (i.e., the unit with the most free blocks) of a multi-unit
structure.
[Diagnosis]
T2 trashed before comparing it with UNITAL.
[Cure]
MOVE T2,(P) before CAMG.
[Comments]
[Keywords]
ALLOCATION
UNITAL
[Related MCOs]
None
[Related SPRs]
35711
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 347 FILUUO UPDA1A
703A
[End of MCO 13309]
MCO: 13318 Name: RDH/RCB Date: 17-Mar-87:02:56:59
[Symptom]
Problems with DECnet and DDPs:
1) DDPs don't always notice that the -10 has crashed, thus wedging
themselves.
2) DDPs can't be configured to start up in a useful state.
3) DECnet can't handle DDPs that are in a useful state at system
startup time.
4) DECnet sometimes wedges circuits when processing strange state
transitions.
[Diagnosis]
1) No code to detect dead -10.
2) No configuration option.
3) No code.
4) Not recycling the circuit fully enough.
[Cure]
1) Add code.
2) Add LnDDP to set line block 'n' to be a DDP ab initio. (.P11)
3) Add code.
4) Cycle the circuit.
[Comments]
SCCed. This is the result of RDH's visit to CCCC in Sweden.
[Keywords]
DECnet
DDP
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
Restricted distribution
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 347 DNDCMP DDPSER,DDPSI0,DDPOS0,DDPOS1,DDPIS0
703A DNLBLK LB.ST2
NETDEV NDDPCI,DDPKD7,DDPKD8
DNADLL DDIINC,DDIINE
ROUTER RTRCC1,RTRCC2,RTCINI
[End of MCO 13318]
MCO: 13347 Name: RCB Date: 4-Apr-87:00:39:57
[Symptom]
Problems with DIE:
1) JOB STOPCDs almost always reload the system.
2) Potential halts trying to figure out if we want DDT.
3) PERISH is needlessly inefficient.
[Diagnosis]
1) Somebody wiped the PI status in S which is used by
ZAPJOB to see whether we need to reload.
2) Pushing onto the possibly corrupt stack while we still hold the
DIE interlock.
3) Somebody didn't realize that being called by an XPCW to section
zero means never having to guess if you're addressable.
[Cure]
1) Obtain the PI status again in ZAPJOB.
2) Don't use the stack to preserve T1, just fetch it from the
crash ACs again when we're done with it.
3) Rearrange the PSECTs around PERISH, and save an instruction.
[Comments]
SCCed. You'd think somebody would have complained about problem #1,
wouldn't you?
[Keywords]
DIE
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 352 ERRCON PERISH,DIE01,ZAPJOB
703A
[End of MCO 13347]
MCO: 13348 Name: KBY Date: 4-Apr-87:11:37:59
[Symptom]
IME deleting a section with PAGE. UUO .PAGSC.
[Diagnosis]
Calling KILSEC with junk in LH(T1).
[Cure]
MOVE-->HRRZS.
[Comments]
The code in 7.04 is different (and fixed for this bug), partly due to MHS.
This is perhaps not the best fix, but it's the quickest and safest. IME130.
I don't understand why this doesn't happen every time.
[Keywords]
Huh?
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Restricted distribution
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
703A VMSER CHGSC5
[End of MCO 13348]
MCO: 13353 Name: RJF Date: 7-Apr-87:12:01:50
[Symptom]
Two problems.
1) The FILOP. .FOAPP (Append) function didn't merge in the last block of a
file into the user's output buffer.
2) The FILOP. .FOAPP function didn't return a virgin ring header when it
created a new file. Some programs expect this.
[Diagnosis]
MCO 13291 caused both problems. However the second worked before
MCO 13291 only because of a bug.
[Cure]
First change the name of the UP.URO bit to UP.MLB (Merge Last Block).
Then set the bit before falling into FOPN9B from the .FOAPP code. This fixes
problem 1.
Add a test for a zero length file to FOPN9B so we don't unvirginize the buffer
ring header when writing to new or nul files.
[Comments]
Hope this fixes some of the MAIL problems, but I don't know why MAIL
wouldn't fail more consistently.
[Keywords]
FILOP.
APPEND
.FOAPP
[Related MCOs]
13291
[Related SPRs]
35229
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 353 S
703A UUOCON
FILUUO
[End of MCO 13353]
MCO: 13375 Name: BAH Date: 30-Apr-87:10:13:25
[Symptom]
GALAXY doesn't build.
[Diagnosis]
Some symbols were left out of the UUOSYM source for autopatch.
[Cure]
Define %LDOCS, .GTNXM, .GTBTX, .STPCP, .DCXSF.
[Comments]
These edits were required for Autopatch #16 and have been given
to Buzz. I just needed to get the source on the black packs updated.
[Keywords]
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Restricted distribution
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
703A UUOSYM
[End of MCO 13375]
MCO: 13381 Name: JMF Date: 5-May-87:10:12:54
[Symptom]
SET NOMESSAGE responds with ?No core assigned.
[Diagnosis]
Bits are wrong in UNQTB2. NOCORE is a right half bit.
[Cure]
Fix bits.
[Comments]
Wonder why this hasn't generated an SPR?
[Keywords]
NOMESS
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 356 COMMON SNAMES
703A
[End of MCO 13381]
MCO: 13386 Name: JAD Date: 7-May-87:10:45:24
[Symptom]
SYSUNI chain gets loopy (2 of 2).
[Diagnosis]
MCO 13382 didn't go far enough in finding all the holes in the
code to twiddle with the SYSUNI and SYSDET chains.
[Cure]
Take a giant step and remove the half dozen separate chunks of
code which twiddle the chains, and make subroutines to do the
appropriate foolishness. The subroutines can be made smarter
so we don't loop the chain by trying to attach a unit twice,
for example.
[Comments]
FINALLY solves the KAFs on CI disk failover.
Someone send me some hate mail so I Autopatch this crud.
[Keywords]
SYSUNI
LOOPS
[Related MCOs]
13382
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 357 FILIO LOTS
703A
[End of MCO 13386]
MCO: 13401 Name: KDO Date: 19-May-87:15:27:31
[Symptom]
ANF-10 file transfers don't (2 of 2).
[Diagnosis]
None, but code is wrong.
[Cure]
MCO 13233 removes an AOS (P) which causes double skip returns.
Remove the other AOS (P) also.
[Comments]
[Keywords]
ANF-10
file transfer
[Related MCOs]
13233
[Related SPRs]
35721
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 360 NETSER NTDSIB
703A NETSER NTDSIB
[End of MCO 13401]
MCO: 13407 Name: DPM Date: 26-May-87:07:55:16
[Symptom]
A job has program to run set and is not supposed to return to monitor
level. If the program running does a CTX. UUO to save the current
context and create a new one without running another program, the job
will get stuck in the captive program with no way to return to the old
context.
[Diagnosis]
It's a one way street. Had the program saved the old context and run
a program in the new context, the implied restore at program exit time
would have cleaned things up. CTXSER doesn't know that's what is needed,
so things never happen automatically.
[Cure]
Disallow context saves from a captive program when no RUN UUO is being
done on behalf of the job. New error code CXCCC%==26, cannot create
context from a captive program.
[Comments]
[Keywords]
CONTEXT
[Related MCOs]
11102
[Related SPRs]
None
[MCO status]
Restricted distribution
[MCO attributes]
New development MCO
Documentation change
UUOSYM change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 361 CTXSER LGLCHK
703A UUOSYM CXCCC%
[End of MCO 13407]
MCO: 13413 Name: DPM Date: 29-May-87:05:00:22
[Symptom]
Stopcode IME when allocating non-zero section core after a series
of allocates and deallocates.
[Diagnosis]
At GVFW11, a section number is setup based on the section number
of the chunk pointed to by AC 'R'. The instruction which handles
section numbers is under an FTMP conditional when it should be under
an FTXMON conditional. FTMP controls the assembly of SMP features.
FTXMON controls extended addressing features.
[Cure]
Change the FTMP conditional to FTXMON.
[Comments]
[Keywords]
XADDR
[Related MCOs]
None
[Related SPRs]
35729
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 362 VMSER GVFW11
703A
[End of MCO 13413]
MCO: 13414 Name: DPM Date: 29-May-87:07:23:06
[Symptom]
Stopcode NIJ when sending a tape labeler message to PULSAR.
[Diagnosis]
We understand how the problem happens, but not why it happens.
If we get to LBLSND with J containing an invalid job number, an NIJ
stopcode will result when sending an IPCF packet to PULSAR. This
behavior is most infrequent and cannot be reproduced at will.
However, it is probably in the best interests of our customers to
prevent the system failures rather than allowing them to occur until
the exact cause is known.
[Cure]
In LBLSND, reset J with the job number of the owner of the tape
drive. Also remove a redundant LDB J,PJOBN## at LBLPOS which would no
longer be needed with this change installed.
[Comments]
[Keywords]
LABELED TAPES
[Related MCOs]
None
[Related SPRs]
35985
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 362 TAPUUO LBLPOS,LBLSND
703A
[End of MCO 13414]
MCO: 13418 Name: JMF Date: 1-Jun-87:10:21:42
[Symptom]
IME
[Diagnosis]
MAPHGH sometimes gets called in section 1 and does indexing
with junk in the left half of an AC.
[Cure]
Use another AC which doesn't contain the junk in the left half.
[Comments]
It mystifies me as to why we haven't seen this around here but
can clearly happen.
[Keywords]
IME
LOCK
[Related MCOs]
None
[Related SPRs]
35613
[MCO status]
Restricted distribution
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
703A CORE1 MAPHGH
[End of MCO 13418]
MCO: 13419 Name: KDO Date: 2-Jun-87:05:10:24
[Symptom]
Extended error status contains junk.
[Diagnosis]
PDVESE (in DEVESE) should be cleared in more places.
[Cure]
Clear PDVESE for OPEN, INIT, and FILOP. UUO's.
[Comments]
Doesn't solve all of LPTSPL's error recovery problems,
but it does help.
[Keywords]
Error status
PDVESE
DEVESE
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 362 UUOCON UINIT0
703A UUOCON UINIT0
[End of MCO 13419]
MCO: 13421 Name: JMF Date: 3-Jun-87:10:08:47
[Symptom]
STOPCD NPJ
[Diagnosis]
CTXSER insists that a job must have a PDB when a swapout
finishes but this "ain't necessarily so" to quote Gershwin whilst
a logout is in progress.
[Cure]
Just ignore the CTX stuff if there's no PDB.
[Comments]
Winkler has been hacking around this since 7.03 went to field test.
[Keywords]
NPJ
[Related MCOs]
11102
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
Multi CPU only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
703A 363 SCHED1 FINOUT
704 CTXSER CTXSCD
[End of MCO 13421]
MCO: 13422 Name: JMF Date: 4-Jun-87:03:55:01
[Symptom]
If doing a LOOKUP of a non-existant file from a non-zero section causes
a UFD to be read, the result is
?Illegal address in UUO at user PC xxxxxx.
[Diagnosis]
PCS is getting zapped by the monitor I/O to read the UFD.
[Cure]
Call SSPCS.
[Comments]
Maybe someone really is using extended addressing (Read that as Edgecomb).
[Keywords]
Ill add
Non-zero section
Monitor I/O
[Related MCOs]
None
[Related SPRs]
35986
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 363 FILIO SETBS2
703A
[End of MCO 13422]
MCO: 13440 Name: JAD Date: 16-Jun-87:07:56:50
[Symptom]
Potential for KLIPA microcode confusion on channel errors.
[Diagnosis]
I always thought the last of the 3 words transferred into the port
(.PCPCB, .PCPIA, and .PCIVA) was reserved. Turns out the microcode
expects .PCIVA to contain the address of word 1 in the channel logout
area. Thus, KLPSER never fills in this address, causing the KLIPA
microcode to fetch channel status from physical addresses 0 and 1
(NOT the ACs, but the first two physical addresses which are not
normally addressable).
[Cure]
Change the name of .PCIVA to reflect its new usage and fill in
that new word (.PCAL1) with the address of word 1 in the channel
logout area.
[Comments]
May improve recovery on channel errors, but I don't expect this
has anything to do with ADP's problems with error recovery. Then
again, it might.
Unusual what you find when you're reading microcode listings when you're
bored.
[Keywords]
KLIPA
PCB
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 364 KLPPRM .PCAL1
KLPSER PCBINI
703A
[End of MCO 13440]
MCO: 13447 Name: KDO Date: 22-Jun-87:23:53:33
[Symptom]
LATSER doesn't (4 of n).
[Diagnosis]
LATSER slobbers on section 3 space:
1. Slot block index numbers start with one (not zero). When used as an index
number into the SBVECT array, the first word of the table is never used, and
the address of the last SB is written in some other block.
2. The table of circuit block addresses, CBVECT, can do the same thing.
3. NFRSBQ, the table of bits used to allocate slot block indices, is initialized
to minus one, indicating that all indices are available. That's fine as long
as the number of remote terminals is evenly divisible by 36. If not, SBALOC
may see a one bit for which there is no space in SBVECT, allocate space for
a slot block, and save the address in the contents of SBVECT + "n".
[Cure]
Subtract one before saving the addresses in SBVECT and CBVECT.
Build a proper bit mask for the last word in NFRSBQ at initialization time.
[Comments]
I can't be overdrawn, I still have checks in my checkbook.
[Keywords]
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 365 LATSER LATINI
703A LATSER LATINI
[End of MCO 13447]
MCO: 13455 Name: JAD Date: 25-Jun-87:09:56:33
[Symptom]
The first word of a block of free core immediately following a
label DDB is mysteriously zeroed whenever the label process
does (dump mode) I/O to process tape labels.
[Diagnosis]
TAPUUO defines a prototype label DDB which doesn't exactly
match the regular magtape DDB definition. Several words are
incorrectly documented, but more important, the prototype DDB
is SHORTER than the regular DDB by several words. When dump
mode I/O is done, the first thing TAPUUO does is to zero the
word TDVREM in the DDB. Well, guess what? The prototype DDB
ends just BEFORE word TDVREM. If the prototype DDB ends on
the last word of a 4-word block (which it does in 7.03 with
FTMP turned off), the first word of the next 4-word block will
get zeroed. This may likely be a disk I/O request block (the
first word of which contains a link to the previous and next DRBs).
[Cure]
Get rid of the prototype label DDB. Use the prototype magtape
DDB and set/clear a few bits in the label DDB copied from the
prototype magtape DDB so the label DDB is "correct".
[Comments]
A MAJOR contributor to Pan Am's crashes.
Question of the day: Why does a regular magtape DDB have DVLNG
set but not the label DDB?
[Keywords]
LABEL DDBS
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 366 TAPUUO LPROTO,TPLBGA
703A
[End of MCO 13455]
MCO: 13456 Name: DPM Date: 26-Jun-87:05:51:02
[Symptom]
CTX. UUO function .CTDIR (return directory of contexts) fails
with an illegal job number error.
[Diagnosis]
AC 'M' is incorrectly advanced beyond the word in the argument
block containing the target job number. When this is corrected, the
job executing the UUO may hang trying to get the MM resource on an SMP
system. This happens because the resource is already owned by the
job. On single CPU systems, the job performing the UUO may call
SCDCHK. During the interval in which the job is not running, the
target job may log out, thus deleting its PDB and context blocks.
Later references to those data structures may cause IMEs or unexpected
results.
[Cure]
Do not advance the pointer to the user's argument block. The
call to GETEW1 will do that automatically. Interlock PDB and context
block scanning by using the CX resource, not only in the case of the
.CTDIR function, but in other places as well. That was the intended
purpose of the CX resource, however, the code was written before the
resource existed and was never updated afterwards.
[Comments]
[Keywords]
CONTEXTS
[Related MCOs]
11102
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 366 CTXSER INIQTA,REDQTA,SETQTA,XITQTA,DIRECT,DIRINI,DIRRST,INFORM,INFRST
703A
[End of MCO 13456]
MCO: 13457 Name: DPM Date: 26-Jun-87:06:41:00
[Symptom]
After a series of context commands to switch between and delete
at least one context, an NNF, IME or KAF stopcode may result if the
original context had any opened files.
[Diagnosis]
When an adjacent context is created (one without a superior),
DDBs for any opened files in the old context are propagated to the new
context. In the normal case, the pointers to the DDBs are zeroed to
prevent file read and write counts from being skewed. When one
context executes the CTX. UUO function (or equivalent monitor command)
to delete another context, the DDB pointers are not zeroed.
Subsequent calls to GETMIN as part of the context block switching
sequence cause the DDBs and associated access table information to be
deleted. Later, when the original context is continued, FILSER uses
the contents of DEVACC in the DDB to find access table information.
At this point, the access blocks have been recycled and the link words
changed. Some flavor of a stopcode will result. If it happens to be
an NNF, the monitor trys to continue. This will never work because
NMB scanning happes in a very low level routine which has no error
return. An IME usually follows.
[Cure]
Zero USRHCU and .USCTA when switching context blocks for delete
functions. Also make the NNF stopcode be a STOP rather than a
continuable DEBUG stopcode.
[Comments]
[Keywords]
CONTEXTS
[Related MCOs]
11102
[Related SPRs]
35558, 35559
[MCO status]
Restricted distribution
[MCO attributes]
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 366 CTXSER DELCTX
FILUUO S..NNF
703A
[End of MCO 13457]
MCO: 13459 Name: DPM Date: 26-Jun-87:07:33:25
[Symptom]
A disconnect of a device from an MPX channel fails with the
unknown device error code from the CNECT. UUO.
[Diagnosis]
Two problems exist. First, the user's job number, not the
job/context handle is stored in the DDB when connecting a device to
the MPX channel. Disconnects fail because the standard DDB searching
routines expect a JCH, not just a job number in the DDB. Second, once
the correct DDB is located, the call to SETCPP to put the job on the
CPU which owns the device fails, resulting in the device not available
error code. Upon inspection, of the four callers of SETCPP, two
expect the non-skip routine to mean success, while the other two
expect it to indicate a dead CPU.
[Cure]
Set up AC 'J' with the job/context handle prior to assigning the
device. This will allow the disconnect function to find the DDB. Also
change the SETCPP routine to take the non-skip return if the CPU which
owns the device is dead, and change the callers of SETCPP to reflect
the change.
[Comments]
[Keywords]
CONTEXTS
[Related MCOs]
11102
[Related SPRs]
35728
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 366 COMMON SETCPP
CPNSER SETCPP
MSGSER CNDEV4
UUOCON RELEA1
703A
[End of MCO 13459]
MCO: 13460 Name: DPM Date: 26-Jun-87:08:22:02
[Symptom]
When using contexts in conjuction with eight or less assigned
logical names, an IME stopcode or unpredicable behavior in the use or
existance of those logical names may result.
[Diagnosis]
The PDB contains a pointer to short assigned logical name table.
This table came into being during mid 5-series monitor development.
It was supposed to facilitate high speed DDB searching. This table
was searched prior to scanning the DEVLST chain. Prior to 7-series
monitors, all DDBs were on the DEVLST chain, so alot of CPU cycles
were saved by using the job's logical name table. With the advent of
funny space DDBs, the table's usefulness dimminished somewhat. When
contexts came into being, it was stated that funny space DDBs would be
saved and restored with other funny space quantities. In practice,
this never happened unless the job had more that eight assigned
logical names, as the table could accomodate eight DDBs. CTXSER
doesn't save and restore the table pointer since this would result in
rebuilding the table on each context block switch. Consequently,
funny space can get corrupted when assigned names are manipulated in
conjuction with context block switching.
[Cure]
Remove all references to the job's assigned logical name table.
This includes PDB location .PDDVL. The DEVLST chain is relatively
short because it contains only unit record DDBs. A substantial amount
of code existed to handle the eight name table and the overhead
incurred probably outweighs the benifits gained by using the table.
Without the table, all funny space DDBs will get saved and restored
across context block switches and potential IMEs are eliminated. In
addition, fix other related bugs which prevented these DDBs from being
deassigned if more than eight existed, and prevented any disk DDBs
from being deassigned if no arguments were given to the DEASSIGN
command.
[Comments]
[Keywords]
CONTEXTS
[Related MCOs]
11102
[Related SPRs]
None
[MCO status]
Restricted distribution
[MCO attributes]
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 366 COMCON
COMMON
ERRCON
FILUUO
NETSER
SCNSER
TAPUUO
UUOCON
703A
[End of MCO 13460]
MCO: 13472 Name: KBY Date: 4-Jul-87:15:06:08
[Symptom]
TWICE crashes the monitor in strange and wondrous ways.
[Diagnosis]
The real problem is, indeed, trying to use PG.IDC to indirect-map
a section which is already indirect mapped. All that really needs to be done,
assuming legality checks out OK (no indirect loops, etc.), is to put in
the new pointer. Unfortunately, the code assumes that if a pointer is there
already, it's an independent section pointer and thus all pages in the
section should be returned and the map killed. The routine to kill all
pages in the section is smart enough to not do anything with an indirect
section, but the map-killer just picks up the pointer, keeps what it
thinks is the physical page number, and returns that page to the
free list. This thus returns physical page JOB#+M.CPU to the free list;
a relatively low numbered page and thus in 99.99999% of cases, a random
monitor low seg page. The crash occurs after the page gets re-used and
the monitor next tries to do something with the virtual page which
maps to that physical page.
[Cure]
Don't call KILSEC or ZAPNZM for an indirect pointer.
[Comments]
Gee, Ned, who wudda thunk?
[Keywords]
new TWICE
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Restricted distribution
[MCO attributes]
KL10 only
PCO required
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 367 VMSER CHGS11
703A
[End of MCO 13472]
MCO: 13474 Name: RCB Date: 6-Jul-87:18:29:07
[Symptom]
MCO 13460 broke TSKs, RDXs, and DDPs. Actually, they didn't always
work before, 13460 just made it more obvious.
[Diagnosis]
If there is no short logical name table (as after 13460), or if
the user has too many logical names to fit (before 13460), we don't clear
DD%LOG when searching for devices. This causes the network devices to have
real hiccups when trying to figure out whether we might have exhausted the
DDB chain.
[Cure]
Make sure that the TSTxxx routines get called in ways that they like.
[Comments]
[Keywords]
Not-PHYONLY
[Related MCOs]
13460
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
PCO required
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 367 UUOCON DDSRC3,DEVLP4
703A
[End of MCO 13474]
MCO: 13476 Name: DPM Date: 7-Jul-87:06:15:29
[Symptom]
After MCO 13456 (PCO 10-703-092), a job will hang in CX wait when
an illegal job number is supplied to CTX. UUO function .CTDIR.
[Diagnosis]
Prior to MCO 13456, the MM resource was used to interlock PDB and
context block scanning. Obtaining the MM does not require a job
number so validating the job number could happen after the resource
had been gotten. The CX is job specific. Therefore, when an illegal
job number is specified, the user's job will block indefinetly,
waiting for a resource which will never become available.
[Cure]
Validate the job number argument prior to obtaining the CX
resource.
[Comments]
[Keywords]
CONTEXTS
[Related MCOs]
13456
[Related SPRs]
None
[MCO status]
Restricted distribution
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 367 CTXSER DIRECT
703A
[End of MCO 13476]
MCO: 13478 Name: RCB Date: 7-Jul-87:07:29:03
[Symptom]
Entry vector's still don't work right.
[Diagnosis]
MCO 13269 didn't go far enough.
[Cure]
Finish the job.
[Comments]
Famous last words.
[Keywords]
ENTRY VECTOR
[Related MCOs]
13269
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 367 SEGCON
703A COMCON
UUOCON
[End of MCO 13478]
MCO: 13479 Name: JAD Date: 7-Jul-87:07:46:17
[Symptom]
Possible scheduler confusion running HPQ jobs.
[Diagnosis]
If a job selected to run is pre-empted by an HPQ job, SP.CJn
isn't set for the job originally selected to run. This can lead
to cache confusion, etc., and odd stopcodes when the user mode
stack has gotten confused.
[Cure]
Set SP.CJn in two cases in CLOCK1. Add a routine to CPNSER
to do the dirty work.
[Comments]
Part of Mike Bisco's scheduler changes which might help fix
the EUE problems seen at Abbott. Don says put it in . . .
[Keywords]
SCHEDULER
HPQ JOBS
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Restricted distribution
[MCO attributes]
New development MCO
Multi CPU only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 367 CLOCK1 CIP6A
CPNSER SETSJ0
703A
[End of MCO 13479]
MCO: 13502 Name: RCB Date: 20-Jul-87:22:41:47
[Symptom]
TTY DEFER (command or TRMOP.) doesn't always really cause deferred echo.
[Diagnosis]
Missing code to have RECINT notify XMTECH about the desired behavior.
[Cure]
Add the code.
[Comments]
I don't know for sure why Tony only started seeing this recently, since
the code's been broken since 7.03 shipped.
[Keywords]
TTY DEFER
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 371 SCNSER RECIN3
703A
[End of MCO 13502]
MCO: 13503 Name: RCB Date: 21-Jul-87:03:08:03
[Symptom]
Fix various problems with TRMOP. and TTY commands. Some functions are
mis-defined, and return erroneous results. Some commands don't work as
expected. Some commands do not act like their parallel functions.
[Diagnosis]
Yes.
[Cure]
Yes.
[Comments]
[Keywords]
TRMOP.
[Related MCOs]
13502
[Related SPRs]
None
[MCO status]
Checked
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 371 COMCON TTCEDT,TTCREM,TTCDFR
SCNSER TOPTB1
COMDEV TERMCR
703A COMCON TTCREM,TTCDFR
SCNSER TOPTB1
[End of MCO 13503]
MCO: 13504 Name: RCB Date: 21-Jul-87:04:49:36
[Symptom]
EUE after MCOs 13460 & 13474.
[Diagnosis]
Missing code to balance the stack.
[Cure]
Add the code.
[Comments]
[Keywords]
ASSIGN
INIT
[Related MCOs]
13474, 13460
[Related SPRs]
None
[MCO status]
Checked
Restricted distribution
[MCO attributes]
PCO required
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
703A UUOCON DDBSRC
SCNSER TOPDD1,GETDDB
[End of MCO 13504]
MCO: 13507 Name: JAD Date: 22-Jul-87:09:38:42
[Symptom]
System runs out of free core. System error queue is full of entries
(which are consuming the free core) but DAEMON apparently isn't taking
any entries from the queue. DAEMON only processes the first system
error block from a crash file.
[Diagnosis]
Code which adds entries to the system error queue tries to save a
little space in the ERRPT. block by only inserting one code 13 entry
(system error block available) in the ERRPT. block when a system error
block is queued. Unfortunately, that single code 13 entry may be lost
or overwritten if errors occur too fast for DAEMON to keep up. Also,
if DAEMON dies while processing the system error queue, it will not
check the system error queue again when a new DAEMON is started.
[Cure]
Forget about the code 13 entries in the ERRPT. block. Have DAEMON
always do an SEBLK. UUO when it is woken (after the ERRPT. UUO).
This will guarantee the system error queue is processed in a timely
fashion. The single extra UUO will not add appreciable overhead to
DAEMON. Always scan the entire system error queue when processing
a crash file. Requires edit 1020 to DAEMON.
[Comments]
Bad design from day one. Mea culpa.
[Keywords]
System error blocks
Lost free core
[Related MCOs]
None
[Related SPRs]
36034
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 372 ERRCON QUESEB
S .ERSEB
703A
[End of MCO 13507]
MCO: 13513 Name: JAD Date: 27-Jul-87:12:22:06
[Symptom]
Undeserved checksum errors on multiple RIB files.
[Diagnosis]
Code at NXTBL6 in FILIO believes if the current operation is a
read there is no need to update changed RIB pointers. This isn't
necessarily true, since a previous write may have caused the
checksum to change.
[Cure]
Call PTRTST instead of PTRCUR. If not update mode, the call
to PTRTST is essentially a call to PTRCUR. If update mode, PTRTST
will rewrite the changed pointers if necessary.
[Comments]
Takes an odd combination of record sizes and etc. to cause
this problem, which only occurs in extended RIBs.
[Keywords]
UPDATE MODE
CHECKSUM ERRORS
EXTENDED RIBS
[Related MCOs]
None
[Related SPRs]
36001
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 372 FILIO NXTBL6
703A
[End of MCO 13513]
MCO: 13521 Name: JMF Date: 31-Jul-87:03:42:34
[Symptom]
RLT17J doesn't run.
[Diagnosis]
704 edit didn't happen to 703A even though the code is the same
in both monitors.
[Cure]
JUMPE T1,CPOPJ=>JUMPE T1,CPOPJ1##
[Comments]
This probably wouldn't happen if we patched rather than edited on
Wednesday morning. It also reinforces the need to run SMP AP monitors.
[Keywords]
Off-line
Disks
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Restricted distribution
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
703A 17000 CPNSER SETCF2
[End of MCO 13521]
MCO: 13524 Name: JAD Date: 3-Aug-87:09:58:24
[Symptom]
RTTRP UUO too restrictive with respect to EVM.
[Diagnosis]
The RTTRP UUO requires the job to be locked in Exec Virtual Memory
(EVM), even if no references will be done to the user address space
without using PXCT'd instructions. Only if an indirectly-addressed
CONSO bit mask is supplied, or if "fast mode" (no AC save/context
switching to be done on a real-time interrupt) is specified, does
the job need to be locked in EVM so the monitor can access the user
address space during real-time interrupt level.
[Cure]
Move the test for "locked in EVM" down to the end of the code
which sets up for a RTTRP UUO. Only test for locked in EVM if an
indirectly-addressed CONSO bit mask is specified, or if "fast mode"
handling is specified.
[Comments]
Gets around the EVM bind which Rockwell complained about
in the SPR. Since we're moving things around in the address space
for 7.04, there will be much less of a bind.
"Fools rush in . . . "
[Keywords]
EVM
RTTRP
LOCK
[Related MCOs]
None
[Related SPRs]
35659
[MCO status]
None
[MCO attributes]
New development MCO
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 373 RTTRP RTTRP,RTTRP5,BLKRET
703A
[End of MCO 13524]
MCO: 13525 Name: JAD Date: 3-Aug-87:11:01:38
[Symptom]
Can't build a monitor with DECnet but without LOKCON.
[Diagnosis]
DECnet (SCLINK) calls MOVPAG in LOKCON to move unmapped pages on
a SET MEMORY OFFLINE command/UUO. If LOKCON wasn't loaded, MOVPAG
won't exist, leading to an undefined global.
[Cure]
Define MOVPAG in COMMON, and add a few FTLOCKs in D36COM and
SCLINK.
[Comments]
Ho hum
[Keywords]
MOVPAG
[Related MCOs]
None
[Related SPRs]
35767
[MCO status]
Restricted distribution
[MCO attributes]
New development MCO
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 373 COMMON MOVPAG
703A D36COM DCNMOV
SCLINK SCTMOV
[End of MCO 13525]
MCO: 13526 Name: RCB Date: 3-Aug-87:20:51:15
[Symptom]
CAL11. function to queue data is broken for IBM FEs in 7.04 and in
7.03 A/P.
[Diagnosis]
CAIE / jrst good/ jrst bad
[Cure]
CAIE/ jrst bad/ fallinto good
[Comments]
I have no idea how this got messed up, especially since this is the first
time I've touched any relevant code in 7.03A.
[Keywords]
IBMCOM
DTEs
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
PCO required
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 373 DTESER DTEQU1
703A
[End of MCO 13526]
MCO: 13528 Name: DPM Date: 4-Aug-87:03:23:12
[Symptom]
Stopcode IME updating page maps.
[Diagnosis]
At EXOPF1+1 in VMSER, a DPB instruction is used to update the
contents of a map slot. The routine GTPME no longer returns byte
pointers. It returns a full word address.
[Cure]
Change a DPB instruction to a MOVEM.
[Comments]
[Keywords]
PAGE MAPS
[Related MCOs]
None
[Related SPRs]
36000
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 373 VMSER EXOPF1
703A
[End of MCO 13528]
MCO: 13530 Name: DPM Date: 4-Aug-87:05:28:34
[Symptom]
Stopcode IME processing characters in SCNSER.
[Diagnosis]
PCO 10-703-056 added a SE1INT at TICAVL. The published answer
included this change, but the Autopatched version lost the SCNOFF.
Also during control-R processing, SCNSER interrupts are enabled
without first disabling the interrupts.
[Cure]
Add a SCNOFF at TICAVL and at the start of control-R processing.
[Comments]
[Keywords]
SCNOFF
[Related MCOs]
None
[Related SPRs]
35758
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 373 SCNSER METRT1
703A SCNSER METRT1,TICAVL
[End of MCO 13530]
MCO: 13535 Name: JAD Date: 6-Aug-87:09:12:47
[Symptom]
UNJ stopcodes and other cruft associated with command processing.
[Diagnosis]
Some rash developer moved FLMCOM from bit 18 (which translates
into the sign bit when in the left half of M) to bit 34. Turns out
a LOT of code in the monitor expects COMCON to set the sign bit of
M to indicate "command level" so ECOD doesn't attempt to store an
error code at the location addressed by M (which just happens to
be the address of the command processing routine).
[Cure]
Move FLMCOM back to bit 18 (actually, (1B0) to make it obvious
someone expects it to be a sign bit). Fix up lots of references
to "400000" to be "FLMCOM". COMMENT THE HECK OUT OF THE DEFINITION
OF FLMCOM SO NO ONE SCREWS THIS UP AGAIN!
[Comments]
What happens when you let the air out of Nick's tires?
Dago Wop Wop Wop
[Keywords]
COMMAND LEVEL
UNJ
FLMCOM
[Related MCOs]
None
[Related SPRs]
36030
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 374 CLOCK1
703A COMCON
COMMON
IPCSER
NETDEV
QUESER
TAPUUO
UUOCON
VMSER
[End of MCO 13535]
MCO: 13536 Name: JAD Date: 6-Aug-87:09:24:44
[Symptom]
Looping in ESTOP if LOGOUT (LOGIN) dies with a fatal job error, such
as illegal memory reference, I/O to unassigned channel, etc.
[Diagnosis]
We get to ESTOP at UUO level, which branches off to JOBKL, which
eventually winds back up at ESTOP, which . . .
[Cure]
Test .CPISF before branching off to JOBKL, and if set, go to
STOP1C instead.
[Comments]
I believe this old ADP patch fixes SNETCO's NCM stopcode bug,
but since I'm not 100% sure I'm not going to answer their 7.02 SPR
this way. They're upgrading to 7.03 and if they SPR the bug there
I'll consider propagating this fix to 7.03.
[Keywords]
JOBKL LOOP
[Related MCOs]
None
[Related SPRs]
35430, 35740
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 374 CLOCK1 ESTOP3
703A
[End of MCO 13536]
MCO: 13542 Name: KBY Date: 9-Aug-87:11:37:29
[Symptom]
Stopcode PPQ most common; other variants possible.
All will be related to and follow some page-in operation.
[Diagnosis]
For paging in, PLTSN will figure out how many of
the requested pages are on the various in-core queues and return
the count in .USTMP. Unfortunately, the pages may have moved around
from one of the IN queues to the IP queue (PPQ stopcode) or from the OUT
queue to some other job by the time PAGIMT actually gets around
to pulling the pages off the queues. This is particularly true since
MCO 13137 which liberally sprinkles SCDCHK calls around the code to
prevent KAFs; calling SCDCHK requires giving up the MM. I'm not sure
window really existed before then in 7.03 (although the one related
incident of this did at one time exist) except in the case of
mixed page-in/page-out lists.
[Cure]
Resurrect what should have been done instead of MCO 11833. PLTSN
will now pull the pages off the in-core queues as it finds them and
place them on a "private" queue whose header is .USTMP. The left
half of .USTMP contains minus the number of pages on the queue.
The success exit from the PAGE. UUO will check to be sure
this value is non-negative (must be zero from PAGEB; could
be positive for some other users which utilize .USTMP for
other things) and will issue a PMW stopcode if .USTMP is
negative. The error exit from the PAGE. UUO will place the pages
back on the appropriate queues from whence they came and will
also issue a PMW stopcode if something doesn't match up correctly
(disk address in map doesn't match MEMTAB). PAGIMT will pull pages
off the private queue now (eliminates the PPQ stopcode).
Note that when the pages are placed on the private queue, P2.TRN (transient
page) is lit in the PT2TAB entry for the page (turned on in PLTSN; off in PAGIMT
or on error return from the PAGE. UUO when it returns pages). This is for
the benefit of CPIASN who can then figure out who owns the page by checking
MT.JOB and returning that job number.
This has the side benefit of not having to restart PLTSN if we have to
wait for a page currently on the IP queue discovered there (supercedes
MCO 11833).
[Comments]
I know no one likes Edgecomb's SPRs but his diagnosis did
save a lot of time.
[Keywords]
PPQ
[Related MCOs]
11833
[Related SPRs]
36028, 36036, 36038
[MCO status]
Restricted distribution
[MCO attributes]
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 374 VMSER PAGIMT,PLTSN,UPAGE4
703A S
ERRCON CPIASN
[End of MCO 13542]
MCO: 13544 Name: KBY Date: 9-Aug-87:19:23:03
[Symptom]
Inelegant grammar in CORE command output.
[Diagnosis]
Lazy pluralizing.
[Cure]
Check numbers to decide how to output "s"
[Comments]
[Keywords]
CORE
[Related MCOs]
None
[Related SPRs]
36013
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 374 COMCON COR5
703A
[End of MCO 13544]
MCO: 13547 Name: DPM Date: 11-Aug-87:06:02:43
[Symptom]
If the first record of a magtape has a parity error, and the user
has disabled DX10 error retry, the monitor will loop endlessly trying
to do error recovery on the tape.
[Diagnosis]
TX1KON increments TUBREC when the read done interrupt happens and
again when unit exception (tape mark seen) is checked. The path
through the code assumes that the unit exception check can only be
made if a short record length error occurs.
[Cure]
[Comments]
One half of the oldest SPR in LCG. Not bad for 11 hours of debugging.
[Keywords]
DX10
RETRY
[Related MCOs]
None
[Related SPRs]
30357
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 374 TX1KON ERRRD2
703A
[End of MCO 13547]
MCO: 13553 Name: KBY Date: 16-Aug-87:08:56:11
[Symptom]
MCO 13542 not quite complete.
[Diagnosis]
Usage of .USTMP is nice and fits in with what used to work, but
it is possible for a job to get swapped fragmented in the middle of
certain page UUOs which will lead to .USTMP getting smashed at interrupt
level.
[Cure]
New (2) temporaries .USTMU (UUO level temp) which may be safely
used by UUO level code only. Change all references in the PAGE. UUO
to use these locations (any other UUO may also do this). Actually
PAGE. UUO can run at pseudo-clock level for the CORE UUO, but this is
still essentially UUO level code.
[Comments]
Not the KAFs.
Note that in order to implement this for 7.03A, the UUO PDL will have
to be shortened (by 8. words). Since the PDL is extendable, this ought
to be OK as a short term solution.
[Keywords]
[Related MCOs]
13542
[Related SPRs]
None
[MCO status]
Restricted distribution
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 375 VMSER LOTS
703A S
[End of MCO 13553]
MCO: 13555 Name: DPM Date: 17-Aug-87:06:29:55
[Symptom]
Stopcode PDLOVF using contexts.
[Diagnosis]
If a user specifies a TMPCOR argument but no data buffer length
and address in the CTX. UUO argument block, a PDLOVF stopcode will
result. This happens because the monitor does not adequately keep
track of intermediate TMPCOR file storage when switching contexts.
TMPCOR files are appended to the end of the user's data buffer in
funny space. If no data buffer exists, arithmetic to compute the
start of the TMPCOR file falls short of our expectations, usually
yieling an address of zero.
[Cure]
Correct the fauly bookkeeping with the addition of a new word in
the context block which contains the exec address of the TMPCOR file,
thus eliminating bizzar computations in determining the file's
address.
[Comments]
[Keywords]
CONTEXTS
[Related MCOs]
11102
[Related SPRs]
36004
[MCO status]
Restricted distribution
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 375 CTXSER LOTS
703A
[End of MCO 13555]
MCO: 13556 Name: JAD Date: 17-Aug-87:08:26:15
[Symptom]
RTTRP lives up to its name (n of 2**18); incorrect arguments to
the UUO can lead to a KAF stopcode.
[Diagnosis]
Specifying a PI channel and device of zero with a non-zero CPU
argument causes the "remove device" routine to loop. The code
in REMDEV checks the device code against a real-time block, and
if a match is found, obtains the SYSPIF interlock, then re-checks
the device code and CPU number (assuming someone snuck in). If
the device code OR CPU number don't match, the code starts over
at the beginning of the loop, assuming some other job snuck in
and allocated a real-time block.
Since the CPU number in an un-initialized real-time block is 0,
the code will loop until a KAF stopcode occurs.
[Cure]
The code should be checking device code AND CPU number under
the SYSPIF interlock. Change it to do so, as well as fixing a
number of other cases where something is tested, an interlock
is obtained, then the same thing is tested.
[Comments]
Rat's nest is the only appropriate comment.
[Keywords]
RTTRP
KAF
[Related MCOs]
None
[Related SPRs]
35372
[MCO status]
None
[MCO attributes]
New development MCO
KL10 only
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 375 RTTRP REMDEV
703A
[End of MCO 13556]
MCO: 13565 Name: KBY Date: 22-Aug-87:12:14:11
[Symptom]
703A doesn't
[Diagnosis]
Yes
[Cure]
Yes
[Comments]
[Keywords]
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Restricted distribution
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
703A VMSER PAGAC1
[End of MCO 13565]
MCO: 13570 Name: JAD Date: 24-Aug-87:14:04:22
[Symptom]
Files can be marked non-existant after a RENAME error even though
the files still exist. This allows ENTERing another file in the
directory, leading to more than one entry in the directory for a
specific file name. (Gee, who snuck generations in on us?)
[Diagnosis]
RENAME error 20 clears the YES bit for the file being RENAMEd,
but leaves the KNO bit on. This will allow another ENTER with
the same file name to succeed and create a second entry in the
directory using the same name. Also, use counts are wrong after
the error.
[Cure]
Don't branch to ENERR1 in this case, add a new routine which
clears rename in progress, decrements use counts, and branches
to a more appropriate clean-up routine.
[Comments]
Only ADP . . .
Gee, I thought Steve fixed all the use count bugs.
[Keywords]
USE COUNTS
RENAME ERROR
[Related MCOs]
None
[Related SPRs]
35709
[MCO status]
None
[MCO attributes]
New development MCO
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 376 FILUUO RENA32
703A
[End of MCO 13570]
MCO: 13573 Name: DPM Date: 26-Aug-87:07:01:50
[Symptom]
Stopcode TMDELI or TMDELE in SCNSER during character processing.
[Diagnosis]
Since the begining of 7.03 development, we have experienced these
stopcodes about once every two or three months. They indicate the
monitor has skewed its count of characters in the chunks, and that
skew is detected while performing some backspacing operation. When
this occurs, SCNSER stopcodes, corrects the character count, and the
user job continues never having known the problem existed and not
being affected in any way. The crashes never contain any useful data.
[Cure]
Change the TMDELI and TMDELE stopcodes from type DEBUG to INFO.
This will eliminate useless crashes. If a case arises where one of
these conditions can be reproduced at will, the stopcodes may be
patched to a DEBUG so potentially useful information may be captured
in a dump.
[Comments]
[Keywords]
INPUT
ECHO
[Related MCOs]
None
[Related SPRs]
35724
[MCO status]
None
[MCO attributes]
Documentation change
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 360 SCNSER TICAVL,ECCAVL
703A
[End of MCO 13573]
MCO: 13575 Name: DPM Date: 31-Aug-87:12:55:44
[Symptom]
Stopcode FOP restoring a context which had sometime previously
spawned another context using the TMPCOR feature of the CTX. UUO.
[Diagnosis]
PCO 10-703-109 separated out TMPCOR handling from the data buffer
processing. The routine ZERDAT is supposed to zero out the user and
exec addresses for both the data buffer and TMPCOR when a context is
restored. Instead the user address is zeroed twice and the exec
address is ignored. The next time a new context is created and then
restored, CTXSER will try to return the old exec address again causing
a FOP.
[Cure]
Change a SETZM .CTTCA(P1) to a SETZM .CTTCE(P1).
[Comments]
[Keywords]
CONTEXTS
[Related MCOs]
11102
[Related SPRs]
None
[MCO status]
None
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 377 CTXSER ZERDAT
703A
[End of MCO 13575]
MCO: 13579 Name: KBY Date: 2-Sep-87:20:03:53
[Symptom]
703A doesn't (again)\
[Diagnosis]
I sure hope so.
[Cure]
Delete an instruction turning a NOP into something useful.
[Comments]
It really isn't the same thing as it was last week.
[Keywords]
[Related MCOs]
None
[Related SPRs]
None
[MCO status]
Restricted distribution
[MCO attributes]
None
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
703A VMSER PAGAC1
[End of MCO 13579]
MCO: 13584 Name: RCB Date: 3-Sep-87:16:16:10
[Symptom]
STOPCD ANFSBA (secondary buffer allocated) at unpredictable times.
[Diagnosis]
When calling GETWDS to allocate a PCB, we call CLNPCB, which doesn't clear
enough locations in the PCB. Since GETWDS doesn't zero core, this has been a
time-bomb waiting to explode for quite some time.
[Cure]
Clear yet more locations in CLNPCB.
[Comments]
Sigh.
[Keywords]
ANFSBA
Dirty core
[Related MCOs]
None
[Related SPRs]
36045
[MCO status]
None
[MCO attributes]
PCO required
[Validity]
Monitor Load Module Tags
------- ------ ------ ------
704 310 NETSER CLNPCB
703A
[End of MCO 13584]