Trailing-Edge
-
PDP-10 Archives
-
QT020_T20_4.1_6.1_SWSKIT_851021
-
swskit-documentation/handbook.mem
There are 5 other files named handbook.mem in the archive. Click here to see a list.
TOPS-20 TROUBLE-SHOOTING HANDBOOK
=================================
Release 4.1 and 6.1 Edition
October 1985
TOPS-20 Monitor Support Group
Marlboro Support Group
Software Services
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 2
Introduction
INTRODUCTION
------------
This document is the TOPS-20 Trouble-Shooting Handbook. It is a
collection of materials designed to increase the effectiveness of the
Software Specialist in the field in coping with TOPS-20 problems. Some
of the common "disasters" to befall TOPS-20 sites are discussed, along
with debugging methods in general. Though the information contained
herein is probably not sufficient to make a Specialist into a TOPS-20
"wizard", it should help ease the communication burden between the
Specialist in the field and his counterpart in Marlboro and lead to
quicker resolution of problems.
This document contains materials from many sources, and presents
some information not available anywhere else. Certain sections may be a
bit dated, but an effort has been made to remove at least some of the
old/wrong stuff along with including new articles.
There is a continuing need to update this document as part of the
SWSKIT materials, and Specialists are encouraged to give the Marlboro
Support Group feedback on these materials. This communication can be
via the Hotline, or by writing to the following address:
TOPS-20 Monitor Support Group
Digital Equipment Corporation
200 Forest Street, MRO1-2/H22
Marlboro, Massachusetts 01752
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 3
Table of Contents
Table of Contents
-----------------
1. Introduction . . . . . . . . . . . . . . . . . . . . . 2
2. Table of Contents . . . . . . . . . . . . . . . . . . 3
3. Policy Statement . . . . . . . . . . . . . . . . . . . 5
4. Producing a Good SPR . . . . . . . . . . . . . . . . . 6
5. Using SIRUS . . . . . . . . . . . . . . . . . . . . . 9
6. DDT Patching the TOPS-20 Monitor . . . . . . . . . . . 16
7. Mapping Directories in MDDT . . . . . . . . . . . . . 20
8. Recovering from Directory Errors . . . . . . . . . . . 22
9. More About Directory Problems . . . . . . . . . . . . 25
10. JSB and PSB Mapping . . . . . . . . . . . . . . . . . 27
11. Breakpointing Multi-User Code . . . . . . . . . . . . 30
12. Using Address Break to Debug the Monitor . . . . . . . 32
13. Recovering from System Disasters . . . . . . . . . . . 35
14. Looking at Hung Tapes . . . . . . . . . . . . . . . . 41
15. A Look at Some of the Disk Stuff . . . . . . . . . . . 45
16. Disk Features of FILDDT . . . . . . . . . . . . . . . 49
17. Supported Disk Drive Parameters . . . . . . . . . . . 51
18. Supported Tape Drive Parameters . . . . . . . . . . . 52
19. TOPS-20 Scheduler Test Routines . . . . . . . . . . . 53
20. TOPS-20 Page Zero Locations . . . . . . . . . . . . . 62
21. TOPS-20 Monitor Sections . . . . . . . . . . . . . . . 67
22. TOPS-20 Monitor PSECTs . . . . . . . . . . . . . . . 68
23. TOPS-20 Monitor Universals . . . . . . . . . . . . . . 69
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 4
Table of Contents
24. TOPS-20 Job Zero Forks . . . . . . . . . . . . . . . 70
25. Known Hardware Deficiencies List . . . . . . . . . . . 71
26. KS10 Console Information . . . . . . . . . . . . . . . 74
27. BOOT Command String Functionality . . . . . . . . . . 82
28. Crash Analysis Fundamentals . . . . . . . . . . . . . 84
29. More Crash Analysis . . . . . . . . . . . . . . . . . 103
30. Referencing the CST Entries under Release 6 . . . . . 122
31. The BUG Macro . . . . . . . . . . . . . . . . . . . . 123
32. Monitor Building Hints . . . . . . . . . . . . . . . . 125
33. EXEC Debugging . . . . . . . . . . . . . . . . . . . . 130
34. Recovering from a Bad EXEC . . . . . . . . . . . . . . 136
35. Debugging the GALAXY System . . . . . . . . . . . . . 137
36. Debugging MOUNTR . . . . . . . . . . . . . . . . . . . 151
37. Debugging PA1050 . . . . . . . . . . . . . . . . . . . 154
38. Copying Floppy Disks . . . . . . . . . . . . . . . . . 155
39. The SWSKIT Documentation Files . . . . . . . . . . . . 157
40. The SWSKIT Tools Programs . . . . . . . . . . . . . . 160
41. Index . . . . . . . . . . . . . . . . . . . . . . . . 162
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 5
Policy Statement
LEGAL POLICY CONCERNING THE TOPS-20 SWSKIT
------------------------------------------
There is a great confusion concerning the materials that make up
the SWSKIT tape, and their legal standing. This memo is an attempt to
clear up some of those problems.
The SWSKITs are made up of an assortment of materials intended to
increase the effectiveness of the software specialist. These materials
include program sources not normally distributed or sold for a premium;
internal and company confidential documentation, which may be in part
incomplete or actually incorrect, but supplied for the information value
on subsystems which may be insufficiently documented through the usual
channels; documentation for specialists specially produced by the
corporate support people; and utility programs produced and maintained
to some extent by corporate support. In addition, the SWSKIT may
contain special or pre-release versions of supported software provided
for the incremental value a specialist may obtain from the software
under controlled circumstances. In time, utilities from the SWSKIT may
evolve into supported or generally distributed products (for example
FILDDT, SYSDPY, REV, CHANS, MONRD, UNITS,etc.).
All of the SWSKIT materials are proprietary to DIGITAL, and were
never intended to be just given to the customer. Obviously, the
materials which are otherwise sold cannot be given away; and the
company confidential materials should not be. While it is expected that
the tools programs may wind up being used at customer sites, neither are
they gifts to the customer. An effort must be made to protect DIGITAL's
rights to these proprietary materials. For instance, a PL90 contract
retains rights to all materials provided to the customer. Deleting a
tool program after use at a customer site indicates intent. There
should be an awareness that if a customer incurs damages due to use of
some program given to him by the specialist, even though improperly
used, then DIGITAL may be seen to be at least in part responsible. This
should be avoided.
In summary, the SWSKIT is a tool provided to increase the
effectiveness of the specialist, especially with regard to PL90 and
debugging activity, but the rights to all materials remain with DIGITAL
and the specialist should act accordingly.
THIS IS NOT A LEGAL DEPARTMENT DOCUMENT. CONSULT LEGAL IF YOU HAVE
ANY DEFINITE PROBLEMS REQUIRING RESOLUTION.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 6
Producing a Good SPR
PRODUCING A GOOD SPR
--------------------
A software specialist is often asked to assist with the
submission of SPRs for a customer. It is always discouraging to
have problems getting an answer to an SPR for entirely
non-technical reasons. For that reason, below are some hints for
producing a "good" SPR which will help in getting the problem
solved more quickly.
1.0 THE SPR FORM
Much of the data on the SPR form is unimportant, until it is
omitted. The line of product data is one. Try to isolate the
problem to the correct component, since that will determine who
first receives the SPR. This will remove the time it takes for,
say the COBOL maintainer, to determine that the problem is not
really in COBOL, but in PA1050 or the monitor, and the time it
takes for the next maintainer to become familiar with the problem.
Something which crashes the system is ALWAYS a monitor problem,
even if it is an EXEC command which causes the problem, or a short
BASIC program.
If you really have a problem, be sure to mark the "problem"
box, and don't use words like "we suggest you correct the
following situation...". If the people who handle the incoming
paperwork think they have a suggestion, it may get routed
elsewhere, and never seen by the appropriate maintainers. A few
problems have been greatly delayed this way.
The priority boxes are not super-critical, but if you have a
problem which is holding up production, or crashing the system
several times a day, try to make a note of that somewhere in the
description of the problem and mark the high-priority box. That
should let the maintainer know that a work-around may also be
appropriate in the short term. Customer-marked high priority SPRs
are generally the first priority for answering.
The phone number of the submitter could be important if the
problem is of such a nature that it proves not-reproducible, or
the complexity is such that futher clarification just to
understand the problem might be needed. Your number here as a
software specialist provides a more informal contact than direct
maintainer-to-customer confrontation, although the customer will
be contacted directly if that is most expedient.
The attachments--be sure to mark some of these boxes if you
send along supporting materials. Since these can get separated
from the form, this will help keep them from getting permanently
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 7
Producing a Good SPR
lost.
The "DO NOT PUBLISH" box is for security problems and ways to
crash the system. We double-check this during incoming
processing, but if the box is checked you can be sure that the SPR
will not be published unanswered.
Describe the problem as clearly as possible in the space
provided. Try to provide enough detail to easily reproduce the
problem. Concentrate on the description of the problem, and any
diagnosis you may have made. Attempting to declare a "cure" is
not always good idea because the actual correction may be of an
entirely different nature for a number of reasons. However, if
you have something that works, the information could be of use.
Just don't count on that exact change being the actual fix. If
the problem is not reproducible from the description given,
chances are that something you left out is relevant to the
problem. Unless the problem directly concerns them, things like
logical names, mounted structures, and other features often
obscure the problem. For the purpose of the problem description,
a terminal listing of an occurrance is often highly desirable, and
it is sometimes a good idea to create a brand-new directory
without any fancy LOGIN.CMD setups or user groups and so on to
demonstrate the problem.
2.0 THE SUPPORTING MATERIALS
As above, the listing from a terminal session is often a very
good attachment. Try to include all the relevant information.
Again, sometimes things like logical names, file and directory
protections, user groups, and other job-state variables are
important and should be included. Inclusion of data such as
program version numbers and edit levels can be essential for
products with large numbers of edits. If you are complaining of
monitor problems, which patches you have installed could be useful
information. Terminal sessions should be as clear as possible.
It should be made obvious just what is going on or the maintainer
may just see a series of commands and think "So?". Concurrent or
after the fact commenting is one way to accomplish this.
Many times there is a program which exercises the bug.
Sometimes these programs are alright as they are, but often they
are giant COBOL monsters working on a multi-RP06 data base, and
very unwieldy for a maintainer to try to work with. If the
program can be reduced to a small subset, do so. Many monitor
problems often turn out to be reproducible from a set of arguments
to a single JSYS. If it is a question of incorrect output from
some program, it is helpful to send along all the files needed to
reproduce the problem, and the files of incorrect output. In the
case of programs with multiple edits to field-image, this speeds
up the maintainer, since he does not have to manually apply those
edits to attempt to recreate your versions, and he can also check
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 8
Producing a Good SPR
the installation of the edits, if that is appropriate. And in
case the problem proves to be not easily reproducible the bad
output can at least be examined for clues.
In the case of a monitor crash, the problem may have been
reduced to a program of less than one page. It is tempting to
type this on the front of the SPR and send it in that way. While
the maintainer can type in the program easily enough (if the copy
is both legible and correct), the submitter has been lax.
Sometimes, that short program will not cause a crash, even though
run thousands of times under varying conditions by the maintainer.
And even when it does cause the crash the first time, the
submitter has lengthened the turn-around by not sending the dump
from the crash along with the SPR. Sending the dump solves both
problems. If the problem is not reproducible with ease, the dump
is VITAL to further understanding. And having the dump to start
with speeds up the work of the maintainer who now does not need to
schedule stand alone to try to exercise the bug and cause a crash
so he has a dump to look at.
When sending a dump, always send the unrun monitor along with
it. If you don't, you are just causing a delay in handling the
problem while the maintainer tries it against the standard ones,
which involves finding tapes with the standard ones, and loading
them... If you are running an unpatched standard monitor, and you
refuse to send it, at least tell which one it is somewhere on the
form. The unrun monitor is also useful for checking the existence
and correct installation of patches when that becomes an issue.
The current preferred tape format is 9-track, 1600bpi, and in
standard DUMPER format, not in INTERCHANGE format, since file
information can be lost that way. Take the time to get a listing
of a directory of the tape and include it with the tape. It will
help to speed things up, as if it is obvious from the directory
that something is missing, faster feedback is generated. There is
also the indication that the tape will indeed be readable when
received, and will partly eliminate the usual first step of the
maintainer in getting a directory of the tape.
As a final word, remember that the SPR is now the ONLY
official mechanism to get software problems resolved in the
development code for Autopatch and future versions. NO other
method is guaranteed to work. So be sure an SPR is generated for
every problem, preferably by the customer; and be sure the SPR
does not make the problem harder to solve.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 9
Using SIRUS
USING SIRUS
-----------
Did you know that you can dial into a Marlboro development system
and type out almost any patch that the Marlboro Support Group has made
to -10 or -20 software in the last several years? The program which
does this is called SIRUS, and with it you can:
1. Search through all the patches to a particular product, if you
know a problem exists but don't know what the patch is or don't
know if we've heard of the problem. If you find the patch you
want, you can then type it out.
2. Type out a particular patch to a particular product, if you
know what the edit number is.
3. Obtain the status of any SPR, including the entire answer if it
has been answered.
By using SIRUS, you can get patches whenever the system is up, even
if it's two A. M. and the Hotline is closed. You can print patches in
your local office without having to wait for a specialist in Marlboro to
mail you a copy. You can be sure that the patch you have is correct.
(Dictating patches over the Hotline is very prone to errors.) Even if
the problem you are experiencing cannot be found in SIRUS, you can help
us when you call by so stating. We immediately know that the problem
you are having is a new one.
There have been several articles about SIRUS in previous Large
Buffers, but none have been oriented towards specialists in the field.
This one is!
To use SIRUS, dial into system CHERRY in Marlboro, log in, and then
run it. In more detail:
1. Dial into system CHERRY. The following number will connect you
to the machines in Marlboro at 300 or 1200 baud.
297-1550 (DTN)
(617)467-1550
You will now be talking to a MICOM data switch which will
autobaud your input if your type carriage returns. It will
then prompt for a system to connect to. You should type
"CHERRY" followed by a return. Once the machine notices you,
type "SET HOST CHERRY" to insure that you are connected to
system CHERRY. If you get the message "?Undefined Network
Node", the machine is down (try again later).
2. To login, type "LOGIN 37,#". When the machine requests a name,
type one in. You WILL need a password, which you can obtain by
calling the Hotline operator.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 10
Using SIRUS
3. To run SIRUS, just type "R SIRUS". SIRUS takes several seconds
to initialize itself and then prompts you with "PRODUCT [H]*".
At this point, type either "10<CRLF>" or "20<CRLF>" depending
on whether the customer of concern is running TOPS10 or TOPS20.
SIRUS then prompts you with "[H] *". You are now at SIRUS
command level.
SIRUS has many commands, but only a few are of interest to the
field specialist. They are:
1. H -- for Help. This may be typed anytime SIRUS precedes its
prompt with "[H]".
2. EX -- for Exit. Use this to exit SIRUS. Then type K/N to
logout, and hang up.
3. PP -- for Peruse PCOs. PCO stands for Product Change Order and
essentially means a patch. This command is used to look
through patches for a particular product if you aren't sure
which patch you want.
4. GP -- for Get PCO. This is used to type out a particular patch
once you know which one you want.
5. GS -- for Get SPR. Use this to retrieve information on a
particular SPR.
6. NP -- for New Product. Use this command if you type the wrong
answer to "PRODUCT [H]*" as mentioned above, or use it in
association with the PP command as described below. SIRUS will
prompt you for a product again.
The three most useful of these commands are PP, GP, and GS.
3.0 PP Command
Use this command to peruse the patches for a particular product --
e.g. LINK or 603 (monitor) or BATCON -- if you want to find a
particular patch you know exists, or if you want to know if the support
group has heard of and fixed some problem you are experiencing with a
product. After you type "PP<CRLF>" SIRUS will prompt for a component.
Here type the program you're interested in -- LINK, BATCON or whatever.
A response of LIST will type the programs SIRUS knows about and then
prompt you for a component again.
Once you type in the component, SIRUS prompts with "[H] PCO #:".
There are two reasonable responses to this. The first is ALL. (Type NO
to the subsequent question about a file.) This will give you a short
summary of all the patches available for this product, one line per
patch. This includes a PCO number, the SPR for which this patch was
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 11
Using SIRUS
written, the edit number corresponding to the patch (for the TOPS10
monitor this is the MCO number), a keyword describing the bug, the
maintainer who wrote the patch, and the date it was made. The other
response you might type here is simply <CRLF>. In this case SIRUS will
type out the symptom of the newest PCO, and then prompt you with
"NEXT?". By continuing to type carriage returns, you can type all the
symptoms of all the patches for this product, from the newest to the
oldest. When you have found the patch you want (remember the PCO
number), type RETURN to get back to SIRUS command level.
If you did not find your symptom while perusing, and your product
exists on both TOPS10 and TOPS20, you should also search the PCOs for
the alternate operating system. To do this, type NP to SIRUS command
level, and then type in the other product number when SIRUS asks for it.
Then peruse PCOs for your product as you did before.
4.0 GP Command
This is used to print out a patch once you know the PCO number.
The PCO number is printed while you are perusing PCOs and is of the form
10-product-nnn or 20-product-nnn. After typing GP to SIRUS command
level, SIRUS prompts for a PCO number. The leading "10-" or "20-" is
supplied by SIRUS, so your response should be of the form "product-nnn".
In response, SIRUS types out information about the patch. The two
most useful data are labeled VLD and SAE. VLD stands for validity and
is the version of the software to which the patch applies. SAE is
Source After Edit and is the edit or MCO number of the patch. To get
the actual text of the patch, respond YES to SIRUS's question "Show
Write-up File?".
5.0 GS Command
This is used to get the status of an SPR. SIRUS will prompt for an
SPR number, and then will provide you with info about the SPR you
specified. This includes the site that submitted the SPR, the
specialist responsible for the SPR, and date received and the date
closed, if the SPR has been answered. If answered, it will also say
whether or not an auxiliary file was written for the SPR and what PCOs
(if any) were included. The aux file is an introductory paragraph which
is written for most SPR answers. For SPRs which do not require patches,
the aux file constitutes the entire answer. The aux file can be typed
by responding YES to "SHOW AUXILIARY FILE?". The PCOs can be typed out
with the GP command.
Finally, if SIRUS begins to give you error messages such as "File
not found", EX from SIRUS and mount a special disk pack with the monitor
command "MOUNT SIRS:". Then try again. This gives you access to more
PCOs and aux files than are normally available.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 12
Using SIRUS
For more information, see the example run of SIRUS below, in which
user input is shown underlined, or the article on SIRUS published in
volume 409 of the Large Buffer. Finally, SIRUS is for use by DIGITAL
personnel only. DO NOT give out instructions for its use or the system
CHERRY phone numbers to customers.
.R SIRUS
- -----
SIRUS...3(3)
[WHEN '[H]' APPEARS YOU MAY TYPE 'HELP' FOR ASSISTANCE]
PRODUCT [H]* 20
--
[H] *PP
--
[H] COMPONENT TO PERUSE: D60SPL
------
[PCO LIMIT FOR 'D60SPL' IS 15]
[H] PCO #:<CR>
----
[20-D60SPL-015]
DATE: 09-JUL-79 BY: BENCE
VLD:
[SYMPTOM]
Jobs sent to the LPT queue from D60SPL are given a random
file name and are billed to OPERATOR.
NEXT?<CR>
----
[20-D60SPL-014]
DATE: 09-JUL-79 BY: WEISBACH
VLD:
[SYMPTOM]
If the spooler is pausing, typing a GO can result in an
illegal instruction.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 13
Using SIRUS
NEXT? ALL
---
DO YOU WANT A FILE? NO
--
PCO 015 SPR 12355 (6,022) KEY= LNAME BENCE 09-JUL-79
PCO 014 SPR 12225 OUTOUT (6,020) KEY= PAUSE WEISBACH 09-JUL-79
PCO 013 SPR 11660 LODVFU 6013(6,014) KEY= VFU WEISBACH 09-JUL-79
PCO 012 SPR 13244 D60CRE 103 (6,032) KEY= CARD L.NEFF 06-JUL-79
PCO 011 SPR D60CR4 103 (6,015) KEY= CARDS L.NEFF 03-JUL-79
PCO 010 SPR REQUEU 103 (6,030) KEY= CTQMFQ L.NEFF 14-JUN-79
PCO 009 SPR 12588 INTCTC 1 (6,026) KEY= CONTROL C TEEGARDEN 17-MAY-79
PCO 008 SPR 12881 OUTE.6 103 (6,025) KEY= REQUEUE NEFF 17-APR-79
PCO 007 SPR 12139 103 (6,019) KEY= ILLEGAL WEISBACH 27-OCT-78
PCO 006 SPR 12005 (0) KEY= SIMULTANEO BENCE 22-SEP-78
PCO 005 SPR 11672 ENDJOB 103 (6,018) KEY= QUASAR BENCE 18-SEP-78
PCO 004 SPR 11841 D60STK 103 (6,016) KEY= BAD WEISBACH 23-AUG-78
PCO 003 SPR 11476 TTYOUT 103 (6,010) KEY= OVERWRITE WEISBACH 12-MAY-78
PCO 002 SPR 11431 OUTE.6 (6,007) KEY= INTERRUPTS WEISBACH 12-APR-78
PCO 001 SPR 11456 D60SPL (6,006) KEY= BLANK WEISBACH 03-APR-78
[H] PCO #: RETURN
------
[H] *GP
--
[H] PCO #: 20-D60SPL-8
[20-D60SPL-008 RETRIEVED]
PROG: NEFF
COMPONENT: D60SPL
SER/SPR:20-12881
KEYS: REQUEUE /
ROUTNS: OUTE.6 /
VLD: 103(2304)
SBE %103 (6,024)
SAE %103 (6,025)
CRIT: N
DOC: N
F/D: F
TEST FILE: : [ ]
P-IND: 10
SHOW WRITE-UP FILE? YES
---
[WRITE-UP FILE]
008 NEFF
[SYMPTOM]
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 14
Using SIRUS
If a job is requeued because of a communications failure, with
D60SPL reporting that the station has signed off, then, when the
station signs on again, the print file will be restarted from its
beginning, not from the last checkpoint.
[DIAGNOSIS]
When the error is detected, routine OUTE.6 calls IBACK to
backspace the file five pages. IBACK zeroes the page counter,
J$RNPP(J), and rewinds the file, in the belief that the forward
spacing code will update the page count as it skips to the correct
page. However, D60SPL discovers the error is not recoverable and it
requeues the job immediately. Since the page count is never updated,
DOREQ requeues the job to start at the beginning of the file.
[CURE]
Preserve the page at which to resume printing over the call to
IBACK. if the job is to be requeued immediately, restore J$RNPP(J) so
that the job will be requeued and checkpointed five pages back from
its current position.
[FILCOM]
File 1) DSK:D60SPL.MAC[4,1022] created: 1724 09-Apr-1979
File 2) DSK:D60SPL.MAC[4,417] created: 1625 10-Apr-1979
1)1 LPTEDT==6024 ;EDIT LEVEL
1) LPTWHO==1 ;WHO LAST PATCHED
****
2)1 LPTEDT==6025 ;EDIT LEVEL
2) LPTWHO==1 ;WHO LAST PATCHED
**************
1)4 ;*****End of Revision History*****
****
2)4 ;6025 If a job printing on a remote printer is interruped by
2) ; a communications failure, requeue to start five pages ba
ck
2) ; instead of at beginning of file. LLN, SPR # 20-12881,
2) ; 10-APR-79
2) ;*****End of Revision History*****
**************
1)179 PUSHJ P,IBACK ;BACKSPACE THE FILE
1) PUSHJ P,INTON ;[6007]TURN INTERRUPTS B
ACK ON
1) PUSHJ P,D60NRY ;PERFORM "NOT READY" DIA
LOG
1) JRST OUTE.7 ;ERROR IS UNRECOVERABLE
1) TELL OPR,[ASCIZ /![LPT... continueing!]
****
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 15
Using SIRUS
2)179 ;**;[6025] ADD SEVERAL LINES AT OUTE.6 + 13L. LLN, 10-APR-79
2) MOVE T1,J$RNPP(J) ;[6025] CALCULATE THE NE
W
2) SUB T1,N ;[6025] DESTINATION PAG
E
2) PUSH P,T1 ;[6025] AND SAVE IT
2) PUSHJ P,IBACK ;BACKSPACE THE FILE
2) PUSHJ P,INTON ;[6007]TURN INTERRUPTS B
ACK ON
2) PUSHJ P,D60NRY ;PERFORM "NOT READY" DIA
LOG
2) JRST [POP P,J$RNPP(J) ;[6025] RESTORE PAGE NO.
FOR REQUEUE
2) JRST OUTE.7] ;[6025] ERROR IS UNRECOV
ERABLE
2) POP P,(P) ;[6025] THROW AWAY DESTI
NATION
2) ;[6025] PAGE - FORWARD S
PACING
2) ;[6025] CODE WILL HANDLE
IT
2) TELL OPR,[ASCIZ /![LPT... continueing!]
**************
[END OF WRITE-UP FILE]
[H] *EX
--
EXIT
.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 16
DDT Patching the TOPS-20 Monitor
DDT PATCHING THE TOPS-20 MONITOR
--------------------------------
This article discusses how DDT patches are made to TOPS-20.
From time to time the Marlboro Support Group has to describe and
explain the DDT patching of TOPS-20 to Specialists from the field. The
following is an explanation, if not a justification, of the way some
things are done.
A DDT patch to TOPS-20 as published is, in essence, a terminal log
of a session applying the patch by hand. This differs from the sometime
practice of a control file containing only the typein to DDT. The raw
typein has a few disadvantages with respect to the log: It is hard to
display in a publication format like the Software Dispatch the bare
control characters like linefeeds and tabs that might be used, and even
harder to edit around them with the only currently supported editor,
EDIT. In addition, the full typescript allows some confidence building
(or cause for concern) if the DDT typeout from application of the patch
is (is not) the same as the typescript. The published patch IS an
actual typescript, and is "proof" that the patch CAN be correctly
installed.
In applying the patch, the basic methodology, lacking innate
knowledge, is to just start typing from the typescript whenever the
computer goes into input wait. Any "$" appearing in a DDT session which
is not the prompt from the enabled EXEC should be the result of typing
an ESCAPE. (ESCAPE is sometimes referred to as ALTMODE or ALT.) In
order to avoid confusion, we try never to use any dollar sign symbols,
and hopefully should make special note of any that might occur.
Starting at the top of a session, there are usually a few comments
about the patch. If we are currently patching multiple releases of
TOPS-20, the specific release for the patch should be noted here. Also
noted should be any hardware or monitor dependencies: KS- or KL-only,
or 2040, 2060, or ARPA only, etc.
The first monitor command is an ENABLE, followed by a GET of the
monitor file to be patched. Unless we are patching an existing patch,
our published patches always show us patching a "virgin" monitor file,
one without any previous patches installed. You should always be able
to duplicate the patch typescript yourself on an unpatched monitor.
At this point we do a START 140 command to get into DDT. There is
a fine distinction at this step between typing START 140 and typing DDT
to get into DDT. START 140 starts up EDDT (Exec-mode DDT) running in
user mode, which is the required action. Typing DDT to the EXEC would
merge SYS:UDDT.EXE with the monitor EXE file and start up UDDT
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 17
DDT Patching the TOPS-20 Monitor
(User-mode DDT), which is not what we want. In fact, with Release 4 of
TOPS-20 the EXEC is clever enough to start up EDDT for us on the DDT
command also, but even so, for the sake of consistency, and to avoid
confusion, published patches should still use START 140.
After entering DDT, it is common to select the local symbol table
for the module to be patched in case there might be local symbol
conflicts, etc. This is done using the MODULE-NAME$: (ESCAPE colon)
construct.
Next follows the body of the patch. We purposely avoid the fancier
DDT commands when applying patches in order to avoid confusion. We try
to limit ourselves to a few DDT commands:
ADDRESS/ (slash)to open the location at ADDRESS
ADDRESS[ (open-square-bracket)
similar to / but typeout numeric not symbolic
RETURN to close the current location, storing any new
value specified
LINE-FEED to close the current location, storing any new
value specified, and open the next location
TAB a convenience command used to close the current
location and open the location specified by the
last reference; commonly used to get to and
open location FFF immediately after inserting a
JRST FFF instruction in the code
SYMBOL: (colon) to define a symbol at the current location;
usually to redefine FFF: further down in the
patch space
FFF$< (ESCAPE open-angle-bracket) or
FFF$$< (ESCAPE ESCAPE open-angle-bracket)
to start a patch in the patch area named FFF
$> (ESCAPE close-angle-bracket)
to terminate a patch, which installs the jumps
back to the inline code, redefines the FFF
symbol value past the used patch space, and then
inserts the initial jump to the patch into the
inline code
Those who apply patches are of course free to use the more sophisticated
DDT commands to achieve the same effect.
A few TOPS-20 peculiarities should be explained here. TOPS-20
patches are applied using the FFF patch area. The default DDT patch
area symbol, PAT.., (used if no argument is given to an $< or $$<
command) should NEVER be used. You are apt to wind up with system
crashes since the PAT.. area may not be locked down. FFF is defined in
the module STG.MAC (which goes to the customers), and the area is 100
octal words long for version 4.1 and defined by the user-settable
parameter FFFSZE in version 6.0 (currently has value 400). FFF is part
of the resident monitor code PSECT RSCOD for v4.1 and the data PSECT
RSDAT for v6.0, and is always in memory. Special care must be taken
when installing patches not to overrun the patch area, which could also
result in system crashes. The first symbol past the FFF area is DTSCNW
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 18
DDT Patching the TOPS-20 Monitor
for v4.1, and SVDTRJ for v6.0. If that symbol shows up while attempting
to install a patch, you may be in trouble.
NOTE
For Release 5 and 5.1 of TOPS-20, the
patch area was moved and is no longer
found in STG, but in POSTLD, at the end of
the RSDAT PSECT, and so requires changes
to the LINK CCL file to expand the area.
Care should be taken so that the next
PSECT is not overlapped with patches. The
space reserved is now 400 (octal)
locations.
There is another patch space defined in TOPS-20, called SWPF, in
the swappable portion of the monitor. We always use FFF in preference
to SWPF since first, SWPF can only be used for patches to swappable
code, but FFF will work for either. Second, two patch areas in common
use might be confusing to the customers, specialists, and us. Third, if
we get a dump to examine from a customer, we can always check the FFF
area for possible (bad) patch installation. SWPF might be swapped out,
and not in the dump.
Unconventionally enough, the symbols FFF, FFF1, and FFF2 are all
defined together in STG.MAC with the same value. When DDT decides which
to type out when printing the symbolic form of an address, it finds FFF2
first, which accounts for the common appearance of FFF2 in patches. In
addition, just the symbol FFF is redefined on patch installation to
always point to the first free word of the remaining patch area. FFF1
and FFF2 are never redefined, and so should always point to the
beginning of the initial patch area built into the monitor. FFF2 should
never have been explicitly referenced as typeIN to DDT; any occurance
in a patch should be known to be from DDT typeOUT, probably from a DDT
LINE-FEED command. This is a common source of error in applying
patches; writing over earlier patch area by typing in the FFF2-based
symbols.
Normally, in a DDT patch, lines which follow one another
immediately in the published patch are the result of typing LINE-FEED at
the end of the line, and not RETURN and the next address symbol. When
the $< and $$< commands are used, all lines from that point to the
terminating $> command should have been ended with LINE-FEED, using
successive locations in the patch space. The patches should show breaks
in this form by inserting extra blank lines in the published patch to
indicate a new "sub-section" of the patch.
The patching session is ended by the ^Z (Control-Z) command to exit
DDT properly. The Control-Z command is the correct way to exit from DDT
when applying patches. It allows DDT to do any final cleanup it may
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 19
DDT Patching the TOPS-20 Monitor
need to do. Exiting via Control-C is NOT recommended when you are
installing patches, and is NOT guaranteed to work.
Finally, the patched monitor is saved away on a disk file. The
published typescript shows creating a new generation of the system
MONITR.EXE file, but a more conservative approach is to save the patched
monitor as some other name, and try running it experimentally during
system time before installing it as the default monitor.
And now for an annotated example:
@
@! PATCH TO RELEASE 3 AND 3A MONITORS TO CORRECT ENQ FROM
@! APPENDING A REQUEST TO THE WRONG LOCK BLOCK WHEN A STRING
@! AND USER CODE HAPPEN TO HASH TO THE SAME ADDRESS.
@! THE MAGIC NUMBER AT XXX: IS POINT 3,T2,2
@
@ENABLE (CAPABILITIES) !Appropriate releases noted above.
$GET SYSTEM:MONITR !Get the monitor
$START 140 !Enter user mode EDDT
DDT
ENQ$: !Open the symbol table for the module
FFF/ 0 XXX: 410300,,T2 !Store into the patch area and define
FFF2+1/ 0 FFF: ! label XXX: to point to it; redefine
! FFF to be the new first unused word
STRCMP+5/ MOVE T3,T2 FFF$< !Begin an $< patch at FFF
FFF/ 0 LDB T3,XXX !This line and the next are ended by
FFF+1/ 0 CAIN T3,5 ! LINE-FEEDs
FFF+2/ 0 RET$> !Terminate the patch
FFF+3/ MOVE T3,T2 !These 4 lines are typed out by DDT on
FFF+4/ JUMPA T1,STRCMP+6 ! terminating the patch
FFF+5/ JUMPA T2,STRCMP+7
STRCMP+5/ JUMPA FFF2+1 !And another blank line indicating end
! of this sub-patch region
^Z !Control-Z to exit DDT properly
$SAVE SYSTEM:MONITR !Save away the patched monitor
<SYSTEM>MONITR.EXE.2 Saved
$
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 20
Mapping Directories in MDDT
MAPPING DIRECTORIES IN MDDT
---------------------------
Release 3 and later of TOPS-20 can take advantage of the extended
addressing features of the model B processor. Some of the data has been
reorganized and moved into non-zero sections of the addressing space.
One of the things moved was directories. Directories are now mapped
into section 2, starting at the beginning of the section. Thus the old
procedure of reading a user's directory in MDDT is no longer valid.
This will describe how to map a directory correctly, for release 3 and
later.
You first have to find out the structure number and directory
number for the directory to be mapped. You can use the TRANSL command
to get the directory number, or use the ^EPRINT command to list the
directory information. As an example, suppose you want to find the
directory and structure information for the directory SNARK:<CURDS>.
You use TRANSL and obtain the results:
@TRANSLATE (DIRECTORY) SNARK:<CURDS>
SNARK:<CURDS> (IS) SNARK:[4,117]
The "programmer number" obtained is the directory number, in octal. In
this example, the directory number is 117. If the directory is in bad
shape, and you can't run TRANSL or use ^EPRINT, you will have to find
out the directory number by looking at the output from a DLUSER or ULIST
run, or from BUGCHK output.
To find the structure number, you have to work harder. If the
structure is mounted as PS:, its structure number is always 0. For
structures mounted other than PS:, you do the following. You get into
MDDT, and look at the table STRTAB. This table contains all of the
addresses of the structure data blocks in the system. The first word of
each structure data block is the structure name in SIXBIT. So you
search the tables looking for the desired structure. The offset into
the table STRTAB is then the structure number. For our example:
@ENABLE
$SDDT
DDT
JSYS 777$X
MDDT
$$6T
STRTAB/ ,8[ / PS
STRTAB+1/ M^I / REL3
STRTAB+2/ M_% / SNARK
In the example above, you see that PS: is the first structure, followed
by the structures REL3: and SNARK:. Since the offset into STRTAB was 2
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 21
Mapping Directories in MDDT
for SNARK:, the structure number you want is 2.
Knowing the structure number and the directory number, you can now
map the directory and look at it. When the directory is mapped,
location DIRORA will point to the area in the monitor you can find it
at. To map the directory, you call the routine MAPDIR which is in the
module DIRECT. It takes two arguments. The directory number goes in
AC1, and the structure number goes in AC2. For our example, the output
looks like:
DIRORA[ 740000
740000/ ?
1! 117
2! 2
CALL MAPDIR$X
<SKIP>
740000[ 400300,,100
The skip return from MAPDIR means you have successfully mapped the
directory. You can now look at the whole directory by examining the
proper locations. The number of pages that are mapped by MAPDIR is the
length of a directory, so the whole thing is available to look at. By
examining or changing location 740000+N in core, you are examining or
changing location N of the directory. When you are finished, you can
just leave MDDT by jumping to MRETN or by typing ^C.
In release 3 and after, however, when you examine location DIRORA
after calling MAPDIR, it doesn't have to contain a section zero address.
If it does, then your machine cannot support extended addressing and the
monitor is running the same as release 2 did. In this case you can
ignore the rest of this document. If your machine does have extended
addressing, when you examine location DIRORA you will see the number
2,,0. This address is now in section 2 of the monitor.
For Release 4 of TOPS-20, the various flavors of DDT have been
trained to understand extended addresses, so the mapping contortions
used for 3 and 3A are unnecessary. On extended machines one can
reference section two directly as below:
DIRORA[ 2,,0
2,,0[ 400300,,100
When done, you can still just ^C out or jump to MRETN.
NOTE: if you have the Release 5 version of MDDT/EDDT that has
sticky current address section (see DDTxx.MEM) then be careful about
doing an MRETN$G after examining section 2, as a crash will result from
transferring to MRETN in section 2.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 22
Recovering from Directory Errors
RECOVERING FROM DIRECTORY ERRORS
--------------------------------
Sometimes after a monitor crash due to disk problems, some of the
directories on the system will contain errors. These errors cause
BUGCHKs such as DIRFDB, NAMBAD, DIRPG0, and DIRPG1. It is sometimes
possible to find the error in the directory by getting into MDDT,
mapping the directory, finding what is wrong, and fixing it. This
procedure is described in the SWSKIT. However, this is not always easy,
and may take a lot of time. It is therefore better in many cases to
simply delete the bad directory and recreate it. This is easy to do for
most directories. But special procedures are necessary for the
directories <SYSTEM> and <SUBSYS>. The rest of this memo will describe
the methods of recovering from bad directories, handling in particular
the difficult case of the <SYSTEM> directory.
You can first try to give the EXPUNGE command with the REBUILD and
PURGE subcommands. If the problem with the directory is very simple, it
may fix your problem. As an example, suppose the directory
PS:<SICK-DIRECTORY> is incorrect. You would type:
$EXPUNGE (DIRECTORY) PS:<SICK-DIRECTORY>,
$$REBUILD (SYMBOL TABLE)
$$PURGE (NOT COMPLETELY CREATED FILES)
$$
PS:<SICK-DIRECTORY> [NO PAGES FREED]
$
If this does not help the problem, you will have to delete the
directory and then recreate it. Before proceeding, you should make sure
that any files you can reference are copied to another directory, or
else are saved on tape. Now first try to delete the directory normally,
as follows:
$BUILD (USER) PS:<SICK-DIRECTORY>
[OLD]
$$KILL
[CONFIRM]
$$
$
If this is successful, then simply recreate the directory again,
and restore the user's files. You should recreate the directory with
the same directory number as it had before, so that DLUSER's data will
still be correct.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 23
Recovering from Directory Errors
The procedure above will fail if either the directory is mapped by
another job, or if it is totally unusable. If it is mapped, and the
directory is a random user, you can wait until the directory is no
longer in use, or you can take the system stand-alone so that no user
can reference it.
If the directory is totally unusable, you will then have to try to
delete it the hard way. Before proceeding, you should try to delete and
expunge all files in the directory. This will minimize the amount of
lost pages that will result. Now there are two cases to consider. If
the directory is not a sub-directory, you type the following:
$DELETE (FILE) PS:<ROOT-DIRECTORY>SICK-DIRECTORY.DIRECTORY,
$$DIRECTORY (AND "FORGET" FILE SPACE)
$$
<ROOT-DIRECTORY>SICK-DIRECTORY.DIRECTORY.1 [OK]
$
If the directory is a subdirectory, you modify the above command by
replacing "ROOT-DIRECTORY" by the name of the next higher directory.
Thus if the directory was PS:<ANOTHER.BAD-ONE>, you type:
$DELETE (FILE) PS:<ANOTHER>BAD-ONE.DIRECTORY,
$$DIRECTORY (AND "FORGET" FILE SPACE)
$$
<ANOTHER>BAD-ONE.DIRECTORY.1 [OK]
$
The above procedure tells the monitor to treat the directory file
like a normal file, and to delete it as such. This means that any files
in the directory will become "lost". The disk pages can be recovered
later with CHECKD. If the above works, you simply can recreate the
directory and restore the files.
The only reason the above command should fail is if the directory
is still mapped. For PS:<SUBSYS>, you can bring up the system
stand-alone so that no programs are run from it, and then delete it.
For PS:<SYSTEM>, even taking the system stand-alone will not help, for
it is always mapped by job 0. But there are two procedures you can use
which do work.
The safest method can be used if the user's system has mountable
structures. If you have built another PS: structure, you can mount the
pack with the bad directory as an alias, and then the directory will not
be mapped and can be deleted. As an example:
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 24
Recovering from Directory Errors
$MOUNT STRUCTURE SICK:/STRUCTURE-ID:PS:
STRUCTURE SICK: MOUNTED
$
$DELETE (FILES) SICK:<ROOT-DIRECTORY>SYSTEM.DIRECTORY,
$$DIRECTORY (AND "FORGET" FILE SPACE)
$$
SICK:<ROOT-DIRECTORY>SYSTEM.DIRECTORY.1 [OK]
$
Then you can build the new directory, restore the files to it, and
then use it again for your normal PS: pack. Be sure to build the new
directory with the same number. This is especially important for the
special system directories.
If you do not have another disk drive or another PS: disk, or if
you don't want to bother MOUNTing the disk, you can fix the <SYSTEM>
area by using MDDT. The basic idea is to patch the monitor so that it
no longer thinks that the directory is in use. This is done as follows:
$^EQUIT
INTERRUPT AT 17117
MX>/MDDT
CHKOFN/ JSP CX,.SAVE JRST RSKP
MRETN$G
$
Then you should have no problems deleting the directory.
Immediately after doing the delete, you should reload the system. When
the system restarts, you can read the monitor and the EXEC either from
the distribution magtape or from another directory where you had kept
copies. Then recreate the <SYSTEM> area, making sure to give it the
same directory number as it had before. Then you can restore the files
and let the users back on. Finally, you should run CHECKD to recover
the lost pages.
NOTE: The special system directory numbers are:
1 - <ROOT-DIRECTORY>
2 - <SYSTEM>
3 - <SUBSYS>
4 - <ACCOUNTS>
5 - <OPERATOR>
6 - <SPOOL>
7 - <NEW-SYSTEM>
10 - <NEW-SUBSYS>
11 - <SYSTEM-ERROR>
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 25
More about Directory Problems
MORE ABOUT DIRECTORY PROBLEMS
-----------------------------
SOME HINTS FOR TRACING DIRECTORY PROBLEMS
NOTE -- Use the methods documented in the Operators Guide before
resorting to the methods below.
1. There is a file on the SWSKIT called DIRTST.EXE which will test
for inconsistencies in the directory pointers. This will tell
you just about everything.
2. Another program on SWSKIT is DIRPNT which prints out the
contents on the chained FDB's, entire directory, FDB, or symbol
table. This might not work completely if the headers are bad.
3. Still another useful SWSKIT program is DS for debugging
directory problems. It can access the disk exclusive of the
directory structure and find directory pages, search for bit
patterns, dump disk pages, extract pages to a separate file
based on address, index block or super index block, and a host
of other functions. See DS.MEM or DS.HLP for more information.
4. If you get a BUGCHK:
Go into the monitor with MDDT and set a breakpoint at the
BUGCHK address, say, FDBBAD. Do the functions that cause the
BUGCHK; DIR, say. Trace down the bug. The relevent listings
are PROLOG and DIRECT. These give the directory format and
useful symbols.
5. If the pointers are destroyed or confused you can map in the
directory as follows:
@ENA
$^EQUIT ; get into MINI-EXEC
MX>/ ; get into MDDT
; To map in the directory, put the directory number
; in AC1. You can obtain the number from DLUSER or
; TRANSL or BUILD. The structure number goes in AC2.
; To find the structure number look at the table
; STRTAB. STRTAB contains a list of pointers to the
; SDBs of structures that are mounted. The structure
; numbers are equal to the offset into the STRTAB. To
; find out which structure has structure number
; 3 look at STRTAB+3. Contents of that location points
; to the SIXBIT structure name.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 26
More about Directory Problems
STRTAB/ 54321 ; str number 0
STRTAB+1/ 56776 ; str no 1
STRTAB+2/ 12345 ; str no 2
12345$6T/ FOO ; str no 2 is FOO:
1/ DIRECTORY NUMBER
2/ STR NUMBER
CALL MAPDIR$X
; Now you can look at the header pointers etc., and
; fix things up if you're lucky.
; See the section on system disasters for a checklist
; of things that could be wrong with the directory.
; Go back to the MINI-EXEC.
^P
MX>START
$
6. If you can't (or don't want to) recover the existing files you
can delete the directory and restore the files using a DUMPER
tape. See the previous article for methods of deleting the bad
directory.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 27
JSB and PSB Mapping
An Easy Way to Examine the PSB and JSB of Another Job
-----------------------------------------------------
There is an occasional need to look at the state in detail of another
job on the system. A common reason for doing this is to find the cause
and cure of a "hung job" which cannot be logged out. To find out what
the job is doing you usually start by looking at the JSYS stack in the
PSB. But you cannot examine such data easily because the fork data in
the PSB and the job data in the JSB are not in the monitor's address
space until the fork is run. If you try to look at the PSB or JSB using
MDDT you will see the data for your own fork. The SWSKIT program MONRD
can provide just this sort of information, but has a few limitations,
and one occasionally needs "direct" access to the data for another fork.
To get it, you must do what the monitor does, and that is to map it.
The procedure to do so is this:
1. Do a "GET" of the file the monitor was loaded from, usually
SYSTEM:MONITR.EXE.
2. Enter user mode DDT in the file you got, and then do a JSYS 777
to get into MDDT.
3. Find out the SPT indexes as before, and call MSETMP to map the
PSB or JSB to the USER address space, in the correct place!!
4. Return from MDDT, and examine PSB and JSB locations directly,
and see the correct data in the right place.
5. When you are done, just ^C and do a RESET.
The rest of this document will document step by step how the procedure
above is done, by using an example. Assume that we wish to examine the
state of fork 105, which belongs to job 21. We then begin:
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 28
JSB and PSB Mapping
@ENABLE !Get a copy of the monitor
$GET PS:<SYSTEM>MONITR.EXE
$START 140 !Get into user DDT
DDT
JSYS 777$X !Enter MDDT
MDDT
!Following is an example of the procedure to map the JSB of a job:
FKJOB+105[ 25,,2035 !Get the SPT index of the JSB
!of fork 105
T1! 2035,,0 !Put SPT index in left half
T2! 540000,,JSBPGA !* Flags and where to map to
T3! JSLSTA'1000-JSBPGA'1000 !Number of pages to map
CALL MSETMP$X !Do the mapping
$
!Following is an example of the procedure to map the PSB of a fork:
FKPGS+105[ 2657,,2332 !Get the SPT index of the PSB
!of fork 105
T1! 2332,,PSBMAP-PSBPGA !Put SPT index in left half,
!and offset in right half
T2! 540000,,PSSPSA !* Flags and where to map to
T3! PSBMSZ !Number of pages to map
CALL MSETMP$X !Do the mapping
$
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 29
JSB and PSB Mapping
!Example of returning to user mode and looking at data from both
!the PSB and the JSB of the fork:
MRETN$G !Return to user mode
$
USRNAM[ 3 !Examine job's user name
USRNAM+1[ 422050,,546230 $T;DBELL
CTRLTT[ 777777,,777777 !Controlling terminal
FILBYT+MLJFN[ 4400,,334010 !Start of data block for JFN 1
PPC/ T1,,DISXE#+2 !Current PC of the fork
PAC+17/ -215,,UPDL+62 !Current stack pointer
UPDL/ CHKHO5# !First few stack locations
UPDL+1/ CAM CHKAE0#+12
UPDL+2/ CHKHO5#
UPDL+3/ CAM CHKAE0#+12
UPDL+4/ T1,,.COMND+1
!Example of terminating the mapping we have done:
^C
$RESET !To finish, just quit and reset
$
The procedure as given above maps the JSB and PSB write-enabled. So if
you find something you want to change, you can simply deposit the new
value into the location. If you want the data to be write-protected,
then change the 540000 to 500000 in the two steps marked with an
asterisk.
WARNING: The procedure of mapping things into your user address space
has its limitations. Mapping the JSB and PSB works because the user
core used for mapping was previously empty. In general, you can only
map things into your user core if your core pages are either nonexistant
or are private. If you call MSETMP or SETMPG and map something over a
shared page, the old file page is unmapped without the share counts
being updated, which prevents your job from logging out later. To get
around this problem you can BLT your core image to force all of the
pages to be private.
The SWSKIT tools program MONRD is able to examine the JSB and PSB of any
job/fork on the system, and is now the preferred method of obtaining
this sort of information, unless the ability to modify the data or use
advanced features of DDT is required.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 30
Breakpointing Multi-User Code
HOW TO USE BREAKPOINTS IN CODE THAT MANY USERS EXECUTE
------------------------------------------------------
When inserting a breakpoint into the running monitor, you have to be
careful that no other users will execute the code containing the
breakpoint. If some other user hits the breakpoint, they will blow up
with an illegal instruction since MDDT will not be there to handle the
breakpoint. This normally limits the places you can set breakpoints,
since most of the monitor can be gotten to by any user. Even if you run
the system stand-alone, it is possible that the routine you are
debugging will be called by job 0. However, it is still possible to do
such debugging, even on a system which is not stand-alone, and this
document will describe how this is done.
The essential element of this technique is to put in the patch in such a
way that only your own fork can ever reach the breakpoint. First you
write a simple routine which will skip if it is not being run by your
particular fork. This can be done easily if you remember that the
location FORKX contains the currently running fork number. An example
of such a routine is the following:
@ENABLE
$SDDT
DDT
JSYS 777$X
MDDT
FORKX[ 23 ; check our fork number
FFF/ 0 NOTME: PUSH P,T1 ; save an AC
NOTME+1/ 0 MOVE T1,FORKX ; get currently running fork number
NOTME+2/ 0 CAIE T1,23 ; is it us=23?
NOTME+3/ 0 AOS -1(P) ; no, setup skip return
NOTME+4/ 0 POP P,T1 ; restore the saved AC
NOTME+5/ 0 POPJ P, ; and return to caller
NOTME+6/ 0 FFF: ; reset the position of FFF
The routine above simply saves AC T1, gets the currently running fork
number, compares it with your own fork number which you obtained by
looking at location FORKX, and skips if they differ.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 31
Breakpointing Multi-User Code
Now assume that you want to set a breakpoint into the following code,
which is in the routine BLKSCN in the module DIRECT.
BLKSC2/ HLRZ C,BLKTAB(B)
BLKSC2+1/ CAME A,C
BLKSC2+2/ AOBJN B,BLKSC2
BLKSC2+3/ JUMPGE B,BLKSCE
BLKSC2+4/ HRRZ B,BLKTAB(B)
Assume you want the breakpoint at location BLKSC2+3. You do the
following:
BLKSC2+3/ JUMPGE B,BLKSCE FFF$< ; patch this location
FFF/ 0 PUSHJ P,NOTME ; call the NOTME routine
FFF+1/ 0 .$B JFCL$> ; me if it gets here,
FFF+2/ JUMPGE B,BLKSCE ; set breakpoint
FFF+3/ JUMPA A,BLKSC2+4
FFF+4/ JUMPA B,BLKSC2+5
BLKSC2+3/ JUMPA NOTME+6
Notice that the breakpoint has been set in the JFCL instruction
following the call to NOTME. Only your fork will execute it, so you can
now debug the section of code while other users are executing it at the
same time. Remember to remove the breakpoint when you are done.
To run a particular program while having breakpoints set, you must
remember that the breakpoint has to be set by the same process which you
expect to hit it. So for example, typing EQUIT, setting a breakpoint,
returning to the EXEC and running your program will not work. You must
enter MDDT and set the breakpoints from your program you want to debug.
As an example:
@ENABLE
$GET PROGRAM ; get the program to be used
$DDT ; enter DDT
DDT
JSYS 777$X ; and enter MDDT from there
MDDT
(PUT IN "NOTME" ROUTINE AND SET BREAKPOINTS HERE)
MRETN$G ; return to the context of the test program
$
$G ; start the test program
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 32
Using Address Break to Debug the Monitor
Using Address Break to Debug the Monitor
----------------------------------------
Sometimes when examining a set of dumps, you will notice the crashes
are caused by some location being destroyed. If you have no idea
where the destruction is done from, finding the problem could be very
difficult. One useful procedure in such cases is to use the address
break feature of the hardware to track down the problem (except for
2020's!). The only problem is that the use of address break is not
obvious. This is a manual describing how to use address break in the
TOPS-20 monitor for releases 4.1(model A)/5.1 and 6.0 (model B).
In order to use address break, four things must be done. First,
the current routines the monitor uses to set address breaks for users
must be disabled. Secondly, your own address break must be set from
MDDT or EDDT. Thirdly, instructions which you want to execute
properly have to be modified so that they will not cause an unwanted
address break. Finally, breakpoints must be placed in the monitor so
that the state of the monitor can be examined when the address break
occurs. The following is a step by step example of doing this.
1. Load the monitor for debugging, and enter EDDT. The procedure
starting from BOOT is the following:
BOOT>/L ;Load monitor but don't start it
BOOT>/G140 ;Start EDDT
EDDT
DBUGSW/ 0 2 ;Set debugging mode
EDDTF/ 0 1 ;Keep EDDT once system starts
GOTSWM$B ;Install useful breakpoint
SYSGO1$G ;Start the monitor
[PS MOUNTED]
$1B>>GOTSWM 0$1B ;Remove breakpoint now
2. Disable the monitor's normal changing of the address break.
For Release 4.1 this is currently done at two places:
KISSAV+4/ DATAO UNPFG1+26 JFCL ;Disable instruction
SETBRK+12/ DATAO A JFCL ;Here too
For Release 6 do not change these locations. Routine STEXDM
used in the next step will take care of this.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 33
Using Address Break to Debug the Monitor
3. Set your own address break at the desired location. Refer to
the Hardware Reference Manual for details. The instruction to
set an address break is:
DATAO APR,ADDR ;Note: APR = 0
where ADDR contains the following fields:
Bits Description
---- -----------
9 Break at given address on instruction fetches
10 Break at given address on reads
11 Break at given address on writes
12 0=exec address space, 1=user address space
13-35 Address to break on.
So now assume you want to catch a bug which is blasting
location CURDS. You want to break only for writes, and want
to use exec virtual space. Therefore you type the following:
For Release 4.1:
FFF/ 0 100000000+CURDS ;Put data in convenient place
DATAO APR,FFF$X ;Set the address break
For Release 6, STEXDM will set the break and notify the monitor:
T1/ 0 100000000+CURDS ;Put data in convenient place
CALL STEXDM$X ;Set the address break
4. Now you want to disable address break for all instructions
which you expect to change the given location. Assume in this
example that only location DIDDLE should change location
CURDS. Then you do the following for a model B CPU:
FFF! IT: ;Define location to get old flags
IT+1! ;Old PC
IT+2! ;New flags
IT+3! IT+4 ;New PC
IT+4! EXCH IT ;Save AC and get old flags
IT+5! TLO 1000 ;Set address break inhibit bit
IT+6! EXCH IT ;Restore flags and AC
IT+7! XJRSTF IT ;Return to caller
IT+10! FFF: ;Redefine FFF
DIDDLE/ MOVEM A,CURDS FFF$< ;Insert patch
FFF/ 0 XPCW IT$> ;Call above routine
FFF+1/ 0 MOVEM A,CURDS ;Typed by DDT when finishing patch
FFF+2/ 0 JUMPA A,DIDDLE+1
FFF+3/ 0 JUMPA B,DIDDLE+2
DIDDLE/ MOVEM A,CURDS JUMPA IT+10
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 34
Using Address Break to Debug the Monitor
The XPCW IT instruction is used to save the old PC at IT and
IT+1, and take a new PC from IT+2 and IT+3. There the old PC
is changed to include the address break inhibit bit. Then a
XJRSTF IT is done which returns to the caller. The next
instruction then executes without causing an address break.
You have to insert the XPCW IT instruction at every
instruction you want to succeed.
For model A CPUs the procedure is similar, but a little easier:
FFF! IT: ;Define location to hold PC
IT+1! EXCH IT ;Get old PC and save AC
IT+2! TLO 1000 ;Set address break inhibit flag
IT+3! EXCH IT ;Restore PC and AC
IT+4! JRSTF @IT ;Return to caller
IT+5! FFF: ;Redefine FFF
DIDDLE/ MOVEM A,CURDS FFF$< ;Insert patch
FFF/ 0 JSR IT$> ;Call above routine
FFF+1/ 0 MOVEM A,CURDS ;Typed by DDT when finishing patch
FFF+2/ 0 JUMPA A,DIDDLE+1
FFF+3/ 0 JUMPA B,DIDDLE+2
DIDDLE/ MOVEM A,CURDS JUMPA IT+5
5. Now put the breakpoints into the monitor so that when an
address break occurs, you will get into EDDT. There are two
locations to patch, one for PI level and one for non-PI level.
ADRCMP$B ;Set breakpoint at non-PI routine
PFCD23$B ;Set breakpoint at PI routine
$P ;Now let the monitor proceed
6. When either of the above breakpoints is hit, the flags and PC
of the instruction which caused the address break will be in
locations TRAPFL and TRAPPC. If the address break was from
JSYS level (breakpoint was to ADRCMP and location INSKED is
zero) then an $P will proceed properly. If the address break
was from the scheduler or from PI level, doing $P will be
useless since the monitor will then BUGHLT because it doesn't
want to see an address break under these conditions. However,
this is ok if all you want to do is find the instruction
causing the trashing.
If the location still gets trashed after trying to catch it this
way, either your procedure is wrong; e.g. by trying this on a 2020
(which has no address break feature); the location is being changed
by some IO being done (RH20s, DTEs, etc); or else the machine is
having some hardware problems.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 35
Recovering from System Disasters
RECOVERING FROM SYSTEM DISASTERS
--------------------------------
There are some common system disasters which in many cases can be
recovered from quickly and with a minimum of effort. The four we will
discuss in this article are:
1. Hung Terminals
2. Hung Jobs
3. Hung SETSPD
4. Trashed Disks
1.0 HUNG TERMINALS
Hung terminals are usually the result of two problems. Either the speed
has been set incorrectly for that terminal type or a problem exists
between the KL and the front end. If the problem is a result of an
improper speed setting, then simply resetting the speed will be
sufficient. On the other hand, if the problem is due to some sync
problem between the KL and the 11 then the easiest way to recover from
this is to reload the front end. This can be done by depressing the
halt switch on the operator's console of the 11 and then placing it back
in the enable state. After about fifteen seconds, the message
[DECsystem-20 continued]
should be printed on the CTY. If this fails to free the terminal,
perhaps the problem is a hung job. See the discussion under that
heading also. If the problem is recurrent or otherwise needs debugging,
there are a couple of ways of gathering the necessary information. One
is to take a dump of the system (see the Crash Analysis section) and the
front end. Another way is to use the DMPTTY program from the SWSKIT to
display the internal terminal state for the line. In many cases this
can substitute for the CPU dump, though any front-end state is still
unknown.
2.0 HUNG JOBS
There are a number of circumstances which arise which cause a job to
become hung, usually waiting for some resource to free up, some share
count to become zero etc. Some times, these tests will never become
satisfied, the Job has its PSI system turned off, and as a result the
job becomes Hung. Freeing it up can be very tricky. The first thing to
try is to log the job out from some other terminal. If this doesn't
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 36
Recovering from System Disasters
succeed in freeing the job up, then the next best thing is to detach the
job from the terminal and allow it to sit there. It may be using
negligible amounts of CPU time and cause no adverse affects to the
system. To zap the job may crash the system which, in most cases, is
not the desirable approach. Use SYSDPY and note the scheduler tests
that the processes of the job are in for later reference (see the
Scheduler Tests section).
The next time the system is reloaded, be sure to get a dump of the
system with the hung job and submit it as an SPR (see the SWSKIT article
about getting informative Dumps).
3.0 HUNG SETSPD
This is a fairly common problem brought on by some hardware problem. It
is possible to bring the system up without running SETSPD under JOB 0,
logging in, and then trying to run SETSPD under some other operator job.
If SETSPD then hangs, it is possible to CONTROL/C out of the program,
edit n-CONFIG.CMD to remove the commands suspected of hanging SETSPD,
and retrying. In this way, while waiting for the problem to be
resolved, it is possible to continue timesharing.
To bring the system up without running SETSPD automatically, one need
only install the following patch to the MONITOR using EDDT on system
start up.
BOOT>/l
BOOT>/g141
EDDT
EDDTF[ 0 -1
DBUGSW[ 0 2
GOTSWM$B
SYSGO1$G
[PS MOUNTED]
1B>>GOTSWM
RUNDD3+7/ PUSHJ P,RUNDII JFCL (at RUNDD3+16 for v6.0)
0$1B
$P
%%No SETSPD
The system will then come up as usual except that SYSJOB will not run.
After successfully deciding the problem with SETSPD, SYSJOB can be run
by typing
COPY (FROM) <SYSTEM>SYSJOB.RUN (TO) <SYSTEM>SYSJOB.COMMANDS
This will cause all the commands in the SYSJOB.RUN file to be executed
by SYSJOB.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 37
Recovering from System Disasters
4.0 TRASHED DISKS
This is surely one of the biggest headaches facing a specialist.
Trashed disks come in many forms and recovering from these requires a
good knowledge of the structure of the TOPS-20 file system.
If the structure cannot be mounted, it is because of one of the
following reasons:
1. Inconsistency in either of the HOM blocks
1. Word HOMNAM (1) of either HOM block not SIXBIT/HOM/
2. Word HOMCOD (176) of either HOM block not 707070
3. Word HOMHOM (5) of first HOM block not 1,,12
4. Word HOMHOM (5) of second HOM block not 12,,1
5. Word HOMFSN (173) of either HOM block not 20040,,47524
6. Word HOMFSN+1 (174) of either HOM block not 51520,,31055
7. Word HOMFSN+2 (175) of either HOM block not 20060,,20040
8. Right half of word HOMLUN (4) of either home block either
refers to a unit greater than the left half of word HOMLUN
or it refers to a UNIT already verified
9. Word HOMSNM (3) of either home block does not agree with
SIXBIT/STRUCTURE-NAME/
10. No disk address for index block in word HOMRXB (10) of
either HOM blocks
2. Inconsistencies in Root-Directory page 0
1. Directory number in page 0 of Root-Directory not 1
2. Directory block type (DRTYP) of Root-Directory page 0 not
400300 (.TYDIR)
3. Relative Page number (DRRPN) of Root-Directory page 0 not 0
4. Top of symbol table (DRSTP) of Root-Directory page 0 out of
Directory bounds
5. Pointer to first free block (DRFFB) of Root-Directory page
0 not in page 0 of the directory
6. Pointer to Directory Name String (DRNAM) not under start of
symbol table
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 38
Recovering from System Disasters
7. Directory name pointer (DRNAM) not 0 and Name string block
length (NMLEN) not at least 2 words long
8. Directory name pointer (DRNAM) not 0 and directory name
block header (NMTYP) not 400001 (.TYNAM)
9. Password block pointer not 0 and password string block
length (NMLEN) not at least 2 words long
10. Password block pointer not 0 and password string block
header (NMTYP) not 400001 (.TYNAM)
11. Account string block pointer not 0 and Account string block
length (NMLEN) not at least 2 words long
12. Account string block pointer not 0 and Account string block
header (NMTYP) not 400001 (.TYNAM)
13. Remote alias list pointer not 0 and Remote alias block
length (NMLEN) not at least 2 words long
14. Remote alias list pointer not 0 and Remote alias block
header (NMTYP) not 400001 (.TYNAM) and so on down the chain
3. Inconsistencies in Block types or free space in subsequent
pages of the directory.
All blocks in the directory (including free space) begin with a
block header which specifies type and length. Immediately
following one block should be a header for a new block. If
this scheme is corrupted, the mount will fail.
1. Header of a block not
1. (.TYNAM) 400001 6. (.TYDIR) 400300
2. (.TYEXT) 400002 7. (.TYFRE) 400500
3. (.TYACC) 400003 8. (.TYFBT) 400600
4. (.TYUNS) 400004 9. (.TYGDB) 400700
5. (.TYFDB) 400100 10. (.TYRNA) 401000
2. Header of block is NAMTYP and length not at least 2 words
3. Header of block is EXTTYP and length not at least 2 words
4. Header of block is ACCTYP and length not at least 3 words
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 39
Recovering from System Disasters
5. Header of block is USRTYP and length not at least 3 words
6. Header of block is FDBTYP and
1. Block length not at least 30 (.FBLN0) words long
2. Pointer to Author String (.FBAUT) not 0 and points to a
block outside of the directory or points to a block
that does not meet the tests for a user name string as
described above.
3. Pointer to Last Writer String (.FBLWR) not 0 and points
to a block outside of the directory or points to a
block that does not meet the tests for a user name
string block as described above.
4. Pointer to Account String (.FBACT) is not less than or
equal to zero and it points to a block outside of the
directory or it points to a block that does not meet
the tests for an account string block as described
above.
5. Pointer to Name String (.FBNAM) is not 0 and it points
to a block outside of the directory or it points to a
block that does not meet the tests for a Name String
Block as described above.
6. Pointer to Extension String (.FBEXT) is not 0 and it
points to a block outside of the directory or it points
to a block that does not meet the tests for an
Extension String Block as described above.
7. Header of a block is DIRTYP and
1. Header is not on a page boundary
2. Relative page number (DRRPN) not the calculated page
number
3. Pointer to first free block (DRFFB) does not point to a
location within the current directory page
4. Directory number (DRNUM) not 1.
8. Header of a block is FRETYP and block is not at least two
words or Pointer to next free block (FRNFB) is not zero and
points to a location not on the same page as current
9. Last block did not end at DRFTP (address specified on first
page of directory)
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 40
Recovering from System Disasters
4. BAT blocks inconsistent.
1. Either block does not contain SIXBIT/BAT/ in BATNAM (offset
0 in block)
2. Either block does not contain 606060 in BATCOD (offset 176)
3. Sector number of the BAT block (BATBLK) not the true sector
4. The BAT blocks do not compare exactly with each other
through word 176 of the blocks
5. Checksum of the Root-directory Index Block does not agree with
the checksum calculated.
Checksums are calculated as follows:
CHKSUM = 0 ;
For I = 0 to 777
If XB(I) = 0 then
CHKSUM = CHKSUM + I
Else
CHKSUM = CHKSUM + XB(I) ;
where XB is the first word of the index block.
As you can see, there are many things that could be wrong with a
structure that inhibits it from being mounted. The consistency of the
structure can be checked quite easily using the FILDDT commands of
STRUCTURE and DISK, discussed elsewhere in the SWSKIT.
For structures which are badly trashed, the only sane way of recovering
is to rebuild the structure using a catastrophe tape. For simple
inconsistencies such as a bad BAT block, CHECKD does the job well. For
more involved trashes which can not be recovered from a back up tape
(because of a forgetful system manager) the above information can be of
great help, along with SWSKIT programs DS, DIRPNT, and DIRTST.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 41
Looking at Hung Tapes
LOOKING AT HUNG TAPES
---------------------
A number of problems of the general classification "tape hang" have been
reported, and will probably always exist as long as we use magtapes.
Although there are apparently several variants of the problem, there are
some things which can be done by a suitably cautious specialist when
presented with a hung tape drive. Listed below are some techniques
which can be used in an attempt to investigate and perhaps alleviate the
problem. These things should, in general, be harmless to the system,
barring mis-typing in MDDT. As a result, perhaps they will not clear
the problem.
There are several tables that are used in relation to tape drives. Some
of these tables are indexed by MT unit number, some by MTA unit number.
In general, it can be said that if a table name begins with the
characters MT, it will be indexed by MTA or physical unit number, and if
the table name begins with TL or TP, it will be indexed by MT or logical
unit number. The TL and TP tables will usually have something to do
with the tape labeling system. This article concerns itself mainly with
the more important tables relating to MTAs (physical tape units).
When playing with the tape subsystem, certain care should be taken. For
instance, it always helps if no one else is actively using the tape
drives while you attempt something like reloading the microcode for a
DX20.
1. Finding the Tape Drive
There are several tables parallel to each other which concern the
ownership of a tape drive. Those of interest are DEVNAM, DEVCHR, and
DEVUNT. At DEVNAM+n is the device name in SIXBIT. At DEVUNT+n is a
word with the left half set to the assigner's job number, -1 if free, or
-2 if being controlled by the allocator. The right half contains the
unit number. Note that with tape allocation turned on, MTAs will always
indicate that job 0 has the drive assigned and that the offset to the MT
unit number will contain the job number of a user. At DEVCHR+n is the
device characteristics word. Knowing the devicename or the owning job,
one can use DDT to find the table offset. See the example below.
2. Grabbing the Drive
Knowing the offsets into DEVUNT, the device assignment can be freed by
putting -1 into the left half of the appropriate DEVUNT entry. The
drive can then be assigned by the normal ASSIGN command to the EXEC. In
dealing with the allocator, your own job number can be placed here if
necessary. The drive, however, will still be in no state to use. Note
that the appropriate DEVUNT entry would be the one referring to the MT
not the MTA.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 42
Looking at Hung Tapes
3. Clearing External Errors
Make sure that there is a tape of some sort mounted, and the drive is
placed on-line. Having a write-enable ring in the tape may help in
being sure the unit is functional if the hung condition is cleared.
4. Checking the UDB
Next, the Unit Data Block status should be reset. This word can be
found using the MTCUTB table. This table is indexed by MTA unit number,
the left half is the address of the channel data block (CDB), and the
right half contains the address of the UDB. The status word of the UDB
should then be reset to the base state. The right half should be left
alone--it basically contains drive type. The left half should have only
bit 16 set, which indicates a tape type device (US.TAP). The old
contents should be remembered for purposes of later analysis.
5. Checking the Status
Now, table MTASTS is examined, indexed by MTA unit number again.
Remember the old contents. Then clear the word to zero.
6. Example
@enaBLE (CAPABILITIES)
$sddt
DDT
mddt%$x
MDDT
dvxstn=21 !THIS SYMBOL PROVIDES A HANDY INDEX TO THE
!MTA OFFSETS IN THE DEVxxx TABLES.
!DEVNAM HAS SIXBIT DEVICE NAMES
devnam+21/ HLRZM P2,FKBSPW+217(T1) $6t;MTA0
DEVNAM+22/ MTA1
DEVNAM+23/ MTA2
DEVNAM+24/ MTA3
...
...
...
DEVNAM+40/ MTA17
mtan=20 !ROOM FOR 20 (OCTAL) TAPE DRIVES ALLOCATED
mtindx[ 777765,,5 !BUT ONLY 5 ACTUAL DRIVES ARE HERE
!MTs WILL APPEAR AFTER MTAs IN THE DEVxxx TABLES SO
!DVXSTN+MTAN WILL BE THE OFFSET TO THE MT ENTRIES
devnam+41/ HLRZM P1,@0 $6t;MT0
DEVNAM+42/ MT1
DEVNAM+43/ MT2
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 43
Looking at Hung Tapes
DEVNAM+44/ MT3
...
...
...
DEVNAM+60/ MT17
!DEVUNT IS PARALLEL TO DEVNAM AND PROVIDES OFFSETS INTO
!THE MTxxxx TABLES FOR MTAs AND OFFSETS INTO THE TLxxxx
!AND TPxxxx TABLES FOR MTs
devunt+21[ 0 !UNIT ZERO (OFFSET FROM DEVNAM) ASSIGNED TO JOB 0
DEVUNT+22[ 1 !JOB 0,,MTA1:
DEVUNT+23[ 2 !JOB 0,,MTA2:
DEVUNT+24[ 3 !JOB 0,,MTA3:
DEVUNT+25[ 4 !JOB 0,,MTA4:
DEVUNT+26[ 5 !JOB 0,,MTA5:
DEVUNT+27[ 777777,,6 !UNASSIGNED,,MTA6:
...
...
...
DEVUNT+40[ 777777,,17 !UNASSIGNED,,MTA17:
!DV%PSD=400000 INDICATES A PSEUDO DEVICE
!THE FOLLOWING ENTRIES FOR MTs WILL INDICATE
!THE AVAILABILITY OF LOGICAL TAPE UNITS
devunt+41[ 32,,400000 !PSEUDO DEVICE MT0: IS ASSIGNED TO
!JOB 32 OCTAL (JOB 26 IN DECIMAL)
DEVUNT+42[ 777776,,400001 !CONTROLLED BY ALLOCATOR,,MT1:
DEVUNT+43[ 777776,,400002 ! " " " ,,MT2:
DEVUNT+44[ 777776,,400003 ! " " " ,,MT3:
...
...
...
DEVUNT+60[ 777776,,400017 ! " " " ,,MT17:
!TLABR0 (INDEXED BY MT NUMBER) WILL INDICATE WHICH
!PHYSICAL TAPE UNIT WILL BE USED WHEN REFERENCING MT.
!THIS IS INDICATED BY PHYSICAL MTA NUMBER IN BITS 2-8.
tlabr0[ 405000,,0 !BIT 0 INDICATES A VALID VOLUME IS ON MTA5
mtcutb+5[ 730437,,730625 !CDB,,UDB FOR MTA5 IN USE BY JOB 26
!WHO KNOWS IT AS MT0 (SEE ABOVE)
730625[ 102,,157 !FIRST WORD OF UDB FOR MTA5
!US.WLK=1B11 ==> WRITE LOCKED
!US.TAP=1B16 ==> TAPE TYPE DEVICE
!.UTT70=17B35 ==> TU70
mtasts+5[ 0 !THIS EXAMPLE INDICATES A TAPE DRIVE THAT PROBABLY
!HASN'T BEEN REFERENCED BY THE USER YET
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 44
Looking at Hung Tapes
mretn$g !TO RETURN TO SDDT FROM MDDT
<>
^Z !TO RETURN TO THE EXEC FROM SDDT
$
If clearing MTASTS and UDBSTS for the drive doesn't seem to clear the
problem, you will probably have to do more digging around to find some
other, more obscure, inconsistency in the MTA/MT tables. This can be
accomplished by referring to the monitor tables under MTA-STORAGE-AREA.
As always, extreme caution should be exercised while fooling around in
MDDT as you can accidentally trash some random location in the monitor
just by hitting a carriage return at the wrong time.
If a DX20 controller is involved, there may be a situation of the
controller microcode hanging or malfunctioning. The SWSKIT program
DX20PC can be useful in these situations.
One last note should be made about the monitor tables here. The
description of the DEVUNT table would lead one to believe that the right
half will contain a -2 if the device is under control of the allocator.
If the device is under control of the allocator, the -2 will appear in
the left half.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 45
A Look at Some of the Disk Stuff
A LOOK AT SOME OF THE DISK STUFF
--------------------------------
This article is a front for the PHYPAR module, which is where the
information may be reliably obtained, and should serve as the ultimate
reference for these problems.
Much of the system debugging you will have to deal with will involve the
DEC-20 hardware. There always seems to be a large gap between what the
diagnostics can tolerate and what the monitor can tolerate in the way of
malfunctioning hardware. The monitor will not always point you to the
real disk or magtape problem, say, but will crash after something has
gone wrong a few minutes ago somewhere. Most of the hardware problems
that we have had to deal with that were really difficult to track down
and point the Field Service rep. to were problems with disk hardware.
The following is information which you can use to help Field Service
trace down problems which are not reported in the diagnostics. In most
cases the Field Service rep knows what all the status bits etc. mean
but has not been able to find them in the monitor crashes or running
monitor.
CHNTAB:
CHNTAB is an ordered list of Channel Data Block addresses
starting with channel 0. RH20-0 data block address is in the
first word etc.
CDB:
CDB is the Channel Data Block. There is one CDB per channnel.
The CDB contains channel dependent instructions and data,
pointers to the unit data block (UDB) in the case of RPO4, RP05,
and RP06's. In the case of tapes the pointer is to the
Kontroller Data Block (TM02/3) which points in turn to the UDBs.
The CDB also contains information about the currently active
unit. When the channel interrupts, control passes (via a JSP)
to CDBINT. The CDB address is stored in AC1, P1 and the
principal analysis routine, PHYINT, is called.
NOTE
The CDBs are referenced in modules PHYSIO, PHYH2 (RH20
code), PHYM2 (TM02/3 code) and PHYP4 (RP04,05,06,07s
code). The Channel Data Block is defined in the module
PHYPAR. The address that you get in CHNTAB is really a
pointer to word0 which contains the status bits for this
controller (CDBSTS). Look in PHYPAR for the table
definition. Some words of interest are:
CDBaddress + CDBSTS: status and configuration info
CDBaddress + CDBUDB: table of UDB (or KDB) addresses
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 46
A Look at Some of the Disk Stuff
The status bits which are also defined in PHYPAR are listed here
for your convenience:
CS.OFL==1B0 ; offline
CS.AC1==1B1 ; primary command active
CS.AC2==1B2 ; secondary command active
CS.ACT==CS.AC1!CS.AC2 ; any active
CS.MAI==1B3 ; channel is in maintenance mode
CS.MRQ==1B4 ; maintenance mode requested for unit
CS.ERC==1B5 ; error recovery in progress
CS.STK==1B6 ; channel supports command stacking
CS.ACL==1B7 ; alternate command list is current
CS.CWP==1B8 ; Channel write parity error detected
CS.CIP==1B9 ; CI-port channel
CS.DEN==1B10 ; CI port DIAG to take channel enabled
CS.NIP==1B12 ; NI-port channel
BITs 30-32 ; PIA field
BITs 33-35 ; channel type field
KDB:
Kontroller Data Block. Defined in PHYPAR also. Referenced in
PHYM2, PHYPAR, PHYSIO. Words of interest are:
KDBADDR+KDBSTS: ; flags unit type
KDBADDR+KDBUDB: ; UDB table first word (1 word/UDB)
UDB:
Unit Data Block. There is one UDB per unit associated with a
CDB or KDB. The UDB contains information about the current
activity on the unit in question. The UDB is defined in PHYPAR
as well. Some words of interest are noted below. Look in the
listings for other information.
UDBADDR + UDBSTS: ; status and configuration (see below)
UDBADDR + UDBERR: ; error recovery status word
UDBADDR + UDBERP: ; error reporting work area if non 0
UDBADDR + UDBRED: ; reads - sectors/frames if disk/tape
UDBADDR + UDBWRT: ; writes - sectors/frames if disk/tape
UDBADDR + UDBSRE: ; soft read errors
UDBADDR + UDBSWE: ; soft write errors
UDBADDR + UDBHRE: ; hard read errors
UDBADDR + UDBHWE: ; hard write errors
UDBADDR + UDBPS1: ; current cylinder/file if disk/tape
UDBADDR + UDBPS2: ; current sector/record if disk/tape
UDBADDR + UDBSPE: ; soft positioning error
UDBADDR + UDBHPE: ; hard positioning error
; NOTE - there are several other UDB words
; including a device dependent portion
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 47
A Look at Some of the Disk Stuff
Status bits in UDBSTS (First word of UDB):
US.OFS==1B0 ; off line or unsafe
US.CHB==1B1 ; check HOME blocks before any normal I/O
US.POS==1B2 ; positioning in progress
US.ACT==1B3 ; active
US.BAT==1B4 ; on if bad BAT blocks on this unit
US.BLK==1B5 ; lock bit for this units BAT blocks
US.PGM==1B6 ; dual port switch in (A or B)
US.MAI==1B7 ; unit is in maintenance mode
US.MRQ==1B8 ; maintenance mode requested on this unit
US.BOT==1B9 ; unit is at BOT
US.REW==1B10 ; unit is rewinding
US.WLK==1B11 ; unit is write locked
US.CIP==1B12 ; unit is on a CI port
US.OIR==1B13 ; operator intervention required, set at
; interrupt level, checked periodically.
US.OMS==1B14 ; once a minute message to operator, used in
; conjunction with US.OIR.
US.PRQ==1B15 ; positioning required on this unit
US.TAP==1B16 ; device type tape
US.PSI==1B17 ; tape - online/offline/rewind done transition
US.DSK==1B18 ; Disk type device
US.OR1==1B19 ; 1st overdue rewind timer bit
US.OR2==1B20 ; 2nd overdue rewind timer bit
US.2PT==1B21 ; Drive is potentially dual-ported
US.ORC==US.OR1!US.OR2; overdue rewind field
US.TPD==1B22 ; Disk is offline to prevent three ports
US.BDK==1B23 ; CI broadcast needed
US.RTY==7B26 ; Retry count field (bits 24,25,26)
US.CIA==1B27 ; CI available
US.UNA==1B28 ; Device unavailable (like 16 bit disk)
BITS 32-35 contain unit type code (.USTYP):
.UTRP4 = 1 ; RP04
.UTRS4 = 2 ; RS04 (drum)
.UTT16 = 3 ; TU16 (TU45)
.UTTM2 = 4 ; TM02 as a unit
.UTRP5 = 5 ; RP05
.UTRP6 = 6 ; RP06
.UTRP7 = 7 ; RP07
.UTRP8 = 10 ; RP08
.UTRM3 = 11 ; RM03
.UTTM3 = 12 ; TM03 AS A UNIT
.UTT77 = 13 ; TU77
.UTTM7 = 14 ; TM78
.UTT78 = 15 ; TU78
.UTDXA = 16 ; DX20-A FOR TAPES
.UTT70 = 17 ; TU70
.UTT71 = 20 ; TU71
.UTT72 = 21 ; TU72
.UTT73 = 22 ; TU7x
.UTDXB = 23 ; DX20-B FOR DISKS
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 48
A Look at Some of the Disk Stuff
.UTP20 = 24 ; RP20
.UTNOD = 25 ; CI NODE WITH NO MSCP SERVER
.UTHSC = 26 ; HSC-50
.UTR80 = 27 ; RA80
.UTR81 = 30 ; RA81
.UTR60 = 31 ; RA60
.UTR82 = 32 ; RA82 (FUTURE)
.UTR62 = 33 ; RA62 (FUTURE)
.UTTA7 = 34 ; TA78
The places where things are on the disk is as follows:
BLOCK 0: ; 11 bootstrap
BLOCK 1: ; primary HOME block
BLOCK 2: ; primary BAT block
BLOCKS 3-11: ; reserved
BLOCK 12 ; secondary HOME block
BLOCK 13 ; secondary BAT block
The places where the disk pages for the above are stored is in the table
HOME, which is defined in STG. The BAT blocks are defined in PROLOG and
the HOME blocks are defined in DSKALC and PROLOG.
The SWSKIT programs DS, CHANS, and UNITS can help in displaying some of
this information on the running system.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 49
Disk Features of FILDDT
DISK FEATURES OF FILDDT
-----------------------
The FILDDT shipped after release 4 of TOPS-20 has two new commands in
relation to disk file structure maintenance. They are:
STRUCTURE (FOR PHYSICAL I/O IS) disk-structure
Examines the specified disk structure.
DRIVE (FOR PHYSICAL I/O IS ON CHANNEL) c (CONTROLLER) k (UNIT) u
Examines the specified disk unit.
These are privileged functions and one must ENABLE to use them.
These two commands are nearly identical. Their difference is in the way
the structure is identified. To use the STRUCTURE command the structure
must be mounted. The STRUCTURE command is useful for examining a
multi-pack structure. The DRIVE command is useful for examining the
file system of a structure which cannot be mounted. Channel,
controller, and unit numbers can be found from the programs UNITS, DS,
SYSDPY, or OPR.
Word addressing is in the same format as in other forms of DDT.
It is easier to understand exactly what the disk will look like in
FILDDT if you keep in mind that all sectors will be packed in the DDT
address space, without regard for sector size, starting at DDT address
0. For instance, on an RP06 there are four sectors per memory page or
200 (octal) words per sector. Therefore, sector zero of the structure
will begin at FILDDT address 0 and end at memory address 177 (octal).
Sector 1 will begin at address 200 and end at 377. All supported disk
drives except the RP20 have 200 (octal) words per sector. On the RP20
there are 1000 (octal) words per sector (one page). All index block
addresses and most monitor disk addresses are in sectors. That is why
it is important to be able to translate between sector addresses and
FILDDT memory addresses.
The FILDDT option of ENABLE PATCHING is also available for use with the
DRIVE and STRUCTURE command. With this option on, the user is able to
modify specific words on the structure. Another very convenient FILDDT
command one may use in conjunction with the disk commands is LOAD
(symbols from) input file spec. One may specify any file here but a
useful one is SYSTEM:MONITR. The symbol table to the MONITOR has HOM
block sector addresses, FDB offsets etc. When a file's symbols are
loaded, one may also define his own symbols. This is useful to remember
addresses of data structures on the units. For example, after finding
the index block to a file, one could define a symbol, FILIDX at that
address for easy referencing later on.
When examining a multi-pack structure using the STRUCTURE command,
addressing the first unit is exactly as if there were only one unit in
the structure. FILDDT addresses of sectors on the other units begin
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 50
Disk Features of FILDDT
immediately after the last address for the first unit of the structure.
For example, consider that we would like to examine the BAT blocks for
the second unit of a two pack STR: on RP06 drives.
An RP06 contains 304000. sectors per unit and 128. words per sector.
The first FILDDT address for the second unit of a RP06 two pack STR: is
304000.*128.=38912000. or 224340000 (octal)
FILDDT>LOAD (SYMBOLS FROM) SYSTEM:MONITR.EXE
[22722 symbols loaded from file]
FILDDT>STRUCTURE (FOR PHYSICAL I/O IS) PS:
[Looking at file structure PS:]
; starting address of second unit in structure plus sector
; address of BAT blocks (2) times words per sector gives
; FILDDT address of start of BAT blocks for that unit
224340000+2*200=224,,340400
224,,340400[ 424164,,0 $6T; BAT ; Found it
For another example, let's say we would like to find the start of the
ROOT-DIRECTORY symbol table:
NWSEC=200 ; number of words per sector
HM1BLK=1 ; sector number of HOM block
HOMRXB=10 ; offset for index block of ROOT-DIRECTORY
; HOM block sector number times words
; per sector equals address of HOM start
HM1BLK*NWSEC[ 505755,,0 $6T;HOM
HM1BLK*NWSEC+HOMRXB[ 10,,5740 ; plus offset to address of index block
; sector number of index block times
; words per sector gives address of
5740*NWSEC[ 10,,5744 ; ROOT-DIRECTORY index block
; NOTE: Bit 14 (DSKAB) specifies this
; address as a disk sector address.
; sector addresses are bits 15-35
RTDIDX: ; define symbol for index block here
; sector number of first page of
; ROOT-DIRECTORY times number of words
; per sector gives the address of first
5744*NWSEC[ 400300,,100 ; page of ROOT-DIRECTORY
RTDIR0: ; define start of page 0 of ROOT-DIR
RTDIR0+3[ 30610 ; plus 3 for start of symbol table
; NOTE: adr is a 'directory address'
; offset 610 of directory page 30
RTDIDX+30[ 10,,6250 ; get sector adr of page 30 of ROOT-DIR
; sector adr of page 30 times words per
; sector gives address of page 30 of
; ROOT-DIRECTORY.
6250*NWSEC+610[ 400400,,1 ; Add offset for symbol table start
RTDSYM: ; Define a symbol here
^E
FILDDT>EXIT
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 51
Supported Disk Drive Parameters
SUPPORTED DISK DRIVE PARAMETERS
-------------------------------
TYPE SIZE (PAGES) MEDIA #/STRUCTURE(1) CONTROLLER CPU
---- ------------ ----- -------------- ---------- ---
RP04 38,000. Pack 6 Massbus KL(2)
RP05 38,000. Pack 6 Massbus KL(2)
RP06 76,000. Pack 3(3) Massbus KL/KS
RM03 30,340. Pack 2 Massbus KS
RP20 201,420. Fixed 3(4) Massbus+DX20B KL
RP07 216,376. Fixed 2 Massbus KL
RA80 53,508. Fixed 6 CI20+HSC50 KL
RA81 200,928. Fixed 3 CI20+HSC50 KL
RA60 90,516. Pack 3 CI20+HSC50 KL
(1) -- depends on addressing, MXPGUN, MXSTRU, and BTBSIZ; SPD is final
(2) -- disk model no longer sold
(3) -- 2 packs/structure on a KS or Model A machine
(4) -- 1 spindle/structure on Model A machines
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 52
Supported Tape Drive Parameters
SUPPORTED TAPE DRIVE PARAMETERS
-------------------------------
TYPE SPEED DENSITY #/CONTROLLER CONTROLLER CPU NOTES
---- ----- ------- ------------ ---------- --- -----
TU45 75ips 800/1600 8(KL)/4(KS) TM02/TM03 KL/KS (1)(2)
TU70 100 800/1600 8 DX20-A/TX02 KL (1)
TU71 100 556/800 8 DX20-A/TX02 KL (1)(3)
TU72 100 1600/6250 8 DX20-A/TX02 KL (4)
TU77 125 800/1600 4 TM02/TM03 KL/KS (2)
TU78 125 1600/6250 4 TM78 KL
TA78 125 1600/6250 4 HSC50/TS78 KL (5)
(1) -- tape model no longer sold
(2) -- TM02 controller no longer sold
(3) -- 7 track model
(4) -- TX05 option allows 16 drives/DX20 using 2 TX02s
(5) -- Planned for some TOPS-20 release after 6.0
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 53
TOPS-20 Scheduler Test Routines
TOPS-20 SCHEDULER TEST ROUTINES
-------------------------------
The following is a tabulation of (hopefully) all of the scheduler tests
used by the TOPS-20 monitor, time-frame approximately Release 6.1. This
includes ARPA and DECNET tests. This is the data one finds in the
monitor table FKSTAT indexed by fork number for forks which have blocked
and left the GOLST (i.e. LH(FKPT) contains WTLST). The format of the
FKSTAT table words is TEST DATA,,TEST ROUTINE ADDRESS. Sometimes table
FKSTA2 contains additional data. (Version 6.0 and later.) The scheduler
test routines are called periodically to determine if a process can be
unblocked. This is indicated by a skip return from the scheduler test.
A nonskip return is taken if the process cannot yet be unblocked.
When examining the monitor because of a hung job or fork, the FKSTAT
table can often reveal the reason the fork is hung, and this sometimes
even allows corrective action to be taken.
The table below gives routine name, what you should expect to see in the
FKSTAT table, and the module in which the scheduler test is defined,
followed finally by a short description of what the particular condition
is which is being tested. Use SYSDPY to view the running system.
Those tests defined in PAGUTL are found in PAGEM in earlier monitors
than release 6.0.
SCHEDULER TESTS
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
BATTST [UNIT #,,BATTST] [DSKALC]
Wait for US.BLK, the lock bit for the BAT blocks
on the unit, in the UDB to be zero.
BLOCKM [TIME,,BLOCKM] [SCHED]
Wait for TIME in BLOCKM format which is the low
order 17 bits of the desired future time to be
compared against a suitably masked TODCLK.
BLOCKT [TIME,,BLOCKT] [SCHED]
Wait for TIME in BLOCKT format which is a
value that is shifted left 10 bits and compared
against a suitably masked TODCLK, providing a
longer delay than BLOCKM, but less precision.
BLOCKW [TIME,,BLOCKW] [SCHED]
Wait for TIME in BLOCKW format (same as BLOCKM).
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 54
TOPS-20 Scheduler Test Routines
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
CDRBLK [UNIT NUMBER,,CDRBLK] [CDRSRV]
Wait for card-reader offline, or not waiting for
a card.
CFGVOT [0,,CFGVOT] [CFSSRV]
Wait for HSHTFW, HSHWVT, or HSHUGD to be set in
block. FKSTA2 has pointer to block.
CFRCNW [TIME,,CFRCNW] [CFSSRV]
Wait for DLYLOK to be non-positive or BLOCKT form
time to have expired.
CFSRWT [0,,CFSRWT] [CFSSRV]
Wait for block released, wakeup timer, or no more
users of block. FKSTA2 has pointer to block.
CHKLOK [ADDRESS,,CHKLOK] [NSPSRV]
Wait for NSP block lock at address to free.
CLOTST [0,,CLOTST] [NIUSR]
Wait for PRCCP (port is closed) to be set in the
port block. FKSTA2 has pointer to port block.
COFTST [TIME,,COFTST] [MEXEC]
Wait for job in FKJOBN to be attached or time
in BLOCKT form to elapse.
CTMTST [0,,CTMTST] [CTHSRV]
Wait for listen (MSGCWL), linked block on output
(MSGBLW), queued CTERM lines (CTMATN), or queued
DCN links (MSGATN) set.
D6BWT [INDEX,,D6BWT] [DTESRV]
Wait for D6STS(INDEX) to be .GE. zero, indicating
a free condition.
D6DWT [INDEX,,D6DWT] [DTESRV]
Wait for D6%DDN to be set in D6STS(INDEX) to
indicate read data done.
D6RWT [INDEX,,D6RWT] [DTESRV]
Wait for D6%RDN to be set in D6STS(INDEX) to
indicate response header.
D6WKT [INDEX,,D6WKT] [DTESRV]
Wait for timer in D6CLK(INDEX) to expire.
DBWAIT [DTE #,,DBWAIT] [DTESRV]
Wait for the TO-10 doorbell from the given DTE.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 55
TOPS-20 Scheduler Test Routines
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
DGLTST [0,,DGLTST] [DIAG]
Wait for DIAGLK lock to be free.
DGUIDL [UDB ADDRESS,,DGUIDL] [DIAG]
Wait for the unit to show as idle in the UDB.
DGUTST [UDB ADDRESS,,DGUTST] [DIAG]
Wait for the maintenance bit to set in the UDB.
DISET [ADDRESS,,DISET] [SCHED]
Wait for contents of ADDRESS to be zero.
DISGET [ADDRESS ,,DISGET] [SCHED]
Wait for contents of ADDRESS to be positive.
DISGT [ADDRESS,,DISGT] [SCHED]
Wait for contents of ADDRESS to be greater than
zero.
DISLT [ADDRESS,,DISLT] [SCHED]
Wait for contents of address to be less than
zero.
DISNT [ADDRESS,,DISNT] [SCHED]
Wait for contents of ADDRESS to be non-zero.
DSKRT [PAGE #,,DSKRT] [PAGEM]
Wait for CSTAGE for PAGE # to not be PSRIP,
meaning disk read completed.
DWRTST [PAGE #,,DWRTST] [PAGUTL]
Wait for DRWBIT to clear in CST3(PAGE #),
meaning write completed.
ENQTST [FORK #,,ENQTST] [ENQ]
Wait for the lock on ENFKTB+FORK #.
FEBWT [ADDRESS OF FE UDB,,FEBWT] [FESRV]
Wait for EOF or input bytes available from FE.
Wake also on invalid assignment.
FEDOBE [ADDRESS OF FE UDB,,FEDOBE] [FESRV]
Wait for output buffer empty and all bytes are
acknowledged by the FE. Wake also if not a
valid assignment.
FEFULL [ADDRESS OF FE UDB,,FEFULL] [FESRV]
Wait for the current count of output bytes to be
less than the count of bytes in the interrupt
buffer. Wake also on invalid assignment.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 56
TOPS-20 Scheduler Test Routines
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
FORCTM [SUPERIOR FORK INDEX,,FORCTM] [SCHED]
Identifiable wait forever, forced termination.
FRZWT [PREVIOUS TEST,,FRZWT] [FORK]
Identifiable wait forever, frozen fork.
HALTT [SUPERIOR FORK INDEX,,HALTT] [SCHED]
Identifiable wait forever for halted fork.
HIBERT [TIME,,HIBERT] [SCHED]
Wait for TIME in BLOCKT format.
INTBOT [BIT #,,INTBOT] [IPIPIP]
Wait for bit in INTWTB table to be zero.
INTBPT [0,,INTBPT] [IPIPIP]
Wait for internet fork to be runnable (INTFLG
nonzero or INTTIM has passed).
INTBZT [BIT #,,INTBZT] [IPIPIP]
Wait for bit in INTWTB table to be one.
INTOOT [<BIT1>B8+<BIT2>B17,,INTOOT] [IPIPIP]
Wait for either or both of two bits in INTWTB to
be set.
INTZOT [<BIT1>B8+<BIT2>B17,,INTZOT] [IPIPIP]
Wait for either bit1 zero or bit2 one or both
conditions in INTWTB.
J0TCOT [LINE #,,J0TCOT] [TTYSRV]
Waits for Job 0 output to CTY to complete, with
timeout checks.
JB0TST [TIME,,JB0TST] [MEXEC]
Wait for JB0FLG set nonzero for explicit request
or time in BLOCKT form to elapse.
JRET [0,,JRET] [SCHED]
Wait forever, interruptible.
JSKP [0,,JSKP] [SCHED]
Unconditional skip used to schedule immediately.
JTQWT [0,,JTQWT] [SCHED]
Wait for JSYS trap queue.
LCKTSS [ADDRESS,,LCKTSS] [IO]
Wait for lock at ADDRESS to unlock, lock it.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 57
TOPS-20 Scheduler Test Routines
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
LKDSPT [0,,LKDSPT] [STG]
Wait for room in LDTAB table of directories
currently locked.
LKDTST [INDEX INTO LDTAB,,LKDTST] [STG]
Wait for bit in LCKDBT to clear, indicating
directory unlocked.
LODWAT [ADDRESS OF STATUS WORD,,LODWAT] [LINEPR]
Wait for flag LP%LHC to set in the addressed
word, indicating loading has completed of the
VFU or RAM file.
LOKWAI [LINE #,,LOKWAI] [CTHSRV]
Wait for link error or CTERM link can send now.
LPTDIS [UNIT ADDRESS,,LPTDIS] [LINEPR]
Wait for an error condition on the addressed
unit, or for all buffers cleared and no bytes
still in the front-end, before finishing close
operation on the device.
MTARWT [IORB ADDRESS,,MTARWT] [MAGTAP]
Wait for IRBFA in the IORB to indicate that this
IORB is no longer active.
MTAWAT [UNIT #,,MTAWAT] [MAGTAP]
Wait for all outstanding IORBs for unit to be
finished.
MTDWT1 [UNIT #,,MTDWT1] [MAGTAP]
Wait for the count of outstanding requests on the
unit to go to one.
NIDLST [0,,NIDLST] [DNADLL]
Wait for NIDLOK read counter lock to be free (-1).
NISCHK [0,,NISCHK] [DNADLL]
Wait for RCCFLG data returned flag to be negative.
NSPLWB [0,,NSPLWB] [LLINKS]
Wait for NSP lock NSPLOK to be free.
NSPTST [0,,NSPTST] [NSPSRV]
Wait for KDPFLG nonzero, indicating KMC11 wants
service, or MSGQ nonzero, indicating messages to
process.
NVTNTT [<0:8>OPTION #,<9:17>LINE #,,NVTNTT] [TVTSRV]
Wait for completed NVT negotiation.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 58
TOPS-20 Scheduler Test Routines
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
OFNLKT [OFN,,OFNLKT] [PAGUTL]
Wait for OFN unlocked--SPTLKB zero in SPTH(OFN).
PIDWAT [FORK #,,PIDWAT] [IPCF]
Wait for bit for fork in PDFKTB to set.
RCCTST [0,,RCCTST] [NIUSR]
Wait for PSI pending for fork or callback received.
FKSTA2 has pointer to port block.
RCRWAI [REQUEST #,,RCRWAI] [LLMOP]
Wait for request complete in request block (RB).
RB address in FKSTA2.
RCVTST [0,,RCVTST] [NIUSR]
Wait for PSI pending for fork of buffers available
in the port block. FKSTA2 has pointer to port block.
RLDTST [0,,RLDTST] [DTESRV]
Wait for master DTE running.
SALTST [TIME,,SALTST] [TTYSRV]
Waits for SALLCK unlocked or time passed.
SALWAT [LINE #,,SALWAT] [TTYSRV]
Waits for line to finish using sendall pointer.
SCJBLK [0,,SCJBLK] [SCJSYS]
Wait for PTBLK (port blocked) to be zero in the
port block. FKSTA2 has pointer to port block.
SCTLWB [0,,SCTLWB] [SCLINK]
Waits for session control lock SCTLOK to be free.
SEBTST [0,,SEBTST] [SYSERR]
Wait for SECHKF to go nonzero before starting
Job 0 task to write queued SYSERR entries.
SEEALL [0,,SEEALL] [TTYSRV4]
Waits for SNDALL to go to zero, indicating the
send-all buffer available.
SJBGON [0,,SJBGON] [SCJSYS]
Wait for SLB associated with SJB to be disposed of.
FKSTA2 has pointer to SJB.
SPCTST [0,,SPCTST] [DTESRV]
Wait for a node.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 59
TOPS-20 Scheduler Test Routines
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
SPMTST [0,,SPMTST] [PAGUTL]
Wait for page in SPMTPG to be on SPMQ or the
time SPMTIM to expire.
SQLTST [0,,SQLTST] [IMPDV]
Wait for the special queues lock SQLCK and lock
it.
STRTST [SDB ADDRESS OF STRUCTURE,,STRTST] [MSTR]
Wait for the structure lock to be free.
STSWAT [ADDRESS OF STATUS WORD,,STSWAT] [CDRSRV]
Wait for flag CD%SHA to come on in the addressed
word, indicating that cardreader status has
arrived.
STSWAT [ADDRESS OF STATUS WORD,,STSWAT] [LINEPR]
Wait for flag LP%SHA to set in the addressed
word, indicating that printer status has
arrived.
SUSFKT [FORK #,,SUSFKT] [FORK]
Wait for fork to be on WTLST in either SUSWT
OR FRZWT.
SWPRT [PAGE #,,SWPRT] [PAGEM]
Wait for CSTAGE for PAGE # to not be PSRIP,
meaning swap read completed.
SWPWTT [0,,SWPWTT] [PAGEM]
Wait for NRPLQ nonzero. Increment CGFLG each
time test is unsuccessful.
TCIPIT [FORK #,,TCIPIT] [TTYSRV]
Waits for no interrupts pending for FORK #.
TCITST [LINE #,,TCITST] [TTYSRV]
Wait for line inactive, no fork in input wait,
or input buffer non-empty.
TCOTST [LINE #,,TCOTST] [TTYSRV]
Wait for line inactive, or output buffer not
too full to add a character to it.
TCPABT [FORK #,,TCPABT] [TCPBBN]
Wait for all TCP connection aborts completed.
TCPOTS [<TOPNF>B8+<TERRF>B17,,TCPOTS] [TCPJFN]
Wait for TCP connection open, or error state.
FKSTA2 has host number index.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 60
TOPS-20 Scheduler Test Routines
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
TCPTST [ADDR,,TCPTST] [TCPJFN]
Wait for TCP%DN (buffer done) on in word .TCPBF
of block at ADDR.
TRMTS1 [0,,TRMTS1] [FORK]
Identifiable wait forever for inferior fork termination.
TRMTST [FORK #,,TRMTST] [FORK]
Wait for FORK # to be on WTLST for either HALTT
or FORCTM.
TRP0CT [MINIMUM NRPLQ,,TRP0CT] [PAGEM]
Wait for NRLPQ to be above stated minimum or
normal minimum. Increment CGFLG each time
test is unsuccessful.
TSACT1 [LINE #,,TSACT1] [TTYSRV]
Wait until line inactive, becoming active, or
has a full length dynamic block assigned.
TSACT2 [LINE #,,TSACT2] [TTYSRV]
Wait for line available--inactive or fully
active.
TSACT3 [LINE #,,TSACT3] [TTYSRV4]
Wait for line inactive--dynamic data unlocked.
TSTSAL [0,,TSTSAL] [TTYSRV4]
Wait for SALCNT to go to zero, indicating the
send-all is finished for this buffer.
TTBUFW [NUMBER,,TTBUFW] [TTYSRV]
Wait for NUMBER of buffers.
TTIBET [LINE #,,TTIBET] [TTYSRV]
Wait for line inactive or input buffer empty.
TTOAV [LINE #,,TTOAV] [TTYSRV]
Wait for line inactive and output buffer not
empty.
TTOBET [LINE #,,TTOBET] [TTYSRV]
Wait for line inactive or output buffer empty.
UDITST [0,,UDITST] [PHYSIO]
Wait for at least two free IORBs on UIOLST.
UDWDON [IORB ADDRESS,,UDWDON] [PHYSIO]
Wait for IS.DON to set in IRBSTS for this IORB.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 61
TOPS-20 Scheduler Test Routines
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
USGWAT [0,,USGWAT] [JSYSA]
Wait for lock on queued USAGE blocks to free.
VBWAIT [0,,VBWAIT] [CFSSRV]
Wait for VOTQ to be nonzero (vote buffers are
available).
VOTDWT [0,,VOTDWT] [CFSSRV]
Wait for HSHVRS set (restart) or HSHDLY zero or
all votes in. FKSTA2 has pointer to block.
VOTSWT [0,,VOTSWT] [CFSSRV]
Wait for HSHVRS set (restart) or all votes in.
FKSTA2 has pointer to block.
VVBWAT [UNIT #,,VVBWAT] [TAPE]
Wait for the MDA to reset TPVV handling EOV.
WTFKT [FORK #,,WTFKT] [FORK]
Wait for fork to be on WTLST.
WTSPTT [PAGE #,,WTSPTT] [SCHED]
Wait for share count on PAGE # to go to 1.
XMTTST [0,,XMTTST] [NIUSR]
Wait for PSI pending for fork or PRTIP (transmit
still in progress) zero in port block. FKSTA2 has
pointer to port block.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 62
TOPS-20 Page Zero Locations
TOPS-20 PAGE ZERO LOCATIONS
---------------------------
The following text outlines the uses of memory in page zero of the
TOPS-20 monitor as of Release 6.1.
ADDR MNEMONIC USAGE
==== ======== =====
0-17 -- Shadow ACs, not used.
20 SCTLW Scheduler halt request word (see SWTST in SCHED). Word
of function bits, current functions include Halt
timesharing, wait for system down, manual pause, and
reset FE protocol.
21 -- Used by BOOT to build CCW lists (unused by monitor).
22 -- Same as 21; both unused for KS10 systems.
23 CRSHTM Initial time for reload; -1 => time not set yet.
Contains the date/time that the system was last
reloaded. May see -1 after forced reload on KS
processor. BUGSTO (APRSRV) copies TADIDT into it for
each BUGHLT/CHK/INF.
24 SEBQOU Pointer to queued SYSERR blocks not yet written.
25 MMAPWD Pointer to MMAP for SETSPD. Contains MMAP.
26 BUGHAD Code around SYSLD1 (STG) puts LH into BUGCHK, RH into
BUGHLT after a reload. No one else uses it, so it
should contain zero.
27 CRSTD1 Current time is saved here on each BUGHLT/CHK/INF. This
is the value that gets into the SYSERR block. Contains
the date/time for the system's most recent
BUGHLT/CHK/INF.
30 SHLTW Scheduler halt word, depositing a nonzero contents
requests system shutdown.
31 RLWORD KS only; used for front-end communication, flags,
keep-alive, etc. (see PROKS). Unused on KL.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 63
TOPS-20 Page Zero Locations
32 CTYIWD KS only; used for front-end communication, used for the
CTY input location. Unused on KL.
33 CTYOWD KS only; used for front-end communication, used for the
CTY output location. Unused by KL.
34 KLIIWD KS only; used for front-end communication, used for the
KLINIK input location. Unused by KL.
35 KLIOWD KS only; used for front-end communication, used for the
KLINIK output location. Unused by KL.
36 -- Unused/reserved. Holds KS RHBASE during boot.
37 -- Unused/reserved. Holds KS unit number during boot.
40 .JBUUO Monitor's location 40. Holds KS tape info during boot.
41 .JB41 Monitor's LUUO dispatch word.
Contains XPCW LUUBLK.
42-43 -- Unused/reserved.
44 .JBREL Job Data Area word filled in by LINK. Contains 777.
45-67 -- Unused/reserved.
70 PWRTRP Location executed by the front-end on powerfail restart.
Contains JRST PWRRST.
71 RLDADR Executed by the front-end on certain (keep-alive)
reloads. APRSRV demands this location be PWRTRP+1.
Contains XPCW RLODPC which winds up at RLDHLT for a
KPALVH BUGHLT.
72 -- Contains address of EDDTF word.
73 CRSTAD Is supposed to contain date/time of last crash. Code in
STG checks it to decide to restore the data from
BUGHAD. During system startup for KL-10s the word is
used to set the reload date/time if nonzero.
Apparently it gets no real use on KS-10s. Contains
zero while system is in normal operation.
74 .JBDDT JOBDDT location.
Contains DDTZ (EDDT entry point).
75 .JBHSO Unused/reserved.
76 -- Contains address of DBUGSW word.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 64
TOPS-20 Page Zero Locations
77 -- Contains address of DCHKSW word.
100-107 -- Reserved for use by the front-end command language.
110 STSBLK KL-Status block pointer, virtual address. Contains zero
if status reporting is not enabled.
111 -- Physical address (MAP) of above virtual address.
112 .JBEDV Pointer to Exec Data Vector
Contains MONEDV.
113 SPRCNT Running count of SPEAR blocks queued.
Used by SETSPD. Initialized to -1.
114-117 -- Unused/reserved.
120 .JOBSA TOPS-10 style start address.
Contains NPVARZ+1,,EVGO.
121 .JBFF Contains first free address not loaded by LINK.
Contains NPVARZ+1.
122-132 -- Unused/reserved.
133 .JBCOR Job Data Area location set by LINK. LH contains highest
low segment address loaded with data. RH refers to a
SAVE argument for highest page.
134-136 -- Unused/reserved.
137 .JBVER Job Data Area version number word.
Contains current monitor version number.
140 EVDDT Monitor startup transfer vector; enter EDDT.
Contains JRST DDTZ.
141 -- Reset and go to EDDT location.
Contains JRST SYSDDT.
142 EVDDT2 Copy of 140.
Contains JRST DDTZ.
143 EVSLOD Entry to initialize file system, used for installation.
Contains JRST SYSLOD.
144 EVVSM Entry to verify swappable monitor on startup.
Contains JRST SYSVSM.
145 EVRST Restart the system location.
Contains JRST SYSRST.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 65
TOPS-20 Page Zero Locations
146 EVLDGO Reload and start the system location.
Contains JRST SYSGO.
147 EVGO Start the monitor location.
Contains JRST SYSGO1.
150 DDTPRS DDT present flag; EDDT is present if nonzero.
Contains -1 initially, cleared later for EDDTF not set.
151 BUTRXB Defined in BOOT and STG but not used (BOOT reads the
disk address of the Root-directory from the HOM
blocks). Contains zero. Pre-version 6.
152 BUTMUN Defined in BOOT and STG but not used (BOOT reads the
values from the HOM blocks, and uses variable MAXUNI
instead). Contains zero. Pre-version 6.
153-162 BUTDRT Defined in BOOT and STG but not used (BOOT uses internal
variable DSKTAB for logical to physical structure
mapping). Contains zeros. Pre-version 6.
163-201 BUTCMD ASCIZ file name of monitor; used for booting the
swappable monitor with calls to VBOOT for segments.
Pre version 6.
200 BUTCOD Unique code to identify TOPS-20 monitor EXE. (V6)
Contains BTCOD = 707707,,707707 to do V6-type load.
201 BUTLEN Length of BOOT communications region. (V6)
Contains BTLEN = 5.
202 BUTPGS Start,,End virtual addresses of VBOOT pages. Used to
reference and finally unlock/destroy VBOOT pages.
Pre-version 6.
BUTFLG BOOT flags. (V6)
203 BUTEPT Contains in LH: Address of the VBOOT EPT page.
RH: Address of the VBOOT page table page.
Pre-version 6.
BUTLLM Lower load limit for BOOT. (V6)
204 BUTPHY Contains in LH: Minus number of pages to map.
RH: Address of first page to map (for the monitor).
Typically contains -6,,NPVARZ for four pages of code, a
file data page and an index block page. Used with the
value in BUTVIR. Pre-version 6.
BUTULM BOOT upper load limit. (V6)
Highest monitor page BOOT will load EXE file data into.
Contains 1,,NRCODL.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 66
TOPS-20 Page Zero Locations
205 BUTVIR Virtual address of first page of BOOT to map. Typically
will contain 772000. Used in conjunction with BUTPHY.
Pre-version 6.
BUTERR Last error from BOOT. (V6)
206 BOOTFL BOOT flags word, 0 => normal, nonzero => special boot.
The contents is supposed to be the index into a table
(BOOTD) designating how to boot the swappable monitor.
An ILBOOT BUGHLT results if the index is too large. In
the SYSGO routine the value IRBOOT is put into BOOTFL;
the table BOOTD contains entries of JRST GSMDSK for all
entries but the IRBOOT offset, which has JRST GSMIRB.
Pre-version 6.
207 BUTSTA Start address of BOOT (VBOOT) for SWPMON load.
Pre-version 6.
207-236 PHYPZS Formerly used for page zero I/O use by PHYSIO.
Currently unused, contain zero. Not defined after V6.
237 SPTWD Physical address of start of SPT, used by SETSPD in
processing the dump file for SPEAR entries.
240 MSCWD Physical address of start of MSECTB, used by SETSPD in
processing the dump file for SPEAR entries.
241-477 -- Not used, contain zero.
500-777 TMPSMM Temporary swappable monitor map saved here during EVLDGO
startup.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 67
TOPS-20 Monitor Sections
TOPS-20 MONITOR SECTIONS
------------------------
The TOPS-20 monitor makes use of a number of sections of address space
on extended addressing machines. For release 6.1, this number has
increased. The following table lists the defined monitor sections at
approximately the timeframe of release 6.1.
Number Symbol Use of the Section
------ ------ ------------------
0 MSEC0 Section zero data and code
1 MSEC1 Section one data and code
2 DRSECN Mapped directories
3 IDXSEC Mapped disk index table
4 BTSEC Mapped disk bit table
5 SYMSEC Monitor symbol table, DDT, CSTs,...
6 XCDSEC Extended code section
7* MFSEC0 Variable - symbol set to value of first
assignable section
(7) TABSEC Tables - DST,...
(10) DNBSE1 DECnet buffers
- CTSSEC CTS terminal database
(11) CFSSEC CFS buffers
(12) INTSEC ARPANET (Internet) buffers
(13) RESSEC RSE, NRE, NRPE psects - resident free space
(14) SWFSEC Swappable free space
(15) FFMSEC Variable - symbol set to value of first free
assignable section
(37) PCDSEC POSTCD section (user mode only)
(37) HGHSEC Highest possible section value on KL-10 processor
* Numbers in parentheses represent the values from a "typical" 6.1
monitor. These numbers are assigned dynamically. See STG.MAC for
the definition of the MSECN macro and an explanation of assignable
sections. All the sections from SYMSEC on are new for 6.0 and
6.1.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 68
TOPS-20 Monitor PSECTs
TOPS-20 MONITOR PSECTS
----------------------
The TOPS-20 monitor code assembles into a number of PSECTs with varying
purposes. Release 6 and 6.1 have created even more PSECTs to worry
about. The following table lists the defined monitor PSECTs at
approximately the timeframe of release 6.1. Most PSECT beginnings are
defined in LDINIT with the rest in STG. POSTLD terminates all the
PSECTs and handles whatever address space rearrangement is necessary
before saving the monitor EXE file.
Release PSECT Purpose
------- ----- -------
RSCOD Resident monitor code and constant data
INCOD Resident initialization code and constant data
6 SZCOD Section-zero-only resident code
5 RSDAT Resident non-zeroed data
PPVAR Processor-private pages
RSVAR Resident zero-initialized data
6 SYVAR Symbol table data
NRVAR Swappable zero-initialized data
PSVAR PSB data
JSVAR JSB data
NRCOD Swappable monitor code and constant data
NPVAR Swappable page variables
POSTCD POSTLD code and data segment
6 ERVAR Extended section resident variables
6 ENVAR Extended section swappable variables
6 EPVAR Extended section swappable page variables
6.1 ERCOD Extended section resident code
6.1 ENCOD Extended section swappable code
6.1 XRCOD KDDT in extended section
6.1 XNCOD MDDT in extended section
BGSTR Bugstring texts
BGPTR Pointers to bugstrings
See SWSKIT document MONITOR-ADDRESS-SPACE.MEMOS for more detail.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 69
TOPS-20 Monitor Universal Files
TOPS-20 MONITOR UNIVERSAL FILES
-------------------------------
The following universal files are used to build a TOPS-20 monitor.
Release Name Function
------- ---- --------
ACTSYM Accounting file symbol definitions
MONSYM General monitor symbol definitions
MACSYM General monitor defintions
6.0 ANAUNV ARPANET TCP/IP symbol definitions
6.0 GLOBS Global symbol satisfaction definitions
6.0 MSCPAR MSCP symbol defintions
<6.1 NSPPAR DECNET phase 2 and 3 symbol definitions
PHYPAR PHYSIO-level device symbol definitions
* PROKL KL-10 specific definitions
* PROKS KS-10 specific definitions
PROLOG General monitor definitions
6.0 SCAPAR SCA symbol definitions
SERCOD SYSERR (SPEAR) file symbol definitions
6.1 CTERMD CTERM symbol definitions
6.1 D36PAR DECNET-36 symbol definitions
6.1 NIPAR Definitons for NI-20 service
6.1 SCPAR DECNET session control symbol definitions
6.1 TTYDEF Monitor terminal definitions
NOTES:
* PROKL and PROKS have been combined into PROLOG for Release
6.0 and no longer exist independently.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 70
TOPS-20 Job Zero Forks
TOPS-20 JOB ZERO FORKS
----------------------
The following is a table of the variables containing the process handles
for forks which run under Job Zero as inferiors of the monitor. Some of
these variables contain local fork handles and some contain system-wide
fork numbers.
Location Process handle for
-------- ------------------
CIDFRK IPADMP CI-20 microcode dumping fork
CIFORK Temporary CFS startup process
CILFRK IPALOD CI-20 microcode loading fork
CTMFRK CTERM host server system-wide fork number
DDMFRK DDMP periodic disk integrity system fork number
DNTFRK DECnet NSP fork (unused?)
EXPFRK System structure expunge task fork
INTFRK INTERNET TCP/IP task system-wide fork number
JB0FRK CHKR periodic task checking system fork number
LODFRK DTE Front-end reload fork
MOSFRK TGHA MF-20/MG-20 memory error analysis fork
RLDFRK DTE Front-end reload system-wide fork number
SEBFRK SYSERR error-logging system-wide fork number
SJBFRK SYSJOB program fork
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 71
KKnown Hardware Deficiencies List
Known Hardware Deficiencies List
--------------------------------
This is a collected list of known hardware characteristics which show up
from time to time as part of certain reported problems. This says
nothing about whether these characteristics are bugs or features, or
whether they will ever be fixed or changed, but merely attempts to make
them known internally.
1. DZ11 - Cannot set the speed to zero in the hardware, can only
turn off the receiver clock.
2. TM78 - ANSI ASCII was not included in the hardware format
modes.
3. TM78 - a formatter problem (corrected by ECO 12/83) causes
unreported data loss at end-of-tape.
4. TM78 - a formatter problem causes the data mode byte packing to
change from core-dump to hi-density "randomly" while reading a
record. TOPS-20 does not normally see the problem since it
usually appears as an overrun that gets retried successfully.
An ECO is planned.
5. TM02 - Can generate bad parity which it passes to memory to
cause the system memory parity errors when the data is
referenced. This is still seen with Rev 12 to the RH20.
6. TM03 - A chip race condition in the M8915 board has been known
to occur where a function register has wrong value because it
has not settled. This generates a device error which appears
transient; i.e. CRLFing DUMPER tries the read again and
succeeds.
7. TM03 - ANSI ASCII was not included in the hardware format
modes. The TM03 does not set format error if ANSI ASCII is
selected. It will usually get a frame error; if the transfer
is a multiple of 7/8 bit bytes, the frame error is not set
either.
8. TM03 - When using industry-compatible mode, reads not of a
multiple of four bytes will produce strange results. The bytes
are counted, but the extra bytes are not written to memory,
leaving garbage.
9. TM03 - if an error ocurs while rewinding, the monitor may be
left in a state of waiting for the rewind to complete, the tape
being unusable. The easiest way to clear this condition is to
reset the TM03, most easily done by the customer by powering it
down and back up.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 72
KKnown Hardware Deficiencies List
10. TM03 - if the TM03 loses synch during the PE preamble bytes,
and it reaches 9, it will raise the postamble instead of
generating the proper error. This can result in lost records.
The usual symptom is a frame count error. This case is
recognizable by the residual frame count being the same as the
initial frame count.
11. VT100 - on a VT100 without the extended memory, one can confuse
the internal microprogram enough to have it clear sections of
the screen on Control-U, Control-R, "clear to end of screen" in
132 column mode, etc.
12. VT125 - especially with printer port option, is known to hang
in an XOFF state that cannot be cleared without resetting the
terminal.
13. VT240 - especially with printer port option, is known to hang
in an XOFF state that cannot be cleared without resetting the
terminal. Reportedly this happens much more often than with
the VT125.
14. RH20 - perfectly willing to store bad parity data into memory
until Rev 12. May still do so.
15. DX20 - is unwilling to allow registers to be examined after it
has started I/O. Can cause register access errors if not
programmed in correct sequence.
16. DX20 - there is a race type condition where the DX20 generates
an an interrupt request on channel 5 for some condition, but
the code is playing with the DX20 and handles the condition, so
it lowers its request, however the KL has latched the interrupt
and tries to process it, but no one will respond. So it tries
the 40+2n type, which gives a PI5ERR occasionally.
17. DX20/TU71 - the DX20 microcode does not set the 556 bpi density
correctly for TU71 (7-track) drives. This can be set
successfully from the maintenance panel.
18. DX20/TX03 - With dual-porting between systems, if the system
issues a drive clear to the DX20 during serious error recovery,
or when booting the DX, the DX resets the TX03. The reset
traps the TX03 to zero, leaving any operation on the other
channel in a hung state.
19. DX20/TX03- when dual-porting between systems, DX2FGS BUGCHKs
occur. The DX2FGS timer in TOPS-20 has been made larger. This
should lessen the occurence of these BUGCHKs, but may not
eliminate them entirely.
20. LP20 - at least one of the printers fails to go off-line when
there is anything in the print line buffer, even if the drum
gate is opened.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 73
KKnown Hardware Deficiencies List
21. LP26 - fails to go offline when there is something left in the
print line buffer. When it runs out of paper, it goes offline
several lines from the actual bottom of the page.
22. LP27 - will not accept the alternate start bytes for 6/8
lines-per-inch in a VFU file. Gets a VFU load error. Use only
"LEFT-ALONE" with the "LINES-PER-INCH" command to MAKVFU.
NORMAL.VFU is fine.
23. RP07 - runs several hundred microdiagnostics at power-up,
causing hundreds of interrupts and keep-alives on -10s and
-20s.
24. RP20 - when using the dual-port option, RP20s regularly lose a
full rotation when trying to do read/write next operations.
This happens about once every ten seconds and will result in
slightly degraded performance.
25. RP20 - may evidence RMR (register modification refused) errors
due to a limp servo mechanism on one of the drives. Multiple
queued operations complete before the drive disconnects
properly and sets GO correctly for the controller to handle the
write.
26. KL10 Microcode - the ADJBP instruction does not work on the
last location of a page. Corrected in 5.1 microcode.
27. KS-10 Front End - Rev. 3. exhibits problems with the KLINIK
line. If the link is in use, it is possible to lock out the
CTY. There are problems with the password check on subsequent
tries, and problems with line hang-up. A software fix has been
implemented which clears the KLINIK output word after queueing
the KLINIK request. This appears to solve the problems.
28. KS-10 Front End - Rev. 3. exhibits some problems with
powerfail restart. If the power returns in less that 3.5
seconds or so the restart will hang. In addition if Rev. 3
and Rev. 2 boards are mixed, there is no powerfail restart or
reload capability.
29. KS10 - during a forced reload, the halt status block is written
twice, first when halting and second when rebooting; thus the
second time wipes any valuable data from the first time. It's
once again the 8080 that's responsible.
30. HSC50 - apparently there may be some problems under 6.1 and HSC
v(200) with recognizing all disk units if the HSC is configured
as node zero. This is not yet well understood.
31. HSC50 - has been reported to hang if the HSC console terminal
runs out of paper and there is output pending.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 74
KS10 Processor Console Information
KS10 PROCESSOR CONSOLE INFORMATION
----------------------------------
CSL-COMMANDS CURRENTLY IMPLEMENTED (CSL V0.161)
^Z ;enter USER mode
^\ ;enter CONSOLE mode
MK XX ;Marks microcode word at CRAM address XX (sets bit 95)
UM XX ;Unmarks Microcode at CRAM address XX
MB ;load only bootstrap of currently selected magtape
LA XX ;Load/set KS10 Memory Address
LI XX ;Load/set I/O address
LK XX ;Load/set 8080 address
LC XX ;Load/set CRAM address to be written/read
EM ;Examine KS10 Memory (last Memory location specified)
EM XX ;Examine KS10 Memory location XX
EN ;Examine Next (either from last EK, EM or EI)
EB ;Examine BUS and 8080 control registers
EI ;Examine I/O (last I/O address specified)
EI XX ;Exmaine I/O address XX
EK ;Examine 8080 location
EK XX ;Examine 8080 address XX
DM XX ;Deposit KS10 Memory last addressed, XX data (36 bits)
DN XX ;Deposit next (depending on last DK, DM or DI) XX data
DB XX ;Deposit BUS, XX data (36 bits)
DI XX ;Deposit I/O, XX data (16,18 or 36 bits)
DK XX ;Deposit XX (8 bits) into 8080 (Data can only be deposited
;in RAM addresses)
MR ;MASTER RESET
CS ;CPU clock start
CH ;CPU clock halt
CP XX ;CPU clock pulse (XX=NR of pulses -- default 1 pulse)
SI ;Single Instruction
LF XX ;Load diagnostic write function (0-7) specifying 12 bits of
;microcode (see note at end ****)
DF XX ;Deposit Field, write microcode bits according to last LF-command
EC ;Examine CRAM ..curr. Control reg, no clocks .. current loc as addr.
EC XX ;Examine CRAM at address XX
DC XX ;Deposit CRAM, XX is at least 32 octal characters. Address
;previously loaded by LC command
EX XX ;EXecute KS10 instruction XX
ST XX ;STart KS10 at address XX. Console enters user mode
SM XX ;Start microcode at XX (SM 1 causes dump of HALT-status block !!)
;Default is 0 -- Start microcode
HA ;HALT KS10 (execute HALT-instruction -- causes microcode to
; write HSB and then to enter HALT-loop)
SH ;SHUTDOWN (deposit non-zero data in memory location 30)
; causing TOPS20 to shut down
CO ;Continue (causes microcode to leave HALT-loop)
PE X ;Parity Enable (0=disable, 1=DRAM-par, 2=CRAM-par
; 4=clock-par error stop, 5=DPE/DPM, 6=CRA/CRM, 7=enable all)
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 75
KS10 Processor Console Information
CE X ;CACHE enable (0=OFF, 1=ON, <CR>=show current state)
TE X ;CPU timer (1 MSEC) enable (0= OFF, 1=ON, <CR>=show current state)
TP X ;CPU TRAPS enable (0=OFF, 1=ON (enables paging),
;<CR>=show current state)
LT ;Lamp Test, lights three lamps of front panel
RC ;Read CRAM direct, functions 0-17
; (no resets, no load diag adr, no CPU clock) (see note at end ****)
EJ ;Examine Jumps -- prints CRAM address signals (Current CRAM address,
;next CRAM address, jump address, subroutine return address)
TR XX ;TRACE - repeats CP and EJ commands until any character typed
;XX (if typed) is desired CRAM stop-address
PM ;Pulse Microcode (issue single CP and EJ)
ZM ;Zero KS10 MOS Memory (beware -- slow)
RP ;Repeat - repeats last command, or line of commands which it delimits
; Any character (except CNTRL-O) typed will stop repeat
;EXAMPLE: EM 0, EK 0, EC 0, RP will repeat execution of this line
BT ;Boot SYSTEM -- load CRAM from designated disk (see DS)
; via memory then load monitor boot from disk and start at 1000
BT 1 ;same as BT, but loads diagnostic monitor SMMON and starts at 20000
LB ;Load Bootstrap from designated disk (see DS)
LB 1 ;Load Bootstrap diagnostic monitor SMMON
DS ;Disk Select for bootstrap or microcode verification. Command prompts
;to specify UNIT NUMBER (default 0), RHBASE (default 776700),
;and UNIBUS ADAPTER (default 1) to load from when booting
MS ;Magtape Select for bootstrap or microcode verification. Command
;prompts to specify UNIT NUMBER (default 0), RH BASE (default 772440),
;UNIBUS ADAPTER (default 3), SLAVE NUMBER (default 0), and
;DENSITY (default 1600 BPI) of magtape to boot from
MT ;Magtape Boot system from selected magtape
MT 1 ;BOOT diagnostic monitor SMMAG from magtape
PW ;clears KLINIK password, or sets it (6 char's max)
KL x ;KLINIK control: 0 = off, 1 = on for remote CTY access
BC ;BOOT Check. PROM code which tests the basic 2020 system
; load path from the UNIBUS adaptor into the CRAM via memory.
CONTROL CHARACTERS
^U ;rub out current line
^O ;switch: first one stops CTY-output, second one resumes CTY-output
^S ;stop TTY-output and hangs 8080 waiting for CONTROL-Q (see below)
^Q ;resumes TTY-output
^C ;stops whatever the 8080 is doing
RUB-OUT ;rub out previous character typed
NOTE: Several commands may be put on a single line, separated by commas.
NOTE: Additional information on KS10 console commands can be found
in the KS10 MAINTENANCE GUIDE
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 76
KS10 Processor Console Information
***** CRAM Bit Formats
LF-Command CRAM Bits RC-Command CRAM Data
-------------------- ---------------------
LF CRAM bits RC Data
-- --------- -- ------------------------------
0 00-11 0 CRAM bits 00-11
1 12-23 1 Next CRAM address
2 24-35 2 CRAM subroutine return address
3 36-47 3 current CRAM address
4 48-59 4 CRAM bits 12-23
5 60-71 5 CRAM bits 24-35 (Copy A)
6 72-83 6 CRAM bits 24-35 (Copy B)
7 84-95 7 0s
10 Parity bits A-F
11 KS10 bus bits 24-35
12 CRAM bits 36-47 (Copy A)
13 CRAM bits 36-47 (Copy B)
14 CRAM bits 48-59
15 CRAM bits 60-71
16 CRAM bits 72-83
17 CRAM bits 84-95
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 77
KS10 Processor Console Information
8080-CONSOLE-ERROR-CODES
------------------------
?A/B A and B copies of CRAM bits did not match
?BC BOOT Check failed
?BFO Input Buffer Overflow
?BN Received Bad Number on input (character typed is not an
octal number)
?BT Device error or timeout during BOOT operation
?BUS BUS polluted on power-up
?CHK PROM checksum failed
?DNC Did Not Complete HALT
?DNF Did Not Finish instruction
?FRC had a forced reload
?IA Illegal Argument (address out of range, etc.)
?IL ILLEGAL Instruction
?KA KEEP ALIVE failed
?MRE Memory Refresh Error (MEM BUSY stayed set too long,
because it didn't release data on a write to memory)
?NBR Console was not granted BUS on a request
?NDA Received No Data Acknowledge on memory request
?NR-SCE Non-Reversible Soft CRAM error.
?NXM Referenced NoneXistent Memory location
?PAR ERR Report clock-freeze due to parity error,
and type out READ IO of 100,303,103
?PWL Password Length error
?RA Command Requires Argument
?RUNNING CPU clock running (command typed requires clock to be stopped
and may fail)
%SCE Soft CRAM error
?UI Unknown Interrupt
OTHER 8080 CONSOLE MESSAGES
---------------------------
BT SW message says BOOTING, using BOOT switch
BUS 0-35 message header for EB command
CYC cycle type for DB command
C CYC typed on DB-command if COM/ADR cycle blew
D CYC " " DATA cycle blew
HLTD message "HALTED/XXXXXX " where xxxxxx is data
KS10> prompt message
OFF message, says current state is off
ON message, says current state is on
RCVD data received on bus
SENT data sent to bus
>>UBA? query for UNIBUS adapter
>>UNIT? query for unit to use
>>RHBASE? query for RH11 base register address to use
>>DENS? query tape density
>>SLV? query tape slave number
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 78
KS10 Processor Console Information
8080-ERROR-Messages-during-BOOTING
----------------------------------
Disk:
On an error-condition, detected by the 8080, the
Fault-light will go on and a message of the form
?BT XXXYYY
will be printed on the CTY.
The following error-codes are only "rough" pointers, they can be
caused by any of the following problems:
Disk not a disk at all
Wrong unit selected (see DS-command)
Home blocks not readable or not there
Home blocks not set by SMFILE for 8080
8080 File-system garbage
XXX=001 Disk error encountered while trying to read HOME-blocks
Can mean incorrect RHBASE specified, wrong UBA selected,
bad disk drive, neither home block or alternate home
block has home block ID ("HOM" in sixbit)
XXX=002 Disk error encountered while trying to read the page of
pointers, which make up the "8080-File-System"
Can mean pack is not in format for 8080 loading, home blocks
bombed, bad drive or pack
XXX=003 Disk error encountered while trying to read a page of
microcode - can mean pack is not in 8080 format, or bad drive or
pack
XXX=004 Microcode did not successfully start running after a BT, MT,
MB, or LB command. This error will occur when an LB is done
before the system microcode is loaded.
XXX=010 Disk error encountered while trying to read PRE-BOOT
YYY are the lower 8 bits of the 8080 address of the failing
"Channel Command List" operation. Normally it is here
a good bet to do an "EI" to get the contents of the
RH11 register that has the error-bits set !
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 79
KS10 Processor Console Information
Magtape:
The following ERROR-messages can point to the following problem areas:
Magtape is no magtape at all
Wrong unit selected (see MS-command)
Magtape is not bootable (no microcode, no PRE-BOOT)
XXX=001 Error trying to read microcode first page
Can mean wrong unit selected, wrong RHBASE address, wrong UBA
selected, wrong slave number, wrong density, bad drive, bad
controller, bad tape, tape in wrong format
XXX=003 Error trying to read additional pages of microcode
XXX=010 Error trying to read in PRE-BOOT program
May occur while doing a skip over the microcode file, or
while reading the PRE-BOOT itself
YYY see above (disk-section)
Error-messages-out-of-PRE-BOOT
PRE-BOOT is loaded from Disk or Magtape (see 8080 commands DS, MS,
BT, BT 1, MT, MT 1)
PRE-BOOT is written onto the disk using "SMFILE.EXE", it also is written on
"standard" Diagnostic-tapes and onto the "MONITOR-INSTALLATION"-tapes.
PRE-BOOT is loaded by the 8080 into MEMORY-locations 1000 and up, and starts
at 1000. The ERROR-halts are:
1001 found "bad" core-transfer address
(page 1 is illegal - can't overload PRE-BOOT)
1003 No RH11 Base Address
1004 Magtape Skip failure
1002 Disk Retry error or Magtape Read error
At ERROR-halt time the following MEMORY-Locations contain the useful INFO :
Disk-Booting Magtape-Booting
------------ ---------------
100 "8080" disk-address Not used
101 Memory transfer address same
102 T3, selection pickup pointer same
103 RPCS1-register MTCS1-register
104 RPCS2-register MTCS2-register
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 80
KS10 Processor Console Information
105 RPDS - register MTDS - register
106 RPER1-register MTER1-register
107 RPER2-register (RP06 only) Not used
110 RPER3-register Not used
111 UBA Page RAM loc 0 same
112 UBA-status register same
113 Version Nr. of PRE-BOOT same
Note: The Version Nr. of PRE-BOOT will be the same as the Version Nr.
of SMFILE. The "8080" disk-address is in the form " CYL SEC SURF "
THEREBY IT WILL BE POSSIBLE TO ASK A CUSTOMER WITH A PRE-BOOT FAILURE,
TO DO AN :
EM 77
EN,RP
...... AND TYPE SOMETHING AFTER ADDRESS 115
...... AND THEN TELL US WHAT HE SEES
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 81
KS10 Processor Console Information
8080-Communication-Area (KS10 Memory)
-------------------------------------
The 8080 maintains and services an in-core communication area.
Currently used are words 31 to 40. See PROKS/PROLOG for more info.
Word Bits Meaning
---- ---- -------
31 Keep Alive and Status word
4 Reload Request
5 Keep Alive active
6 KLINIK active
7 PARITY Error detect enabled
8 CRAM Parity Error detect enabled
9 DRAM Parity Error detect enabled
10 CACHE enabled
11 1 msec enabled
12 TRAPS enabled
20-27 Keep Alive counter field
32 BOOT SWITCH BOOT
33 POWER FAIL
34 Forced RELOAD
35 Keep Alive failed to change
32 KS-10 CTY input word (from 8080)
20-27 0 -- no action, 1 -- CTY character pending
28-35 CTY-character
33 KS-10 CTY output word (to 8080)
20-27 0 -- no action, 1 -- CTY character pending
28-35 CTY-Character
34 KS-10 KLINIK user input word (from 8080)
20-27 0 -- no action, 1 -- KLINIK character,
2 -- KLINIK active, 3 -- KLINIK carrier loss
28-35 KLINIK-Character
35 KS-10 KLINIK user output word (to 8080)
20-27 0 -- no action, 1 -- KLINIK character, 2 -- Hangup request
28-35 KLINIK-Character
36 BOOT RH-11 Base Address
37 BOOT Drive Number
40 Magtape Boot Format and Slave Number
OUTPUT process KS10 ==> 8080
----------------------------
Load character and flag into 33, set 8080-interrupt, 8080 examines
33 and gets character, clears interrupt, sends character to hardware,
clears 33 and sets KS-10 interrupt.
INPUT process 8080 ==> KS10
---------------------------
8080 gets interrupted "TTY-char available", 8080 gets character and
delivers into input-word (31) with flag(s) and sets KS-10 interrupt.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 82
BOOT Command String Functionality
BOOT COMMAND STRING FUNCTIONALITY
---------------------------------
The BOOT program is usually invoked in one of two ways: invisibly as
part of a system dump and reload on a crash condition, or by explicit
invocation and response to the BOOT> prompt, usually with a simple
carriage return.
BOOT, however, possesses substantially more command line functionality,
at least some of which can be useful to the Specialist in a debugging
situation. This document tries to explain some of that functionality in
the context of the BOOT for Release 6.1 of TOPS-20.
BOOT parses a command string of the form:
device:<directory>file.extension.generation/switchaddress(lower,upper)
with the restrictions that some switches have precedence and so some
combinations are meaningless, and that the directory must NOT be a
subdirectory, i.e. <SYSTEM> is legal, but <SYSTEM.MONITORS> is not.
The "default" command string (in response to a simple carriage return)
is: PS:<SYSTEM>MONITR.EXE/R - i.e. load and start the resident
monitor.
The available switches are:
/M MERGE - merge specification with current memory.
/L LOAD - load according to specification (suppresses
default startup).
/A ALL - load all of specified file, useful for avoiding
bounds values; loads up to page 377.
/R RUN - run specification - this is the default. Load and
start at the EXE file entry vector location. If no
(firstpage,lastpage) specification is given, then BOOT
will look in .JBSYM for the last location of the symbol
table and load up to that point (this assumes that this
is the last location in the resident monitor). If
.JBSYM is zero, or there is no page zero in the .EXE
file to find it in, then the old assembled-in default
of (0,340) will be used.
/D DUMP - dump on given specification. The default here is
PS:<SYSTEM>DUMP.EXE.1 but other existing files could be
used, e.g. if the normal dump file kept causing the
dump to fail with ?IO Error because of bad pages or was
too small, etc. one could do something like:
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 83
BOOT Command String Functionality
BOOT>JUNK:<DUMPS>DEBUG.EXE.1/D
BOOT>
BOOT has special address space knowledge which it
applies to writing out the monitor dumps, such as not
writing out pages overwritten by BOOT, etc.
/S SAVE - similar to /D, but no special knowledge applied,
saves according to specification. Useful for things
like saving image of BOOT for debugging.
/Gadr GO - transfer control to location adr; for example /G141
to invoke EDDT.
/E EDDT - load and transfer to EDDT. This is a shorthand
method for the old two command sequence of /L, then
/G141; e.g.
BOOT>NEWMON/E
EDDT
/I INFORMATION - displays the current version of BOOT and
the version numbers of any DX20A or DX20B microcode
assembled into BOOT.
/E and /I are not available in the BOOT that goes with TOPS-20 4.1.
The monitor uses the (lowerbound,upperbound) construct in loading the
swappable monitor in multiple passes into the available physical memory
by building the appropriate command string to merge the next set of
pages and invoking BOOT at the VBOOT entry point multiple times.
For version 6 of TOPS-20 there is another mechanism BOOT uses to try to
load the monitor in a single pass. See the section on Page Zero
locations to learn about the communication region.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 84
Crash Analysis
TOPS-20 CRASH ANALYSIS FUNDAMENTALS
-----------------------------------
TOPS-20 crash analysis is a complex subject which can never be be fully
taught since the areas of interest change constantly as new versions are
released, with new problems; and as problems are fixed, leaving areas
of code "stable", and no longer hot analysis prospects. Hence, like
diagnostics, things like crash analyzers tend to evolve into software
that can produce enormous quantities of uninteresting data after a lot
of computes, but come up with a bottom line of "I don't know what's
wrong".
These articles attempt to present the fundamental tools and methods that
usually wind up getting used on most crashes, and explain some of the
data structures that often need to be passed through on the way to the
answer to the problem. Some effort is made to give some of the
traditional methods by which hiding bugs may be forced into the open.
CRASH DUMPS:
-----------
Each time there is a BUGHLT there is an automatic dumping of the system
core image into PS:<SYSTEM>DUMP.EXE. If there is sufficient room on the
disk the data that was previously in DUMP.EXE will be copied into
DUMP.CPY by SETSPD after the system is reloaded. DUMP.CPY does not get
deleted and you may find several generations of DUMP.CPY.
TOPS-20 will not create a dump of the Monitor unless the system is
properly prepared to do so. This means that there must first exist a
file called PS:<SYSTEM>DUMP.EXE that will accomodate the dump. This
file can be found on the distribution tape for TOPS-20, or it can be
created by using the MAKDMP program, which will accept the memory size
from the user, and create the proper sized file. The file must contain
a sufficient number of pages equal to the total number of pages of
physical memory in the DECSYSTEM-20 plus enough pages to hold the EXE
file directory for the dump (generally one), minus the number of pages
that BOOT overwrites, and which will not be present in the dump. For
example, a system that has 1024K words of memory should have a DUMP.EXE
file that is about 2048 pages long. It is important to remember that
the number of pages in the dump file must be twice the size of the
machine's memory capacity in K words.
It is possible to give a FILENAME/D to BOOT to specify where to dump the
monitor, so it is possible to put up another pack, or whatever to get a
dump for those situations where there is no existing DUMP.EXE on the
pack to dump into. The filename given must exist however, and not be in
a subdirectory, or too small, or not all the memory will be saved. See
the article on BOOT commands for more info.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 85
Crash Analysis
GETTING A DUMP FROM BOOT:
------------------------
Normally, when the system has crashed for whatever reason, it will
reload itself using the BOOT program. This Auto-reload feature can be
suppressed, by giving the "SET NOT RELOAD" or "CLEAR RELOAD" command to
the PARSER. The PARSER must first be set in PROGRAMMER mode, via the
"SET CONSOLE PROGRAMMER command. These commands do not apply to 2020's,
of course. There is a location in the 8080 which, when it contains the
right number, will prevent automatic reloads after crashes. The
location depends on the revision level of the ROM, which is typed at
system startup. The following commands will turn off auto-reload:
ROM level 0.1 ROM level 4.2
KS10>LK 20255 KS10>LK 20256
KS10>DK 303 KS10>DK 303
Also, patching the BUGHLT code where the reload is requested will
prevent an auto-reload. Placing a JFCL in locations BUGH2+3 and BUGH2+4
in the running monitor will prevent the monitor from issuing its
request.
BOOT has a limited file system capability when creating the file to
contain the dump, and in this manner avoids complicating a possibly
compromised file structure during the reload. It is for this reason
that the DUMP.EXE file must already exist on the public structure; for
BOOT can find it there, but it can not create it if it does not already
exist. Also, because BOOT resides in main memory of the host (KL10 or
KS10) processor, small portions of the Monitor will be overwritten when
BOOT is loaded into memory. Currently, BOOT is written into that area
of the resident Monitor that normally contains pure code, and as such is
not usually of much consequence. When one needs to refer to this
portion of the code, either the listings or fiche can be used, or the
MONITR.EXE file itself. At about 6.1 time, BOOT loads into pages 11-54
of memory, or 11-62 if BOOT contains both DX20A and DX20B microcode.
If for some reason the system fails to auto-reload, then it is still
possible to obtain a copy of the dump. To do this, the front end must
have at least loaded the BOOT program, and the console will display the
BOOT prompt:
BOOT>
BOOT has a number of commands that may be used to manipulate the
contents of the processor memory; in this case, the command we will use
will cause BOOT to copy the contents of memory into PS:<SYSTEM>DUMP.EXE:
BOOT>/D or BOOT>filename/D
BOOT> BOOT>
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 86
Crash Analysis
At this point the system may be brought up normally, and the analysis of
the dump may begin.
Similarly, a KL-10 system may be set to suppress the auto-reload
facility, and the CTY will prompt with the KLI> prompt. Simply typing
the word "BOOT" will load the BOOT program into memory. There are cases
where the system may be completely hung, and it is unclear how to best
initiate an orderly shutdown. Obviously, it is always possible to type
the control-backslash (^\) character at the CTY to get into the
front-end parser, but then what can be done? The front-end parser
allows the operator to force the processor to jump to a specified
location, and in the case described above, this feature may be used to
force a BUGHLT. This can be done after typing ^\, with the following
commands:
PAR>SET CONSOLE PROGRAMMER
CONSOLE MODE: PROGRAMMER
PAR>JUMP 71
PAR>
causing the console to return to USER mode, connected to the KL-10.
This will be followed immediately by a KPALVH BUGHLT (Keep Alive Halt),
and the system will perform the usual BUGHLT procedures. The above
command forces the processor to jump to location 71, which in turn will
cause the BUGHLT, sweeping the cache to ensure all of the dump taken
will contain valid data. Simply forcing the processor to halt, and then
reBOOTing and getting a dump will not cause the cache to be invalidated,
and random locations in the dump will not contain valid data.
On the 2020 the equivalent command is "KS10>ST 71".
GETTING A FRONT-END DUMP:
------------------------
The front-end will generally create a crash dump file called
PS:<SYSTEM>0DUMP11.BIN, containing the core image of the PDP-11. If the
front-end is hung, and none of the terminals are usable, it is still
possible to obtain a dump of the -11. By setting the HALT/ENABLE switch
of the -11 to the HALT position, and then back to the ENABLE position,
the KL-10 will force the -11 to reload. In the process of reloading the
-11, the KL will indicate to the -11 that it has reloaded, and send the
necessary information to set up the terminals, and unit record devices
connected to the -11. The -11 will, in the process of reloading, dump
the old core image into the 0DUMP11.BIN file mentioned earlier. In the
event that the problem will be the subject of an SPR, the front-end
crash dump should also be included on the DUMPER tape with the SPR.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 87
Crash Analysis
CRASH ANALYSIS MATERIALS:
------------------------
First when analyzing software or software/hardware problems be sure you
have the proper tools:
1. A SWSKIT on magtape to provide further tools and documentation.
2. A full copy of the current release microfiche MONITOR and EXEC
or equivalent listings online or on paper.
3. A copy of the Monitor Tables document from the Software
Notebooks or the SWSKIT tape.
4. A MONITOR CALLS Reference Manual.
5. A SPEAR (formerly SYSERR) manual.
6. A listing of the SPEAR/SYSERR log, especially if hardware is
suspected. The SWSKIT programs DSKERR and SWSERR produce
useful, compact extracts of the SPEAR error file, for disk and
BUGxxx entries, respectively.
7. The CTY log for BUGHLTs and BUGINFs or other problem
indications, or an accurate reproduction of this information.
8. Any other manuals you may need for reference such as the proper
version Installation Guide, Operators Guide, System Managers
Guide, etc.
9. The BUGS.MAC file for releases 4.1, 5.0, 5.1.
10. The TOPS-20.BWR file for documentation of known exceptions to
the normal documents.
11. The current FILDDT.EXE to examine the dump.
12. The MONITR.EXE responsible for the crash to load symbols from,
and, of course
13. The DUMP.EXE or DUMP.CPY resulting from the crash.
You will need the SWSKIT and perhaps listings of the latest versions of
monitor modules in case the microfiche are not up to date. FILDDT is on
the customers distribution tape.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 88
Crash Analysis
Be sure you have analysed the SPEAR log. This is the easiest way to
determine the hardware state of the machine at the time of the crash.
Be sure, also, that you have looked up the BUGHLT and/or BUGCHKs in
question in the listings (microfiche) and have at least read the
comments around them. Probably tracing down how it got called is a good
idea. If you happen to be without a GLOB (provided on microfiche) you
can find the BUGHLT tag of interest in the monitor as follows:
$GET <SYSTEM>MONITR.EXE
$ST 140
DDT
ILMNRF? ; BUGHLT of interest followed by "?"
PAGEM G ; it is defined in PAGEM and is global
Some other useful bits of information: There is a GLOB listing provided
in the microfiche which contains a list of all the global symbols in the
monitor. Most of the datasymbols are defined in the module STG.MAC. If
you don't know a tag name but want to look at the storage for DTEs, say,
look through STG. STG also contains some small portion of code mostly
to do with restart, start, auto reload, dispatches for PI channels and a
few scheduler tests. STG stands for storage. Note that some stuff may
be defined in PROLOG, and of course lots of stuff is defined throughout
the monitor. You may also want to get a listing of MACSYM to be able to
understand the macros you see while reading the monitor listings;
MONSYM is also useful at times. Be sure you know how PARAMS has been
changed in case it has. See BUILD.MEM on the distribution tapes for the
currently distributed information on what to do to change various system
parameters in PARAM0.MAC. Be sure that you know about any variables
that the site may have changed in STG as well.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 89
Crash Analysis
EXAMINING THE MONITOR:
---------------------
Debugging a complex, multi-process software system is largely a matter
of absorbing sufficient knowledge, experience and folklore about the
particular system with a considerable element of personal preference, or
'taste' also involved. This document is a cursory description of
features built into the system to aid debugging, and such folklore as
can be described in written English.
There are four different versions of DDT that may be used to examine the
monitor. Each is used for a different purpose and has special
capabilites. The versions of DDT are:
1. UDDT (user DDT) used to examine or modify the MONITR.EXE file.
2. MDDT (monitor DDT) used to examine or modify the running
monitor under timesharing.
3. EDDT (exec DDT) used to examine or modify the running monitor
from the CTY in a stand-alone mode.
4. FILDDT used to examine dumps.
All the DDT's are versions of TOPS-20 DDT documented in the TOPS-20 DDT
manual, and have all of the features described in the manual. See also
the document DDT41.MEM.
The use of all four versions of the DDT's is the same and will be
described later, however, each version is started differently.
UDDT:
----
To use UDDT to modify your MONITR.EXE file on system, you must give the
following EXEC commands:
@GET <SYSTEM>MONITR.EXE
@START 140 (on systems after Release 4, @DDT works too)
This causes EDDT to start in user mode. This is the same DDT that is
used when examining any program. You may now look at or change any part
of the monitor. If you make changes to the monitor and want to save it,
you should get back to the EXEC by typing ^Z. Then you may save the
monitor. You will probably have to be enabled in order to save the
monitor back in <SYSTEM>. This is the safest, best, and only
recommended method of putting patches into the monitor.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 90
Crash Analysis
MDDT:
----
A version of DDT which runs in monitor space is available. It can
examine and change the running monitor, and can breakpoint code running
as a process but not at PI or scheduler level. When patching or
breakpointing the monitor, the normal write protection must be defeated,
either by setting DBUGSW to 2 on startup, or calling SWPMWE. If you
insert breakpoints with MDDT, remember monitor code is reentrant and
shared so that the breakpoint could be hit by any other process in the
system. In this event, the other process will most likely crash the
system since it will be executing a JSR to a page full of zeros.
To use MDDT you must have WHEEL or OPERATOR capabilities. You first
issue the EXEC command:
@ENABLE
$^EQUIT
; You are now in the mini-exec and receive a prompt
; of MX>. Now you give the "/" command:
MX>/
; You are now put into MDDT. To return to the EXEC
; you can issue a ^Z or a ^C which produces a
; message like "INTERRUPT AT 17372" and returns you
; to the mini-exec. If you type a ^P in MDDT you
; will get a message, "ABORT", and be returned to
; the mini-exec. If you once go into the mini-exec
; the CONTROL-P interrupt is enabled and typing this
; character will return you to the mini-exec. This
; is a good thing to use when debugging programs
; that do CONTROL-C trapping. From the mini-exec
; you may give either:
MX>S
; or
MX>E
; The S is filled out as START and the E as EXEC.
; Both of these commands will return you to the
; EXEC. See the section EXEC-DEBUGGING for more info
; about ^P and getting out of the EXEC to MX> and
; returning from MX> to either your copy of the EXEC
; or the system EXEC.
; You may also give the command:
MRETN$G
; From MDDT to return directly to the EXEC. While
; in MDDT you may examine any core location in the
; running monitor. If you wish to change any of the
; locations in the protected monitor you must give
; the command:
CALL SWPMWE$X or $W
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 91
Crash Analysis
; To write enable the monitor. After you have made
; your changes you must give the command:
CALL SWPMWP$X
; to write protect the monitor again.
MDDT may also be entered from process level via JSYS:
JSYS 777$X
or
MDDT%$X ; will enter MDDT from the context of the current process
To return to user context:
MRETN$G
To use SETMPG to map pages to this context:
Page 677 has been traditionally used for this; but any unused
page may be used. To make sure that the page is currently
unused type:
ADDRESS/ ? ; the question mark from DDT indicates that the
; page is nonexistent.
when the destination page has been found, set up AC2 as:
AC2/ ACCESS,,677000
If the page has its own SPT slot:
AC1/SPT INDEX
If the source page does not have its own SPT slot, it will belong to
either a file or process page table. It will be represented as an index
into this page table:
AC1/ SPT INDEX OF PAGE TABLE,,INDEX INTO PAGE TABLE
Access = read or/and or write access
Read/Write access = 140000 in LH
Therefore, to map a page, call with either:
AC1/SPT INDEX OF PAGE
AC2/140000,,677000
or
AC1/SPT INDEX OF PAGETABLE,,INDEX INTO PAGE TABLE
AC2/140000,,677000
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 92
Crash Analysis
AND SAY:
CALL SETMPG$X
The page will then be mapped to page 677. In examining locations
677000-677777, you will be looking at the contents of the page.
If you desire to map another page into this slot, merely call SETMPG
again with arguments for the new page. You need not first un-map the
old page. However, when you are finished, page 677 should be un-mapped
in the following manner:
AC1/0
AC2/ACCESS,,677000
CALL SETMPG$X
WARNING:
Calling SETMPG incorrectly can crash the system. Be CAREFUL! Do not
use SETMPG on a time sharing system if a crash will cause bad feelings.
NOTE: if you have the Release 5 version of MDDT/EDDT that has sticky
current address section (see DDTxx.MEM) then be careful about doing an
MRETN$G after examining section 2, as a crash will result from
transferring to MRETN in section 2.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 93
Crash Analysis
EDDT:
----
NOTE
Not to be confused with ^EEDDT command to
get into UDDT used with the command
processor. See separate document on EXEC
DEBUGGING for that.
To get into EDDT you must bring the system up using the switch-register.
See the DECSYSTEM-20 Operators Guide for a discussion of switches. Go
through the KLINIT dialog and when you get the prompt BOOT>, respond
with:
BOOT>/L or BOOT>/E (in version 5 or later)
BOOT>/G141
The "/L" command causes the monitor to be loaded, but not started. The
"/G141" starts the monitor at location 141, which is a jump to EDDT.
You can use EDDT like UDDT under timesharing on the MONITR.EXE file by
giving the following commands:
$GET <SYSTEM>MONITR.EXE
$START 140
EDDT is linked into the monitor and is always there. You may also get
to EDDT from MDDT (providing EDDT is locked down, see EDDTF below) by
issuing the following:
EDDT$G
from MDDT. This stops timesharing. To resume timesharing and/or get
back to MDDT give the command:
MDDT$G ; back to MDDT
MRETN$G ; back to normal timesharing
Breakpoints may be inserted in the resident monitor with EDDT, but not
in the swappable monitor in general, because its pages may be swapped
out and be unavailable to EDDT. You can bring them in by typing:
SKIP LOC$X ; where LOC is some address not in core
and then set the breakpoint. The swappable monitor must be
write-enabled to set breakpoints.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 94
Crash Analysis
There are some locations in the monitor that are very useful when using
EDDT for debugging. They must be set before going on to start the
monitor.
They are:
EDDTF 1 keep EDDT in core when system comes up
0 delete DDT when system comes up (default)
DBUGSW 0 do not stop on BUGHLTs, crash and reload
1 stop on BUGHLTs (hit EDDT breakpoint)
2 write enable the monitor,
do not start up SYSJOB, and stop on
BUGHLTs. Also it dosn't run CHECKD
automatically on startup.
DCHKSW 0 do not stop on BUGCHKs (default)
1 stop on BUGCHKs (hit EDDT breakpoint)
DINFSW 0 do not stop on BUGINFs (default)
1 stop on BUGINFs (hit EDDT breakpoint)
In addition the symbol GOTSWM appears in the code just after the
swappable monitor is loaded. So, if you want to debug the swappable
part of the monitor you must put a breakpoint at GOTSWM (to get
swappable part in core) by,
GOTSWM$B
Then start the MONITOR by,
147$G
CALL SWPMLK$X
CALL SWPMLK is used to lock swappable monitor in core for debugging.
You must have sufficient physical memory to give this command since the
resident plus swappable monitor is rather large. To start up the
monitor after you have gone into EDDT and set up your breakpoints
(remember the last two are used for BUGHLT and BUGCHK) give the command:
147$G
or
SYSGO1$G
If you are in EDDT and DBUGSW is not 2, that is, the monitor is write
protected, you can use the routines SWPMWE and SWPMWP to write enable
and write protect the monitor, i.e. CALL SWPMWE$X in DDT.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 95
Crash Analysis
FILDDT:
------
FILDDT is distributed on the customer software tape.
The following is a chewed-up FILDDT.HLP file.
LOAD (SYMBOLS FROM) FILE SPEC
Reads specified file and builds internal symbol table. This must be the
first command to FILDDT before "GET" when looking at a dump. You will
most probably use <SYSTEM>MONITR.EXE which would have been the monitor
running at the time of the dump.
GET (FILE) FILE-SPEC
Loads a file for DDT to examine. If you are looking at a monitor dump
you must load DUMP.CPY explicitly. FILDDT looks for MUMBLE.EXE not
MUMBLE.CPY. That is, DUMP<ESC> will tell you that there is no such file
or will load DUMP.EXE. When looking at a dump and you wish to load the
symbols you must first issue the load command followed by the get
command. Be sure that the file from which you get the symbols is the
same version as the dump. Be sure, also that the monitor that was
dumped is the same monitor you use for symbols. That is, don't get
MONMED symbols to use with MONBCH etc.
EXIT (FROM FILDDT)
Returns to command level. You then may type a save command if a load
command was just done to preload symbols. You will get a version of
FILDDT that has the symbols you just loaded in it so you no longer need
to "LOAD" symbols. You now have a monitor specific FILDDT, which was
common practice for TOPS-10, but is not generally done for TOPS-20.
HELP
Types something like this text.
ENABLE PATCHING
Allows writing on an existing file specified by a GET.
ENABLE DATA-FILE
Assumes file is raw binary (i.e. no ACs, and not an EXE file).
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 96
Crash Analysis
DDT FEATURES:
EP$U Sets monitor context for FILDDT mapping. EP is a symbol
which is equal to the page number of the EPT. (Rel 4)
<CTRL/E> Returns to FILDDT command level.
TRACKING DOWN UNMAPPED ADDRESSES:
--------------------------------
The resident monitor may be looked at without any difficulties, but the
swappable monitor may not be in core at the time of the dump. If the
value of the symbol is in the swappable monitor you must sometimes go
through the monitor map to find where the location really is. The
location MONCOR contains the number of pages of resident monitor and the
location SWPCP0 contains the first page of real core for swapping. So
if the value of the symbol is greater than contents of MONCOR times 1000
then it is in swappable monitor. This also applies to non-monitor
pages: mapped file pages, and pages from other processes and their JSBs
and PSBs.
If the page of the swappable monitor you want to look at is in core it
will probably not be in core in the location that it's address refer to
since the dump is of core and relocation of pages does not happen. To
find where a symbol really is in the dump, first type the symbol
followed by an "=". DDT will respond with the value of this symbol.
The value of the symbol can be divided into two, three octal digit,
fields. The high order three digits are the page number and the low
order three digits are the offset into the page.
If the value of the symbol is 324621 the high order three digits, 324,
are the page number and the low order three digits, 621, are the offset
into the page. To find the location of the page in question in the dump
you must look at the monitor map indexed by the page number. For
example:
MMAP+324/
would give you the monitor map word for page 324. This word contains
some protection bits for the page and the address of the page when the
dump was taken.
The page may have been in core, on the swapping area or on the disk at
the time of the dump.
If bits 14-17 in the monitor map word are non-zero the page was
on the swapping area or disk and is no longer available.
If bits 14-17 are zero then the page was in core, and the right
half of the word contains the page number in the dump of the
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 97
Crash Analysis
page you are looking for (the dump program overwrites some pages
of memory, therefore it does not contain these pages.)
If the page was in core the new address of the symbol you are looking
for can be found by using the page number from the monitor map word and
appending the offset into the page to it. For example if MMAP+324
contains 104000,,256; then the new address of our symbol would be
256621.
All addresses in the swappable monitor must be resolved in this manner.
In addition the pages of the JSB and PSB must sometimes be resolved in
this manner. There are some locations and tables in the monitor that
make this easy:
NAME INDEX DESCRIPTION
FORKX none Number of the fork that was running at the time of
the dump, -1 if in the scheduler.
JOBNO In PSB Job number to which current fork belongs.
FKJOB Fork # Job number,,SPT index of JSB
JOBDIR Job # logged in directory number
JOBPT Job # controlling TTY number,,top fork number
FKSTAT Fork # test data,,address of fork wait routine
FKPGS Fork # SPT index of page table,,SPT index of PSB
SPT indexes are indexes into a share pointer table starting at SPT. To
find the PSB of fork 20, you first look at FKPGS+20. If this location
contains 425,,426, the word at SPT+426 is the pointer to the PSB. This
pointer can point to disk, swap area, or a page in the dump. If bits
14-17 are zero it is a pointer to a page in the dump and the right half
of the SPT word is the page number of the PSB in the dump.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 98
Crash Analysis
BUGHLT, BUGCHK, BUGINF
------ ------ ------
The monitor contains a considerable number of internal redundancy checks
which generally serve to prevent unexpected hardware or software
failures from cascading into severely destructive reactions. Also, by
detecting failures early, they tend to expedite the correction of
errors.
There are three failure routines, BUGINF and BUGCHK and BUGHLT for
lesser and greater severity of failures. Calls to them with JSR (or
PUSHJ P, for Release 5 or later BUGCHKs and BUGINFs) are included in
code by use of a macro which records the locations and a text string
describing the failure. The general form is:
for 4.1: BUG (NAME,<DATA>)
for 6.0: BUG. (TYPE,NAME,MODULE,HARD,<STRING>,<DATA>,<EXPLANATION>)
Where TYPE is HLT or CHK or INF, MODULE is the source file, DATA is
addtitional data, HARD is the hardware/software flag, STRING is the
short text and EXPLANATION the long text explanation of the cause. The
strings are constructed during loading and are dumped into a file. The
BUGSTRINGS.TXT file will produce an ordered listing of the bug messages
for operator or programmer use.
BUGCHK (or BUGINF) is used where the inconsistency detected is probably
not fatal to the system or to the job being run, or which can probably
be corrected automatically.
BUGHLT is used where the failure detected is likely to preclude further
proper operation of the system or file storage might be jeopardized by
attempted further operation.
NOTE
The exact form the BUGHLT/CHK/INF macro
takes is different for releases [3A and
before], [4.0, 4.1, 5.0, 5.1], and [6.0
and after], and different files and
assembly forms are used, though the action
of the code remains essentially unchanged.
See the separate article on the BUGxxx
macro for details.
The SWSKIT program SWSERR can be used to produce a compact listing of
the BUGxxx entries in the system error file in a less cumbersome manner
than SPEAR.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 99
Crash Analysis
DBUGSW:
------
A monitor cell, DBUGSW, controls the behavior of BUGHLT and BUGCHK when
they are called. DBUGSW is set according to whether the system is
attended by system programmers.
If C(DBUGSW)=0, the system is not attended by system programmers, so all
automatic crash handling is invoked. BUGCHK will return +1 immediately,
appearing effectively as NOP. BUGHLT will, if called from the scheduler
or at PI level, invoke a total reload from the disk and a restart of the
system. The BUGCHK/INF output will appear on the CTY and in the SYSERR
log when JOB ZERO gets around to them.
If the system continues to run or is restarted properly, the location of
the bug (saved over a reload) and its message will be reported on the
CTY.
If C(DBUGSW).NEQ.0, the system is attended, and one of the EDDT
breakpoints will be hit. This allows the programmer to look for the bug
and/or possibly correct the difficulty and proceed. There are two
defined non-zero settings of DBUGSW, 1 and 2, which have the following
distinction.
C(DBUGSW) = 1
Operation is the same as with 0 except for breakpoint
action. In particular, the monitor is write protected
and SYSJOB is started at startup as described.
C(DBUGSW) = 2
Is used for actual system debugging. The monitor is
not write protected so that it may conveniently
be patched or breakpointed, and the SYSJOB operation
is not started to save time.
BUGCHK and BUGHLT procedures are the same as for 1.
The following is a summary of DBUGSW settings:
SETTING 0 1 2
MEANING Unattended Attended Debugging
BUGCHK action NOP Hit Breakpoint Hit Breakpoint
BUGHLT action Crash System Hit Breakpoint Hit Breakpoint
Monitor write protect? Yes Yes No
CHECKD on startup? Yes Yes No
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 100
Crash Analysis
Other console functions:
-----------------------
In addition to EDDT, several other entry points are defined as absolute
addresses. The machine may be started at these as appropriate.
EVDDT 140 JRST DDTZ ; go to EDDT
141 JRST SYSDDT ; reset and go to EDDT
EVDDT2 142 JRST DDTZ ; copy of EDDT address
EVSLOD 143 JRST SYSLOD ; initialize file system
EVVSM 144 JRST SYSVSM ; verify swap mon on startup
EVRST 145 JRST SYSRST ; restart
EVLDGO 146 JRST SYSGOX ; reload and start
EVGO 147 JRST SYSGO1 ; start
The soft restart (address 145, EVRST) restarts all I/O devices, but
leaves the system tables intact. If it is successful, all jobs and all
(or all but 1) process will continue in their previous state without
interruption. This may be used if an I/O device has malfunctioned and
not recovered properly. The total restart initializes core, swapping
storage and all monitor tables.
A very limited set of control functions for debugging purposes has been
built into the scheduler. To invoke a function, the appropriate bit or
bits are set into location 20 via MDDT. The word is scanned from left
to right (JFFO). The first 1 bit found will select the function. Refer
to routine SWTST in SCHED for the current details.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 101
Crash Analysis
DDT TRICKS:
----------
Here are a few useful tidbits to use in DDT when tracking down problems:
1. Enter MDDT from a program
@ENABLE
$SDDT
DDT
JSYS 777$X
MDDT
2. Return to program from MDDT
MRETN$G ! Return from MDDT
3. Set a breakpoint in the swappable monitor in EDDT at FOO:
BOOT>/L ! Load resident part
BOOT>/G141 ! Start EDDT
EDDT
EDDTF[ 0 1 ! Set debugging flags
DBUGSW[ 0 2
GOTSWM$B 147$G ! Breakpoint at GOTSWM
$1B>>GOTSWM/MOVEI T1,FKPTRS ! SWPMON is now loaded
SKIP FOO$X ! Get FOO into memory
<>
FOO$B ! Set breakpoint
4. Find all forks of job J in MDDT or EDDT or FILDDT
-1,,0$M ! Set compare flag
FKJOB<FKJOB+NFKS-1>J,,0$W
5. Map a directory in MDDT or EDDT
1! directory number ! Put DIR in AC1
2! structure number ! and STR in AC2
CALL MAPDIR$X ! Do routine to map it
6. Write-enable monitor in MDDT or EDDT
CALL SWPMWE$X
7. Write-protect monitor in MDDT or EDDT
CALL SWPMWP$X
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 102
Crash Analysis
8. Lock swappable monitor in memory in MDDT or EDDT
CALL SWPMLK$X
9. Set monitor context for mapping in FILDDT
EP$U ! Have FILDDT do address mapping
10. Select unmapped physical addressing in FILDDT
$U ! Clear address mapping
11. Select user virtual address space mapping in FILDDT
FKPGS+forknumber/ x,,y
SPT+x/ n ! If LH(n) .NE. 0 => swapped out
n$1U ! n is address of page table (UPT)
12. Enter EDDT from MDDT
EDDT$G
13. Return to MDDT from EDDT
MDDT$G
14. See if mapped job has been in MDDT from FILDDT
Releases 4.1, 5.0, 5.1:
DDTPGA/ ? ! Page non-existence ==> NO
Release 6:
DDTPXA/ ? ! Page non-existence ==> NO
15. Find what module defines a symbol from any DDT
symbol?
16. Reference CSTn table entries
Releases 4.1, 5.0, 5.1:
CST0/ n ! Reference directly by symbol
Release 6:
CST0X[ x,,y $Q<CST0: ! Define symbol from CSTnX
CST0/ n ! table contents, then use symbol
! Note: CST5 access is same as 5.1
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 103
TOPS-20 Crash Dump Analysis
MORE TOPS-20 CRASH DUMP ANALYSIS
--------------------------------
1.0 INTRODUCTION
The purpose of this article is to provide some basic guidelines for
those who have never analyzed a TOPS-20 crash dump. The information
contained in this article refers to versions 4.1, 5.0, 5.1, and 6.0 of
the TOPS-20 Monitor, although the basic principles will also apply to
earlier and later versions of the Monitor. None of the concepts
included in this article can be considered highly advanced; indeed it
is doubtful that there exists an "advanced" methodology in crash dump
analysis. Such techniques are the result of nothing more than the
continual exercise of the basic skills. In all cases, the person who is
to perform the analysis must be familiar with the internal structures of
the Monitor. Obviously, one must know where to look for a potential
problem before hoping to solve it. For this reason, this article
assumes that the reader has an in-depth knowledge of the basic
structures of the TOPS-20 Monitor.
2.0 GENERAL INFORMATION
It would not be practical to define a method of approaching each BUGHLT
in the system, but the state of the system at the time of the crash may
be defined in terms of the data structures that it accesses. By looking
at the Monitor's stack, the status of the current job, and process, and
the condition of the Monitor's tables that were in use by the code that
BUGHLTed, we can define a limited number of "types" of crashes, e.g., a
scheduler crash, a pager crash, an APR or device interrupt crash. Each
crash will occur while the Monitor is using a specific subset of the
internal data structures of the system. We will attempt to limit the
number of "types" of crashes based upon the function being performed by
the Monitor at the time of the crash. In the sections following this
general information, we will suggest some of the areas to check when
looking at each type of crash. This information is not complete, but
contains some of the information that is more significant in each
particular context.
When you look at a dump, you should first try to find why the dump
occured by looking at the location BUGHLT. If BUGHLT is zero then you
should check the CTY log to find out why the dump was taken and for
information like the PC at the time of the dump and the status of the PI
system. If BUGHLT is non-zero it is the address of where the BUGHLT was
issued. You should look up the BUGHLT in BUGSTRINGS.TXT or BUGS.MAC or
the source code to find additional information about the BUGHLT. If at
this point you are not sure as to why the BUGHLT occured, you will have
to look at the listings for more information. A copy of BUGSTRINGS.TXT
is in Appendix A of the Operators manual. You can find the location of
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 104
TOPS-20 Crash Dump Analysis
the call to the BUGHLT by typing the BUGHLT tag to DDT followed by a
"?". DDT will tell which monitor module the BUGHLT is in and you can go
to your microfiche and read all about the conditions precipitating the
BUGHLT.
Next if necessary look at FORKX. If it contains a -1 the scheduler was
running; otherwise it is the number of the fork that was running when
the crash occurred. The registers are saved at BUGACS on a BUGHLT, but
if BUGACS+17 contains something,,BUGPDL+n, then the registers are
invalid and you must go to the SYSERR buffer to get the good registers.
This is done by adding to the right half of the SYSERR buffer pointer,
SEBQOU, the offset into the buffer for the heading and ACs,
SEBDAT+BG%ACS. This value points to a block of 16 words containing the
users ACs. You may have to chain down more than one queued-up SYSERR
entry to get to the BUGHLT block.
Some other locations of interest in the initial stages are:
LOCATION DESCRIPTION
SVN Monitor version number string
BUTCMD BOOT filespec for loading the monitor
LSTERR Code of the last error encounterd by process
USRNAM User name string
P Current stack pointer
JOBNO Job number of currently running process
JOBPNM+(JOBNO) SIXBIT program name of running program
UAC User's ACs when he did his last JSYS
PAC Monitor's ACs
PPC Process' PC
UPDL User's pushdown stack while in a JSYS
NSKED 0 => ok to run scheduler
>0 => cannot run scheduler
INTDF -1 => ok to receive software interrupts
>=0 => cannot receive software interrupts
It may be useful to know the status of a fork when it is hung or you are
unsure of its status. This can be determined by looking at FKSTAT
indexed by the fork number. The right half of this location is the
address of a test routine and the left half is data to be tested. For
example if FKSTAT+12 contains 23,,FKWAT, then fork 12 is waiting for
fork 23 to complete. FKWAT is a routine that waits for another fork to
complete and its data (the left half of the word) is the number of the
fork it is waiting for. There are many different wait routines and you
will have to look at the code to see what individual ones are waiting
for, or refer to the section on scheduler tests elsewhere in this
manual.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 105
TOPS-20 Crash Dump Analysis
You can easily determine all of the forks associated with a job by
giving the commands:
-1,,0$M
FKJOB<FKJOB+NFKS>N,,0$W
Where N is the job you are looking for. A fork structure can usually be
determined by looking at the FKSTAT of the forks and seeing which forks
are waiting on which forks. A FKSTAT of FKSKP indicates a fork is
inactive.
You should refer to STG.MAC for other fork and job tables and other
locations in the PSB and JSB of interest. All of the above locations
can be examined with MDDT or EDDT while the monitor is running. Of
course at these times you do not have to go through MMAP and the PSB and
JSB that are in core are your own.
There are two separate patch areas in the monitor (FFF and SWPF). FFF
is the resident patch area and SWPF is the swapable patch area. These
two symbols should be updated to point to the next free location in the
patch area when a patch is inserted. By convention, all distributed
patches are applied at FFF. This serves the purposes of reducing
confusion, always working until the patch area is exhausted, and leaving
patches always present in a dump for the cases where that is important.
2.1 Identifying The Type Of Crash
The Monitor performs several basic operations, each of which has its own
set of tables and data structures. Some of these operations can be
defined as:
1. BUGHLT processing
2. JSYS processing
3. Page faults
4. PSI Service
5. Scheduling
6. DTE interrupt Service
7. Initiating I/O transfers (queueing)
8. Device interrupt Service
9. APR interrupt Service
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 106
TOPS-20 Crash Dump Analysis
2.2 The BUGHLT Itself
There are specific areas in any crash dump that can be examined to
determine the status and context of the system at the time of the crash.
The most obvious of these is the location called BUGHLT, which will
contain the address whence the BUGHLT code was called. It is good
practice to remember when looking at this address that there are
portions of the monitor that were overwritten by the BOOT program, when
the dump was taken, and therefore, the contents of the address that
called the BUGHLT code, that is, the location whose address is contained
in location "BUGHLT", may not point to the same code that the fiche or
the listings indicate. A good example of such a BUGHLT is a PTNIC1, one
that is a part of the APRSRV code, which is overwritten by BOOT.
See the separate discussion of the BUGxxx macro in its many forms for
more information on this useful source of problem explanation.
The BUGHLT's are performed by using the XCT instruction of a location
that contains a JSR BUGHLT instruction. In the locations following the
JSR BUGHLT, is the list of additional data addresses, and then the name
of the BUGHLT, in SIXBIT format, such as "PTNIC1". Finally in the event
of multiple BUGCHK's, BUGINF's or even nested BUGHLT's, the location
"BUGNUM" contains the number of BUGHLT's, BUGCHK's, and BUGINF's since
the last system start-up. This location is most helpful in obtaining a
clearer view of the circumstances of the crash. The case of the BUGHLT
code itself causing a BUGHLT is extremely unusual, but in certain cases
of extreme degradation of the system's data bases or "pure" code pages,
this is a possibility.
2.3 Summary Of PC Storage
The storage of previous state PC is often context dependent, however,
some of the standard cases are listed below:
1. Crash PC - stored in location BUGHLT.
2. PC of JSYS - two copies are stored on the UPDL stack.
3. PFL/PPC - contain the current flags and PC of the process at
the last context switch. This might be a user or EXEC mode PC.
4. PIFL/PIPC - contain the flags and PC while a software interrupt
(PSI) is in progress.
5. SKDFL/SKDPC - PC saved here while process is blocking, in case
of context switch.
6. MONFL/MONPC - PC saved here while process is starting nested
JSYS, in case of context switch.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 107
TOPS-20 Crash Dump Analysis
7. ENSKR/ENSKR+1 - PC saved here while entering scheduler via
ENTSKD, in case of context switch.
2.4 Summary Of AC Storage
There are various areas that the ACs will be found, often depending on
the context of the crash. The "general" ones are:
1. UAC - previous context ACs are stored here when the user is
context switched. These are the ACs the last time the process
was dismissed. If in a nested JSYS, these are the ACs the
nested JSYS was called with; the user ACs are in the UACB
stack.
2. UACB and ACBAS - the UACB block is the AC stack for nested
JSYSes, and the location ACBAS (shifted left four) is the index
to the current set.
3. PAC - the EXEC mode ACs for a process are stored here when the
process is dismissed.
4. PIAC - the EXEC mode ACs for a process are stored here when a
software interrupt (PSI) is in progress.
5. BUGACS - the EXEC mode ACs at the time of the crash.
6. BUGACU - the previous context ACs at the time of the crash.
2.5 The Monitor's Stacks
The next piece of valuable information is contained in the stack
pointer, P. This location will point to one of several possible monitor
stacks, and will give a strong indication about the context of the
monitor at the time of the crash. Identifying the type of BUGHLT will
usually be a direct indication of which stack will be in use, however
under certain circumstances, the monitor may crash while changing from
one stack to another, and such a circumstance could provide a useful
insight into the state of the system just before the crash. The
following are the names of several possible monitor stacks, and the
context under which each of them is used:
BUGPDL This stack is used while performing BUGHLT processing.
It will normally only be important if the system crashes
in a nested BUG.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 108
TOPS-20 Crash Dump Analysis
BUGSPL This stack is used when generating KLSTAT blocks.
UPDL This is the user stack, in that it is used when
processing a user's JSYS in exec mode. Whenever any user
executes a JSYS, this area in his PSB is used for the
stack. Those processes under job 0 which run in exec
mode will also use this stack.
TRAPSK This stack is used by the paging code whenever a process
page faults. Normally a page fault will occur while in
the midst of performing some other function, such as a
JSYS, and the stack pointer at the time of the page fault
will be in location TRAPAP, which in turn will in this
case point to UPDL plus some offset.
CFSSTK This stack is used processing CFS code.
PIPDB This is used by the software interrupt handler.
SKDPDL This stack is used by the scheduler.
DTESTK This stack is used by the DTE interrupt service routines.
PHYPDL This stack is used by PHYSIO code in the process of
queueing I/O request blocks (IORB's). These IORB's are
the means by which RH20/RH11 data transfers are
initiated.
PHYIPD This stack is used by the PHYSIO interrupt service
routines, and therefore is the interrupt-level equivalent
of PHYPDL. It is important to remember that these two
stacks are independent of each other, and should not be
confused.
PI5STK This stack is used for unvectored PI5 interrupts, eg
KLIPA.
PI6STK This stack is used for unvectored PI6 interrupts, eg
KLNI.
PIXSTK This stack is used while processing spurious unvectored
interrupts.
MEMPP This stack is used when processing APR interrupts.
IMSTK This stack is used processing AN20 interrupt code.
The stack that is being used, and the section of code that executed
the BUGHLT will indicate the type of BUGHLT that has occurred, file
system BUGHLT's will be observed either while performing a JSYS,
servicing an interrupt, or otherwise attempting to access a file system
that has been corrupted to the point of being unusable.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 109
TOPS-20 Crash Dump Analysis
3.0 BUGHLT CONTEXT (BUGPDL)
The previous PC will be stored in location BUGHLT. The ACs are
saved in the block at BUGACS (and loaded into the ACs by FILDDT by
default), hence the saved stack pointer is at BUGACS+17. The previous
context ACs are stored in the block at BUGACU. These are the user mode
ACs unless in a nested JSYS at the time of the crash, in which case
BUGACU has the ACs the current JSYS was called with, and the user mode
ACs are in the UACB block.
The stack is set to BUGPDL. In the case of a nested BUGHLT, AC17
will point to BUGPDL, and location BUGLCK will display:
o -1 => no BUG in progress
o 0 => one BUG in progress (the usual case)
o +N => N nested bugs in progress (very unusual - bugs
during the BUGxxx code)
4.0 JSYS CONTEXT (UPDL)
When a process executes a JSYS, the Monitor performs the JSYS by
dispatching through a table called JSTAB to the proper routine. These
routines are named by convention as the JSYS name, preceded by a ".",
thus the routine to perform the JSYS PMAP is called ".PMAP::". This
name is always a global symbol. The last JSYS executed in user context
is saved in the PSB for the process, in location KIMUU1, and KIMUU1+1.
KIMUU1/ flags,,104000
+1/ JSYS number
The second of these locations will contain the dispatch offset in
JSTAB; this number, when combined with the JSYS opcode (104000,,0), is
the last JSYS performed by the user. This, then, will point indirectly
through the JSTAB table to the place where the user JSYS began
processing. By following the code, and examining the stack, it is often
possible to reconstruct the events leading to the crash.
The stack will contain two copies of the user's program counter
(PC) and flags in the first four locations of UPDL. The PSB location
MPP will contain the stack pointer at the time of last JSYS, and each
time the Monitor performs a JSYS internally, this data is pushed onto
the stack, and set to the current value of P.
Initial JSYS stack set-up:
UPDL/ PC
UPDL+1/ flags
UPDL+2/ PC
UPDL+3/ flags
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 110
TOPS-20 Crash Dump Analysis
JSYS in Monitor context (nested JSYS):
UPDL+n/ INTDF ;old interrupts-deferred flag
/ MPP ;previous PC, or level of nesting
/ Return PC of nested JSYS
/ PC flags
So, MPP is the stack pointer for the return PC block. If this is a
nested JSYS, the ACs are saved in UACB at the proper nesting level.
Some other useful locations in JSYS context are:
JSB Locations
USRNAM This contains the name of the user, in ASCII.
PSB Locations
JOBNO Contains the number of the job for this process.
FORKN Contains the fork number for the top fork of the job in
the left half of the word, and the fork number of the
current fork in the right.
INTDF Contains -1 if process is OKINT, 0 or greater if NOINT
(defer all software interrupts for this job)
NSKED Contains 0 if process is OKSKED, 1 or greater if NOSKED.
(defer scheduling of other forks)
Monitor Fork Tables - indexed by the current fork number
FKCNO Contains the SPT offset that points to the second page of
the PSB in the left half of this word.
FKINT Contains the pseudo-interrupt communications register,
with flags in the left half describing the type of
request, and the channel number of the request in the
right half.
FKINTB Contains the pseudo-interrupt channel requests pending
since the fork's last PSI interrupt.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 111
TOPS-20 Crash Dump Analysis
FKJOB Job number of the fork in the left half, and SPT index
for the JSB in the right half.
FKJTQ Part of a doubly linked list of forks that are waiting
program software interrupt the Monitor. JTLST points to
the top fork on the list.
FKNR Contains in bits 0-8 the age stamp value at the last time
local garbage collection was performed.
FKPGS Contains the SPT indices for the process page table, in
the left half, and the PSB in the right half.
FKPGST Contains the address of the routine to test for balance
set wait satisfied in the right half, with test data in
the left. If the fork is not in the balance set, this
contains the time of day that the fork entered a wait
list.
FKPT Part of a linked list of forks on a particular schedular
list, such as GOLST, WTLST, etc. The right half of the
word contains the address of the next element in the
list, and the left half contains the amount of runtime
the fork's job will have accumulated when the fork
exceeds its Balance Set Hold time.
FKQ1 Contain the fork's remaining run quantum. When the
quantum expires, the fork is moved to a lower run queue,
and given the appropriate new quantum.
FKQ2 Contains the fork's schedular queue level number in the
left half, and the list address, i.e. GOLST, WTLST,
etc., in the right.
FKSTAT Contains the address of the schedular test routine which
will determine when the fork is available to be placed on
the GOLST.
FKTIME Contains the time of day, in internal format, that the
fork was placed on its current run queue.
FKWSP Contains the number of physical pages assigned by the
fork in the right half, and the working set size of the
fork when the fork entered the balance set in the left.
5.0 PAGER CONTEXT (TRAPSK)
Page faults trap through the user's UPT, by placing the old flags
and PC for the process in locations UPTPFL and UPTPFO respectively, and
taking the new PC from location UPTPFN. UPTPFN will usually contain the
address PGRTRP, which is the beginning of the page fault code.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 112
TOPS-20 Crash Dump Analysis
The location being referenced and therefore causing the page fault
is stored in UPTPFW, also called TRAPS0. This contains the virtual
address that page faulted in bits 13-35. Bit 0 of this word indicates
if the location is in user or exec (monitor) address space. If this bit
is set, the address is in user address space.
The PGRTRP code copies TRAPS0 into TRAPSW (before release 6), in
case of recursion. This code will determine the nature of the page
fault, and attempt to resolve it. UPTPFL and UPTPFO are also called
TRAPFL and TRAPPC respectively.
The old stack pointer is saved in location TRAPAP (this is only
relevant if the page fault occurred in exec mode). The new stack,
TRAPSK, is set up according to the context of the page fault, i.e., user
context, monitor context, or recursive page fault. The form of the
stack changes for Release 6. First, for earlier releases:
A page fault in user mode causes the stack to be set up with the
runtime, return PC, and return PC flags in the first three locations of
the stack:
TRAPSK/ runtime
TRAPSK+1/ return PC
TRAPSK+2/ return PC flags
Page faults from monitor context have the following initial stack
set-up: (prior to release 6)
TRAPSK/ AC1
TRAPSK+1/ AC2
TRAPSK+2/ AC3
TRAPSK+3/ AC4
TRAPSK+4/ AC7
TRAPSK+5/ AC16
TRAPSK+6/ TRAPSW
TRAPSK+7/ runtime
TRAPSK+10/ PC
TRAPSK+11/ PC flags
Recursive page faults will cause the following set up in TRAPSK, at the
time of the page fault: (prior to release 6)
/ AC1
/ AC2
/ AC3
/ AC4
/ AC7
/ AC16
/ TRAPSW
/ PC
/ PC flags
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 113
TOPS-20 Crash Dump Analysis
For release 6, the code becomes more uniform, and the format of the
stack is the same for all cases; however, some stack offsets are not
made use of for all types of page faults -- as above.
The code at PGRTRP sets up the TRAPSK stack, pushes CX on it, and
the calls PFAULT, which has the following TRVAR:
TRVAR <<PFACS,5>,PFHTIM,PFHTMP,PFHPFW,PFHFL,PFHPC>
and so the stack looks like this:
TRAPSK/ CX
+1/ return address of call to PFAULT
+2/ AC15 saved by TRVAR
+3/ AC1 =PFHACS
+4/ AC2
+5/ AC3
+6/ AC4
+7/ AC7 = FX
+10/ runtime =PFHTIM
+11/ TRVAR temp location =PFHTMP
+12/ UPTPFW page fail word =PFHPFW
+13/ TRAPFL flags =PFHFL
+14/ TRAPPC pc =PFHPC
+15/ .TRRET
Recursive page faults will indicate the level of recursion in TRAPC.
This location is normally set to -1 and is incremented every time the
page fault code is called, and decremented when a page fault has been
satisfied.
In examining a pager crash, it is usually a good idea to begin by
tracing down the Monitor's table entries for the location that faulted.
This location is stored in location TRAPS0. The identity of the page
causing the trap is stored in location TRPID, and will be in either of
two forms: page table number in left, and page number in right, or
simply the page table number in the right. The page table number is an
SPT index, and the page number, if any, is an offset into the page table
pointed to by that SPT slot. There are four Core Status Tables (CST's)
indexed by physical page number, that are used to keep track of each
page in the machine. A page fault crash will usually have bad data in
either the SPT slot indicated in TRPID, or one of the CST's for the
physical page pointed to indirectly through that SPT slot. If TRPID
contains PTN,,PN, then find location SPT+PTN. This should have a
physical page number in the right half. Look at this physical page,
offset by PN in TRPID to find the pointer to the page that caused the
fault. Shared and indirect pointers in this location will point through
another SPT location, but private pointers will point directly at the
physical page that we are looking for. If TRPID contains just PTN, then
SPT+PTN will point directly at the physical page we are looking for.
Knowing the physical page number, it is now possible to examine the CST
tables for that page. Refer to the section on referencing the non-zero
section CSTs for more info.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 114
TOPS-20 Crash Dump Analysis
CST0 Used principally by the pager hardware, this location
will contain the Process Use Register, mentioned in the
FKCNO table above, and the age stamp.
CST1 Contains the system lock count, and the backup address
for the page. The lock count indicates the number of
systen events necessary before the page will be swapped
out, and the backup address for the page. The system
should never swap out a page with a non-zero lock count.
The backup address can be a disk or drum address for a
page in memory.
CST2 Contains the home map location of the page, and should
match the contents of TRPID.
CST3 Is used by the software to create lists of pages in
various states of use. Those pages available for use
will be on the Replaceable Queue, and linked together in
a doubly linked list. Those pages awaiting swapping will
be on a swapping device queue, and part of a singly
linked list. Pages in use will contain the fork number
of the owner in bits 3-14, and the local disk address for
PHYSIO for the page.
CST5 Contains the list of short I/O Request Blocks (IORB's)
associated with the page.
A few other significant locations for page faults are:
RPLQ Points to the beginning of the Replaceable Queue in CST3.
NRPLQ Contains the number of pages on the Replaceable Queue.
SWPLST Points to the beginning of the PHYSIO swap list, in CST3.
NOF Contains the number of OFN's in use in the SPT.
6.0 PSI CONTEXT (PIPDB)
Tables FKINT and FKINTB will be useful in determining the type and
timing of PSI interrupts pending at the time of the crash. When a
process has a PSI interrupt pending, it is flagged in the FKINT entry
for that fork, and the scheduler will take note of this event and set
the PPC location in the PSB for that process to contain the address
PIRQ. This action takes place at location SCHED5 in the scheduler.
The next time that the process is ready to run, it will continue at
location PIRQ, which will set up the PSI stack, PIPDB. SCHED5 also
moves the PSI request word from FKINT to PIMSK in the PSB. Thus, it is
possible to check this location for the last PSI request that was
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 115
TOPS-20 Crash Dump Analysis
scheduled.
The old contents of PPC and PFL are stored in PIPC and PIFL by the
SCHED5 routine, so these will indicate the point where the process was
interrupted. The ACs are stored in the block at PIAC, hence the
previous stack pointer is at PIAC+17.
7.0 SCHEDULER CONTEXT (SKDPDL)
The scheduler is usually invoked in one of two ways: through a
software interrupt initiated by channel 3 PI routine, indicating that a
set period of time has elapsed since the last scheduler cycle, or
through the ENTSKD macro, which is used by a running process that is
about to dismiss. In this way the scheduler is guaranteed to run at
regular intervals, or whenever the system is idle.
The primary entry point to the scheduler is SCHED0. It is through
this location control passes whenever the running process dismisses, or
whenever one of the two scheduler clock cycles elapses.
Briefly, the hardware traps on every clock tick through location
TIMVIL in the EPT. This location contains the instruction XPCW TIMINT.
Again, as in the device interrupt code, this instruction causes the
flags and PC to be placed in locations TIMINT, and TIMINT+1, and control
passes to the location in TIMINT+3, which in this case is TIMIN0.
TIMIN0 determines whether or not it is time to run the scheduler, and
dismisses the interrupt. The path taken by the KS-10 processor is
slightly different, taking a 40+2*n interrupt on the CPU channel (3),
but it winds up in the same place (TIMIN0) when it determines the
interrupt was for a clock tick.
If the scheduler is to be run, TIMIN0 initiates a software
interrupt on channel 7, which causes a trap through the EPT location
KIEPT+56 to PISC7R. The instruction executed in KIEPT+56 is an XPCW
PISC7R, causing the old PC and flags to be deposited at PISC7R, and
control to begin at PISC7+1. The PISC7 code sets up PPC and PFL to
contain the old PC and flags, from PISC7R, and saves the process ACs at
the time of the interrupt in a block of the PSB called PAC.
Having set up for scheduler context, the PISC7 code then transfers
control to the SCHED0 routine. Similarly, the ENTSKD macro does an XPCW
ENSKR, causing a jump to the ENSKED routine that does the context
switch.
The stack is set to SKDPDL. The previous PC is stored by the code
in PFL and PPC in the PSB. The ACs are stored in PAC (exec mode) and
UAC (previous context ACs) in the PSB. The previous stack pointer is in
the saved ACs.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 116
TOPS-20 Crash Dump Analysis
Some other useful locations in scheduler context:
FORKX contains a -1 if no fork is chosen or the fork number of
the chosen fork.
INTDF Contains -1 if process is OKINT, 0 or greater if NOINT
(defer all software interrupts for this job)
NSKED Contains 0 if process is OKSKED, 1 or greater if NOSKED.
(defer scheduling of other forks)
SKEDF3 If nonzero, will cause the scheduler to reevaluate the
balance set and reschedule all forks.
SKEDF1 If nonzero, indicates that a fork has been chosen to run,
and the scheduler should set the fork context.
SKEDFC If nonzero, forces a clear of the balance set and memory.
RSKED Contains the instruction to be executed when a NOSKED
process goes OKSKED.
INSKED If nonzero, indicates the scheduler overhead cycle has
been entered.
SSKED Holds the NOSKED fork number, if any.
SCKATM The software clock that generates a channel 7 interrupt
when it has been decremented to zero.
GOLST Points to the beginning of the GOLST in the FKPT table.
WTLST Points to the Wait list in the FKPT table.
TTILST Points to the TTY input wait list in the FKPT table.
FRZLST Points to the list of frozen forks.
WT2LST Points to the list of forks waiting to be unblocked.
(UNBLK1)
TRMLST Points to the list of forks waiting for another fork to
terminate.
SUMNR Contains the number of reserved pages. (locked in memory)
BALSHC Contains the number of pages reserved due to shared
access.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 117
TOPS-20 Crash Dump Analysis
8.0 DTE INTERRUPT CONTEXT (DTESTK)
DTE interrupts also dispatch through locations in the EPT,
depending upon which DTE is interrupting. For each DTE that could exist
on a system (4), there is an eight word block in the EPT used to keep
up-to-date information for that DTE. Not all of the DTE blocks will
necessarily be used, however they will all exist in the EPT. These
blocks begin at location DTEEBP. The format of one of these blocks is
described below. The DTE interrupt executes the third word in this
block, which contains a XPCW DTEN0.
The old PC and flags will be stored at location DTEN0, and, since
DTEN0+3 contains ".+1", the system will begin processing the interrupt
at location DTEN0+4.
The flags and PC will be stored at DTETRA and the ACs stored at
DTEACB (previous stack at DTEACB+17). The new stack will be set to
DTESTK.
DTEN0 will then use INTDTE to process the interrupt. This code can
be found in the DTESRV module of the monitor.
The DTE control block:
DTEEBP/ To -11 byte pointer
DTETBP/ To -10 byte pointer
DTEINT/ "XPCW DTEN0" ;dispatch for DTE-0
/ reserved
DTEEPW/ Examine Protection Word
DTEERW/ Examine Relocation Word
DTEDPW/ Deposit Protection Word
DTEDRW/ Deposit Relocation Word
Note that the labels above apply only to DTE-0, and that the remaining
DTE's must be offset by DTE-number X 8.
Some other useful locations in the EPT:
DTEFLG/ Operation Complete Flag
DTECFK/ Clock Interrupt Flag
DTECKI/ Clock Interrupt Instruction
DTET11/ To -11 argument
DTEF11/ From -11 argument
DTECMD/ Command Word
DTESEQ/ DTE20 Operation Sequence Number
DTEOPR/ Operation In Progress Flag
DTECHR/ Last Typed Character
DTETMD/ Monitor TTY Output Complete Flag
DTEMTI/ Monitor TTY Input Flag
DTESWR/ Console Switch Register
These location are found at offsets 444 through 457 in the EPT.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 118
TOPS-20 Crash Dump Analysis
9.0 PHYSIO I/O QUEUEING LEVEL (PHYPDL)
All disk and tape I/O is initiated through the PHYSIO code, by
calling PHYSIO with a pointer to an I/O Request Block (IORB) in AC1, and
the addresses of the Channel Data Block (CDB) and Unit Data Block (UDB)
in AC2 (CDB,,UDB). PHYSIO validates the arguments passed to it, and
then determines whether the IORB belongs on the Position Wait Queue
(PWQ) or the Transfer Wait Queue (TWQ). These two queues are pointed to
by offsets UDBPWQ and UDBTWQ in the UDB for the device. Note that these
are offsets into the UDB, which will be in resident free space, as well
as the CDB's. During processing, PHYSIO will keep the following
information in the ac's:
P1/ address of the CDB
P2/ address of the KDB (for tapes or RP20) or 0
P3/ address of the UDB
P4/ address of the IORB being processed
Since PHYSIO is called via the PUSHJ P, instruction, the previous PC is
saved on the caller's stack. The P and Q AC's are stored on the stack
via the SAVEPQ macro.
PHYSIO does use a private stack, and the old stack pointer is saved
in PHYSVP.
Also, because PHYSIO does use a private stack, it is necessary for
the process calling PHYSIO to be NOSKED. Also take note of the fact
that IORB's are associated with the physical pages of memory that are
involved with the I/O through pointers in the CST5 table for those
pages. See the next section for more information in this area.
10.0 DEVICE INTERRUPT CONTEXT (PHYIPD)
Device interrupts, in this context, refer to disk and tape
interrupts, those devices connected through the RH20's. Each RH20
channel has a "Channel Logout" area at the beginning of EPT. This
logout area is four words in length for each channel, the fourth word of
which contains an instruction to execute on an interrupt. This
instruction causes the system to dispatch to code actually in the CDB
for the channel.
On the 2020, the interrupts work differently. The EPT contains
pointers to SM10 vector tables starting at address SMTEPT. The number
of the interrupting UBA (1 or 3) is used as an offset to SMTEPT to find
the proper vector table, and then the function and device (read done,
DZ11, etc...) is used as an offset into the vector table which contains
the appropriate XPCW instruction to transfer control to the correct
routine.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 119
TOPS-20 Crash Dump Analysis
The previous PC and flags are saved in the area immediately
preceding the CDB; offset CDBINT (value -6) is the location where the
flags and PC are stored. When the interrupt occurs, the hardware
executes the instruction in the channel logout area, which is "XPCW
loc". "Loc" is the address of the CDB for this channel, offset by
CDBINT (-6). The XPCW instruction saves the flags at CDBINT(CDB), the
PC at the next location, and gets the new flags and PC from the next two
locations. This area of the CDB, then, contains the following:
CDBINT(CDB)/ old flags
-5(CDB)/ old PC
-4(CDB)/ new flags (0)
-3(CDB)/ new PC ( ".+1")
-2(CDB)/ MOVEM P1,CDBSVQ(CDB) ; save P1 in CDBSVQ
-1(CDB)/ JSP P1,PHYINT ; dispatch to interrupt code
CDBSTS(CDB)/ status and configuration flags
PHYINT sets up to use the stack PHYIPD, and saves the ACs in the
block at PHYACS, therefore the previous stack pointer is at PHYACS+17.
The KLIPA code takes a 40+2*n interrupt through the EPT to EPT+52,
thence to PISC5: (in STG) and from there to KLPSV: (in PHYKLP) and
finally to PHYINT:.
The PHYINT code, then, resolves the interrupt, and returns to the
old PC by JRSTing through offset CDBJEN in the CDB. This part of the
CDB contains the following:
CDBJEN(CDB)/ BLT 17,17
+1/ DATAO RH,CDBRST
+2/ XJEN CDBINT(P1)
The last of these locations causes the system to resume where it was
interrupted. During processing of the interrupt, the following
information may be found:
P1/ address of the CDB
P2/ address of the KDB or 0
P3/ address of the UDB
P4/ address of the IORB or argument code:
(P4) < 0 - schedule a channel cycle
(P4) = 0 - dismiss interrupt
(P4) > 0 - complete current request (IORB address)
When the system is attempting to perform I/O to or from a specific
page of physical memory, that page is locked into core, by incrementing
the lock count in the CST1 location for that page. If a device error
occurs during the transfer of data for that page, then the CST5 entry
for that page will have either a short I/O Request Block (IORB) or a
pointer to a long (magtape or DSKOP) IORB. The short IORB is only one
word in length and is used for disk transfer requests, i.e., swapping.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 120
TOPS-20 Crash Dump Analysis
In either case, the first word of an IORB, called IRBSTS, contains flags
that describe the success or failure of the transfer. It may be helpful
to check these locations in the event of a PHYINT crash.
The following offsets contain useful information for PHYSIO
crashes:
In the UDB:
UDBPS1/ cylinder number
UDBPS2/ surface,, sector number
UDBERC/ error retry count
UDBERR/ status function for error retry
In the CDB:
CDBCNI/ status of channel when interrupt began.
11.0 APR INTERRUPT CONTEXT (MEMPP)
These interrupts are the result of one of numerous hardware errors
being detected -- memory parity error, address parity error, NXM error,
cache directory parity error, SBUS error, IO page fail, etc. APR
Interrupts, like Device interrupts, are vectored through the EPT, but in
the case of the APR interrupts, the vector location is a part of the
priority interrupt scheme. These are priority channel 3 interrupts, and
dispatch through location KIEPT+46, which contains an XPCW PIAPRX. This
is the channel 3 interrupt routine. As in the case of the device
interrupt, the XPCW PIAPRX will cause the PC and flags to be stored at
locations PIAPRX and PIAPRX+1, and the processor will then jump to the
location stored in PIAPRX+3, which is PIAPR+1. PIAPR actually dismisses
the APR interrupt, or BUGHLT's.
This routine will set up its own stack, MEMPP. The previous stack
pointer will be stored in MEMAP.
The current AC block is switched to AC block 2 and so the ACs are
not stored in memory.
One unusual aspect about handling APR interrupts is that the PIAPR
code changes the page fault trap vector, mentioned earlier, from PGRTRP
to MEMPTP, in UPTPFN, to handle the special case of a page fault in APR
interrupt context.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 121
TOPS-20 Crash Dump Analysis
12.0 ARPANET INTERRUPT LEVEL (IMSTK)
The interrupt stack is set to IMSTK via the MKNCTS macro called in
STG. Interrupts enter through the XPCW at the NTIINT offset in the NCT,
eg NTIINT+(NCTVT). The previous PC is stored in a doubleword at
NTIPCW+(NCTVT). The ACs are stored at NTSVAC+(NCTVT), so the previous
stack is at NTSVAC+(NCTVT)+17.
The location NCINPC+(NCTVT) contains the initial interrupt dispatch
address. The dispatch addresses for message input and output are
NTIDSP+(NCTVT) and NTODSP+(NCTVT) respectively. See the definition of
the NCT table in ANAUNV.MAC.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 122
Referencing CST Entries under TOPS-20 Version 6
REFERENCING CST ENTRIES UNDER TOPS-20 VERSION 6
-----------------------------------------------
Under Release 6 of TOPS-20, a number of data structures have been
relocated out of section zero of the monitor's address space. In some
cases this necessitated changes in the way those data structures were
accessed, and how they are accessed via FILDDT in crash dumps. The CST
tables (with the exception of CST5) are one such data structure. They
are accessed in the monitor by indirect reference through a series of
tables with names of the form CSTnX, e.g. CST3X to reference CST3. The
tables are 16 words long, where CSTnX + m is an indirect word pointing
to CSTn and indexed by register m. Therefore the monitor can use a
construct such as MOVE T1,@CST0X+P1 where previous monitors used the
form MOVE T1,CST0(P1) to fetch the CST0 entry for the page number in
register P1.
The following is an example of a method that can be used in FILDDT
to access the CST tables in a crash dump, assuming we want to find out
the CST information for page 237:
@ENABLE (CAPABILITIES)
$FILDDT
FILDDT>LOAD (SYMBOLS FROM) SYSTEM:MONITR.EXE
[38136 symbols loaded from file]
FILDDT>GET (FILE) SYSTEM:DUMP.EXE
[ACs copied from BUGACS to 0-17]
[Looking at file GIDNEY:<SYSTEM>DUMP.EXE.1]
EP$U ! Establish virtual mapping
0,,CST0X[ 5,,203000 $Q<CST0: ! Define symbols CST0,...,CST3
0,,CST1X[ 5,,217000 $Q<CST1: ! from the contents of the
0,,CST2X[ 5,,233000 $Q<CST2: ! CSTnX tables zeroth location
0,,CST3X[ 5,,247000 $Q<CST3:
CST5=203001 ! CST5 was not moved for v6
CST0+237[ 556000,,400321 .=5,,203237! Now we can reference the CST
CST1+237[ 101,,0 ! entries for page 237 in the
CST2+237[ 624,,237 ! same old way we did for
CST3+237[ 77770,,0 ! earlier releases.
CST5+237[ 556000,,400321
^Z
$
See SWSKIT document MONITOR-ADDRESS-SPACE.MEMOS for more detail.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 123
The BUG Macro
THE BUG MACRO
-------------
In the various releases of TOPS-20 the BUG macro mechanism has
changed in form many times. It has remained essentially the same in
essence however. The use of the BUG macro is to generate a code
sequence in the monitor to report the occurrance of a software-detected
error and, in the case of a BUGHLT, crash the system. The code
generates an XCT bugname which is a call to the proper routine to handle
the BUGHLT/CHK/INF and provides the argument list of additional data for
the BUG. In 3- and 4-series monitors, the call is via JSR to BUGHLT,
BUGCHK, or BUGINF. In 5-series and later monitors, the call is via JSR
to BUGHLT or CALL to BGCCHK or BGCINF. In addition, the single line
descriptive text is appended to the BGSTR PSECT, and a pointer to it
placed in the BGPTR PSECT.
With current monitors, a document called BUGHLT Documentation is
included with the Software Notebook set, which brings together all the
additional data that is now part of the BUG description. This should be
considered an essential debugging document.
For 3-series monitors, all of the information for the BUG was found
in-line in the source file. There was only a single line descriptive
text, and so all information about the condition had to be gotten
directly from the code.
For 4-series monitors, there is a file called BUGS.MAC which is
part of the monitor build process and which contains the detailed BUG
descriptions as part of the DEFBUG macros. BUGS.MAC assembles as part
of the build of PROLOG.UNV, and the calls to the BUG macro in the source
look like: BUG(bugname,<additional data>). For example:
BUG(XBWERR)
BUG(WSPNEG,<<FX,D>,<T2,D>>)
That is, essentially all the descriptive text is in the BUGS.MAC file,
and not in the source. DEFBUG and BUG are defined in PROLOG.
For 5-series monitors, the same method as for 4-series monitors is
used, with the additional data field descriptors taking on mnemonics
instead of the "D". The descriptive text is still all in BUGS.MAC.
With Release 6, the procedure changes again. The whole BUG macro
text moves back in-line in each of the source modules, like Release 3,
however, the long argument list with the long descriptive text remains.
The BUGS.MAC file disappears. The calling name becomes BUG. instead of
just BUG without the period, and some new argument options are added.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 124
The BUG Macro
Here is a description from PROLOG, and an example of the new form
for Release 6 of TOPS-20:
;Macros for defining BUGs
;General format for in-line bug macro call is:
; BUG. (TYP,TAG,MODULE,WORD,STR,LOCS,HELP,CONTIN)
;
;TYP - Flavor, HLT, CHK, or INF
;
;TAG - Name of BUG
;
;MODULE -
; Name of module in which BUG occurs.
;
;WORD - Flavor of BUG. For instance, HARD for hardware-caused, SOFT
; for software-caused.
;
;STR - Short descriptive string describing cause of BUG, which gets
; printed on CTY when BUG occurs.
;
;LOC - List of locations whose contents should be displayed when the
; BUG occurs. Each location must be followed by a comma and
; then a one-word descriptor of what the datum represents, for
; instance UNIT or CHN. Each pair of locations and descriptors
; must be in angle brackets, and the angle-bracketed pairs must
; be separated by commas with the entire LOC argument in angle
; brackets.
;
;HELP - General documentation for the BUG
;
;CONTIN - Optional continuation address after BUGCHK or BUGINF is
; logged. Assumed to be in same section with call.
For example, from PAGEM.MAC, the PCIN0 BUGCHK:
BUG.(CHK,PCIN0,PAGEM,SOFT,<PAGEM - PC has gone into section 0>,
<<T2,PC>,<T1,PFW>>,<
Cause: A reference has been made to RSCOD or NRCOD in section 0.
This should not happen because section 0 code cannot
reference data in extended sections. As an expedient,
the page being referenced will be mapped to section 1
with an indirect pointer.
>)
There is further information in the BUGHLT Documentation section of
the TOPS-20 Notebook Set, and the SWSERR program is useful in extracting
BUGxxx entries from ERROR.SYS.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 125
Monitor Building Hints
MONITOR BUILDING HINTS
----------------------
1. GENERAL
Judging from the number of requests for help on this subject, the
chances are that you will be required to rebuild a monitor sometime
during your career as a Software Specialist. The reasons are quite
simple. There are customers, who simply want functionality other than
that provided by stock monitors. There are also those who are
experiencing performance problems. We cannot forget the sales folks.
It is not unusual to have to rebuild a monitor in order to run a
benchmark. A very common example is increasing the OFN area. Another
quite common requirement is to increase the patch area (FFF). Doing
either of these and simply submitting a build control file will often
produce a bad monitor.
We will talk about PSECTS in relation to the Monitor's address space
but will make no attempt to define what they do. A good detailed
discussion on the Monitor's address space is on pages 2-62 to 2-73 in
the Release 4 Update Manual. Also there is a memo on the Monitor's
address space in the SWSKIT.
2. BACKGROUND
In V3A, all of the Monitor was in the same address space. Nevertheless
there was a crunch on space. As a result some PSECTS were allowed to
overlap. So critical was the space requirement, that attempts to
increase the OFN area or FFF usually resulted in the overlapping of
PSECTS other the the ones permitted. Therein lies the problem. The
Monitor produced from such a process would ordinarily be useless.
With the development of V4, the space requirement became more
critical. The Symbol Table became the object of concern. It required
a large number of pages, and in general, it is only used infrequently
under normal conditions. Hence the Engineering folks were of the
opinion that it should be completely eliminated. We objected. It
would be a nightmare to try to debug the monitor without symbols. It
thus became our project to somehow keep the Symbol Table while
conforming with the space restrictions. We decided to remove the
Symbol Table and place it in an alternate address space. It should
be noted that this action does not impact adversely on system
performance. With this change, the build procedure and the monitor's
address space were reorganized.
3. BUILD PROCEDURE
Outlined below are some steps to guide you when rebuilding a monitor.
Bear in mind that this is a guide and might not account for all the
unusual situations. This guide however, coupled with your experience
and common sense will most likely do the trick. PLEASE READ THIS
ENTIRE MEMO BEFORE ATTEMPTING TO REBUILD YOUR MONITOR. Also please
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 126
Monitor Building Hints
read the build BEWARE file that is on the Installation tape.
NOTE: The customers Distribution Tape will have all the files needed
to rebuild the monitor. All TOPS-20 modules will be in
TOPS-20.REL (or T2020.REL etc) The control file is TOPS20.CTL
(or T2020.CTL etc). The link file will be NAME.CCL where
"NAME" depends upon what monitor is being used (could be 2020,
ARPA etc.). For 2040/50, it is called LNKSCH.CCL. In any case
the TOPS20.CTL file will have the name. The files you will
change will be one of the PARAM's file and/or STG.MAC. It
should be noted that the special LINK.EXE and MACRO.EXE needed
to build V3A are not required under V4.
====> The very first thing to do is to use all the standard files to
build a "vanilla" monitor without any changes. This will show
most of the bugs in your attempt without worrying about what
you are changing having an effect; and hence, should result in
a substantially reduced debugging time.
STEP 1 Restore all files needed from <n-SOURCES>. This will
usually contain the monitor modules (TOPS20.REL file),
all needed source files, all build control, command
and log files.
STEP 2 Carefully make the source changes as needed.
STEP 3 Examine the TOPS20.CTL file. This file will usually
have logical name definitions and TAKE commands along
with other things. Also look at all referenced files.
STEP 4 Examine the corresponding log file. This will show
what the result of the original build procedure was.
It should therefore be a template which should be used
to judge the validity of the new Monitor. Pay special
attention to the section which shows the PSECT layout
at the end of the BUILD procedure. This shows the
start location, the end location and the amount of
free space between each PSECT. The file used by LINK
to set up the PSECTS is called LNKSCH.CCL. You should
look at this file to get an idea of what's happening.
STEP 5 Now edit the control and command files as necessary to
reflect your environment. This will mean, among other
things, changing or eliminating logical name
definitions. Do NOT change the order of the PSECTS in
the LNKSCH.CCL file. Also do not change the starting
value for any PSECT. The starting value is the value
given to the /SET: switch.
STEP 6 Submit the control file with /TAG:SINGLE switch.
Ensure that the control file is correct and reflects
accurately logical name definitions and the .CCL file.
Also this portion of the .CTL file has the commands
necessary to compile the changed module.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 127
Monitor Building Hints
STEP 7 When the job ends, examine your log file. Correct any
compilation or missing files errors and go back to
STEP 6. Continue with STEP 8 only after all errors are
eliminated.
STEP 8 At this point you should have a MONITR.EXE. Now
examine the section in the log file which gives an
outline of the PSECTS. If any PSECTS overlap, a
message will indicate the same. If there are no
overlapping messages, go to STEP 11. NOTE: There are
some instances where PSECTs can overlap. POSTCD
and SYVAR PSECTs are allowed to overlap any xxxVAR
PSECT. This will not gain very much in storage - 4
pages to be exact. If you follow the build procedure
then overlapping PSECTs are not allowed and therefore
must be resolved. You are once again advised NOT
to re-organize the monitor's address space.
STEP 9 Start with the first overlapping. Figure out the
amount of words by which the first PSECT overlaps its
following PSECT. Now add this value to the start
location of the overlapped PSECT. This value quite
possibly will be location within a page i.e. an
address of the form 125300, where the page number is
125 and the offset into the page is 300. The starting
address of many PSECTs is required to be on a page
boundary i.e. an address of the form 126000. A good
rule to follow is: IF THE PSECT STARTED ON A PAGE
BOUNDARY BEFORE THE BUILD, THEN KEEP IT ON A PAGE
BOUNDARY. This would mean that you may be required to
add an additional value to round up to the next page.
For example the 125300 value would be rounded to
126000 if the PSECT is required on a page boundary.
The PSECT sequence and starting values are in the
LNKSCH.CCL file. NOTE: the values are all given in
OCTAL so add in OCTAL.
STEP 10 EDIT the LNKSCH.CCL file to reflect this new start
value for the overlapped PSECT. Go back to STEP 6.
Repeat these steps until there are no more error
messages. Note that changing the start location of the
overlapped PSECT can cause it to overlap its following
PSECT and the same procedure must be followed to
resolve any conflicts. Of course you must be careful
to ensure that you do not outgrow the monitors address
space. A total of the length of all PSECTs will tell
you if the Monitor is too large.
STEP 11 At this point you should have a good Monitor. Save it
in the proper directory. The final test is getting it
up and running.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 128
MONITOR BUILDING NOTES FOR RELEASE 6
MONITOR BUILDING NOTES FOR RELEASE 6
------------------------------------
There have been even more changes in POSTLD processing and the
monitor's address space for version 6 of TOPS-20, some of which should
be taken into account when attempting to build a new monitor from the
control file. This is a list of the changes from the version 4.1/5.1
build procedures.
1. A new file SYSFLG.MAC appears for version 6 builds. This file
is not used explicitly in the customer build, but is used to
create PROLOG.UNV (and BOOT, by the way). SYSFLG contains
system configuration flags and conditional settings, and
replaces the files KSPRE.MAC and KLPRE.MAC which now disappear
(into PROLOG for the most part), along with PROKS.UNV and
PROKL.UNV.
2. The command file ASEMBL.CMD has been split more or less
arbitrarily into two files: ASMBL1.CMD and ASMBL2.CMD to
perform exactly the same function, but to put less of a burden
on the EXEC. These two files now contain comments about the
files to be compiled also, by the way.
3. There is a change in the DDT dialog used to establish the
breakpoints for BUGHLT and BUGCHK. An additional breakpoint is
set at DDTIBP (which is XCT'ed by DDTINI) with the breakpoint
set to proceed when hit. The purpose for this is so that when
the monitor reaches a given state in initializing the system
paging, we hit a DDT breakpoint and DDT can then sense the
state of the world, according to the monitor, and can set its
own internal state however it needs to reflect extended
addressing considerations for EDDT.
4. POSTLD now tries to make PSECT juggling easier by making one
try itself. If the given configuration does not work due to
overlaps, POSTLD will try to write what should be a working set
of values (if possible) into two new files: LNKNEW.CCL and
PARNEW.MAC. It will then have BATCON transfer to an error
label, where the monitor load is tried again using these new
files. There is a third new parameter file: LNKINI.CCL that
is used in conjunction with LNKNEW.CCL, and does not contain
PSECT settings, which is also used in the try-again load.
5. The format of the PSECT map printed by POSTLD has changed very
slightly, but the content is still the same. There are some
new PSECTs.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 129
MONITOR BUILDING NOTES FOR RELEASE 6
6. POSTLD now writes the MONITR.EXE file using extended sections.
This has some implications for BOOT, which must now know about
extended sections, and any other program which might have some
embedded knowledge of what the monitor .EXE file looks like.
7. The BUGSTF conditional feature has been removed, since the
bugstrings have been moved out of the way, and there is no
additional benefit derived from deleting them.
8. The HIDSYF conditional/feature has likewise been removed, as it
is assumed that the monitor symbol table is always hidden now.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 130
EXEC Debugging
EXEC DEBUGGING
--------------
Now that most SWS have micro fiche of the released EXEC and MONITOR I
anticipate questions on looking at the EXEC and MONITOR. Here is a
cursory tutorial on investigating the internals of the EXEC (or command
processor, if you prefer). The examples are intended to be a guide and
although the typein is correct, the response may not be character
perfect. You are advised to read the other chapters in this document
for more information on DDT and MONITOR snooping and debugging.
LOOKING AT THE EXEC WITH DDT
----------------------------
You can either look at the running system EXEC or your own copy of the
EXEC with DDT that is loaded with the EXEC.
I. TO LOOK AT THE RUNNING EXEC:
First you must have WHEEL privileges in order to use the ^EEDDT command.
The ^EEDDT command transfers control to the DDT now loaded with EXEC,
with symbols. Now you can do all the normal DDT functions. To exit
from DDT all you do is <ESC>G , echoed as $G. This starts your program
which is the EXEC and so now you are at EXEC command level.
@ENABLE
$^EEDDT
DDT
.
.
.
$G
$DIS
@
II. TO LOOK AT YOUR COPY OF AN EXEC (RUNNING UNDER SYSTEM EXEC):
Get your copy of the EXEC in your address space, transfer control to it
and start DDT as above. There are 3 ways to exit from this depending on
the state you are in. If you are in DDT you can ^Z out to get back to
system EXEC. If you are running your EXEC and want to exit to the
system EXEC you can ^EQUIT (if you are enabled) or "POP" (if you are not
enabled). POP is preferable. Note if you prefer to get your EXEC and
not start it in order to set breakpoints or put in patches before
running, see section "VI -- PATCHING" below.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 131
EXEC Debugging
EXAMPLE EXITING FROM DDT:
@GET MYEXEC.EXE
@SET NO CONTROL-C-CAPABILITY
@START
@MONNAM.TXT, TOPS-20 MONITOR (VERSION#)
@ENA
$^EEDDT
DDT
.
.
.
CINITF/ -1 0 ; reset initialization flag to
. ; run this EXEC again after saving
.
^Z ; to exit and save, for example
@ ; now you are in the monitors EXEC
; with your EXEC in your address
@SAV MYEXEC.EXE.2 ; space. You can save it, say.
EXAMPLE, EXITING FROM YOUR RUNNING EXEC:
@GET MYEXEC.EXE
@START
@MONNAM.TXT,,TOPS-20 MONITOR(VERSION #)
@ENA
$^EEDDT
DDT
.
.
.
CINITF/ -1 0 ; clear initialization flag
$G ; running your EXEC
.
.
$^EQUIT ; return to higher (system) EXEC
$ ; you are in system EXEC
$SAV NEWEXEC ; etc.
EXAMPLE, EXITING FROM YOUR RUNNING EXEC WITH POP:
@GET MYEXEC.EXE
@START
@MONNAM.TXT,,TOPS-20 MONITOR(VERSION#)
@
.
.
.
@POP ; return to higher (system) EXEC.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 132
EXEC Debugging
@ ; now you are in system EXEC.
; NOTE: you should set CINITF to 0
; if you want to save and run this
; EXEC later. You can do it by DDT
; after the POP or ^EEDDT before.
III. GETTING OUT OF TROUBLE:
Since it is true that you could get into trouble with your EXEC and
not be able to get out of it, CTRL/C traps or you can't POP or whatever,
there is a way to exit to the MINI-EXEC always. First you must issue
^EQUIT to get into the MINI-EXEC. Then "S" (start) to get back to the
system EXEC. Then get into your EXEC. If you now get into trouble you
can issue ^P which will get you back into the MINI-EXEC. Now you have
the chance to get back to the system EXEC with "S" (start).
EXAMPLE:
@ENA
$^EQUIT
INTERRUPT AT 15657
MX>S
$ ; now back at system EXEC.
$GET MYEXEC
$
$START
@MONNAM.TXT, TOPS-20 MONITOR (VERSION)
. ; let's say your EXEC can't
. ; do anything - you are hung
. ; get out, get into MINI-EXEC
^P
INTERRUPT AT 12345
MX>S ; MINI-EXEC prompt then start.
$ ; now back at the system EXEC.
IV. RUNNING YOUR EXEC AS A TOP LEVEL FORK:
Suppose that you want to run your EXEC as the top level EXEC, that
is, not running under the system EXEC. Get into the MINI-EXEC and get
your copy of the EXEC and run it as the top level EXEC.
EXAMPLE:
@ENA
$^EQUIT
INTERRUPT AT 23456
MX>R ; Reset so you will MERGE not GET
MX>G <MYAREA>MYEXEC.EXE.2
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 133
EXEC Debugging
MX>S
@ ; Now you are in your EXEC
.
.
. ; Lets say you want to get out
@^P ; Control-P to get to MINI-EXEC
INTERRUPT AT 12345
MX>R ; "RESET" resets your address space
MX>E ; You are requesting the system EXEC
@ ; You are in system EXEC
NOTE: If you had typed "S" rather than "E" above you would
have restarted your EXEC.
V. REPLACING THE SYSTEM EXEC
Once you have made a change to your personal copy of the EXEC, you
may wish to have your edited EXEC run as the SYSTEM EXEC. It is
necessary to make the saved EXEC non-writable before using it
system-wide.
EXAMPLE:
@ENABLE (CAPABILITIES)
$GET (PROGRAM) PS:<SYSTEM>EXEC.EXE
$INFORMATION (ABOUT) MEMORY-USAGE
81. pages, Entry vector loc 6000 len 3
0 PS:<SYSTEM>EXEC.EXE.1 1 R, CW, E
6-125 PS:<SYSTEM>EXEC.EXE.1 2-121 R, E
$!MAKE THE EXEC WRITABLE SO WE CAN EDIT IT
$SET PAGE-ACCESS (OF PAGES) 6:125 (ACCESS) COPY-ON-WRITE
$DDT
DDT
. ;Make the edits
.
^Z
$
$!MAKE THOSE PAGES NON-WRITABLE
$SET PAGE-ACCESS (OF PAGES) 6:125 (ACCESS) NO WRITE
$SET PAGE-ACCESS (OF PAGES) 6:125 (ACCESS) NO COPY-ON-WRITE
$!SAVE THE NEW EXEC
$SAVE EXEC.EXE.2 !New generation! (PAGES FROM) 6 (TO) 125
EXEC.EXE.2 Saved
$!RENAME THE SYSTEM EXEC SO WE CAN GET IT BACK IF WE NEED IT
$RENAME (EXISTING FILE) SYSTEM:EXEC.EXE SYSTEM:OLD-EXEC.EXE
$!AND COPY THE NEW ONE INTO PS:<SYSTEM>
$COPY (FROM) EXEC.EXE (TO) SYSTEM:EXEC.EXE.197 !New generation!
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 134
EXEC Debugging
VI. OTHER INFORMATION:
When searching for symbols you may notice that the module name DDT
gives you is different from the module names that are assembled for the
EXEC. For example to open the symbol table for EXECED you say CANDE$:
to DDT.
The following is a correspondence list for EXECs before version 5:
FILENAME MODULE NAME FILENAME MODULE NAME
=========================== ===========================
EXECDE.MAC XDEF EXEC0.MAC EXEC0
EXECGL.MAC XGLOBS EXEC1.MAC EXEC1
EXECPR.MAC PRIV EXEC2.MAC EXEC2
EXECED.MAC CANDE EXEC3.MAC EXEC3
EXECCS.MAC CSCAN EXEC4.MAC EXEC4
EXECSU.MAC SUBRS EXECMT.MAC EXECMT
EXECVR.MAC VER EXECQU.MAC EXECQU
EXECMI.MAC MIC EXECSE.MAC EXECSE
EXECP.MAC EXECP
For Release 5 of the EXEC, the TITLE statements in the EXEC modules have
been changed to match the module names so that this concordance is no
longer necessary.
The sources and .CTL file for assembling the EXEC are part of the
SWSKIT.
If it is true that upon trying to examine a location symbolically
you get "U" implying the symbol is undefined you may have to reset the
symbol table pointers. Look in location 770001 for the address that
contains the symbol table pointer then look at location 116 to find the
real symbol table pointer. Put the contents of 116 in the location
pointed to by 770001.
116/ 762600,54463 ; real symbol table pointer
770001/ 776456 ; location of symbol table pointer
776456/ 743200,,23540 762600,,54463
VII. PATCHING
There is a patch command in DDT. The form is as follows:
$< ; patch before this instruction
$$< ; patch after this instruction
$> ; end patch following this instruction
DDT will put the patch in the EXEC patch area. The symbol is PAT.. DDT
will insert JUMPA 1,LOC+1 and JUMPA 2,LOC+2 following the patch you
typed in. Where LOC is the location of the instruction you're patching.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 135
EXEC Debugging
DDT then replaces LOC, the original INST., with a JUMPA XXXXX, where
XXXXX is the patch area where your patch is now. Then the patch area
(PAT..) is redefined to follow your last patch.
EXAMPLE:
Get a copy of <SYSTEM>EXEC, insert calls to subroutine MUMBLE and
subroutine FRATZ before location DING+1. DING+1 contains PRINT Q3
originally and contains a JUMPA to the patch area after the patch. The
patch area will contain:
CALL MUMBLE
CALL FRATZ
PRINT Q3
JUMPA 1,DING+2
JUMPA 2,DING+3
USER TYPESCRIPT FOR THE ABOVE:
@ENABLE
$GET<SYSTEM>EXEC
$SAVE NUEXEC ; you must SAVE and GET in order to
$GET NUEXEC ; write-enable the EXEC and use DDT
$DDT ; instead of ^EEDDT
DDT
EXEC0$: ; open symbols for module where DING is
DING/ PUSH P,A ; first location in routine "DING"
DING+1/ PRINT Q3 $< ; begin patching before location DING+1
PAT../ 0 CALL MUMBLE ; DDT opens up PAT.. area, you add code
PAT..+1/CALL FRATZ ; continue to insert your patch
$> ; close the patch
PAT..+2/ PRINT Q3 ; the original instruction being replaced.
PAT..+3/ JUMPA 1,DING+2 ; DDT inserts this return.
PAT..+4/ JUMPA 2,DING+3 ; incase a SKIP inst.
DING+1/ JUMPA 12345 ; JUMPA to PAT.. replaces original LOC.
$G ; start your copy of EXEC etc.
Various methods may be used to write-enable the EXEC for patching.
You can use the GET, SAVE method above, or SET PAGE n COPY-ON-WRITE, or
the $W command in DDT to achieve the same results.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 136
Recovering from a Bad EXEC
RECOVERING FROM A BAD EXEC
--------------------------
This procedure is simply a rehash of the procedure for recovering
from the case in which the EXEC refuses to log in. For more
information see the article "Looking at the EXEC with DDT".
If your system version of the EXEC blows up completely, you can
recover rather easily. You type a ^C on the CTY, and when the EXEC
blows up you will be dumped into the MINI-EXEC. Then you can use the
GET and START commands to read in a good version of the EXEC, either
from a copy on disk, or from the distribution magtapes.
If the problem with the EXEC is that it does not blow up, but it
still fails to let you log in, then you have a harder time. In this
case you have to bring up the system with the switches, and bring up
the system stand-alone. An example of what to do from the point where
the BOOT program is loaded follows:
BOOT>/L ; load in the monitor
BOOT>/G141 ; start up EDDT
EDDT
DBUGSW[ 0 2 ; set system as debugging
EDDTF[ 0 1 ; keep EDDT around
GOTSWM$B ; set a breakpoint after the swappable
; part of the monitor has been loaded
147$G ; start the system
GOTSWM$1B>> STEX+1/ HRROI T2,BOOTER+51 HRROI T2,FFF
FFF[ ""PS:<SYSTEM>OLD-EXEC.EXE"
FFF: ; change the name of the EXEC file
0$1B ; remove the GOTSWM breakpoint
$P ; proceed to bring up the system
^C ; and Control-C to get the new EXEC
If you had no old version of the EXEC around, then change the name to
some garbage, so that the monitor can't find any such program. This
will then dump you into the MINI-EXEC, and then you can read a good
EXEC in from magtape.
In release 3 of the monitor, there is a new JSYS which is very
useful for debugging new versions of the EXEC. The CRJOB JSYS can
allow you to start up a new job with any program at all as it's top
level fork. You can also start the job not logged in. So you can
debug your new versions of the EXEC easily, with no possibility of
ripping yourself off. Of course the ^EQUIT, GET from MINI-EXEC is
still a valid sequence for starting a new top-level fork.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 137
Debugging the GALAXY System
Debugging the GALAXY System
---------------------------
1.0 INTRODUCTION
The GALAXY system presents a unique problem to the software specialist
who is trying to debug one of its components. Usually, any user mode
program can be debugged under TOPS-20 by running a copy of it, loaded
with DDT, taking appropriate care that nothing is done which will affect
any users of the system. For GALAXY, however, it is very difficult to
not affect users of the system. For example, if you are trying to debug
BATCON, you will find that QUASAR will very happily schedule batch jobs
submitted by other users to be run by your BATCON. If you are not
careful, you can cause those batch jobs to be lost, or at least slowed
down, while you are debugging.
Debugging QUASAR or ORION would be even worse. Users would see PRINT,
SUBMIT, etc. commands hang when you hit a breakpoint in QUASAR.
Operators would be unable to control any system components if you were
breakpointed in ORION. On top of this, the monitor knows about QUASAR,
and you may lose messages which happen when users close a spooled
lineprinter file, or when a job logs out.
To solve these problems, the concept of a "private GALAXY system" has
been implemented in GALAXY and the EXEC. When a private GALAXY system
is operating, all of its components are completely independent of the
primary GALAXY system. QUASAR, the queue maintainer, keeps queues that
are separate from the system queues and are failsofted to a different
master queue file. This QUASAR communicates only with other components
in the same private system. It is even possible to run several complete
private GALAXY systems, with the restrictions that:
1. All components in a private system must run under the same user
name.
2. Only one private system may be run by a given user.
3. Each private QUASAR must be connected to a different directory.
4. Each private ORION must be connected to a different directory.
NOTE
This text is oriented towards version 4.0
of GALAXY, and there may be slight
differences for version 4.2 or later.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 138
Debugging the GALAXY System
2.0 BUILDING A PRIVATE GALAXY SYSTEM
Since the changes necessary to create a private GALAXY system were
implemented in the version 4 source code, it is relatively simple to
build the system. The recommended procedure is as follow:
1. Create a directory for the private GALAXY system.
2. Restore the file EXEC-FOR-DEBUGGING-GALAXY.EXE from the SWSKIT
to this newly created directory. For Release 5 of the EXEC,
the distributed EXEC replaces the need for this special
program.
3. Restore each of the following files from the proper saveset on
the TOPS-20 distribution tape to this directory.
BATCON.EXE PLEASE.EXE
CDRIVE.EXE QMANGR.EXE
GLXLIB.EXE QUASAR.EXE
LPTSPL.EXE SPRINT.EXE
OPR.EXE SPROUT.EXE
ORION.EXE
4. For each component in the above list except GLXLIB.EXE and
QMANGR.EXE, perform the following steps:
1. Give the EXEC command "GET xxxxxx.EXE"
2. Give the command "DEPOSIT 135 -1"
3. Give the command "SAVE xxxxxx"
3.0 EXAMPLE OF A PRIVATE GALAXY BUILD
It is not strictly necessary to restore all of the GALAXY components for
a one time only debugging session. To debug a component like BATCON,
you would need at a minimum:
1. Your own copy of BATCON
2. Your own copy of QUASAR for BATCON to speak to
3. Your own copy of ORION for BATCON and QUASAR to speak to
4. A copy of OPR to speak to ORION to control BATCON
5. An EXEC which knows about your QUASAR to make queue entries
The following is a log of an example build of a private GALAXY system:
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 139
Debugging the GALAXY System
@ENABLE (CAPABILITIES)
$!
$! First connect to a debugging directory
$!
$CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG>
$!
$! Now build and save debugging .EXE files
$!
$! QUASAR, the queue maintainer
$!
$GET (PROGRAM) SYS:QUASAR.EXE.55
$DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
[Shared]
$SAVE (ON FILE) QUASAR.EXE.1 !New file! (PAGES FROM)
QUASAR.EXE.1 Saved
$!
$! ORION, the message clearinghouse
$!
$GET (PROGRAM) SYS:ORION.EXE.53
$DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
[Shared]
$SAVE (ON FILE) ORION.EXE.1 !New file! (PAGES FROM)
ORION.EXE.1 Saved
$!
$! OPR, the operator interface
$!
$GET (PROGRAM) SYS:OPR.EXE.55
$DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
[Shared]
$SAVE (ON FILE) OPR.EXE.1 !New file! (PAGES FROM)
OPR.EXE.1 Saved
$!
$! BATCON, the batch controller
$!
$GET SYS:BATCON.EXE.39
$DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
[Shared]
$SAVE (ON FILE) BATCON.EXE.1 !New file! (PAGES FROM)
BATCON.EXE.1 Saved
$!
$! Now a directory of what we've got
$!
$VDIRECTORY (OF FILES) *.*.*
MISC:<HEMPHILL.GALAXY.DEBUG>
BATCON.EXE.1;P777700 16 8192(36) 13-Feb-80 22:00:37
EXEC-FOR-DEBUGGING-GALAXY.EXE.1;P777700
82 41984(36) 13-Feb-80 04:33:50
OPR.EXE.1;P777700 31 15872(36) 13-Feb-80 22:00:09
ORION.EXE.1;P777700 44 22528(36) 13-Feb-80 21:59:45
QUASAR.EXE.1;P777700 40 20480(36) 13-Feb-80 21:59:27
Total of 213 pages in 5 files
$
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 140
Debugging the GALAXY System
4.0 RUNNING THE PRIVATE GALAXY SYSTEM
Starting and running a private GALAXY system is similar to running
GALAXY in the usual manner. First QUASAR and ORION are started, then
the component you wish to debug. You will also need OPR to issue
operator commands and the EXEC to make queue entries. Since you will
need about five jobs, it is usually most convenient to run each
component as a separate subjob under PTYCON.
4.1 Starting QUASAR
QUASAR and ORION should be started before everything else. Nothing evil
happens if you start them last, but all the other components will be
waiting for these two to start. A suggested procedure is:
1. Define a subjob "Q"
2. Connect to it
3. LOGIN a job under the same user name
4. CONNECT that job to the directory in which you did the private
GALAXY build
5. ENABLE
6. RUN QUASAR
4.2 Starting ORION
Starting ORION is as painless as starting QUASAR:
1. Define a subjob "O"
2. Connect to it
3. LOGIN a job under the same user name
4. CONNECT that job to the directory in which you did the private
GALAXY build
5. ENABLE
6. RUN ORION
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 141
Debugging the GALAXY System
4.3 Starting OPR
OPR starts up using the same formula as QUASAR and ORION:
1. Define a subjob "OPR"
2. Connect to it
3. LOGIN a job under the same user name
4. CONNECT that job to the directory in which you did the private
GALAXY build
5. ENABLE
6. RUN OPR
7. You may now type OPR commands to see if QUASAR and ORION appear
to be healthy.
4.4 Starting The Component To Be Debugged
If the component you wish to debug is QUASAR, ORION, or OPR, then you
have already started it. Breakpoints could have been set, and when they
were hit, the component could have been debugged without any noticable
affect on other users of the system. If you wish to debug PLEASE,
BATCON, LPTSPL, CDRIVE, SPRINT, or SPROUT, do the following:
1. Define a subjob with an appropriate ID (e.g. B for BATCON)
2. Connect to it
3. LOGIN a job under the same user name
4. CONNECT that job to the directory in which you did the private
GALAXY build
5. ENABLE
6. GET the component
7. Enter DDT
8. Set breakpoints, then start the program
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 142
Debugging the GALAXY System
4.5 Starting The Modified EXEC
The file "EXEC-FOR-DEBUGGING-GALAXY.EXE" which has been supplied on the
SWSKIT has exactly two commands added to its repertoire. These are
"^ESET DEBUGGING-GALAXY" and "^ESET NO DEBUGGING-GALAXY". The effect of
these commands is to select which one of two PIDs (Process IDs) to
communicate with: the system QUASAR or the private QUASAR. If "NO
DEBUGGING-GALAXY" is set, then PRINT, SUBMIT, CANCEL, MODIFY, and the
INFORMATION commands will all cause communication with the system
QUASAR. If "DEBUGGING-GALAXY" is set for this EXEC, then the commands
listed will communicate with the private QUASAR run by that user. For
Release 5 or later of the EXEC, the distributed EXEC incorporates this
functionality in the "^ESET PRIVATE-QUASAR" and "^ESET NO
PRIVATE-QUASAR" commands, and the special EXEC is unneeded.
1. Define a subjob "E"
2. Connect to it
3. LOGIN a job under the same user name
4. CONNECT that job to the directory in which you did the private
GALAXY build
5. RUN EXEC-FOR-DEBUGGING-GALAXY (or the Release 5 or later EXEC)
6. ENABLE
7. ^ESET DEBUGGING-GALAXY (or PRIVATE-QUASAR)
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 143
Debugging the GALAXY System
5.0 EXAMPLE DEBUGGING SESSION
The following is a log of a sample debugging session:
TOPS-20 Command processor 4(560)
@!
@! First run PTYCON, so we can control five jobs from one terminal
@!
@PTYCON.EXE.7
PTYCON> !
PTYCON> ! Now start up QUASAR as subjob Q
PTYCON> !
PTYCON> DEFINE (SUBJOB #) 0 (AS) Q
PTYCON> CONNECT (TO SUBJOB) Q
[CONNECTED TO SUBJOB Q(0)]
2102 Development System, TOPS-20 Monitor 4(3245)
@LOG HEMPHILL (PASSWORD)
Job 21 on TTY222 13-Feb-80 22:18:05
Structure PS: mounted
Structure MISC: mounted
@ENABLE (CAPABILITIES)
$!
$! Connect to directory where debugging .EXE files are
$!
$CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG>
$!
$! Finally run the component
$!
$RUN (PROGRAM) QUASAR.EXE.1
% QUASAR GLXIPC Becoming [HEMPHILL]QUASAR (PID = 66000031)
% QUASAR GLXIPC Waiting for ORION to start
^X
PTYCON> !
PTYCON> ! Now start up ORION as subjob O
PTYCON> !
PTYCON> DEFINE (SUBJOB #) 1 (AS) O
PTYCON> CONNECT (TO SUBJOB) O
[CONNECTED TO SUBJOB O(1)]
2102 Development System, TOPS-20 Monitor 4(3245)
@LOG HEMPHILL (PASSWORD)
Job 22 on TTY223 13-Feb-80 22:19:25
Structure PS: mounted
Structure MISC: mounted
@ENABLE (CAPABILITIES)
$!
$! Connect to directory where debugging .EXE files are
$!
$CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG>
$!
$! Finally run the component
$!
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 144
Debugging the GALAXY System
$RUN (PROGRAM) ORION.EXE.1
% ORION GLXIPC Alternate [HEMPHILL]QUASAR (PID = 66000031)
% ORION GLXIPC Becoming [HEMPHILL]ORION (PID = 70000032)
**** Q(0) 22:19:58 ****
% QUASAR GLXIPC Alternate [HEMPHILL]ORION (PID = 70000032)
**** O(1) 22:19:58 ****
^X
PTYCON> !
PTYCON> ! Now start up OPR as subjob OPR
PTYCON> !
PTYCON> DEFINE (SUBJOB #) 2 (AS) OPR
PTYCON> CONNECT (TO SUBJOB) OPR
[CONNECTED TO SUBJOB OPR(2)]
2102 Development System, TOPS-20 Monitor 4(3245)
@LOG HEMPHILL (PASSWORD)
Job 23 on TTY224 13-Feb-80 22:20:29
Structure PS: mounted
Structure MISC: mounted
@ENABLE (CAPABILITIES)
$!
$! Connect to directory where debugging .EXE files are
$!
$CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG>
$!
$! Finally run the component
$!
$RUN (PROGRAM) OPR.EXE.1
% OPR GLXIPC Alternate [HEMPHILL]QUASAR (PID = 66000031)
% OPR GLXIPC Alternate [HEMPHILL]ORION (PID = 70000032)
OPR>
22:19:59 -- Network Node 1031 is Online --
22:19:59 -- Network Node 2137 is Online --
22:19:59 -- Network Node 4097 is Online --
22:19:59 -- Network Node DN20A is Online --
22:19:59 -- Network Node MILL20 is Online --
22:19:59 -- Network Node SYS880 is Online --
OPR>!
OPR>! Let's take a look at our brand new queues
OPR>!
OPR>SHOW QUEUES
OPR>
22:21:21 --The Queues are Empty--
OPR>SHOW STATUS PRINTER
OPR>
22:21:27 --There are no Devices Started--
OPR>^X
PTYCON> !
PTYCON> ! Now start up BATCON as subjob B
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 145
Debugging the GALAXY System
PTYCON> !
PTYCON> DEFINE (SUBJOB #) 3 (AS) B
PTYCON> CONNECT (TO SUBJOB) B
[CONNECTED TO SUBJOB B(3)]
2102 Development System, TOPS-20 Monitor 4(3245)
@LOG HEMPHILL (PASSWORD)
Job 24 on TTY225 13-Feb-80 22:21:49
Structure PS: mounted
Structure MISC: mounted
@ENABLE (CAPABILITIES)
$!
$! Connect to directory where debugging .EXE files are
$!
$CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG>
$!
$! Finally run the component
$!
$RUN (PROGRAM) BATCON.EXE.1
% BATCON GLXIPC Alternate [HEMPHILL]QUASAR (PID = 66000031)
% BATCON GLXIPC Alternate [HEMPHILL]ORION (PID = 70000032)
^X
PTYCON> !
PTYCON> ! Now start up special EXEC as subjob E
PTYCON> !
PTYCON> DEFINE (SUBJOB #) 4 (AS) E
PTYCON> CONNECT (TO SUBJOB) E
[CONNECTED TO SUBJOB E(4)]
2102 Development System, TOPS-20 Monitor 4(3245)
@LOG HEMPHILL (PASSWORD)
Job 19 on TTY226 13-Feb-80 22:23:00
Structure PS: mounted
Structure MISC: mounted
@CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG>
@!
@! Run the special EXEC, which is provided on the SWSKIT
@!
@RUN (PROGRAM) EXEC-FOR-DEBUGGING-GALAXY.EXE.1
TOPS-20 Command processor 4(560)-1
@ENABLE (CAPABILITIES)
$!
$! Make this EXEC switch from system queues to private queues
$!
$^ESET DEBUGGING-GALAXY
$!
$! Use ordinary EXEC commands to examine private queues
$!
$INFORMATION (ABOUT) OUTPUT-REQUESTS
[The Queues are Empty]
$INFORMATION (ABOUT) BATCH-REQUESTS
[The Queues are Empty]
$!
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 146
Debugging the GALAXY System
$! Now switch back to look at system queues
$!
$^ESET NO DEBUGGING-GALAXY
$INFORMATION (ABOUT) OUTPUT-REQUESTS
Printer Queue:
Job Name Req# Limit User
-------- ---- ----- ------------------------
* KLERR 6 1197 DEUFEL On Unit:0
Started at 22:05:47, printed 314 of 1197 pages
XXX 3 18 KAMANITZ /Dest:4097
MS-OUT 18 117 BRAITHWAITE /Unit:0
There are 3 Jobs in the Queue (1 in Progress)
$INFORMATION (ABOUT) BATCH-REQUESTS
Batch Queue:
Job Name Req# Run Time User
-------- ---- -------- ------------------------
* DUMP 16 02:00:00 OPERATOR In Stream:0
Job# 17 Running DUMPER Last Label: A Runtime 0:23:55
BATCH 2 00:05:00 BLIZARD /Proc:FOO
SOURCE 8 00:05:00 BLOUNT /After:14-Feb-80 0:00
SRCCOM 12 00:05:00 MURPHY /After:14-Feb-80 0:00
QJD4R 13 00:05:00 SROBINSON /After:19-Feb-80 0:00
QAR 10 00:05:00 BLOUNT /After:19-Feb-80 0:14
SAVE 1 00:05:00 FICHE /After:19-Feb-80 9:10
There are 7 Jobs in the Queue (1 in Progress)
$!
$! Now let's submit a batch job to our own BATCON
$!
$^ESET DEBUGGING-GALAXY
$!
$! Make a trivial batch control file
$!
$COPY (FROM) TTY: (TO) A.CTL.1 !New file!
TTY: => A.CTL.1
@SY A
^Z
$!
$! And submit the job
$!
$SUBMIT (BATCH JOB) A.CTL.1
[Job A Queued, Request-ID 1, Limit 0:05:00]
$!
$! Now examine private queues
$!
$INFORMATION (ABOUT) BATCH-REQUESTS
Batch Queue:
Job Name Req# Run Time User
-------- ---- -------- ------------------------
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 147
Debugging the GALAXY System
A 1 00:05:00 HEMPHILL
There is 1 Job in the Queue (None in Progress)
$!
$! Our job is in the batch queue, but no batch-streams have been started
$!
$^X
PTYCON> CONNECT (TO SUBJOB) OPR
[CONNECTED TO SUBJOB OPR(2)]
OPR>START (Object) BATCH-STREAM (Stream Number) 0
OPR>
22:25:40 Batch-Stream 0 --Startup Scheduled--
22:25:40 Batch-Stream 0 --Started--
OPR>
22:25:40 Batch-Stream 0 --Begin--
Job A Req #1 for HEMPHILL
OPR>
22:25:51 Batch-Stream 0 --End--
Job A Req #1 for HEMPHILL
OPR>
^X
PTYCON> !
PTYCON> ! Cleaning up is easy
PTYCON> !
PTYCON> KILL (SUBJOB) ALL
PTYCON> EXIT (FROM PTYCON)
@
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 148
Debugging the GALAXY System
6.0 TECHNICAL DETAILS
This section is to explain what happens differently when a component has
had location 135 (.JBOPS) poked to -1, and to present a few helpful
tidbits of information about debugging some of the programs. .JBOPS
incidentally is the word in the job data area (defined under TOPS-10)
which is reserved for a program's OTS. GALAXY references this location
by the symbol "DEBUGW".
6.1 GLXLIB
GLXLIB is the GALAXY library. It consists of a code segment which
starts at address 400000 and a data segment at address 600000. Each of
the programs QUASAR, ORION, OPR, PLEASE, BATCON, LPTSPL, CDRIVE, SPRINT,
and SPROUT uses it. Part of the initialization code of each of these
programs maps in GLXLIB as a "high segment". This is in effect an
object time system for GALAXY, with many commonly used routines. Most
of the support for the private GALAXY system is in this library, enough
so that OPR, PLEASE, BATCON, LPTSPL, SPRINT and SPROUT actually have no
code which cares whether they are part of a private GALAXY. The
initialization code in each component looks in three places to find
GLXLIB.EXE: first on the structure and directory that the component
itself came from, second on DSK:, third on SYS:. This search order is
the same for both the system GALAXY and the private one.
The actual changes implemented for the private GALAXY are as
follows:
1. Ordinarily, a component which stopcodes will save a crash file
on disk. When debugging, however, the crash file is not
written. In either case, if DDT is loaded with the program,
the stopcode will invoke a jump to DDT.
2. GALAXY components do not require receiving privileged packets
under debugging.
3. Ordinarily, QUASAR and ORION get special system PIDs for IPCF
communications. When debugging, they get PIDs with names of
the form "[username]QUASAR" and "[username]ORION". All GALAXY
components will then look for these PID names. Even a
pseudo-GALAXY component, such as MOUNTR or IBMSPL, will be able
to find these PIDs if its location 135 has been poked to -1,
simply because it uses GLXLIB.
4. GALAXY components print messages like:
"% QUASAR GLXIPC Waiting for ORION to start"
only while debugging.
5. ORION and QUASAR print messages about PIDs they acquire, like:
"% QUASAR GLXIPC Becoming [HEMPHILL]QUASAR (PID =
66000031)"
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 149
Debugging the GALAXY System
6. All components print messages about the special PIDs they find
for QUASAR and ORION, like:
"% ORION GLXIPC Alternate [HEMPHILL]QUASAR (PID =
66000031)"
6.2 QUASAR
1. QUASAR reads and writes private queues from its connected
directory. The full filespec is
"DSK:PRIVATE-MASTER-QUEUE-FILE.QUASAR"
2. QUASAR does absolutely no privilege checking. Anyone can
modify or kill any request in the queues (if they know how to
speak to this private QUASAR).
6.3 ORION
1. ORION will create a log file under the name of
"DSK:ORION-TEST.LOG" instead of
"PS:<SPOOL>ORION-SYSTEM-LOG.001", and does no renaming of any
old log files present.
2. ORION will not set up any NSP servers when debugging. It
therefore will not speak to remote nodes to run OPRs for them.
However, there are hooks for ORION to initialize "SRV:128"
instead of the usual "SRV:47" when debugging.
6.4 QMANGR
QMANGR has also been modified to look for a private QUASAR's PID if the
low segment has a non-zero entry in .JBOPS.
6.5 CDRIVE
CDRIVE can pose a problem to debug, since it has potentially many
inferior forks all executing the same code, so each fork automatically
loads SDDT into its address space and jumps to it when it starts up.
After setting any breakpoints or otherwise modifying this fork's code,
the debugger types "GO<ESC>G" to resume the fork. While debugging, if
the fork terminates (crashes), CDRIVE will not go through its normal
purging of the crashed fork, so that its status can be examined.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 150
Debugging the GALAXY System
7.0 EXAMINING GALAXY CRASH FILES
All GALAXY components use the stopcode facility supplied by GLXLIB.
This facility dumps the ACs, program error codes, associated error
messages, program version numbers, and the last nine locations of the
stack onto the controlling terminal of the program executing the
stopcode. In addition, a crash file is created with the name of the
form: PS:<SPOOL>program-stopcode-CRASH.EXE. This .EXE file contains
the entire core image of the program which has crashed, and is extremely
useful in determining the cause of the crash. In particular, there is a
block of data referred to as the "crash block" which usually contains
the information most pertinent to the debugger. This information can be
read with either DDT or FILDDT. Its contents are tabulated as follows:
Location Data
.SPC PC of stopcode
.SCODE SIXBIT name of stopcode
.SERR Last TOPS-20 error code
.SACS Contents of the sixteen accumulators
.SPTBL Base address of page table used by
GLXMEM
.SPRGM Name of program in SIXBIT
.SPVER Program version number
.SPLIB GLXLIB version number
.LGERR Last GALAXY error code
.LGEPC PC of last GALAXY error return
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 151
Debugging MOUNTR
DEBUGGING MOUNTR
----------------
1.0 INTRODUCTION
This write-up was prepared to assist developers and maintainers in
understanding and debugging the TOPS-20 tape and structure mounting
program, MOUNTR. It is assumed that the reader has a working knowledge
of TOPS-20 assembler language coding and the set of TOPS-20 monitor
calls.
2.0 SOURCES OF INFORMATION
This document will serve primarily as a guide to debugging MOUNTR
crashes. Much of the information needed to understand the data bases
and the operation of MOUNTR resides within the first 20 or 30 pages of
the MOUNTR code itself. Just make a listing and start reading.
3.0 DEBUGGING A LIVE MOUNTR
MOUNTR can be debugged as a standard GALAXY component, by
depositing -1 in location 135 of MOUNTR.EXE. MOUNTR will aquire a PID
for a private copy of QUASAR and will communicate with it.
To debug a MOUNTR which is actually recognized by the system as the
"real" MOUNTR it is usually best to run it as a seperate job by
including the following commands in SYSJOB.RUN:
JOB n /LOGIN OPERATOR XX OPERATOR
ENABLE
GET SYS:MOUNTR
START
/
This job can be reached by use of the ADVISE command, MOUNTR can be
killed and a new copy can be started with appropriate breakpoints or
patches installed. Before MOUNTR can be patched or breakpointed it is
necessary to issue the DDT command $W since MOUNTR write protects itself
during execution. For example:
@ENABLE
$ADVISE OPERATOR
TTY2, NRT20
TTY235, OPR
TTY234, MOUNTR
TTY233, PTYCON
TTY232, EXEC
TTY: 234
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 152
Debugging MOUNTR
[Pseudo-terminal, confirm]
Escape character is <CTRL>E, type <CTRL>^? for help
OPERATOR Job 3 MOUNTR
LINK FROM MOSER, TTY 60
[Advising]
^C !KILL OLD MOUNTR
^C
$GET SYS:MOUNTR !GET A NEW ONE
$DDT !ENTER DDT
DDT
$W !YOU MUST DO THIS
USM/ MOVEI 2,BSRRTA# .$B !SET SOME BREAKPOINTS OR WHATEVER
DDSCIH/ JSP 16,SAVEQR# .$B
^Z !EXIT DDT
$START !START MOUNTR
Depositing 1 in location CDFLG will enable CONTROL-D interrupts.
Typing CONTROL-D when enabled causes MOUNTR to enter DDT.
4.0 MOUNTR CRASHES
When MOUNTR crashes, it saves its core image in the file,
PS:<SPOOL>MOUNTR-CRASH.EXE
All crashes are initiated by a CALL STOP instruction. This may result
from a logic inconsistency, or it can happen if MOUNTR receives a
software interrupt on a panic channel. The STOP routine gathers some
important data and saves it in core. It then types a message giving the
name of the filespec wherein it is saving the core image, and issues an
SSAVE JSYS to save the image. After restoring the ACs from the time of
the crash, MOUNTR halts.
To begin debugging a MOUNTR crash, follow these steps:
1. GET PS:<SPOOL>MOUNTR-CRASH.EXE
2. Get into DDT and type STOP1$G. This will load DDT's ACs with
MOUNTR's ACs at the time of the crash and exit to the EXEC.
Give the DDT command to the EXEC again to get back into DDT.
3. Look at P (AC 17). If it contains PDL1+something, there has
been a stack trap, and the routine STOPP was called as a
result. The location BADP contains the contents of P at the
time of the trap.
4. If P contains PDL+something, type TAB to look at the top of the
stack. This will contain one plus the address of the CALL STOP
instruction. Type TAB and ^H to display the CALL STOP
instruction that invoked the crash. If MOUNTR died as a result
of a panic channel interrupt, LPC1 will contain one plus the
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 153
Debugging MOUNTR
address of the instruction which was executing at the time of
the interrupt.
The following locations and data structures are helpful in locating the
cause of difficulties in MOUNTR:
NAME FUNCTION
---- --------
CRSHAC Contains the ACs at the time the STOP routine was called.
LPC1 For crashes caused by panic channel interrupts, LPC1 contains
one plus the address of the instruction that caused the crash.
LSTERR Contains the last TOPS-20 error.
MRPDB PDB for last IPCF message received by MOUNTR
MSTRBK Used as an argument block for MTOPR and MSTR monitor calls.
RBUF Last IPCF message received by MOUNTR (particularly useful if
SSSDAT+1 contains MRCVIH, indicating that MOUNTR crashed while
processing an incoming IPCF message).
SSSDAT When MOUNTR crashes, SSSDAT+1 contains the address of the
routine that was invoked by MOUNTR's scheduler. Starting here
and using the stack, you can trace the execution of MOUNTR's
code that led to the crash.
TBUF Last IPCF message sent by MOUNTR.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 154
Debugging PA1050
DEBUGGING PA1050
----------------
In order to debug the compatibility package you must have a copy of the
file called PAT.EXE. PA1050 is just the system name for PAT. If there
is no copy of PAT.EXE, then take the source program called PAT.MAC, and
assemble it, thereby creating a sharable save file called PAT.EXE. To
debug the compatibility package the following steps are required.
$RESET
$GET PROG ;Where PROG may be any program you choose
$MERGE PAT ;PAT is the source name for PA1050
$DDT
PAT$: MOVBF$B ;You set your breakpoints here
DEBUG$G
$G ;You must type $G twice because of the double symbol
table
NOTE
Some of the error messages you may receive
from PA1050 may not be the true error
message. To have the correct error
message printed out use an ERJMP, or an
ERCAL after the JSYS it fails on. For
more information on ERJMP and ERCAL refer
to the Monitor Calls Reference Manual.
In order to build the compatibility package the following steps are
required:
$LOAD /CREF PAT.MAC
$START
$SAVE PAT
$GET PAT
$DDT
MAKEPF$G
Output file: PA1050.EXE
$DDT
UDDT
40000,,0$X
^Z
$I MEM
The start after loading causes the program to be moved from its location
to its running location in high core. The symbol table is also moved,
and the pointer adjusted. A sharable save file of pages 700-777 must be
made for debugging. This is created when you MAKEPF$G, then execute
40000,,0 in UDDT. When you type I MEM you should now have PA1050.EXE in
700-730.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 155
Copying Floppy Disks
COPYING FLOPPY DISKS
--------------------
This is a description of the front end program COP (quick floppy
copy). This program should be used to create backup copies of the
distributed set of floppies.
CAUTIONARY NOTES ABOUT FLOPPY DISKS:
1) Only IBM floppies should be used. Other floppies may
destroy the DX11 drives.
2) Floppies have a finite life while mounted in the
drive. The heads do not float, and the floppies turn
continuously. This causes the magnetic surface to be
eaten away. Minimum floppy life is something like 200
hours.
3) Floppies which are dropped, badly shocked, or used as
frisbees will lose their sector headers, and will be
good for nothing.
4) Never put a floppy which you suspect is bent into the
drive -- it may damage the drive.
5) COP is discussed also in the Front End File System
Specification manual in Volume 14 of the TOPS-20
Software Notebooks, section 3.2.
COP COMMANDS:
The basic COP command string is of the form:
COP> <destination device>/<switch>=<source device>
To enter COP, type a Control-backslash to get to the
Parser, then MCR COP to start up COP. The floppies
should have already been mounted with MCR MOUNT, and
should then be dismounted with MCR DMOUNT after the
copy.
COP SWITCHES:
/HE Help, types a list of switches
/RD Read Device, check for errors
/CP Copy (default action)
/VF Verify copy (default when copy in effect)
/ZE Zero the device
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 156
Copying Floppy Disks
COP EXAMPLE:
The following sequence of commands will succeed in
copying the contents of the floppy in DX0: (the left
hand drive) onto the floppy in DX1:, and verifying the
operation.
^\
PAR>MCR MOU
MOU>DX0:
Mount completed
MOU>DX1:
Mount completed
MOU>^Z
^\
PAR>MCR COP
COP>DX1:=DX0:
COP>^Z
^\
PAR>MCR DMO
DMO>DX0:
Dismount Complete
DMO>DX1:
Dismount Complete
DMO>^Z
The copy takes about two minutes, the verify about the same.
Take care to specify the correct source and destination
devices.
CAUTIONARY NOTE--
If you COP for many generations you will build up
ghost bad blocks until RSX will declare the floppy
useless. This is because in each generation the bad
block file of the old floppy is copied onto the new
(which will have its bad blocks in different physical
locations). A way around this is to use PIP for any
non-boot copies once every several generations.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 157
The SWSKIT Documentation Files
THE SWSKIT DOCUMENTATION FILES
------------------------------
Following is a brief synopsis of each article/file that appears in the
SWSKIT documentation. Please note that many of these articles are
preliminary functional specs and discussions, and may contain some
information that is completely false. However, the material is provided
to be used with proper caution because it does provide information not
otherwise available in useful form at this time. Over time, many of
these documents will be replaced by SDC-type materials. For others,
these articles may be the main source of information indefinitely.
TITLE DESCRIPTION
* HANDBOOK This document is the latest revision of the
TOPS-20 Trouble-Shooting Handbook.
ACCOUNTING This article describes the changes made to
allow the billing rates for system usage to
change during the day. It also explains a
feature called disk accounting.
ACCOUNTING-TABLES This file documents the formats of
SYSTEM-DATA.BIN and CHECKPOINT.BIN in
tabular format.
ARCHIVE This document describes the functionality of
archiving, and how to use archiving.
* CFS-INFO This document describes the implementation
of the Common File System (CFS) for TOPS-20.
CI-INFO This document contains files describing the
implentation of support for the CI-20
(KLIPA) I/O port for the DECSYSTEM-20.
* CTERM-INFO This document describes the implementation
of the CTERM protcol terminal links for
TOPS-20.
DDT-INFO This document describes changes made to DDT
for versions 41, 41A, and 43.
DDP This document discusses some aspects of DDP
(Distributed Data Processing) on TOPS-20.
(Very early paper.)
DEBUGGING-GALAXY This document describes how to build a
private GALAXY system for debugging, and
gives hints on debugging various components.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 158
The SWSKIT Documentation Files
DX20 This document gives a brief description of
the configuration requirements for tapes
controlled by the DX20.
EAPGMG This document describes Extended Addressing
and programming in non-zero sections.
EXECUTE-ONLY This document describes the changes,
restrictions, and implementation of an
execute-only file capability on TOPS-20.
GALAXY-TABLES This document is the tables for GALAXY v4.2.
GALAXY-V5 This document contains discussions of the
changes to GALAXY for version 5 and
specifications of the QUEUE% JSYS.
GETOK This document describes three JSYSes:
GETOK, RCVOK, and GIVOK; and also describes
the SMON function.
HSC-INFO This document describes the programming and
use of the HSC-50 storage controller by
release 6.0 of TOPS-20.
IO This document describes some of the aspects
of how IO is done by TOPS-20.
KFS This document explains the functional
specification of the RSX-20F KLINK link.
KLCOM This document describes the KL10/PDP-11
DTE20 protocol. It explains such things as
the protocol messages, error messages, and
bootstrap procedures.
KL873 This document describes the functionality of
all the revisions of the BM873 Bootstrap ROM
for KL10 based on PDP-11 Front-Ends.
LABELED-TAPES This document describes TOPS-20's support of
labeled tapes. It also gives a description
of the monitor calls and support routines
that are used for labeled tapes.
* LAT-INFO This document collects software
specifications for the 10/20 host support of
LAT-based terminals, architecture and
implementation.
LP20 This functional specification describes the
interface to the LP-20 from the KL-10.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 159
The SWSKIT Documentation Files
* MONITOR-ADDRESS-SPACE This document describes the changes to BOOT
and DDT to enable more address space. It
also explains about PSECTS, and overlapping
BGSTR to build monitors and the
write-protecting of the resident monitor and
the parts of the Release 6.0 address space
project, moving things out of section 0/1.
* MONITOR-TABLES This document displays most of the tables in
the monitor. This is a best effort based on
the ED SERVICES materials and will doubtless
be not as complete as the eventual ED
SERVICES document.
MOS This document describes MOS memory and the
TOPS-20 monitor support of TGHA.
MSCP-INFO This document contains the design and
functional specifications for the TOPS-20
implementation of the Mass Storage Control
Protocol (MSCP) server and driver.
* NI-INFO This document describes NIA-20 Ethernet
adapter support for TOPS-20.
PARITY This document describes some of the changes
made to the way parity errors are handled
for Release 5.
* PERFORMANCE This document discusses a number of
performance issues.
RSX-STOP-CODES This documents a list of RSX-20F stop codes,
stating their meaning, and the module that
contains the stop code.
SCA-INFO This document describes SCA, the Systems
Communication Architecture protocol used
over the CI bus.
SCHEDULER This document describes Working Set
Swapping, and Release 4 and 6 Scheduler
changes (Class Scheduler, SKED%, etc.).
* TCPIP This document describes the TCP/IP ARPAnet
software implementation for TOPS-20AN.
USEFE This document outlines how to use the FE
device and program.
* indicates new or updated material for this SWSKIT version.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 160
The SWSKIT Tools Programs
THE SWSKIT TOOLS PROGRAMS
-------------------------
Included on the SWSKIT are a number of utility programs, as summarized
below. These tools have been found to have at least some usefulness in
the past in a debugging environment. Most of these programs require the
user to have WHEEL or OPERATOR privileges to work, but also most are of
the "show and tell but don't touch" category, so they are in general
"safe" to run.
We have cleaned up some of the old ones a bit, added a few new ones, and
checked them all out to the extent that they will all run. There should
even be some documentation, at least a HELP file, with each program.
While we do not actively "support" these programs, we are quite willing
to accept complaints and suggestions and submissions from the field.
These are the "standard" tools; the Marlboro Support Group is generally
familiar with their operation and quirks, and in providing support to
the field may request that one or more of the programs be used at a
customer site to diagnose or assist in correcting a problem. This is
generally more effective than random poking about in DDT, or trying to
learn the peculiarities of whatever the customer may have available.
And now, the current collection:
PROGRAM DESCRIPTION
------- -----------
ACTDMP Converts an ACCOUNTS-TABLE.BIN file back
into a sequence of commands that created
it for debugging purposes.
CHANS Produces system configuration, and status
information on tapes and disks.
DIRPNT Lists the contents of the blocks in a disk
directory.
DIRTST Checks the format, and lists any invalid
data in directory files.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 161
The SWSKIT Tools Programs
PROGRAM DESCRIPTION
------- -----------
DMPTTY Produces internal monitor information on
the state of a given terminal line.
DS Provides software diagnostic help
concerning the disk file system. It can
also perform the functions of READ,
FILADR, and UNITS.
DSKERR Provides a convenient listing of the hard
and soft disk errors that have occurred.
DX20PC Traces the microcode PC in the DX20.
ENVIRONMENT Types out CPU and memory configuration
information.
JSTRAP Produces information in a log on any JSYS,
including the PC and arguments used.
MONRD Allows you to easily examine the running
monitor.
MTEST Allows you to insert MONITOR instruction
execution tests anywhere in the monitor.
REV Allows you to easily alter, edit, delete,
obtain information, etc. on files.
SWSERR Produces a convenient listing of BUG
HLT/CHK/INF occurrences.
TYPVF7 This program is useful for typing out the
contents of a VFU file in a readable form.
UNITS Produces status information on disk
drives.
Page Index-1
TROUBLESHOOTING HANDBOOK INDEX
2020
See KS-10
Address break . . . . . . . . . . . 32
BOOT
commands . . . . . . . . . . . . . 82
getting a dump . . . . . . . . . . 85
Monitor memory pages . . . . . . . 85
BUG macro . . . . . . . . . . . . . 123
BUGCHK . . . . . . . . . . . . . . . 98, 123
BUGHLT . . . . . . . . . . . . . . . 98, 123
BUGINF . . . . . . . . . . . . . . . 98, 123
Crash analysis
AC storage . . . . . . . . . . . . 107
address translation . . . . . . . 96
APR interrupt context . . . . . . 120
ARPANET interrupt level . . . . . 121
BOOT . . . . . . . . . . . . . . . 85
BUGHLT . . . . . . . . . . . . . . 106, 109
Device interrupt context . . . . . 118
DTE Interrupt context . . . . . . 117
DUMPs . . . . . . . . . . . . . . 84
EDDT . . . . . . . . . . . . . . . 93
FILDDT . . . . . . . . . . . . . . 95
front-end dumps . . . . . . . . . 86
general . . . . . . . . . . . . . 103
JSYS context . . . . . . . . . . . 109
materials . . . . . . . . . . . . 87
MDDT . . . . . . . . . . . . . . . 90
monitor locations . . . . . . . . 104, 110, 115
Pager context . . . . . . . . . . 111
PHYSIO context . . . . . . . . . . 118
PSI context . . . . . . . . . . . 114
Scheduler context . . . . . . . . 115
stacks . . . . . . . . . . . . . . 107
DDT
EDDT . . . . . . . . . . . . . . . 93
FILDDT . . . . . . . . . . . . . . 49, 95
MDDT . . . . . . . . . . . . . . . 90
patching TOPS-20 . . . . . . . . . 16
Tricks . . . . . . . . . . . . . . 101
UDDT . . . . . . . . . . . . . . . 89
Directories
Mapping in MDDT . . . . . . . . . 20
problems . . . . . . . . . . . . . 20, 25, 36
Disk debugging . . . . . . . . . . . 49
Page Index-2
Disk parameters . . . . . . . . . . 45, 51
EXEC debugging . . . . . . . . . . . 130
BOOT . . . . . . . . . . . . . . . 136
FKSTAT . . . . . . . . . . . . . . . 53
Floppy disks
copying . . . . . . . . . . . . . 155
GALAXY debugging . . . . . . . . . . 137
CDRIVE . . . . . . . . . . . . . . 149
crash files . . . . . . . . . . . 150
GLXLIB . . . . . . . . . . . . . . 148
MOUNTR . . . . . . . . . . . . . . 151
ORION . . . . . . . . . . . . . . 149
private GALAXY . . . . . . . . . . 138
QMANGR . . . . . . . . . . . . . . 149
QUASAR . . . . . . . . . . . . . . 149
stopcodes . . . . . . . . . . . . 148
Hardware
deficiencies . . . . . . . . . . . 71
disk parameters . . . . . . . . . 45, 51
tape parameters . . . . . . . . . 45, 52
Hung Jobs . . . . . . . . . . . . . 35
Hung SETSPD . . . . . . . . . . . . 36
Hung tapes . . . . . . . . . . . . . 41
Hung terminals . . . . . . . . . . . 35
Job Zero . . . . . . . . . . . . . . 36, 56, 58, 70, 99, 108
JSB . . . . . . . . . . . . . . . . 27, 96-97, 110
KS-10
8080 information . . . . . . . . . 77
BOOT errors . . . . . . . . . . . 78
console information . . . . . . . 74
microcode . . . . . . . . . . . . 76
Legal policy . . . . . . . . . . . . 5
MDDT Operations . . . . . . . . . . 90, 101
Address break . . . . . . . . . . 32
Breakpoints . . . . . . . . . . . 30
CST access . . . . . . . . . . . . 122
directory mapping . . . . . . . . 20
JSB and PSB Mapping . . . . . . . 27
magtapes . . . . . . . . . . . . . 41
MAPDIR . . . . . . . . . . . . . . 20
MSETMP . . . . . . . . . . . . . . 27
Monitor address space
CSTs . . . . . . . . . . . . . . . 122
Job Zero forks . . . . . . . . . . 70
PSECTs . . . . . . . . . . . . . . 68
sections . . . . . . . . . . . . . 67
Page Index-3
Monitor building . . . . . . . . . . 69, 125
POSTLD . . . . . . . . . . . . . . 128
release 6 . . . . . . . . . . . . 128
Monitor locations . . . . . . . . . 97
CDB . . . . . . . . . . . . . . . 45
CHNTAB . . . . . . . . . . . . . . 45
CSTs . . . . . . . . . . . . . . . 122
DBUGSW . . . . . . . . . . . . . . 94, 99
DTE . . . . . . . . . . . . . . . 117
EDDTF . . . . . . . . . . . . . . 94
entry vector . . . . . . . . . . . 100
KDB . . . . . . . . . . . . . . . 45
Page zero . . . . . . . . . . . . 62
Pager . . . . . . . . . . . . . . 114
scheduler . . . . . . . . . . . . 115
See also JSB, PSB
UDB . . . . . . . . . . . . . . . 45
Monitor universal files . . . . . . 69
PA1050 debugging . . . . . . . . . . 154
Page zero . . . . . . . . . . . . . 62
Patch area . . . . . . . . . . . . . 18, 105
PCOs
see SIRUS
PSB . . . . . . . . . . . . . . . . 27, 96-97, 110, 115
Scheduler tests . . . . . . . . . . 53
SIRUS
CHERRY . . . . . . . . . . . . . . 9
commands . . . . . . . . . . . . . 9
system . . . . . . . . . . . . . . 9
SPR
answers . . . . . . . . . . . . . 9
see SIRUS
submission . . . . . . . . . . . . 6
SWPMLK . . . . . . . . . . . . . . . 94
SWSKIT files . . . . . . . . . . . . 5, 124, 157, 160
SWSKIT programs . . . . . . . . . . 5, 160
ACTDMP . . . . . . . . . . . . . . 160
CHANS . . . . . . . . . . . . . . 48, 160
DIRPNT . . . . . . . . . . . . . . 25, 40, 160
DIRTST . . . . . . . . . . . . . . 25, 40, 160
DMPTTY . . . . . . . . . . . . . . 35, 160
DS . . . . . . . . . . . . . . . . 25, 40, 48, 160
DSKERR . . . . . . . . . . . . . . 87, 160
DX20PC . . . . . . . . . . . . . . 44, 160
ENVIRONMENT . . . . . . . . . . . 160
JSTRAP . . . . . . . . . . . . . . 160
MONRD . . . . . . . . . . . . . . 29, 160
MTEST . . . . . . . . . . . . . . 160
REV . . . . . . . . . . . . . . . 160
SWSERR . . . . . . . . . . . . . . 87, 98, 124, 160
TYPVF7 . . . . . . . . . . . . . . 160
UNITS . . . . . . . . . . . . . . 48, 160
Page Index-4
System problems
Hung jobs . . . . . . . . . . . . 35, 53
Hung SETSPD . . . . . . . . . . . 36
Hung tapes . . . . . . . . . . . . 41
Hung terminals . . . . . . . . . . 35
Trashed disks . . . . . . . . . . 36
Tape parameters . . . . . . . . . . 52
Trashed disks . . . . . . . . . . . 36
[END OF HANDBOOK]
FOR DSR RUNOFF