Trailing-Edge
-
PDP-10 Archives
-
BB-L288A-RM
-
swskit-documentation/handbook.mem
There are 5 other files named handbook.mem in the archive. Click here to see a list.
TOPS-20 TROUBLE-SHOOTING HANDBOOK
=================================
Release 4 Edition
RP20 LIR Update
January 1981
TOPS-20 Monitor Group
Marlboro Support Group
Software Services
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 2
INTRODUCTION
INTRODUCTION
------------
This document is the TOPS-20 Trouble-Shooting Handbook. It is a
collection of materials designed to increase the effectiveness of the
Software Specialist in the field in coping with TOPS-20 problems.
Some of the common "disasters" to befall TOPS-20 sites are discussed,
along with debugging methods in general. Though the information
contained herein is probably not sufficient to make a Specialist into
a TOPS-20 "wizard", it should help ease the communication burden
between the Specialist in the field and his counterpart in Marlboro
and lead to quicker resolution of problems.
This document contains materials from many sources, and presents
some information not available anywhere else. Certain sections may be
a bit dated, but an effort has been made to remove at least some of
the old/wrong stuff along with including new articles.
There is a continuing need to update this document as part of the
SWSKIT materials, and Specialists are encouraged to give the Marlboro
Support Group feedback on these materials.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 3
TABLE OF CONTENTS
TABLE OF CONTENTS
1. INTRODUCTION 2
2. TABLE OF CONTENTS 3
3. POLICY STATEMENT 5
4. PRODUCING A GOOD SPR 6
5. USING SIRUS 9
6. DDT PATCHING THE TOPS-20 MONITOR 16
7. MAPPING DIRECTORIES IN MDDT 20
8. RECOVERING FROM DIRECTORY ERRORS 23
9. MORE ABOUT DIRECTORY PROBLEMS 26
10. JSB AND PSB MAPPING 28
11. BREAKPOINTING MULTI-USER CODE 32
12. USING ADDRESS BREAK TO DEBUG THE MONITOR 34
13. RECOVERING FROM SYSTEM DISASTERS 37
14. LOOKING AT HUNG TAPES 43
15. A LOOK AT SOME OF THE DISK STUFF 47
16. NEW DISK FEATURES FOR FILDDT 51
17. TOPS-20 SCHEDULER TEST ROUTINES 54
18. TOPS-20 PAGE ZERO LOCATIONS 61
19. KNOWN HARDWARE DEFICIENCIES LIST 65
20. KS10 CONSOLE INFORMATION 67
21. CRASH ANALYSIS 76
22. MORE CRASH ANALYSIS 95
23. BUG'TYP MACRO CHANGES FOR VERSION 4 OF TOPS-20 112
24. MONITOR BUILDING HINTS 114
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 4
TABLE OF CONTENTS
25. EXEC DEBUGGING 118
26. RECOVERING FROM A BAD EXEC 125
27. DEBUGGING THE GALAXY SYSTEM 126
28. DEBUGGING MOUNTR 142
29. DEBUGGING PA1050 145
30. COPYING FLOPPY DISKS 146
31. THE SWSKIT TOOLS PROGRAMS 148
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 5
POLICY STATEMENT
LEGAL POLICY CONCERNING THE TOPS-20 SWSKIT
There is a great confusion concerning the materials that make up
the SWSKIT tape, and their legal standing. This memo is an attempt to
clear up some of those problems.
The SWSKITs are made up of an assortment of materials intended to
increase the effectiveness of the software specialist. These
materials include program sources not normally distributed or sold for
a premium; internal and company confidential documentation, which may
be in part incomplete or actually incorrect, but supplied for the
information value on subsystems which may be insufficiently documented
through the usual channels; documentation for specialists specially
produced by the corporate support people; and utility programs
produced and maintained to some extent by corporate support. In
addition, the SWSKIT may contain special or pre-release versions of
supported software provided for the incremental value a specialist may
obtain from the software under controlled circumstances. In time,
utilities from the SWSKIT may evolve into supported products.
All of the SWSKIT materials are proprietary to DIGITAL, and were
never intended to be just given to the customer. Obviously, the
materials which are otherwise sold cannot be given away; and the
company confidential materials should not be. While it is expected
that the tools programs may wind up being used at customer sites,
neither are they gifts to the customer. An effort must be made to
protect DIGITAL's rights to these proprietary materials. For
instance, a PL90 contract retains rights to all materials provided to
the customer. Deleting a tool program after use at a customer site
indicates intent. There should be an awareness that if a customer
incurs damages due to use of some program given to him by the
specialist, even though improperly used, then DIGITAL may be seen to
be at least in part responsible. This should be avoided.
In summary, the SWSKIT is a tool provided to increase the
effectiveness of the specialist, especially with regard to PL90 and
debugging activity, but the rights to all materials remain with
DIGITAL and the specialist should act accordingly.
THIS IS NOT A LEGAL DEPARTMENT DOCUMENT. CONSULT LEGAL IF YOU
HAVE ANY DEFINITE PROBLEMS REQUIRING RESOLUTION.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 6
PRODUCING A GOOD SPR
PRODUCING A GOOD SPR
A software specialist is often asked to assist with the
submission of SPRs for a customer. It is always discouraging to
have problems getting an answer to an SPR for entirely
non-technical reasons. For that reason, below are some hints for
producing a "good" SPR which will help in getting the problem
solved more quickly.
1.0 THE SPR FORM
Much of the data on the SPR form is unimportant, until it is
omitted. The line of product data is one. Try to isolate the
problem to the correct component, since that will determine who
first receives the SPR. This will remove the time it takes for,
say the COBOL maintainer, to determine that the problem is not
really in COBOL, but in PA1050 or the monitor, and the time it
takes for the next maintainer to become familiar with the problem.
Something which crashes the system is always a monitor problem,
even if it is an EXEC command which causes the problem, or a short
BASIC program.
If you really have a problem, be sure to mark the "problem"
box, and don't use words like "we suggest you correct the
following situation...". If the people who handle the incoming
paperwork think they have a suggestion, it gets routed elsewhere,
and is never seen by the maintainers. A few problems have been
greatly delayed this way.
The priority boxes are not super-critical, but if you have a
problem which is holding up production, or crashing the system
several times a day, try to make a note of that somewhere in the
description of the problem. That should let the maintainer know
that a work-around may also be appropriate in the short term.
The phone number of the submitter could be important if the
problem is of such a nature that it proves not-reproducible, or
the complexity is such that futher clarification just to
understand the problem might be needed. Your number here as a
software specialist provides a more informal contact than direct
maintainer-to-customer confrontation, although the customer will
be contacted directly if that is most expedient.
The attachments--be sure to mark some of these boxes if you
send along supporting materials. Since these can get separated
from the form, this will help keep them from getting permanently
lost.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 7
THE SPR FORM
The "DO NOT PUBLISH" box is for security problems and ways to
crash the system. We double-check this on incoming handling, but
if the box is checked you can be sure that the SPR will not be
published unanswered.
Describe the problem as clearly as possible in the space
provided. Try to provide enough detail to easily reproduce the
problem. Concentrate on the description of the problem, and any
diagnosis you may have made. Attempting to declare a "cure" is
not always good idea because the actual correction may be of an
entirely different nature for a number of reasons. However, if
you have something that works, the information could be of use.
Just don't count on that exact change being the actual fix. If
the problem is not reproducible from the description given,
chances are that something you left out is relevant to the
problem. Unless the problem directly concerns them, things like
logical names, mounted structures, and other features often
obscure the problem. For the purpose of the problem description,
a terminal listing of an occurrance is often highly desirable, and
it is sometimes a good idea to create a brand-new directory
without any fancy LOGIN.CMD setups or user groups and so on to
demonstrate the problem.
2.0 THE SUPPORTING MATERIALS
As above, the listing from a terminal session is often a very
good attachment. Try to include all the relevant information.
Again, sometimes things like logical names, file and directory
protections, user groups, and other job-state variables are
important and should be included. Inclusion of data such as
program version numbers and edit levels can be useful for products
with large numbers of edits. If you are complaining of monitor
problems, which patches you have installed could be useful
information. Terminal sessions should be as clear as possible.
It should be made obvious just what is going on or the maintainer
may just see a series of commands and think "So?". Concurrent or
after the fact commenting is one way to accomplish this.
Many times there is a program which exercises the bug.
Sometimes these programs are alright as they are, but often they
are giant COBOL monsters working on a multi-RP06 data base, and
very unwieldy for a maintainer to try to work with. If the
program can be reduced to a small subset, do so. Many monitor
problems often turn out to be reproducible from a set of arguments
to a single JSYS. If it is a question of incorrect output from
some program, it is helpful to send along all the files needed to
reproduce the problem, and the files of incorrect output. In the
case of programs with multiple edits to field-image, this speeds
up the maintainer, since he does not have to manually apply those
edits to attempt to recreate your versions, and he can also check
the installation of the edits, if that is appropriate. And in
case the problem proves to be not easily reproducible the bad
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 8
THE SUPPORTING MATERIALS
output can at least be examined for clues.
In the case of a monitor crash, the problem may have been
reduced to a program of less than one page. It is tempting to
type this on the front of the SPR and send it in that way. While
the maintainer can type in the program easily enough (if the copy
is both legible and correct), the submitter has been lax.
Sometimes, that short program will not cause a crash, even though
run thousands of times under varying conditions by the maintainer.
And even when it does cause the crash the first time, the
submitter has lengthened the turn-around by not sending the dump
from the crash along with the SPR. Sending the dump solves both
problems. If the problem is not reproducible with ease, the dump
is vital to further understanding. And having the dump to start
with speeds up the work of the maintainer who now does not need to
schedule stand alone to try to exercise the bug and cause a crash
so he has a dump to look at.
When sending a dump, always send the unrun monitor along with
it. If you don't, you are just causing a delay in handling the
problem while the maintainer tries it against the standard ones,
which involves finding tapes with the standard ones, and loading
them... If you are running an unpatched standard monitor, and you
refuse to send it, at least tell which one it is somewhere on the
form. The unrun monitor is also useful for checking the existence
and correct installation of patches when that becomes an issue.
The current preferred tape format is 9-track, 1600bpi, and in
standard DUMPER format, not in INTERCHANGE format, since file
information can be lost that way. Take the time to get a listing
of a directory of the tape and include it with the tape. It will
help to speed things up, as if it is obvious from the directory
that something is missing, faster feedback is generated. There is
also the indication that the tape will indeed be readable when
received, and will partly eliminate the usual first step of the
maintainer in getting a directory of the tape.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 9
USING SIRUS
USING SIRUS
-----------
Did you know that you can dial into a Marlboro development system
and type out almost any patch that the Marlboro Support Group has made
to -10 or -20 software in the last three to four years? The program
which does this is called SIRUS, and with it you can:
1. Search through all the patches to a particular product, if
you know a problem exists but don't know what the patch is or
don't know if we've heard of the problem. If you find the
patch you want, you can then type it out.
2. Type out a particular patch to a particular product, if you
know what the edit number is.
3. Obtain the status of any SPR, including the entire answer if
it has been answered.
By using SIRUS, you can get patches whenever the system is up,
even if it's two A. M. and the Hotline is closed. You can print
patches in your local office without having to wait for a specialist
in Marlboro to mail you a copy. You can be sure that the patch you
have is correct. (Dictating patches over the Hotline is very prone to
errors.) Even if the problem you are experiencing cannot be found in
SIRUS, you can help us when you call by so stating. We immediately
know that the problem you are having is a new one.
There have been several articles about SIRUS in previous Large
Buffers, but none have been oriented towards specialists in the field.
This one is!
To use SIRUS, dial into system 1026 in Marlboro, log in, and then
run it. In more detail:
1. Dial into system 1026. Any of the following numbers will
reach system 1026 in Marlboro. They are all 300 baud lines.
231-1171 (DTN)
231-1172 (DTN)
(617)481-5606
(617)481-5632
(617)481-5635
(617)481-5636
(617)481-5637
(617)481-5638
Once the machine notices you, type "SET HOST 26" to insure
that you are connected to system 1026. If you get the
message "?Undefined Network Node", the machine is down (try
again later).
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 10
USING SIRUS
2. To login, type "LOGIN 10,#". When the machine requests a
name, type one in. You will not need a password.
3. To run SIRUS, just type "R SIRUS". SIRUS takes several
seconds to initialize itself and then prompts you with
"PRODUCT [H]*". At this point, type either "10<CRLF>" or
"20<CRLF>" depending on whether the customer of concern is
running TOPS10 or TOPS20. SIRUS then prompts you with
"[H] *". You are now at SIRUS command level.
SIRUS has many commands, but only a few are of interest to the
field specialist. They are:
1. H -- for Help. This may be typed anytime SIRUS precedes its
prompt with "[H]".
2. EX -- for Exit. Use this to exit SIRUS. Then type K/N to
logout, and hang up.
3. PP -- for Peruse PCOs. PCO stands for Product Change Order
and essentially means a patch. This command is used to look
through patches for a particular product if you aren't sure
which patch you want.
4. GP -- for Get PCO. This is used to type out a particular
patch once you know which one you want.
5. GS -- for Get SPR. Use this to retrieve information on a
particular SPR.
6. NP -- for New Product. Use this command if you type the
wrong answer to "PRODUCT [H]*" as mentioned above, or use it
in association with the PP command as described below. SIRUS
will prompt you for a product again.
The three most useful of these commands are PP, GP, and GS.
3.0 PP Command
Use this command to peruse the patches for a particular product
-- e.g. LINK or 603 (monitor) or BATCON -- if you want to find a
particular patch you know exists, or if you want to know if the
support group has heard of and fixed some problem you are experiencing
with a product. After you type "PP<CRLF>" SIRUS will prompt for a
component. Here type the program you're interested in -- LINK, BATCON
or whatever. A response of LIST will type the programs SIRUS knows
about and then prompt you for a component again.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 11
PP Command
Once you type in the component, SIRUS prompts with "[H] PCO #:".
There are two reasonable responses to this. The first is ALL. (Type
NO to the subsequent question about a file.) This will give you a
short summary of all the patches available for this product, one line
per patch. This includes a PCO number, the SPR for which this patch
was written, the edit number corresponding to the patch (for the
TOPS10 monitor this is the MCO number), a keyword describing the bug,
the maintainer who wrote the patch, and the date it was made. The
other response you might type here is simply <CRLF>. In this case
SIRUS will type out the symptom of the newest PCO, and then prompt you
with "NEXT?". By continuing to type carriage returns, you can type
all the symptoms of all the patches for this product, from the newest
to the oldest. When you have found the patch you want (remember the
PCO number), type RETURN to get back to SIRUS command level.
If you did not find your symptom while perusing, and your product
exists on both TOPS10 and TOPS20, you should also search the PCOs for
the alternate operating system. To do this, type NP to SIRUS command
level, and then type in the other product number when SIRUS asks for
it. Then peruse PCOs for your product as you did before.
4.0 GP Command
This is used to print out a patch once you know the PCO number.
The PCO number is printed while you are perusing PCOs and is of the
form 10-product-nnn or 20-product-nnn. After typing GP to SIRUS
command level, SIRUS prompts for a PCO number. The leading "10-" or
"20-" is supplied by SIRUS, so your response should be of the form
"product-nnn".
In response, SIRUS types out information about the patch. The
two most useful data are labeled VLD and SAE. VLD stands for validity
and is the version of the software to which the patch applies. SAE is
Source After Edit and is the edit or MCO number of the patch. To get
the actual text of the patch, respond YES to SIRUS's question "Show
Write-up File?".
5.0 GS Command
This is used to get the status of an SPR. SIRUS will prompt for
an SPR number, and then will provide you with info about the SPR you
specified. This includes the site that submitted the SPR, the
specialist responsible for the SPR, and date received and the date
closed, if the SPR has been answered. If answered, it will also say
whether or not an auxiliary file was written for the SPR and what PCOs
(if any) were included. The aux file is an introductory paragraph
which is written for most SPR answers. For SPRs which do not require
patches, the aux file constitutes the entire answer. The aux file can
be typed by responding YES to "SHOW AUXILIARY FILE?". The PCOs can be
typed out with the GP command.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 12
GS Command
Finally, if SIRUS begins to give you error messages such as "File
not found", EX from SIRUS and mount a special disk pack with the
monitor command "MOUNT SIRS:". Then try again. This gives you access
to more PCOs and aux files than are normally available.
For more information, see the example run of SIRUS below, in
which user input is shown underlined, or the article on SIRUS
published in volume 409 of the Large Buffer. Finally, SIRUS is for
use by DIGITAL personnel only. DO NOT give out instructions for its
use or the system 1026 phone numbers to customers.
.R SIRUS
- -----
SIRUS...3(3)
[WHEN '[H]' APPEARS YOU MAY TYPE 'HELP' FOR ASSISTANCE]
PRODUCT [H]* 20
--
[H] *PP
--
[H] COMPONENT TO PERUSE: D60SPL
------
[PCO LIMIT FOR 'D60SPL' IS 15]
[H] PCO #:<CR>
----
[20-D60SPL-015]
DATE: 09-JUL-79 BY: BENCE
VLD:
[SYMPTOM]
Jobs sent to the LPT queue from D60SPL are given a random
file name and are billed to OPERATOR.
NEXT?<CR>
----
[20-D60SPL-014]
DATE: 09-JUL-79 BY: WEISBACH
VLD:
[SYMPTOM]
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 13
GS Command
If the spooler is pausing, typing a GO can result in an
illegal instruction.
NEXT? ALL
---
DO YOU WANT A FILE? NO
--
PCO 015 SPR 12355 (6,022) KEY= LNAME BENCE 09-JUL-79
PCO 014 SPR 12225 OUTOUT (6,020) KEY= PAUSE WEISBACH 09-JUL-79
PCO 013 SPR 11660 LODVFU 6013(6,014) KEY= VFU WEISBACH 09-JUL-79
PCO 012 SPR 13244 D60CRE 103 (6,032) KEY= CARD L.NEFF 06-JUL-79
PCO 011 SPR D60CR4 103 (6,015) KEY= CARDS L.NEFF 03-JUL-79
PCO 010 SPR REQUEU 103 (6,030) KEY= CTQMFQ L.NEFF 14-JUN-79
PCO 009 SPR 12588 INTCTC 1 (6,026) KEY= CONTROL C TEEGARDEN 17-MAY-79
PCO 008 SPR 12881 OUTE.6 103 (6,025) KEY= REQUEUE NEFF 17-APR-79
PCO 007 SPR 12139 103 (6,019) KEY= ILLEGAL WEISBACH 27-OCT-78
PCO 006 SPR 12005 (0) KEY= SIMULTANEO BENCE 22-SEP-78
PCO 005 SPR 11672 ENDJOB 103 (6,018) KEY= QUASAR BENCE 18-SEP-78
PCO 004 SPR 11841 D60STK 103 (6,016) KEY= BAD WEISBACH 23-AUG-78
PCO 003 SPR 11476 TTYOUT 103 (6,010) KEY= OVERWRITE WEISBACH 12-MAY-78
PCO 002 SPR 11431 OUTE.6 (6,007) KEY= INTERRUPTS WEISBACH 12-APR-78
PCO 001 SPR 11456 D60SPL (6,006) KEY= BLANK WEISBACH 03-APR-78
[H] PCO #: RETURN
------
[H] *GP
--
[H] PCO #: 20-D60SPL-8
[20-D60SPL-008 RETRIEVED]
PROG: NEFF
COMPONENT: D60SPL
SER/SPR:20-12881
KEYS: REQUEUE /
ROUTNS: OUTE.6 /
VLD: 103(2304)
SBE %103 (6,024)
SAE %103 (6,025)
CRIT: N
DOC: N
F/D: F
TEST FILE: : [ ]
P-IND: 10
SHOW WRITE-UP FILE? YES
---
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 14
GS Command
[WRITE-UP FILE]
008 NEFF
[SYMPTOM]
If a job is requeued because of a communications failure, with
D60SPL reporting that the station has signed off, then, when the
station signs on again, the print file will be restarted from its
beginning, not from the last checkpoint.
[DIAGNOSIS]
When the error is detected, routine OUTE.6 calls IBACK to
backspace the file five pages. IBACK zeroes the page counter,
J$RNPP(J), and rewinds the file, in the belief that the forward
spacing code will update the page count as it skips to the correct
page. However, D60SPL discovers the error is not recoverable and it
requeues the job immediately. Since the page count is never updated,
DOREQ requeues the job to start at the beginning of the file.
[CURE]
Preserve the page at which to resume printing over the call to
IBACK. if the job is to be requeued immediately, restore J$RNPP(J) so
that the job will be requeued and checkpointed five pages back from
its current position.
[FILCOM]
File 1) DSK:D60SPL.MAC[4,1022] created: 1724 09-Apr-1979
File 2) DSK:D60SPL.MAC[4,417] created: 1625 10-Apr-1979
1)1 LPTEDT==6024 ;EDIT LEVEL
1) LPTWHO==1 ;WHO LAST PATCHED
****
2)1 LPTEDT==6025 ;EDIT LEVEL
2) LPTWHO==1 ;WHO LAST PATCHED
**************
1)4 ;*****End of Revision History*****
****
2)4 ;6025 If a job printing on a remote printer is interruped by
2) ; a communications failure, requeue to start five pages ba
ck
2) ; instead of at beginning of file. LLN, SPR # 20-12881,
2) ; 10-APR-79
2) ;*****End of Revision History*****
**************
1)179 PUSHJ P,IBACK ;BACKSPACE THE FILE
1) PUSHJ P,INTON ;[6007]TURN INTERRUPTS B
ACK ON
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 15
GS Command
1) PUSHJ P,D60NRY ;PERFORM "NOT READY" DIA
LOG
1) JRST OUTE.7 ;ERROR IS UNRECOVERABLE
1) TELL OPR,[ASCIZ /![LPT... continueing!]
****
2)179 ;**;[6025] ADD SEVERAL LINES AT OUTE.6 + 13L. LLN, 10-APR-79
2) MOVE T1,J$RNPP(J) ;[6025] CALCULATE THE NE
W
2) SUB T1,N ;[6025] DESTINATION PAG
E
2) PUSH P,T1 ;[6025] AND SAVE IT
2) PUSHJ P,IBACK ;BACKSPACE THE FILE
2) PUSHJ P,INTON ;[6007]TURN INTERRUPTS B
ACK ON
2) PUSHJ P,D60NRY ;PERFORM "NOT READY" DIA
LOG
2) JRST [POP P,J$RNPP(J) ;[6025] RESTORE PAGE NO.
FOR REQUEUE
2) JRST OUTE.7] ;[6025] ERROR IS UNRECOV
ERABLE
2) POP P,(P) ;[6025] THROW AWAY DESTI
NATION
2) ;[6025] PAGE - FORWARD S
PACING
2) ;[6025] CODE WILL HANDLE
IT
2) TELL OPR,[ASCIZ /![LPT... continueing!]
**************
[END OF WRITE-UP FILE]
[H] *EX
--
EXIT
.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 16
DDT PATCHING THE TOPS-20 MONITOR
DDT PATCHING THE TOPS-20 MONITOR
This article discusses how DDT patches are made to TOPS-20.
From time to time the Marlboro Support Group has to describe and
explain the DDT patching of TOPS-20 to Specialists from the field. The
following is an explanation, if not a justification, of the way some
things are done.
A DDT patch to TOPS-20 as published is, in essence, a terminal log
of a session applying the patch by hand. This differs from the sometime
practice of a control file containing only the typein to DDT. The raw
typein has a few disadvantages with respect to the log: It is hard to
display in a publication format like the Software Dispatch the bare
control characters like linefeeds and tabs that might be used, and even
harder to edit around them with the only currently supported editor,
EDIT. In addition, the full typescript allows some confidence building
(or cause for concern) if the DDT typeout from application of the patch
is (is not) the same as the typescript. The published patch IS an
actual typescript, and is "proof" that the patch CAN be correctly
installed.
In applying the patch, the basic methodology, lacking innate
knowledge, is to just start typing from the typescript whenever the
computer goes into input wait. Any "$" appearing in a DDT session which
is not the prompt from the enabled EXEC should be the result of typing
an ESCAPE. (ESCAPE is sometimes referred to as ALTMODE or ALT.) In
order to avoid confusion, we try never to use any dollar sign symbols,
and hopefully should make special note of any that might occur.
Starting at the top of a session, there are usually a few comments
about the patch. If we are currently patching multiple releases of
TOPS-20, the specific release for the patch should be noted here. Also
noted should be any hardware or monitor dependencies: KS- or KL-only,
or 2040, 2060, or ARPA only, etc.
The first monitor command is an ENABLE, followed by a GET of the
monitor file to be patched. Unless we are patching an existing patch,
our published patches always show us patching a "virgin" monitor file,
one without any previous patches installed. You should always be able
to duplicate the patch typescript yourself on an unpatched monitor.
At this point we do a START 140 command to get into DDT. There is
a fine distinction at this step between typing START 140 and typing DDT
to get into DDT. START 140 starts up EDDT (Exec-mode DDT) running in
user mode, which is the required action. Typing DDT to the EXEC would
merge SYS:UDDT.EXE with the monitor EXE file and start up UDDT
(User-mode DDT), which is not what we want. In fact, with Release 4 of
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 17
DDT PATCHING THE TOPS-20 MONITOR
TOPS-20 the EXEC is clever enough to start up EDDT for us on the DDT
command also, but even so, for the sake of consistency, and to avoid
confusion, published patches should still use START 140.
After entering DDT, it is common to select the local symbol table
for the module to be patched in case there might be local symbol
conflicts, etc. This is done using the MODULE-NAME$: (ESCAPE colon)
construct.
Next follows the body of the patch. We purposely avoid the fancier
DDT commands when applying patches in order to avoid confusion. We try
to limit ourselves to a few DDT commands:
ADDRESS/ (slash)to open the location at ADDRESS
ADDRESS[ (open-square-bracket)
similar to / but typeout numeric not symbolic
RETURN to close the current location, storing any new
value specified
LINE-FEED to close the current location, storing any new
value specified, and open the next location
TAB a convenience command used to close the current
location and open the location specified by the
last reference; commonly used to get to and
open location FFF immediately after inserting a
JRST FFF instruction in the code
SYMBOL: (colon) to define a symbol at the current location;
usually to redefine FFF: further down in the
patch space
FFF$< (ESCAPE open-angle-bracket) or
FFF$$< (ESCAPE ESCAPE open-angle-bracket)
to start a patch in the patch area named FFF
$> (ESCAPE close-angle-bracket)
to terminate a patch, which installs the jumps
back to the inline code, redefines the FFF
symbol value past the used patch space, and then
inserts the initial jump to the patch into the
inline code
Those who apply patches are of course free to use the more sophisticated
DDT commands to achieve the same effect.
A few TOPS-20 peculiarities should be explained here. TOPS-20
patches are applied using the FFF patch area. The default DDT patch
area symbol, PAT.., (used if no argument is given to an $< or $$<
command) should NOT be used. You are apt to wind up with system crashes
since the PAT.. area is not locked down. FFF is defined in the module
STG.MAC (which goes to the customers), and the area is 100 octal words
long. FFF is part of the resident monitor code PSECT, and is always in
memory. Special care must be taken when installing patches not to
overrun the patch area, which could also result in system crashes. The
first symbol past the FFF area is DTSCNW. If that symbol shows up while
attempting to install a patch, you may be in trouble.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 18
DDT PATCHING THE TOPS-20 MONITOR
There is another patch space defined in TOPS-20, called SWPF, in
the swappable portion of the monitor. We always use FFF in preference
to SWPF since first, SWPF can only be used for patches to swappable
code, but FFF will work for either. Second, two patch areas in common
use might be confusing to the customers, specialists, and us. Third, if
we get a dump to examine from a customer, we can always check the FFF
area for possible (bad) patch installation. SWPF might be swapped out,
and not in the dump.
Unconventionally enough, the symbols FFF, FFF1, and FFF2 are all
defined together in STG.MAC with the same value. When DDT decides which
to type out when printing the symbolic form of an address, it finds FFF2
first, which accounts for the common appearance of FFF2 in patches. In
addition, just the symbol FFF is redefined on patch installation to
always point to the first free word of the remaining patch area. FFF1
and FFF2 are never redefined, and so should always point to the
beginning of the initial patch area built into the monitor. FFF2 should
never have been explicitly referenced as typeIN to DDT; any occurance
in a patch should be known to be from DDT typeOUT, probably from a DDT
LINE-FEED command. This is a common source of error in applying
patches; writing over earlier patch area by typing in the FFF2-based
symbols.
Normally, in a DDT patch, lines which follow one another
immediately in the published patch are the result of typing LINE-FEED at
the end of the line, and not RETURN and the next address symbol. When
the $< and $$< commands are used, all lines from that point to the
terminating $> command should have been ended with LINE-FEED, using
successive locations in the patch space. The patches should show breaks
in this form by inserting extra blank lines in the published patch to
indicate a new "sub-section" of the patch.
The patching session is ended by the ^Z (Control-Z) command to exit
DDT properly. The Control-Z command is the correct way to exit from DDT
when applying patches. It allows DDT to do any final cleanup it may
need to do. Exiting via Control-C is NOT recommended when you are
installing patches, and is NOT guaranteed to work.
Finally, the patched monitor is saved away on a disk file. The
published typescript shows creating a new generation of the system
MONITR.EXE file, but a more conservative approach is to save the patched
monitor as some other name, and try running it experimentally during
system time before installing it as the default monitor.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 19
DDT PATCHING THE TOPS-20 MONITOR
And now for an annotated example:
@
@! PATCH TO RELEASE 3 AND 3A MONITORS TO CORRECT ENQ FROM
@! APPENDING A REQUEST TO THE WRONG LOCK BLOCK WHEN A STRING
@! AND USER CODE HAPPEN TO HASH TO THE SAME ADDRESS.
@! THE MAGIC NUMBER AT XXX: IS POINT 3,T2,2
@
@ENABLE (CAPABILITIES) !Appropriate releases noted above.
$GET SYSTEM:MONITR !Get the monitor
$START 140 !Enter user mode EDDT
DDT
ENQ$: !Open the symbol table for the module
FFF/ 0 XXX: 410300,,T2 !Store into the patch area and define
FFF2+1/ 0 FFF: ! label XXX: to point to it; redefine
! FFF to be the new first unused word
STRCMP+5/ MOVE T3,T2 FFF$< !Begin an $< patch at FFF
FFF/ 0 LDB T3,XXX !This line and the next are ended by
FFF+1/ 0 CAIN T3,5 ! LINE-FEEDs
FFF+2/ 0 RET$> !Terminate the patch
FFF+3/ MOVE T3,T2 !These 4 lines are typed out by DDT on
FFF+4/ JUMPA T1,STRCMP+6 ! terminating the patch
FFF+5/ JUMPA T2,STRCMP+7
STRCMP+5/ JUMPA FFF2+1 !And another blank line indicating end
! of this sub-patch region
^Z !Control-Z to exit DDT properly
$SAVE SYSTEM:MONITR !Save away the patched monitor
<SYSTEM>MONITR.EXE.2 Saved
$
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 20
MAPPING DIRECTORIES IN MDDT
MAPPING DIRECTORIES IN MDDT
---------------------------
Release 3 of TOPS-20 can take advantage of the extended
addressing features of the model B processor. Some of its data has
been reorganized and moved into non-zero sections of the addressing
space. One of the things moved was directories. Directories are now
mapped into section 2, starting at the beginning of the section. Thus
the old procedure of reading a user's directory in MDDT is no longer
valid. This will describe how to map a directory correctly, for
release 2 and for releases 3, 3A, and 4.
The procedure for release 2 was the following. You first have to
find out the structure number and directory number for the directory
to be mapped. You can use the TRANSL program to get the directory
number, or use the ^EPRINT command to list the directory information.
As an example, suppose you want to find the directory and structure
information for the directory SNARK:<DBELL>. You run TRANSL and
obtain the results:
@TRANSL SNARK:<CURDS>
SNARK:<CURDS> (IS) SNARK:[4,117]
The "programmer number" obtained is the directory number, in octal.
In this example, the directory number is 117. If the directory is in
bad shape, and you can't run TRANSL or use ^EPRINT, you will have to
find out the directory number by looking at the output from a DLUSER
or ULIST run, or from BUGCHK output.
To find the structure number, you have to work harder. If the
structure is mounted as PS:, its structure number is always 0. For
structures mounted other than PS:, you do the following. You get into
MDDT, and look at the table STRTAB. This table contains all of the
addresses of the structure data blocks in the system. The first word
of each structure data block is the structure name in SIXBIT. So you
search the tables looking for the desired structure. The offset into
the table STRTAB is then the structure number. For our example:
@ENABLE
$SDDT
DDT
JSYS 777$X
MDDT
$$6T
STRTAB/ ,8[ / PS
STRTAB+1/ M^I / REL3
STRTAB+2/ M_% / SNARK
In the example above, you see that PS: is the first structure,
followed by the structures REL3: and SNARK:. Since the offset into
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 21
MAPPING DIRECTORIES IN MDDT
STRTAB was 2 for SNARK:, the structure number you want is 2.
Knowing the structure number and the directory number, you can
now map the directory and look at it. When the directory is mapped,
location DIRORA will point to the area in the monitor you can find it
at. This is currently the address 740000. To save typing, you can
use the symbol DA, which has the value 740000 (none of the examples
here uses this symbol however). To map the directory, you call the
routine MAPDIR which is in the module DIRECT. It takes two arguments.
The directory number goes in AC1, and the structure number goes in
AC2. For our example, the output looks like:
DIRORA[ 740000
740000/ ?
1! 117
2! 2
CALL MAPDIR$X
$$
740000[ 400300,,100
The skip return from MAPDIR means you have successfully mapped the
directory. You can now look at the whole directory by examining the
proper locations. The number of pages that are mapped by MAPDIR is
30, which is the length of a directory, so the whole thing is
available to look at. By examining or changing location 740000+N in
core, you are examining or changing location N of the directory. When
you are finished, you can just leave MDDT by jumping to MRETN or by
typing ^C.
In release 3, however, when you examine location DIRORA after
calling MAPDIR, it doesn't have to contain 740000. If it does, then
your machine cannot support extended addressing and the monitor is
running the same as release 2 did. In this case you can ignore the
rest of this document. If your machine does have extended addressing,
when you examine location DIRORA you will see the number 2,,0. This
address is now in section 2 of the monitor, and MDDT cannot read the
data there directly. If you look at the location 740000 after calling
MAPDIR, it will still be unreadable, since the directory is no longer
read in there. Those pages are now unused.
To be able to read the directory now, you have to tell the
monitor to map in the pages where you can see them with MDDT. The
first step is to examine the location DRMAP. This location is the
section pointer for section 2, where the directories are mapped. This
is a share-type pointer, which contains the OFN for the desired
directory in the right half. This number is one of the arguments for
the MSETMP routine. MSETMP takes the following arguments. AC1
contains the OFN in the left half, and the first page number to be
mapped in the right half. AC2 contains flag bits in the left half,
and the address where you want to map the pages in the right half.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 22
MAPPING DIRECTORIES IN MDDT
AC3 contains the number of pages to be mapped. For mapping
directories, you can use 740000 as the address, and you want to map 30
pages. You also want to set flag bits so that the directory can be
changed. For the example, you do the following:
DRMAP[ 224000,,147
1! 147,,0
2! 140000,,740000
3! 30
CALL MSETMP$X
$
After the call to MSETMP, the directory is now mapped in 740000, and
you can proceed as you used to in release 2. When you are finished
with the directory, you should call MSETMP again to unmap the
directory. This is done by supplying the same arguments as before,
except that ac 1 contains zero. As an example:
1! 0
2! 140000,,740000
3! 30
CALL MSETMP$X
$
Now you can simply ^C out of MDDT or jump to MRETN.
For Release 4 of TOPS-20, the various flavors of DDT have been
trained to understand extended addresses, so the mapping contortions
used for 3 and 3A are once again unnecessary. On extended machines
one can reference section two directly as below:
DIRORA[ 2,,0
2,,0[ 400300,,100
When done, you can still just ^C out or jump to MRETN.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 23
RECOVERING FROM DIRECTORY ERRORS
RECOVERING FROM DIRECTORY ERRORS
Sometimes after a monitor crash due to disk problems, some of the
directories on the system will contain errors. These errors cause
BUGCHKs such as DIRFDB, NAMBAD, DIRPG0, and DIRPG1. It is sometimes
possible to find the error in the directory by getting into MDDT,
mapping the directory, finding what is wrong, and fixing it. This
procedure is described in the SWSKIT. However, this is not always
easy, and may take a lot of time. It is therefore better in many
cases to simply delete the bad directory and recreate it. This is
easy to do for most directories. But special procedures are necessary
for the directories <SYSTEM> and <SUBSYS>. The rest of this memo will
describe the methods of recovering from bad directories, handling in
particular the difficult case of the <SYSTEM> directory.
You can first try to give the EXPUNGE command with the REBUILD
and PURGE subcommands. If the problem with the directory is very
simple, it may fix your problem. As an example, suppose the directory
PS:<SICK-DIRECTORY> is incorrect. You would type:
$EXPUNGE (DIRECTORY) PS:<SICK-DIRECTORY>,
$$REBUILD (SYMBOL TABLE)
$$PURGE (NOT COMPLETELY CREATED FILES)
$$
PS:<SICK-DIRECTORY> [NO PAGES FREED]
$
If this does not help the problem, you will have to delete the
directory and then recreate it. Before proceeding, you should make
sure that any files you can reference are copied to another directory,
or else are saved on tape. Now first try to delete the directory
normally, as follows:
$BUILD (USER) PS:<SICK-DIRECTORY>
[OLD]
$$KILL
[CONFIRM]
$$
$
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 24
RECOVERING FROM DIRECTORY ERRORS
If this is successful, then simply recreate the directory again,
and restore the user's files. You should recreate the directory with
the same directory number as it had before, so that DLUSER's data will
still be correct.
The procedure above will fail if either the directory is mapped
by another job, or if it is totally unusable. If it is mapped, and
the directory is a random user, you can wait until the directory is no
longer in use, or you can take the system stand-alone so that no user
can reference it.
If the directory is totally unusable, you will then have to try
to delete it the hard way. Before proceeding, you should try to
delete and expunge all files in the directory. This will minimize the
amount of lost pages that will result. Now there are two cases to
consider. If the directory is not a sub-directory, you type the
following:
$DELETE (FILE) PS:<ROOT-DIRECTORY>SICK-DIRECTORY.DIRECTORY,
$$DIRECTORY (AND "FORGET" FILE SPACE)
$$
<ROOT-DIRECTORY>SICK-DIRECTORY.DIRECTORY.1 [OK]
$
If the directory is a subdirectory, you modify the above command
by replacing "ROOT-DIRECTORY" by the name of the next higher
directory. Thus if the directory was PS:<ANOTHER.BAD-ONE>, you type:
$DELETE (FILE) PS:<ANOTHER>BAD-ONE.DIRECTORY,
$$DIRECTORY (AND "FORGET" FILE SPACE)
$$
<ANOTHER>BAD-ONE.DIRECTORY.1 [OK]
$
The above procedure tells the monitor to treat the directory file
like a normal file, and to delete it as such. This means that any
files in the directory will become "lost". The disk pages can be
recovered later with CHECKD. If the above works, you simply can
recreate the directory and restore the files.
The only reason the above command should fail is if the directory
is still mapped. For PS:<SUBSYS>, you can bring up the system
stand-alone so that no programs are run from it, and then delete it.
For PS:<SYSTEM>, even taking the system stand-alone will not help, for
it is always mapped by job 0. But there are two procedures you can
use which do work.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 25
RECOVERING FROM DIRECTORY ERRORS
The safest method can be used if the user's system has mountable
structures. If you have built another PS: structure, you can mount
the pack with the bad directory as an alias, and then the directory
will not be mapped and can be deleted. As an example:
$SMOUNT (FILE STRUCTURE) SICK:,
$$STRUCTURE-ID (IS) PS:
$$
WAITING FOR STRUCTURE SICK: TO BE PUT ON LINE...
STRUCTURE SICK: MOUNTED
$
$DELETE (FILES) SICK:<ROOT-DIRECTORY>SYSTEM.DIRECTORY,
$$DIRECTORY (AND "FORGET" FILE SPACE)
$$
SICK:<ROOT-DIRECTORY>SYSTEM.DIRECTORY.1 [OK]
$
Then you can build the new directory, restore the files to it,
and then use it again for your normal PS: pack. Be sure to build the
new directory with the same number. This is especially important for
the special system directories.
If you do not have another disk drive or another PS: disk, or if
you don't want to bother SMOUNTing the disk, you can fix the <SYSTEM>
area by using MDDT. The basic idea is to patch the monitor so that it
no longer thinks that the directory is in use. This is done as
follows:
$^EQUIT
INTERRUPT AT 17117
MX>/MDDT
CHKOFN/ JSP CX,.SAVE JRST RSKP
MRETN$G
$
Then you should have no problems deleting the directory.
Immediately after doing the delete, you should reload the system.
When the system restarts, you can read the monitor and the EXEC either
from the distribution magtape or from another directory where you had
kept copies. Then recreate the <SYSTEM> area, making sure to give it
the same directory number as it had before. Then you can restore the
files and let the users back on. Finally, you should run CHECKD
sometime to recover the lost pages.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 26
MORE ABOUT DIRECTORY PROBLEMS
MORE ABOUT DIRECTORY PROBLEMS
=============================
SOME HINTS FOR TRACING DIRECTORY PROBLEMS
NOTE -- Use the methods documented in the Operators
Guide before resorting to the methods below.
1. There is a file on the SWSKIT called DIRTST.EXE which will
test for inconsistencies in the directory pointers.
@ENABLE
$RU DIRTST
This will tell you just about everything.
2. Another program on SWSKIT is DIRPNT which prints out the
contents on the chained FDB's, entire directory, FDB, or
symbol table.
To run it:
@ENABLE
$RU DIRPNT
And answer the questions. This also may not work if the
headers are bad.
3. If you get a BUGCHK:
Go into the monitor with MDDT and set a breakpoint at the
BUGCHK address, say, FDBBAD. Do the functions that cause the
BUGCHK; DIR, say. Trace down the bug. The relevent
listings are PROLOG and DIRECT. These give the directory
format and useful symbols.
4. If the pointers are destroyed or confused you can map in the
directory as follows:
@ENA
$^EQUIT ; get into MINI-EXEC
MX>/ ; get into MDDT
; Map in directory, put dir number in 1. Get dir
; number from DLUSER or TRANSL. Format is
; [4,directory#]. Put the structure number in AC2.
; To find the structure number look at the table
; STRTAB. STRTAB contains a list of pointers to the
; SDBs of structures that are mounted. The structure
; numbers are equal to the offset into the STRTAB. To
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 27
MORE ABOUT DIRECTORY PROBLEMS
; find out which structure has structure number
; 3 look at STRTAB+3. Address contents which are the
; SIXBIT structure name.
STRTAB/ 54321 ; str number 0
STRTAB+1/ 56776 ; str no 1
STRTAB+2/ 12345 ; str no 2
12345$6T/ FOO ; str no 2 is FOO:
1/ DIRECTORY NUMBER
2/ STR NUMBER
CALL MAPDIR$X
; Now you can look at the header pointers etc., and
; fix things up if you're lucky. Go back to the
; MINI-EXEC.
^P
MX>START
$
5. If you can't (or don't want to) recover the existing files
you can delete the directory and restore the files using a
DUMPER tape. This works for <SYSTEM> and all other
directories.
In order to delete a directory you must remove it from
<ROOT-DIRECTORY> (or next higher-level directory).
You can do this with the
following set of commands:
(first be sure nothing is mapped from this
directory)
@ENA
$DELETE<ROOT-DIRECTORY>DIRECTORYNAME.*.,
$$DIRECTORY
$$
Create new directory with the same directory number. The same number
is important for the special system directories.
$^ECREATE <DIRECTORYNAME>
[New]
$$NUMBER nn
.
.
.
Now DUMPER the files back.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 28
JSB AND PSB MAPPING
An Easy Way to Examine the PSB and JSB of Another Job
-----------------------------------------------------
There is an occasional need to look at the state in detail of
another job on the system. A common reason for doing this is to find
the cause and cure of a "hung job" which cannot be logged out. To
find out what the job is doing you usually start by looking at the
JSYS stack in the PSB. But you cannot examine such data easily
because the fork data in the PSB and the job data in the JSB are not
in the monitor's address space until the fork is run. If you try to
look at the PSB or JSB using MDDT you will see the data for your own
fork. To look at the data for another fork you must do what the
monitor does, and that is to map it.
A procedure for doing the mapping of a PSB or JSB was given in
the release 3 and 3A SWSKITs. You first find the SPT index of the PSB
or JSB you want to map, then you call SETMPG or MSETMP to set up
pointers to the data, and then you can examine it. But there are
several problems in using that method, which are:
1. You have to find an empty set of pages in the monitor's
address space which can be used for mapping.
2. There is not enough room to map all of the PSB and JSB. So
if you want to examine many different things you have to do
the mapping many times.
3. The routines SETMPG and MSETMP do no validity checking of
their arguments. Thus if you feed them bad data the system
will probably crash. So if you need to map things many times
your chances are you will make a mistake once too often.
4. The addresses of the data are not correct. To look at PPC
for example, you can't just examine location PPC (which would
be for your own fork). You have to look in the page you are
using for mapping. So every reference has to be offset by
some constant.
5. When you are done looking at the fork, you can't simply leave
MDDT. You have to call SETMPG or MSETMP again to unmap the
data.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 29
JSB AND PSB MAPPING
Since that documentation was written I have found a procedure
which is much easier. It eliminates almost all of the above problems.
The procedure is this:
1. Do a "GET" of the file the monitor was loaded from, usually
SYSTEM:MONITR.EXE.
2. Enter user mode DDT in the file you got, and then do a JSYS
777 to get into MDDT.
3. Find out the SPT indexes as before, and call MSETMP to map
the PSB or JSB to the USER address space, in the correct
place!!
4. Return from MDDT, and examine PSB and JSB locations directly,
and see the correct data in the right place.
5. When you are done, just ^C and do a RESET.
The rest of this document will document step by step how the
procedure above is done, by using an example. Assume that we wish to
examine the state of fork 105, which belongs to job 21. We then
begin:
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 30
JSB AND PSB MAPPING
@ENABLE !Get a copy of the monitor
$GET PS:<SYSTEM>MONITR.EXE
$START 140 !Get into user DDT
DDT
JSYS 777$X !Enter MDDT
MDDT
!Following is an example of the procedure to map the JSB of a job:
FKJOB+105[ 25,,2035 !Get the SPT index of the JSB
!of fork 105
T1! 2035,,0 !Put SPT index in left half
T2! 540000,,JSBPGA !* Flags and where to map to
T3! JSLSTA'1000-JSBPGA'1000 !Number of pages to map
CALL MSETMP$X !Do the mapping
$
!Following is an example of the procedure to map the PSB of a fork:
FKPGS+105[ 2657,,2332 !Get the SPT index of the PSB
!of fork 105
T1! 2332,,PSBMAP-PSBPGA !Put SPT index in left half,
!and offset in right half
T2! 540000,,PSSPSA !* Flags and where to map to
T3! PSBMSZ !Number of pages to map
CALL MSETMP$X !Do the mapping
$
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 31
JSB AND PSB MAPPING
!Example of returning to user mode and looking at data from both
!the PSB and the JSB of the fork:
MRETN$G !Return to user mode
$
USRNAM[ 3 !Examine job's user name
USRNAM+1[ 422050,,546230 $T;DBELL
CTRLTT[ 777777,,777777 !Controlling terminal
FILBYT+MLJFN[ 4400,,334010 !Start of data block for JFN 1
PPC/ T1,,DISXE#+2 !Current PC of the fork
PAC+17/ -215,,UPDL+62 !Current stack pointer
UPDL/ CHKHO5# !First few stack locations
UPDL+1/ CAM CHKAE0#+12
UPDL+2/ CHKHO5#
UPDL+3/ CAM CHKAE0#+12
UPDL+4/ T1,,.COMND+1
UPDL+5/ -273,,UPDL+4
!Example of terminating the mapping we have done:
^C
$RESET !To finish, just quit and reset
$
The procedure as given above maps the JSB and PSB write-enabled.
So if you find something you want to change, you can simply deposit
the new value into the location. If you want the data to be
write-protected, then change the 540000 to 500000 in the two steps
marked with an asterisk.
WARNING: The procedure of mapping things into your user address
space has its limitations. Mapping the JSB and PSB works because the
user core used for mapping was previously empty. In general, you can
only map things into your user core if your core pages are either
nonexistant or are private. If you call MSETMP or SETMPG and map
something over a shared page, the old file page is unmapped without
the share counts being updated, which prevents your job from logging
out later. To get around this problem you can BLT your core image to
force all of the pages to be private.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 32
BREAKPOINTING MULTI-USER CODE
HOW TO USE BREAKPOINTS IN CODE THAT MANY USERS EXECUTE
------------------------------------------------------
When inserting a breakpoint into the running monitor, you have to
be careful that no other users will execute the code containing the
breakpoint. If some other user hits the breakpoint, they will blow up
with an illegal instruction since MDDT will not be there to handle the
breakpoint. This normally limits the places you can set breakpoints,
since most of the monitor can be gotten to by any user. Even if you
run the system stand-alone, it is possible that the routine you are
debugging will be called by job 0. However, it is still possible to
do such debugging, even on a system which is not stand-alone, and this
document will describe how this is done.
The essential element of this technique is to put in the patch in
such a way that only your own fork can ever reach the breakpoint.
First you write a simple routine which will skip if it is not being
run by your particular fork. This can be done easily if you remember
that the location FORKX contains the currently running fork number.
An example of such a routine is the following:
@ENABLE
$SDDT
DDT
JSYS 777$X
MDDT
FORKX[ 23 ; check our fork number
FFF/ 0 NOTME: PUSH P,T1 ; save an AC
NOTME+1/ 0 MOVE T1,FORKX ; get currently running fork number
NOTME+2/ 0 CAIE T1,23 ; is it us=23?
NOTME+3/ 0 AOS -1(P) ; no, setup skip return
NOTME+4/ 0 POP P,T1 ; restore the saved AC
NOTME+5/ 0 POPJ P, ; and return to caller
NOTME+6/ 0 FFF: ; reset the position of FFF
The routine above simply saves AC T1, gets the currently running fork
number, compares it with your own fork number which you obtained by
looking at location FORKX, and skips if they differ.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 33
BREAKPOINTING MULTI-USER CODE
Now assume that you want to set a breakpoint into the following
code, which is in the routine BLKSCN in the module DIRECT.
BLKSC2/ HLRZ C,BLKTAB(B)
BLKSC2+1/ CAME A,C
BLKSC2+2/ AOBJN B,BLKSC2
BLKSC2+3/ JUMPGE B,BLKSCE
BLKSC2+4/ HRRZ B,BLKTAB(B)
Assume you want the breakpoint at location BLKSC2+3. You do the
following:
BLKSC2+3/ JUMPGE B,BLKSCE FFF$< ; patch this location
FFF/ 0 PUSHJ P,NOTME ; call the NOTME routine
FFF+1/ 0 .$B JFCL$> ; me if it gets here, set breakpoint
FFF+2/ JUMPGE B,BLKSCE
FFF+3/ JUMPA A,BLKSC2+4
FFF+4/ JUMPA B,BLKSC2+5
BLKSC2+3/ JUMPA NOTME+6
Notice that the breakpoint has been set in the JFCL instruction
following the call to NOTME. Only your fork will execute it, so you
can now debug the section of code while other users are executing it
at the same time. Remember to remove the breakpoint when you are
done.
To run a particular program while having breakpoints set, you
must remember that the breakpoint has to be set by the same process
which you expect to hit it. So for example, typing ^EQUIT, setting a
breakpoint, returning to the EXEC and running your program will not
work. You must enter MDDT and set the breakpoints from your program
you want to debug. As an example:
@ENABLE
$GET PROGRAM ; get the program to be used
$DDT ; enter DDT
DDT
JSYS 777$X ; and enter MDDT from there
MDDT
(PUT IN "NOTME" ROUTINE AND SET BREAKPOINTS HERE)
MRETN$G ; return to the context of the test program
$
$G ; start the test program
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 34
USING ADDRESS BREAK TO DEBUG THE MONITOR
Using Address Break to Debug the Monitor
----------------------------------------
Sometimes when examining a set of dumps, you will notice the crashes
are caused by some location being destroyed. If you have no idea
where the destruction is done from, finding the problem could be very
difficult. One useful procedure in such cases is to use the address
break feature of the hardware to track down the problem (except for
2020's!). The only problem is that the use of address break is not
obvious. This is a manual describing how to use address break in the
TOPS-20 monitor.
In order to use address break, four things must be done. First,
the current routines the monitor uses to set address breaks for users
must be disabled. Secondly, your own address break must be set from
MDDT or EDDT. Thirdly, instructions which you want to execute
properly have to be modified so that they will not cause an unwanted
address break. Finally, breakpoints must be placed in the monitor so
that the state of the monitor can be examined when the address break
occurs. The following is a step by step example of doing this.
1. Load the monitor for debugging, and enter EDDT. The procedure
starting from BOOT is the following:
BOOT>/L ;Load monitor but don't start it
BOOT>/G140 ;Start EDDT
EDDT
DBUGSW/ 0 2 ;Set debugging mode
EDDTF/ 0 1 ;Keep EDDT once system starts
GOTSWM$B ;Install useful breakpoint
SYSGO1$G ;Start the monitor
[PS MOUNTED]
$1B>>GOTSWM 0$1B ;Remove breakpoint now
2. Disable the monitor's normal changing of the address break.
This is currently done at two places:
KISSAV+4/ DATAO UNPFG1+26 JFCL ;Disable instruction
SETBRK+12/ DATAO A JFCL ;Here too
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 35
USING ADDRESS BREAK TO DEBUG THE MONITOR
3. Set your own address break at the desired location. Refer to
the Hardware Reference Manual for details. The instruction to
set an address break is:
DATAO APR,ADDR ;Note: APR = 0
where ADDR contains the following fields:
Bits Description
---- -----------
9 Break at given address on instruction fetches
10 Break at given address on reads
11 Break at given address on writes
12 0=exec address space, 1=user address space
13-35 Address to break on.
So now assume you want to catch a bug which is blasting
location CURDS. You want to break only for writes, and want
to use exec virtual space. Therefore you type the following:
FFF/ 0 100000000+CURDS ;Put data in convenient place
DATAO APR,FFF$X ;Set the address break
4. Now you want to disable address break for all instructions
which you expect to change the given location. Assume in this
example that only location DIDDLE should change location
CURDS. Then you do the following for a model B CPU:
FFF! IT: ;Define location to get old flags
IT+1! ;Old PC
IT+2! ;New flags
IT+3! IT+4 ;New PC
IT+4! EXCH IT ;Save AC and get old flags
IT+5! TLO 1000 ;Set address break inhibit bit
IT+6! EXCH IT ;Restore flags and AC
IT+7! JRST 5,IT ;Return to caller
IT+10! FFF: ;Redefine FFF
DIDDLE/ MOVEM A,CURDS FFF$< ;Insert patch
FFF/ 0 JRST 7,IT$> ;Call above routine
FFF+1/ 0 MOVEM A,CURDS ;Typed by DDT when finishing patch
FFF+2/ 0 JUMPA A,DIDDLE+1
FFF+3/ 0 JUMPA B,DIDDLE+2
DIDDLE/ MOVEM A,CURDS JUMPA IT+10
The JRST 7,IT instruction is used to save the old PC at IT and
IT+1, and take a new PC from IT+2 and IT+3. There the old PC
is changed to include the address break inhibit bit. Then a
JRST 5,IT is done which returns to the caller. The next
instruction then executes without causing an address break.
You have to insert the JRST 7,IT instruction at every
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 36
USING ADDRESS BREAK TO DEBUG THE MONITOR
instruction you want to succeed.
For model A CPUs the procedure is similar, but a little easier:
FFF! IT: ;Define location to hold PC
IT+1! EXCH IT ;Get old PC and save AC
IT+2! TLO 1000 ;Set address break inhibit flag
IT+3! EXCH IT ;Restore PC and AC
IT+4! JRSTF @IT ;Return to caller
IT+5! FFF: ;Redefine FFF
DIDDLE/ MOVEM A,CURDS FFF$< ;Insert patch
FFF/ 0 JSR IT$> ;Call above routine
FFF+1/ 0 MOVEM A,CURDS ;Typed by DDT when finishing patch
FFF+2/ 0 JUMPA A,DIDDLE+1
FFF+3/ 0 JUMPA B,DIDDLE+2
DIDDLE/ MOVEM A,CURDS JUMPA IT+5
5. Now put the breakpoints into the monitor so that when an
address break occurs, you will get into EDDT. There are two
locations to patch, one for PI level and one for non-PI level.
You also have to patch a monitor bug in release 3 and 3A so
that the page fail dispatch code works properly.
ADRCMP$B ;Set breakpoint at non-PI routine
PFCD23$B ;Set breakpoint at PI routine
PIPTRP+1/ MOVE A,TRAPSW MOVE A,TRAPS0 ;And fix a bug
$P ;Now let the monitor proceed
6. When either of the above breakpoints is hit, the flags and PC
of the instruction which caused the address break will be in
locations TRAPFL and TRAPPC. If the address break was from
JSYS level (breakpoint was to ADRCMP and location INSKED is
zero) then an $P will proceed properly. If the address break
was from the scheduler or from PI level, doing $P will be
useless since the monitor will then BUGHLT because it doesn't
want to see an address break under these conditions. However,
this is ok if all you want to do is find the instruction
causing the trashing.
If the location still gets trashed after trying to catch it this
way, either your procedure is wrong; you are trying this on a 2020
(which has no address break feature); the location is being changed
by some IO being done (RH20s, DTEs, etc); or else the machine is
having some hardware problems.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 37
RECOVERING FROM SYSTEM DISASTERS
RECOVERING FROM SYSTEM DISASTERS
There are some common system disasters which in many cases can be
recovered from quickly and with a minimum of effort. The four we will
discuss in this article are:
1. Hung Terminals
2. Hung SETSPD
3. Trashed Disks
4. Hung Jobs
1.0 HUNG TERMINALS
Hung terminals are usually the result of two problems. Either
the speed has been set incorrectly for that terminal type or a problem
exists between the KL and the front end. If the problem is a result
of an improper speed setting, then simply resetting the speed will be
sufficient. On the other hand, if the problem is due to some sync
problem between the KL and the 11 then the easiest way to recover from
this is to reload the front end. This can be done by depressing the
halt switch on the operator's console of the 11 and then placing it
back in the enable state. After about fifteen seconds, the message
[DECsystem-20 continued]
to be printed on the CTY. If this fails to free the terminal, perhaps
the problem is a hung job. See the discussion under that heading.
2.0 HUNG SETSPD
This is a fairly common problem brought on by some hardware
problem. It is possible to bring the system up without running SETSPD
under JOB 0, logging in, and then trying to run SETSPD under some
other operator job. If SETSPD then hangs, it is possible to CONTROL/C
out of the program, edit 4-CONFIG.CMD to remove the commands suspected
of hanging SETSPD, and retrying. In this way, while waiting for the
problem to be resolved, it is possible to continue timesharing.
To bring the system up without running SETSPD automatically, one
need only install the following patch to the MONITOR using EDDT on
system start up.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 38
HUNG SETSPD
BOOT>/l
BOOT>/g141
EDDT
EDDTF[ 0 -1
DBUGSW[ 0 2
GOTSWM$B
SYSGO1$G
[PS MOUNTED]
1B>>GOTSWM
RUNDD3+7/ PUSHJ P,RUNDII JFCL
0$1B
$P
%%No SETSPD
The system will then come up as usual except that SYSJOB will not
run. After successfully deciding the problem with SETSPD, SYSJOB can
be run by typing
COPY (FROM) <SYSTEM>SYSJOB.RUN (TO) <SYSTEM>SYSJOB.COMMANDS
This will cause all the commands in the SYSJOB.RUN file to be
executed by SYSJOB.
There is a project under way to allow SETSPD to time out itself
and continue with the next comand in 4-CONFIG.CMD. Look for it in the
Large Buffer or the 20 Dispatch.
3.0 TRASHED DISKS
This is surely one of the biggest headaches facing specialist.
Trashed disks come in many forms and recovering from these requires a
good knowledge of the structure of the TOPS-20 file system.
If the structure cannot be mounted, it is because of one of the
following reasons:
1. Inconsistency in either of the HOM blocks
1. Word HOMNAM (1) of either HOM block not SIXBIT/HOM/
2. Word HOMCOD (176) of either HOM block not 707070
3. Word HOMHOM (5) of first HOM block not 1,,12
4. Word HOMHOM (5) of second HOM block not 12,,1
5. Word HOMFSN (173) of either HOM block not 20040,,47524
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 39
TRASHED DISKS
6. Word HOMFSN+1 (174) of either HOM block not 51520,,31055
7. Word HOMFSN+2 (175) of either HOM block not 20060,,20040
8. Right half of word HOMLUN (4) of either home block either
refers to a unit greater than the left half of word
HOMLUN or it refers to a UNIT already verified
9. Word HOMSNM (3) of either home block does not agree with
SIXBIT/STRUCTURE-NAME/
10. No disk address for index block in word HOMRXB (10) of
either HOM blocks
2. Inconsistencies in Root-Directory page 0
1. Directory number in Directory page 0 of Root-Directory
not 1
2. Directory block type (DRTYP) of Root-Directory page 0 not
400300
3. Relative Page number (DRRPN) of Root-Directory page 0 not
0
4. Top of symbol table (DRSTP) of Root-Directory page 0 out
of Directory bounds
5. Pointer to first free block (DRFFB) of Root-Directory
page 0 not in page 0 of the directory
6. Pointer to Directory Name String (DRNAM) not under start
of symbol table
7. Directory name pointer (DRNAM) not 0 and Name string
block length (NMLEN) not at least 2 words long
8. Directory name pointer (DRNAM) not 0 and directory name
block header (NMTYP) not 400001
9. Password block pointer not 0 and password string block
length (NMLEN) not at least 2 words long
10. Password block pointer not 0 and password string block
header (NMTYP) not 400001
11. Account string block pointer not 0 and Account string
block length (NMLEN) not at least 2 words long
12. Account string block pointer not 0 and Account string
block header (NMTYP) not 400001
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 40
TRASHED DISKS
3. Inconsistencies in Block types or free space in subsequent
pages of the directory.
All blocks in the directory (including free space) begin
with a block header which specifies type and length.
Immediatly following one block should be a header for a new
block. If this scheme is corrupted, the mount will fail.
1. Header of a block not
1. (NAMTYP) 400001
2. (EXTTYP) 400002
3. (ACCTYP) 400003
4. (USRTYP) 400004
5. (FDBTYP) 400100
6. (DIRTYP) 400300
7. (FRETYP) 400500
8. (FBTTYP) 400600
9. (GRPTYP) 400700
2. Header of a block is NAMTYP and Block length not at least
2 words
3. Header of a block is EXTTYP and block length not at least
2 words
4. Header of a block is ACCTYP and block length not at least
3 words
5. Header of a block is USRTYP and block length not at least
3 words
6. Header of a block is FDBTYP and
1. Block length not at least 30 (.FBLN0) words long
2. Pointer to Author String (.FBAUT) not 0 and points to
a block outside of the directory or points to a block
that does not meet the tests for a user name string
as described above.
3. Pointer to Last Writer String (.FBLWR) not 0 and
points to a block outside of the directory or points
to a block that does not meet the tests for a user
name string block as described above.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 41
TRASHED DISKS
4. Pointer to Account String (.FBACT) is not less than
or equal to zero and it points to a block outside of
the directory or it points to a block that does not
meet the tests for an account string block as
described above.
5. Pointer to Name String (.FBNAM) is not 0 and it
points to a block outside of the directory or it
points to a block that does not meet the tests for a
Name String Block as described above.
6. Pointer to Extension String (.FBEXT) is not 0 and it
points to a block outside of the directory or it
points to a block that does not meet the tests for an
Extension String Block as described above.
7. Header of a block is DIRTYP and
1. Header is not on a page boundary
2. Relative page number (DRRPN) not the calculated page
number
3. Pointer to first free block (DRFFB) does not point to
a location within the current directory page
4. Directory number (DRNUM) not 1.
8. Header of a block is FRETYP and block is not at least two
words or Pointer to next free block (FRNFB) is not zero
and points to a location not on the same page as current
9. Last block did not end at DRFTP (address specified on
first page of directory)
4. BAT blocks inconsistent.
1. Either block does not contain SIXBIT/BAT/ in BATNAM
(offset 0 in block)
2. Either block does not contain 606060 in BATCOD (offset
176 in block)
3. Sector number of the BAT block (BATBLK) not the true
sector of block
4. The BAT blocks to not compare exactly with each other
through word 176 of the blocks
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 42
TRASHED DISKS
5. Checksum of the Root-directory Index Block does not agree
with the checksum calculated.
Checksums are calculated as follows:
CHKSUM = 0 ;
For I = 0 to 777
If XB(I) = 0 then
CHKSUM = CHKSUM + I
Else
CHKSUM = CHKSUM + XB(I) ;
where XB is the first word of the index block.
As you can see, there are many things that could be wrong with a
structure that inhibits it from being mounted. The consistency of the
structure can be checked quite easily using the new FILDDT commands of
STRUCTURE and DISK (see 'NEW DISK FEATURES FOR FILDDT' also in this
SWSKIT).
For structures which are badly trashed, the only sane way of
recovering is to rebuild the structure using a catastrophe tape. For
simple inconsistencies such as a bad BAT block, CHECKD does the job
well. For more involved trashes which can not be recovered from a
back up tape (because of a forgetful system manager) the above
information can be of great help.
4.0 HUNG JOBS
There are a number of circumstances which arise which cause a job
to become hung, usually waiting for some resource to free up, some
share count to become zero etc. Some times, these tests will never
become satisfied, the Job has its PSI system turned off, and as a
result the job becomes Hung. Freeing it up can be very tricky. The
first thing to try is to log the job out from some other terminal. If
this doesn't succeed in freeing the job up, then the next best thing
is to detatch the job from the terminal and allow it to sit there. It
may be using negligable amounts of CPU time and causes no adverse
affects to the system. To zap the job may crash the system which, in
most cases, is not the disirable approach.
The next time the system is reloaded, be sure to get a dump of the
system with the hung job and submit it as an SPR (see the SWSKIT
article about getting informative Dumps).
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 43
LOOKING AT HUNG TAPES
LOOKING AT HUNG TAPES
A number of problems of the general classification "tape hang"
have been reported, and will probably always exist as long as we use
magtapes. Although there are apparently several variants of the
problem, there are some things which can be done by a suitably
cautious specialist when presented with a hung tape drive. Listed
below are some techniques which can be used in an attempt to
investigate and perhaps alleviate the problem. These things should,
in general, be harmless to the system, barring mis-typing in MDDT. As
a result, perhaps they will not clear the problem.
For release 4, there are several tables that are used in relation
to tape drives. Some of these tables are indexed by MT unit number,
some by MTA unit number. In general, it can be said that if a table
name begins with the characters MT, it will be indexed by MTA or
physical unit number, and if the table name begins with TL or TP, it
will be indexed by MT or logical unit number. The TL and TP tables
will usually have something to do with the tape labeling system. This
article concerns itself mainly with the more important tables relating
to MTAs (physical tape units).
When playing with the tape subsystem, certain care should be
taken. For instance, it always helps if no one else is actively using
the tape drives while you attempt something like reloading the
microcode for a DX20.
1. Finding the Tape Drive
There are several tables parallel to each other which concern the
ownership of a tape drive. Those of interest are DEVNAM, DEVCHR, and
DEVUNT. At DEVNAM+n is the device name in SIXBIT. At DEVUNT+n is a
word with the left half set to the assigner's job number, -1 if free,
or -2 if being controlled by the allocator. The right half contains
the unit number. Note that in release 4, with tape allocation turned
on, MTAs will always indicate that job 0 has the drive assigned and
that the offset to the MT unit number will contain the job number of a
user. At DEVCHR+n is the device characteristics word. Knowing the
devicename or the owning job, one can use DDT to find the table
offset. See example below.
2. Grabbing the Drive
Knowing the offsets into DEVUNT, the device assignment can be
freed by putting -1 into the left half of the appropriate DEVUNT
entry. The drive can then be assigned by the normal ASSIGN command to
the EXEC. In dealing with the allocator for Release 4, your own job
number can be placed here if necessary. The drive, however, will
still be in no state to use. Note that the appropriate DEVUNT entry
would be the one referring to the MT not the MTA.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 44
LOOKING AT HUNG TAPES
3. Clearing External Errors
Make sure that there is a tape of some sort mounted, and the
drive is placed on-line. Having a write-enable ring in the tape may
help in being sure the unit is functional if the hung condition is
cleared.
4. Checking the UDB
Next, the Unit Data Block status should be reset. This word can
be found using the MTCUTB table. This table is indexed by MTA unit
number, the left half is the address of the channel data block (CDB),
and the right half contains the address of the UDB. The status word
of the UDB should then be reset to the base state. The right half
should be left alone--it basically contains drive type. The left half
should have only bit 16 set, which indicates a tape type device
(US.TAP). The old contents should be remembered for purposes of later
analysis.
5. Checking the Status
Now, table MTASTS is examined, indexed by MTA unit number again.
Remember the old contents. Then clear the word to zero.
6. Example
@enaBLE (CAPABILITIES)
$sddt
DDT
mddt%$x
MDDT
dvxstn=21 !THIS WILL PROVIDE A HANDY INDEX
!INTO THE MTA OFFSETS IN THE
!DEVxxx TABLES.
!DEVNAM IS A SIXBIT DEVICE NAME
devnam+21/ HLRZM P2,FKBSPW+217(T1) $6t;MTA0
DEVNAM+22/ MTA1
DEVNAM+23/ MTA2
DEVNAM+24/ MTA3
DEVNAM+25/ MTA4
DEVNAM+26/ MTA5
...
...
...
DEVNAM+40/ MTA17
mtan=20 !ROOM FOR 20 (OCTAL) TAPE DRIVES HAS BEEN ALLOCATED
mtindx[ 777765,,5 !BUT ONLY 5 ACTUAL TAPE DRIVES ARE ON THIS SYSTEM
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 45
LOOKING AT HUNG TAPES
!THE MTs WILL APPEAR AFTER MTAs IN THE DEVxxx
!TABLES SO DVXSTN+MTAN WILL PROVIDE THE OFFSET
!TO THE MT ENTRIES
devnam+41/ HLRZM P1,@0 $6t;MT0
DEVNAM+42/ MT1
DEVNAM+43/ MT2
DEVNAM+44/ MT3
DEVNAM+45/ MT4
DEVNAM+46/ MT5
...
...
...
DEVNAM+60/ MT17
!DEVUNT IS PARALLEL TO DEVNAM AND PROVIDES
!THE OFFSETS INTO THE MTxxxx TABLES FOR MTAs
!AND OFFSETS INTO THE TLxxxx/TPxxxx TABLES
!FOR MTs
devunt+21[ 0 !MTA UNIT ZERO (MTA0: FROM DEVNAM ABOVE) ASSIGNED TO JOB 0
DEVUNT+22[ 1 !JOB 0,,MTA1:
DEVUNT+23[ 2 !JOB 0,,MTA2:
DEVUNT+24[ 3 !JOB 0,,MTA3:
DEVUNT+25[ 4 !JOB 0,,MTA4:
DEVUNT+26[ 5 !JOB 0,,MTA5:
DEVUNT+27[ 777777,,6 !UNASSIGNED,,MTA6:
...
...
...
DEVUNT+40[ 777777,,17 !UNASSIGNED,,MTA17:
!DV%PSD=400000 INDICATES A PSEUDO DEVICE
!THE FOLLOWING ENTRIES FOR MTs WILL INDICATE
!THE AVAILABILITY OF LOGICAL TAPE UNITS
devunt+41[ 32,,400000 !PSEUDO DEVICE MT0: IS ASSIGNED TO
!JOB 32 OCTAL (JOB 26 IN DECIMAL)
DEVUNT+42[ 777776,,400001 !CONTROLLED BY ALLOCATOR,,MT1:
DEVUNT+43[ 777776,,400002 ! " " " ,,MT2:
DEVUNT+44[ 777776,,400003 ! " " " ,,MT3:
...
...
...
DEVUNT+60[ 777776,,400017 ! " " " ,,MT17:
!TLABR0 (INDEXED BY MT NUMBER) WILL INDICATE
!WHICH PHYSICAL TAPE UNIT WILL BE USED WHEN
!REFERENCING AN MT. THIS IS INDICATED BY THE
!PHYSICAL MTA NUMBER IN BITS 2-8.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 46
LOOKING AT HUNG TAPES
tlabr0[ 405000,,0 !BIT 0 INDICATES A VALID VOLUME IS MOUNTED ON MTA5
mtcutb+5[ 730437,,730625 !CDB,,UDB FOR MTA5 BEING USED BY JOB 26
!WHO KNOWS IT AS MT0 (SEE ABOVE)
730625[ 102,,157 !FIRST WORD OF UDB FOR MTA5
!US.WLK=1B11 >> WRITE LOCKED
!US.TAP=1B16 >> TAPE TYPE DEVICE
!.UTT70=17B35 >> TU70
mtasts+5[ 0 !THIS EXAMPLE INDICATES A TAPE DRIVE THAT PROBABLY
!HASN'T BEEN REFERENCED BY THE USER YET
mretn$g !TO RETURN TO SDDT FROM MDDT
<>
^Z !TO RETURN TO THE EXEC FROM SDDT
$
If clearing MTASTS and UDBSTS for the drive doesn't seem to clear
the problem, you will probably have to do more digging around to find
some other, more obscure, inconsistency in the MTA/MT tables. This
can be accomplished by referring to the monitor tables (which,
hopefully, have been included with the SWSKIT) under MTA-STORAGE-AREA.
As always, extreme caution should be exercised while fooling around in
MDDT as you can accidentally trash some random location in the monitor
just by hitting a carriage return at the wrong time.
One last note should be made about the monitor tables here. The
description of the DEVUNT table would lead one to believe that the
right half will contain a -2 if the device is under control of the
allocator. If the device is under control of the allocator, the -2
will appear in the left half.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 47
A LOOK AT SOME OF THE DISK STUFF
A LOOK AT SOME OF THE DISK STUFF
================================
This article is a front for the PHYPAR module, which is where the
information may be reliably obtained, and should serve as the ultimate
reference for these problems.
Much of the system debugging you will have to deal with will
involve the DEC-20 hardware. There always seems to be a large gap
between what the diagnostics can tolerate and what the monitor can
tolerate in the way of malfunctioning hardware. The monitor will not
always point you to the real disk or magtape problem, say, but will
crash after something has gone wrong a few minutes ago somewhere.
Most of the hardware problems that we have had to deal with that were
really difficult to track down and point the Field Service rep. to
were problems with disk hardware. The following is information which
you can use to help Field Service trace down problems which are not
reported in the diagnostics. In most cases the Field Service rep
knows what all the status bits etc. mean but have not been able to
find them in the monitor crashes or running monitor.
CHNTAB:
CHNTAB is an ordered list of Channel Data Block
addresses starting with channel 0. RH20-0 data block
address is in the first word etc.
CDB:
CDB is the Channel Data Block. There is one CDB per
channnel. The CDB contains channel dependant
instructions and data, pointers to the unit data block
(UDB) in the case of RPO4, RP05, and RP06's. In the
case of TU45's the pointer is to the Kontroller Data
Block (TM02's) which point in turn to the UDBs. The
CDB also contains information about the currently
active unit. When the channel interrupts, control
passes (via a JSP) to CDBINT. The CDB address is
stored in AC1, P1 and the principal analysis routine,
PHYINT, is called.
NOTE: The CDBs are referenced in modules PHYSIO, PHYH2 (RH20
code), PHYM2 (TMO2 code) and PHYP4 (RP04, 05, 06
code). The Channel Data Block is defined in the
module PHYPAR. The address that you get in CHNTAB is
really a pointer to word0 which contains the status
bits for this controller (CDBSTS). Look in PHYPAR for
the table definition. Some words of interest are:
CDBaddress + CDBSTS: status and configuration
information CDBaddress + CDBUDB: 8 word table of UDB
(or KDB) addresses.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 48
A LOOK AT SOME OF THE DISK STUFF
The status bits which are also defined in PHYPAR are
listed here for your convenience:
CS.OFL==1B0 ; offline
CS.AC1==1B1 ; primary command active
CS.AC2==1B2 ; secondary command active
CS.ACT==CS.AC1!CS.AC2 ; any active
CS.MAI==1B3 ; channel is in maintenance mode
CS.MRQ==1B4 ; maintenance mode requested for unit
CS.ERC==1B5 ; error recovery in progress
CS.STK==1B6 ; channel supports command stacking
CS.ACL==1B7 ; alternate command list is current
BITs 30-32 ; PIA field
BITs 33-35 ; channel type field
KDB:
Kontroller Data Block (TM02 only) defined in PHYPAR
also. Referenced in PHYM2, PHYPAR, PHYSIO. Words of
interest are:
KDBADDR+KDBSTS: ; flags unit type
KDBADDR+KDBUDB: ; UDB table first word (1 word/UDB)
UDB:
Unit Data Block. There is one UDB per unit associated
with a CDB or KDB. The UDB contains information about
the current activity on the unit in question. The UDB
is defined in PHYPAR as well. Some words of interest
are noted below. Look in the listings for other
information.
UDBADDR + UDBSTS: ; status and configuration info (see below)
UDBADDR + UDBERR: ; error recovery status word
UDBADDR + UDBERP: ; error reporting work area if non 0
UDBADDR + UDBRED: ; reads - sectors if disk, frames if tape
UDBADDR + UDBWRT: ; writes - sectors if disk, frames if MTA
UDBADDR + UDBSRE: ; soft read errors
UDBADDR + UDBSWE: ; soft write errors
UDBADDR + UDBHRE: ; hard read errors
UDBADDR + UDBHWE: ; hard write errors
UDBADDR + UDBPS1: ; current cylinder if disk, cur file if MTA
UDBADDR + UDBPS2: ; current sector within cyl if disk, record
; in file if tape
UDBADDR + UDBSPE: ; soft positioning error
UDBADDR + UDBHPE: ; hard positioning error
; NOTE - there are several other UDB words
; including a device dependent portion
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 49
A LOOK AT SOME OF THE DISK STUFF
STATUS BITS IN UDBSTS OR FIRST WORD OF UDB:
US.OFS==1B0 ; off line or unsafe
US.CHB==1B1 ; check HOME blocks before any normal I/O
US.POS==1B2 ; positioning in progress
US.ACT==1B3 ; active
US.BAT==1B4 ; on if bad BAT blocks on this unit
US.BLK==1B5 ; lock bit for this units BAT blocks
US.PGM==1B6 ; dual port switch in (A or B)
US.MAI==1B7 ; unit is in maintenance mode
US.MRQ==1B8 ; maintenance mode requested on this unit
US.BOT==1B9 ; unit is at BOT
US.REW==1B10 ; unit is rewinding
US.WLK==1B11 ; unit is write locked
US.MAL==1B12 ; maintenance mode allowed on this unit
US.OIR==1B13 ; operator intervention required, set at
; interrupt level, checked at periodically.
US.OMS==1B14 ; once a minute message to operator, used in
; conjunction with US.OIR.
US.PRQ==1B15 ; positioning required on this unit
US.TAP==1B16 ; device type tape
US.PSI==1B17 ; tape - online/offline/rewind done transition
BITS 32-35 CONTAIN UNIT TYPE CODE NAME IS USTYP
.UTRP4 = 1 ; RP04
.UTRS4 = 2 ; RS04 (drum)
.UTT16 = 3 ; TU16 (TU45)
.UTTM2 = 4 ; TM02 as a unit
.UTRP5 = 5 ; RP05
.UTRP6 = 6 ; RP06
.UTRP7 = 7 ; RP07
.UTRP8 = 10 ; RP08
.UTRM3 = 11 ; RM03
.UTTM3 = 12 ; TM03 AS A UNIT
.UTT77 = 13 ; TU77
.UTTM7 = 14 ; TM78
.UTT78 = 15 ; TU78
.UTDX2 = 16 ; DX20-A
.UTT70 = 17 ; TU70
.UTT71 = 20 ; TU71
.UTT72 = 21 ; TU72
.UTT73 = 22 ; TU7x
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 50
A LOOK AT SOME OF THE DISK STUFF
THE PLACES WHERE THINGS ARE ON THE DISK ARE AS FOLLOWS:
BLOCK 0: ; 11 bootstrap
BLOCK 1: ; primary HOME block
BLOCK 2: ; primary BAT block
BLOCKS 3-11: ; reserved
BLOCK 12 ; secondary HOME block
BLOCK 13 ; secondary BAT block
The places where the disk pages for the above are stored is in the
table HOME. HOME is defined in STG. The BAT blocks are defined in
PROLOG and the HOME blocks are defined in DSKALC.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 51
NEW DISK FEATURES FOR FILDDT
NEW DISK FEATURES FOR FILDDT
The FILDDT to be shipped with release 4 of TOPS-20 will have two new
commands in relation to disk file structure maintenance.
They are:
STRUCTURE (FOR PHYSICAL I/O IS) disk-structure
Examines the specified disk structure.
DRIVE (FOR PHYSICAL I/O IS ON CHANNEL) c (UNIT) u
Examines the specified disk unit.
These are privileged functions and one must have privileges enabled to
use these.
These two commands are nearly identical. Their difference is in the
way the structure is identified. To use the STRUCTURE command the
structure must be mounted. The STRUCTURE command is useful for
examining a multi-pack structure. The DRIVE command is useful for
examining the file system of a structure which cannot be mounted.
Channel and unit numbers can be found from the programs UNITS, DS,
SYSDPY, or OPR.
Addressing is in the same format as in other forms of DDT.
It is easier to understand exactly what the disk will look like in
FILDDT if you keep in mind that all sectors will be packed in the DDT
address space, without regard for sector size, starting at DDT address
0. For instance, on an RP06 there are four sectors per memory page or
200 (octal) words per sector. Therefore, sector zero of the structure
will begin at FILDDT address 0 and end at memory address 177 (octal).
Sector 1 will begin at address 200 and end at 377. For release 4, all
DEC supported disks contain 200 (octal) words per sector, so a
consistent mapping exists between sector number and FILDDT memory
location. Soon, TOPS-20 will support RP20's. For RP20's, there are
1000 (octal) words per sector (one page per sector). Index block
addresses and most monitor disk addresses are in sectors. That is why
it is important to be able to translate between sector addresses and
FILDDT memory addresses.
The FILDDT option of ENABLE PATCHING is also available for use with
the DRIVE and STRUCTURE command. With this option on, the user is
able to modify specific words on the structure. Another very
convenient FILDDT command one may use in conjunction with the disk
commands is LOAD (symbols from) input file spec. One may specify any
file here but a useful one is SYSTEM:MONITR. The symbol table to the
MONITOR has home block sector addresses, FDB offsets etc. When a
file's symbols are loaded, one may also define his own symbols.
This is useful to remember addresses of data structures on the units.
For example, after finding the index block to a file, one could define
a symbol, FILIDX at that address for easy referencing later on.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 52
NEW DISK FEATURES FOR FILDDT
When examining a multi-pack structure using the STRUCTURE command,
addressing the first unit is exactly as if there were only one unit in
the structure. FILDDT addresses of sectors on the other units begin
immediately after the last address for the first unit of the
structure. For example, consider that we would like to examine the
BAT blocks for the second unit of a two pack STR: on RP06 drives.
An RP06 contains 304000. sectors per unit and 128. words per sector.
The first FILDDT address for the second unit of a RP06 two pack STR:
is 304000.*128.=38912000. or 224340000 (octal)
FILDDT>STRUCTURE (FOR PHYSICAL I/O IS) PS:
[Looking at file structure PS:]
; starting address of second unit in structure
; plus sector address of BAT blocks (2)
; times number of words per sector gives
; FILDDT address of start of BAT blocks for
; that unit
224340000+2*200=224,,340400
224,,340400[ 424164,,0 $6T; BAT
For another example, let's say we would like to find the start of the
ROOT-DIRECTORY symbol table.
@ENABLE (CAPABILITIES)
$FILDDT
FILDDT>LOAD (SYMBOLS FROM) SYSTEM:MONITR
[22722 symbols loaded from file]
FILDDT>STRUCTURE (FOR PHYSICAL I/O IS) PS:
[Looking at file structure PS:]
NWSEC=200 ; number of words per sector
HM1BLK=1 ; sector number of HOM block
HOMRXB=10 ; offset in HOM block for index
; block to root-directory
; sector number of HOM block
; times words per sector equals
; FILDDT address of start of HOM block
HM1BLK*NWSEC[ 505755,,0 $6T;HOM
HM1BLK*NWSEC+HOMRXB[ 10,,5740 ; plus offset to address of index block
; sector number of index block times
; number of words per sector gives
5740*NWSEC[ 10,,5744 ; FILDDT adr of root-dir index block
; NOTE: Bit 14 (DSKAB) specifies this
; address as a disk sector address.
; sector addresses are bits 15-35
RTDIDX: ; define symbol for index block
; sector number of first page of
; root-directory times number of words
; per sector gives the
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 53
NEW DISK FEATURES FOR FILDDT
5744*NWSEC[ 400300,,100 ; FILDDT adr of first page of ROOT-DIR
RTDIR0: ; define start of page 0 of ROOT-DIR
RTDIR0+3[ 30610 ; plus 3 for start of symbol table
; NOTE: adr is a 'directory address'
; offset 610 of directory page 30
RTDIDX+30[ 10,,6250 ; get sector adr of page 30 of ROOT-DIR
; sector adr of page 30 times words per
; sector gives FILDDT address of page
; 30 of ROOT-DIR.
6250*NWSEC+610[ 400400,,1 ; Add offset for symbol table start
RTDSYM:
^E
FILDDT>EXIT
Here are some magic numbers for all DEC supported drives.
DRIVE TYPE SECTORS/UNIT STARTING ADR STARTING ADR
OF 2nd UNIT OF 3rd UNIT
(in decimal) (in octal) (in octal)
__________ ____________ ____________ ____________
RP04-RP05 152000. 112,,160000 224,,340000
RP06 304000. 224,,340000 450,,700000
RP07 502200. 365,,156000 752,,334000
RM03 121360. 73,,204000 166,,410000
RP20 201420. 611,,314000 1422,,630000
NOTE: RP20 will not be supported in release 4. It is important to
remember that there are 1000 (octal) words per sector for a
RP20. As a result, to look at a sector of an RP20, one would
multiply the sector number by 1000 (octal) to find the FILDDT
starting address for that sector. For all other drive types
there are 200 (octal) words per sector.
The above information is calculated from the parameters available in
STG.MAC.
REF: DDT41.MEM
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 54
TOPS-20 SCHEDULER TEST ROUTINES
TOPS-20 SCHEDULER TEST ROUTINES
-------------------------------
The following is a tabulation of (hopefully) all of the scheduler
tests used by the TOPS-20 monitor, time-frame approximately Release 3A.
This includes ARPA and DECNET tests. This is the data one finds in the
monitor table FKSTAT indexed by fork number for forks which have blocked
and left the GOLST (i.e. LH(FKPT) contains WTLST). The format of the
FKSTAT table words is TEST DATA,,TEST ROUTINE ADDRESS. The scheduler
test routines are called periodically to determine if a process can be
unblocked. This is indicated by a skip return from the scheduler test.
A nonskip return is taken if the process cannot yet be unblocked.
When examining the monitor because of a hung job or fork, the
FKSTAT table can often reveal the reason the fork is hung, and this
sometimes even allows corrective action to be taken.
The table below gives routine name, what you should expect to see
in the FKSTAT table, and the module in which the scheduler test is
defined, followed finally by a short description of what the particular
condition is which is being tested.
SCHEDULER TESTS
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
BALTST [CONNECTION #,,BALTST] [NETWRK]
Wait for network bit allocation.
BATTST [UNIT #,,BATTST] [DSKALC]
Wait for US.BLK, the lock bit for the BAT blocks
on the unit, in the UDB to be zero.
BLOCKM [TIME,,BLOCKM] [SCHED]
Wait for TIME in BLOCKM format which is the low
order 17 bits of the desired future time to be
compared against a suitably masked TODCLK.
BLOCKT [TIME,,BLOCKT] [SCHED]
Wait for TIME in BLOCKT format which is a
value that is shifted left 10 bits and compared
against a suitably masked TODCLK, providing a
longer delay than BLOCKM, but less precision.
BLOCKW [TIME,,BLOCKW] [SCHED]
Wait for TIME in BLOCKW format (same as BLOCKM).
CDRBLK [UNIT NUMBER,,CDRBLK] [CDRSRV]
Wait for card-reader offline, or not waiting for
a card.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 55
TOPS-20 SCHEDULER TEST ROUTINES
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
CHKLOK [ADDRESS,,CHKLOK] [NSPSRV]
Wait for NSP block lock at address to free.
COFTST [TIME,,COFTST] [MEXEC]
Wait for job in FKJOBN to be attached or time
in BLOCKT form to elapse.
DBWAIT [DTE #,,DBWAIT] [DTESRV]
Wait for the TO-10 doorbell from the given DTE.
DGLTST [0,,DGLTST] [DIAG]
Wait for DIAGLK lock to be free.
DGUIDL [UDB ADDRESS,,DGUIDL] [DIAG]
Wait for the unit to show as idle in the UDB.
DGUTST [UDB ADDRESS,,DGUTST] [DIAG]
Wait for the maintenance bit to set in the UDB.
DISET [ADDRESS,,DISET] [SCHED]
Wait for contents of ADDRESS to be zero.
DISGET [ADDRESS ,,DISGET] [SCHED]
Wait for contents of ADDRESS to be positive.
DISGT [ADDRESS,,DISGT] [SCHED]
Wait for contents of ADDRESS to be greater than
zero.
DISLT [ADDRESS,,DISLT] [SCHED]
Wait for contents of address to be less than
zero.
DISNT [ADDRESS,,DISNT] [SCHED]
Wait for contents of ADDRESS to be non-zero.
DMPTST [COUNT,,DMPTST] [IO]
Wait for COUNT to be less than DMPCNT to indicate
dump mode buffers freed.
DSKRT [PAGE #,,DSKRT] [PAGEM]
Wait for CSTAGE for PAGE # to not be PSRIP,
meaning disk read completed.
DWRTST [PAGE #,,DWRTST] [PAGEM]
Wait for DRWBIT to clear in CST3(PAGE #),
meaning write completed.
ENQTST [FORK #,,ENQTST] [ENQ]
Wait for the lock on ENFKTB+FORK #.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 56
TOPS-20 SCHEDULER TEST ROUTINES
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
FEBWT [ADDRESS OF FE UDB,,FEBWT] [FESRV]
Wait for EOF or input bytes available from FE.
Wake also on invalid assignment.
FEDOBE [ADDRESS OF FE UDB,,FEDOBE] [FESRV]
Wait for output buffer empty and all bytes are
acknowledged by the FE. Wake also if not a
valid assignment.
FEFULL [ADDRESS OF FE UDB,,FEFULL] [FESRV]
Wait for the current count of output bytes to be
less than the count of bytes in the interrupt
buffer. Wake also on invalid assignment.
FORCTM [SUPERIOR FORK INDEX,,FORCTM] [SCHED]
Identifiable wait forever, forced termination.
FRZWT [PREVIOUS TEST,,FRZWT] [FORK]
Identifiable wait forever, frozen fork.
HALTT [SUPERIOR FORK INDEX,,HALTT] [SCHED]
Identifiable wait forever for halted fork.
HIBERT [TIME,,HIBERT] [SCHED]
Wait for TIME in BLOCKT format.
HUPTST [<0:9>TIME<10:17>HOST #,,HUPTST] [NETWRK]
Wait for IMPHRT bit set for host or time out in
BLOCKW form.
IDVTST [0,,IDVTST] [IMPDV]
Wait for the lock on IDVLCK to free, lock it.
IMPBPT [0,,IMPBPT] [IMPDV]
Wait for IMPFLG nonzero, or IBPTIM timer to run
out, or IDVLCK lock free and output scan needed
for the IMP.
JB0TST [TIME,,JB0TST] [MEXEC]
Wait for JB0FLG set nonzero for explicit request
or time in BLOCKT form to elapse.
JRET [0,,JRET] [SCHED]
Wait forever, interruptible.
JSKP [0,,JSKP] [SCHED]
Unconditional skip used to schedule immediately.
JTQWT [0,,JTQWT] [SCHED]
Wait for JSYS trap queue.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 57
TOPS-20 SCHEDULER TEST ROUTINES
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
LCKTSS [ADDRESS,,LCKTSS] [IO]
Wait for lock at ADDRESS to unlock, lock it.
LKDSPT [0,,LKDSPT] [STG]
Wait for room in LDTAB table of directories
currently locked.
LKDTST [INDEX INTO LDTAB,,LKDTST] [STG]
Wait for bit in LCKDBT to clear, indicating
directory unlocked.
LODWAT [ADDRESS OF STATUS WORD,,LODWAT] [LINEPR]
Wait for flag LP%LHC to set in the addressed
word, indicating loading has completed of the
VFU or RAM file.
LPTDIS [UNIT ADDRESS,,LPTDIS] [LINEPR]
Wait for an error condition on the addressed
unit, or for all buffers cleared and no bytes
still in the front-end, before finishing close
operation on the device.
MTARWT [IORB ADDRESS,,MTARWT] [MAGTAP]
Wait for IRBFA in the IORB to indicate that this
IORB is no longer active.
MTAWAT [UNIT #,,MTAWAT] [MAGTAP]
Wait for all outstanding IORBs for unit to be
finished.
MTDWT1 [UNIT #,,MTDWT1] [MAGTAP]
Wait for the count of outstanding requests on the
unit to go to one.
NCPLKT [0,,NCPLKT] [NETWRK]
Wait for lock NCPLCK to free, lock it.
NICTST [0,,NICTST] [PAGEM]
Wait for SUMNR less than or equal to MAXNR or
only one fork in BALSET.
NOTTST [<0:8>CONNECTION #<9:17>STATE,,NOTTST] [NETWRK]
Wait for connection to leave state.
NSPTST [0,,NSPTST] [NSPSRV]
Wait for KDPFLG nonzero, indicating KMC11 wants
service, or MSGQ nonzero, indicating messages to
process.
NVTNTT [<0:8>OPTION #,<9:17>LINE #,,NVTNTT] [TTNTDV]
Wait for completed NVT negotiation.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 58
TOPS-20 SCHEDULER TEST ROUTINES
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
OFNLKT [OFN,,OFNLKT] [PAGEM]
Wait for OFN unlocked--SPTLKB zero in SPTH(OFN).
PIDWAT [FORK #,,PIDWAT] [IPCF]
Wait for bit for fork in PDFKTB to set.
SEBTST [0,,SEBTST] [SYSERR]
Wait for SECHKF to go nonzero before starting
Job 0 task to write queued SYSERR entries.
SEEALL [0,,SEEALL] [TTYSRV]
Waits for SNDALL to go to zero, indicating the
send-all buffer available.
SPCTST [0,,SPCTST] [DTESRV]
Wait for a node.
SPMTST [0,,SPMTST] [PAGEM]
Wait for page in SPMTPG to be on SPMQ or the
time SPMTIM to expire.
SQLTST [0,,SQLTST] [IMPDV]
Wait for the special queues lock SQLCK and lock
it.
STRTST [SDB ADDRESS OF STRUCTURE,,STRTST] [MSTR]
Wait for the structure lock to be free.
STSWAT [ADDRESS OF STATUS WORD,,STSWAT] [CDRSRV]
Wait for flag CD%SHA to come on in the addressed
word, indicating that cardreader status has
arrived.
STSWAT [ADDRESS OF STATUS WORD,,STSWAT] [LINEPR]
Wait for flag LP%SHA to set in the addressed
word, indicating that printer status has
arrived.
SUSFKT [FORK #,,SUSFKT] [FORK]
Wait for fork to be on WTLST in either SUSWT
OR FRZWT.
SWPRT [PAGE #,,SWPRT] [PAGEM]
Wait for CSTAGE for PAGE # to not be PSRIP,
meaning swap read completed.
SWPWTT [0,,SWPWTT] [PAGEM]
Wait for NRPLQ nonzero. Increment CGFLG each
time test is unsuccessful.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 59
TOPS-20 SCHEDULER TEST ROUTINES
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
TCIPIT [FORK #,,TCIPIT] [TTYSRV]
Waits for no interrupts pending for FORK #.
TCITST [LINE #,,TCITST] [TTYSRV]
Wait for line inactive, no fork in input wait,
or input buffer non-empty.
TCOTST [LINE #,,TCOTST] [TTYSRV]
Wait for line inactive, or output buffer not
too full to add a character to it.
TRMTS1 [0,,TRMTS1] [FORK]
Identifiable wait forever for inferior fork termination.
TRMTST [FORK #,,TRMTST] [FORK]
Wait for FORK # to be on WTLST for either HALTT
or FORCTM.
TRP0CT [MINIMUM NRPLQ,,TRP0CT] [PAGEM]
Wait for NRLPQ to be above stated minimum or
normal minimum. Increment CGFLG each time
test is unsuccessful.
TSACT1 [LINE #,,TSACT1] [TTYSRV]
Wait until line inactive, becoming active, or
has a full length dynamic block assigned.
TSACT2 [LINE #,,TSACT2] [TTYSRV]
Wait for line available--inactive or fully
active.
TSACT3 [LINE #,,TSACT3] [TTYSRV]
Wait for line inactive--dynamic data unlocked.
TSTSAL [0,,TSTSAL] [TTYSRV]
Wait for SALCNT to go to zero, indicating the
send-all is finished for this buffer.
TTBUFW [NUMBER,,TTBUFW] [TTYSRV]
Wait for NUMBER of buffers.
TTIBET [LINE #,,TTIBET] [TTYSRV]
Wait for line inactive or input buffer empty.
TTOAV [LINE #,,TTOAV] [TTYSRV]
Wait for line inactive and output buffer not
empty.
TTOBET [LINE #,,TTOBET] [TTYSRV]
Wait for line inactive or output buffer empty.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 60
TOPS-20 SCHEDULER TEST ROUTINES
TEST CONTENTS OF T1 AT TIME OF SCHEDULER CALL DEFINED
---- ---------------------------------------- -------
UDITST [0,,UDITST] [PHYSIO]
Wait for at least two free IORBs on UIOLST.
UDWDON [IORB ADDRESS,,UDWDON] [PHYSIO]
Wait for IS.DON to set in IRBSTS for this IORB.
UPBGT [CONNECTION INDEX,,UPBGT] [IMPDV]
Wait for LTDF connection done flag to set, or
output buffers to appear.
USGWAT [0,,USGWAT] [JSYSA]
Wait for lock on queued USAGE blocks to free.
VVBWAT [UNIT #,,VVBWAT] [TAPE]
Wait for the MDA to reset TPVV handling EOV.
WATTST [<0:8>CONNECTION #<9:17>STATE,,WATTST] [NETWRK]
Wait for connection to be in state.
WTFKT [FORK #,,WTFKT] [FORK]
Wait for fork to be on WTLST.
WTSPTT [PAGE #,,WTSPTT] [SCHED]
Wait for share count on PAGE # to go to 1.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 61
TOPS-20 PAGE ZERO LOCATIONS
TOPS-20 PAGE ZERO LOCATIONS
The following text outlines the uses of memory in page zero of the
TOPS-20 monitor as of Release 4.
ADDR MNEMONIC USAGE
==== ======== =====
0-17 -- Shadow ACs, not used.
20 SCTLW Scheduler halt request word (see SWTST in SCHED). Word
of function bits, current functions include Halt
timesharing, wait for system down, manual pause, and
reset FE protocol.
21 -- Used by BOOT to build CCW lists (unused by monitor).
22 -- Same as 21; both unused for KS10 systems.
23 CRSHTM Initial time for reload; -1 => time not set yet.
Contains the date/time that the system was last
reloaded. May see -1 after forced reload on KS
processor. BUGSTO (APRSRV) copies TADIDT into it for
each BUGHLT/CHK/INF.
24 SEBQOU Pointer to queued SYSERR blocks not yet written.
25 DBUGS1 Not currently used by the monitor.
26 BUGHAD Code around SYSLD1 (STG) puts LH into BUGCHK, RH into
BUGHLT after a reload. No one else uses it, so it
should contain zero.
27 CRSTD1 Current time is saved here on each BUGHLT/CHK/INF. This
is the value that gets into the SYSERR block. Contains
the date/time for the system's most recent
BUGHLT/CHK/INF.
30 SHLTW Scheduler halt word, depositing a nonzero contents
requests system shutdown.
31 RLWORD KS only; used for front-end communication, flags,
keep-alive, etc. (see PROKS). Unused on KL.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 62
TOPS-20 PAGE ZERO LOCATIONS
32 CTYIWD KS only; used for front-end communication, used for the
CTY input location. Unused on KL.
33 CTYOWD KS only; used for front-end communication, used for the
CTY output location. Unused by KL.
34 KLIIWD KS only; used for front-end communication, used for the
KLINIK input location. Unused by KL.
35 KLIOWD KS only; used for front-end communication, used for the
KLINIK output location. Unused by KL.
36 -- Unused/reserved. Holds KS RHBASE during boot.
37 -- Unused/reserved. Holds KS unit number during boot.
40 .JBUUO Monitor's location 40. Holds KS tape info during boot.
41 .JB41 Monitor's LUUO dispatch word.
Contains XPCW LUUBLK.
42-43 -- Unused/reserved.
44 .JBREL Job Data Area word filled in by LINK. Contains 777.
45-67 -- Unused/reserved.
70 PWRTRP Location executed by the front-end on powerfail restart.
Contains JRST PWRRST.
71 RLDADR Executed by the front-end on certain (keep-alive)
reloads. APRSRV demands this location be PWRTRP+1.
Contains XPCW RLODPC which winds up at RLDHLT for a
KPALVH BUGHLT.
72 EDDTF Retain EDDT in core if contents is one.
73 CRSTAD Is supposed to contain date/time of last crash. Code in
STG checks it to decide to restore the data from
BUGHAD. During system startup for KL-10s the word is
used to set the reload date/time if nonzero.
Apparently it gets no real use on KS-10s. Contains
zero while system is in normal operation.
74 .JBDDT JOBDDT location.
Contains DDT (EDDT entry point).
75 .JBHSO Unused/reserved.
76 DBUGSW BUGHLT action switch word (0=unattended; 1=attended;
2=debugging).
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 63
TOPS-20 PAGE ZERO LOCATIONS
77 DCHKSW BUGCHK action switch word.
100-107 -- Reserved for use by the front-end command language.
110 STSBLK KL-Status block pointer, virtual address. Contains zero
if status reporting is not enabled.
111 -- Physical address (MAP) of above virtual address.
112 .JBEDV Pointer to Exec Data Vector
Contains MONEDV.
113-114 -- Unused/reserved.
115-117 -- Unused/reserved.
120 .JOBSA TOPS-10 style start address.
Contains EVGO.
121-132 -- Unused/reserved.
133 .JBCOR Job Data Area location set by LINK. LH contains highest
low segment address loaded with data. RH refers to a
SAVE argument for highest page.
134-136 -- Unused/reserved.
137 .JBVER Job Data Area version number word.
Contains current monitor version number.
140 EVDDT Monitor startup transfer vector; enter EDDT.
Contains JRST DDTX.
141 -- Reset and go to EDDT location.
Contains JRST SYSDDT.
142 EVDDT2 Copy of 140.
Contains JRST DDTX.
143 EVSLOD Entry to initialize file system, used for installation.
Contains JRST SYSLOD.
144 -- Unused; contains zero.
145 EVRST Restart the system location.
Contains JRST SYSRST.
146 EVLDGO Reload and start the system location.
Contains JRST SYSGO.
147 EVGO Start the monitor location.
Contains JRST SYSGO1.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 64
TOPS-20 PAGE ZERO LOCATIONS
150 DDTPRS DDT present flag; EDDT is present if nonzero.
Contains -1 initially, cleared later for EDDTF not set.
151 BUTRXB Defined in BOOT and STG but not used (BOOT reads the
disk address of the Root-directory from the HOM
blocks). Contains zero.
152 BUTMUN Defined in BOOT and STG but not used (BOOT reads the
values from the HOM blocks, and uses variable MAXUNI
instead). Contains zero.
153-162 BUTDRT Defined in BOOT and STG but not used (BOOT uses internal
variable DSKTAB for logical to physical structure
mapping). Contains zeros.
163-201 BUTCMD ASCIZ file name of monitor; used for booting the
swappable monitor with calls to VBOOT for segments.
202 BUTPGS Start,,End virtual addresses of VBOOT pages. Used to
reference and finally unlock/destroy VBOOT pages.
203 BUTEPT Contains in LH: Address of the VBOOT EPT page.
RH: Address of the VBOOT page table page.
204 BUTPHY Contains in LH: Minus number of pages to map.
RH: Address of first page to map (for the monitor).
Typically contains -5,,773777 for three pages of code,
a file data page and an index block page. Used with
the value in BUTVIR.
205 BUTVIR Virtual address of first page of BOOT to map. Typically
will contain 773000. Used in conjunction with BUTPHY.
206 BOOTFL BOOT flags word, 0 => normal, nonzero => special boot.
The contents is supposed to be the index into a table
(BOOTD) designating how to boot the swappable monitor.
An ILBOOT BUGHLT results if the index is too large. In
the SYSGO routine the value IRBOOT is put into BOOTFL;
the table BOOTD contains entries of JRST GSMDSK for all
entries but the IRBOOT offset, which has JRST GSMIRB.
207 DINFSW BUGINF action switch word.
210-237 PHYPZS Formerly used for page zero I/O use by PHYSIO.
Currently unused, contain zero.
240-777 -- Not used, contain zero.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 65
KNOWN HARDWARE DEFICIENCIES LIST
Known Hardware Deficiencies List
This is a collected list of known hardware characteristics which show up
from time to time as part of certain reported problems. This says
nothing about whether these characteristics are bugs or features, or
whether they will ever be fixed or changed, but merely attempts to make
them known internally.
1. DZ11 - Cannot set the speed to zero in the hardware, can only
turn off the receiver.
2. TM02 - Can generate bad parity which it passes to memory to
cause the system memory parity errors when the data is
referenced. This is still seen with Rev 12 to the RH20.
3. TM03 - A chip race condition has been known to occur where a
function register has wrong value because it has not settled.
This generates a device error which appears transient; i.e.
CRLFing DUMPER tries the read again and succeeds.
4. TM03 - ANSI ASCII was not included in the hardware format
modes.
5. TM03 - When using industry-compatible mode, reads not of a
multiple of four bytes will produce strange results. The bytes
are counted, but the extra bytes are not written to memory,
leaving garbage.
6. DX20 - there is a race type condition where the DX20 generates
an an interrupt request on channel 5 for some condition, but
the code is playing with the DX20 and handles the condition, so
it lowers its request, however the KL has latched the interrupt
and tries to process it, but no one will respond. So it tries
the 40+2n type, which gives a PI5ERR occasionally.
7. VT100 - on a VT100 without the extended memory, one can confuse
the internal microprogram enough to have it clear sections of
the screen on Control-U, Control-R, etc.
8. RH20 - perfectly willing to store bad parity data into memory
until Rev 12. May still do so.
9. DX20 - is unwilling to allow registers to be examined after it
has started I/O. Can cause register access errors if not
programmed in correct sequence.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 66
KNOWN HARDWARE DEFICIENCIES LIST
10. LP20 - at least one of the printers fails to go off-line when
there is anything in the print line buffer, even if the drum
gate is opened.
11. KS-10 Front End - Rev. 3. exhibits problems with the KLINIK
line. If the link is in use, it is possible to lock out the
CTY. There are problems with the password check on subsequent
tries, and problems with line hang-up.
12. KS-10 Front End - Rev. 3. exhibits some problems with
powerfail restart. If the power returns in less that 3.5
seconds or so the restart will hang. In addition if Rev. 3
and Rev. 2 boards are mixed, there is no powerfail restart or
reload capability.
13. DX20/TU71 - the DX20 microcode does not set the 556 bpi density
correctly for TU71 (7-track) drives. This can be set
successfully from the maintenance panel.
14. TM03 - if an error ocurs while rewinding, the monitor may be
left in a state of waiting for the rewind to complete, the tape
being unusable. The easiest way to clear this condition is to
reset the TM03, most easily done by the customer by powering it
down and back up.
15. KS10 - during a forced reload, the halt status block is written
twice, first when halting and second when rebooting; thus the
second time wipes any valuable data from the first time. It's
once again the 8080 that's responsible.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 67
KS10 PROCESSOR CONSOLE INFORMATION
KS10 PROCESSOR CONSOLE INFORMATION
----------------------------------
CSL-COMMANDS CURRENTLY IMPLEMENTED (CSL V0.161)
^Z ;enter USER mode
^\ ;enter CONSOLE mode
MK XX ;Marks microcode word at CRAM address XX (sets bit 95)
UM XX ;Unmarks Microcode at CRAM address XX
MB ;load only bootstrap of currently selected magtape
LA XX ;Load/set KS10 Memory Address
LI XX ;Load/set I/O address
LK XX ;Load/set 8080 address
LC XX ;Load/set CRAM address to be written/read
EM ;Examine KS10 Memory (last Memory location specified)
EM XX ;Examine KS10 Memory location XX
EN ;Examine Next (either from last EK, EM or EI)
EB ;Examine BUS and 8080 control registers
EI ;Examine I/O (last I/O address specified)
EI XX ;Exmaine I/O address XX
EK ;Examine 8080 location
EK XX ;Examine 8080 address XX
DM XX ;Deposit KS10 Memory last addressed, XX data (36 bits)
DN XX ;Deposit next (depending on last DK, DM or DI) XX data
DB XX ;Deposit BUS, XX data (36 bits)
DI XX ;Deposit I/O, XX data (16,18 or 36 bits)
DK XX ;Deposit XX (8 bits) into 8080 (Data can only be deposited
;in RAM addresses)
MR ;MASTER RESET
CS ;CPU clock start
CH ;CPU clock halt
CP XX ;CPU clock pulse (XX=NR of pulses -- default 1 pulse)
SI ;Single Instruction
LF XX ;Load diagnostic write function (0-7) specifying 12 bits of
;microcode (see note at end ****)
DF XX ;Deposit Field, write microcode bits according to last LF-command
EC ;Examine CRAM ..curr. Control reg, no clocks .. current loc as addr.
EC XX ;Examine CRAM at address XX
DC XX ;Deposit CRAM, XX is at least 32 octal characters. Address
;previously loaded by LC command
EX XX ;EXecute KS10 instruction XX
ST XX ;STart KS10 at address XX. Console enters user mode
SM XX ;Start microcode at XX (SM 1 causes dump of HALT-status block !!)
;Default is 0 -- Start microcode
HA ;HALT KS10 (execute HALT-instruction -- causes microcode to
; write HSB and then to enter HALT-loop)
SH ;SHUTDOWN (deposit non-zero data in memory location 30)
; causing TOPS20 to shut down
CO ;Continue (causes microcode to leave HALT-loop)
PE X ;Parity Enable (0=disable, 1=DRAM-par, 2=CRAM-par
; 4=clock-par error stop, 5=DPE/DPM, 6=CRA/CRM, 7=enable all)
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 68
KS10 PROCESSOR CONSOLE INFORMATION
CE X ;CACHE enable (0=OFF, 1=ON, <CR>=show current state)
TE X ;CPU timer (1 MSEC) enable (0= OFF, 1=ON, <CR>=show current state)
TP X ;CPU TRAPS enable (0=OFF, 1=ON (enables paging),
;<CR>=show current state)
LT ;Lamp Test, lights three lamps of front panel
RC ;Read CRAM direct, functions 0-17
; (no resets, no load diag adr, no CPU clock) (see note at end ****)
EJ ;Examine Jumps -- prints CRAM address signals (Current CRAM address,
;next CRAM address, jump address, subroutine return address)
TR XX ;TRACE - repeats CP and EJ commands until any character typed
;XX (if typed) is desired CRAM stop-address
PM ;Pulse Microcode (issue single CP and EJ)
ZM ;Zero KS10 MOS Memory (beware -- slow)
RP ;Repeat - repeats last command, or line of commands which it delimits
; Any character (except CNTRL-O) typed will stop repeat
;EXAMPLE: EM 0, EK 0, EC 0, RP will repeat execution of this line
BT ;Boot SYSTEM -- load CRAM from designated disk (see DS)
; via memory then load monitor boot from disk and start at 1000
BT 1 ;same as BT, but loads diagnostic monitor SMMON and starts at 20000
LB ;Load Bootstrap from designated disk (see DS)
LB 1 ;Load Bootstrap diagnostic monitor SMMON
DS ;Disk Select for bootstrap or microcode verification. Command prompts
;to specify UNIT NUMBER (default 0), RHBASE (default 776700),
;and UNIBUS ADAPTER (default 1) to load from when booting
MS ;Magtape Select for bootstrap or microcode verification. Command
;prompts to specify UNIT NUMBER (default 0), RH BASE (default 772440),
;UNIBUS ADAPTER (default 3), SLAVE NUMBER (default 0), and
;DENSITY (default 1600 BPI) of magtape to boot from
MT ;Magtape Boot system from selected magtape
MT 1 ;BOOT diagnostic monitor SMMAG from magtape
PW ;clears KLINIK password, or sets it (6 char's max)
BC ;BOOT Check. PROM code which tests the basic 2020 system
; load path from the UNIBUS adaptor into the CRAM via memory.
CONTROL CHARACTERS
^U ;rub out current line
^O ;switch: first one stops CTY-output, second one resumes CTY-output
^S ;stop TTY-output and hangs 8080 waiting for CONTROL-Q (see below)
^Q ;resumes TTY-output
^C ;stops whatever the 8080 is doing
RUB-OUT ;rub out previous character typed
NOTE: Several commands may be put on a single line, separated by commas.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 69
KS10 PROCESSOR CONSOLE INFORMATION
***** CRAM Bit Formats
LF-Command CRAM Bits RC-Command CRAM Data
-------------------- ---------------------
LF CRAM bits RC Data
-- --------- -- ------------------------------
0 00-11 0 CRAM bits 00-11
1 12-23 1 Next CRAM address
2 24-35 2 CRAM subroutine return address
3 36-47 3 current CRAM address
4 48-59 4 CRAM bits 12-23
5 60-71 5 CRAM bits 24-35 (Copy A)
6 72-83 6 CRAM bits 24-35 (Copy B)
7 84-95 7 0s
10 Parity bits A-F
11 KS10 bus bits 24-35
12 CRAM bits 36-47 (Copy A)
13 CRAM bits 36-47 (Copy B)
14 CRAM bits 48-59
15 CRAM bits 60-71
16 CRAM bits 72-83
17 CRAM bits 84-95
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 70
KS10 PROCESSOR CONSOLE INFORMATION
8080-CONSOLE-ERROR-CODES
------------------------
?BUS BUS polluted on power-up
?BFO Input Buffer Overflow
?IL ILLEGAL Instruction
?UI Unknown Interrupt
?A/B A and B copies of CRAM bits did not match
?DNF Did Not Finish instruction
?BT device error or timeout during BOOT operation
?DNC Did Not Complete HALT
?PAR ERR report clock-freeze due to parity error,
and type out READ IO of 100,303,103
?RE memory Refresh Error (MEM BUSY stayed set too long,
because it didn't release data on a write to memory)
?CHK PROM checksum failed
?BC BOOT Check failed
?RUNNING CPU clock running (command typed requires clock to be stopped
and may fail)
?NDA received No Data Acknowledge on memory request
?NXM referenced NoneXistent Memory location
?NBR Console was not granted BUS on a request
?RA command Requires Argument
?BN received Bad Number on input (character typed is not an
octal number
?KA KEEP ALIVE failed
?FRC had a forced reload
?PWL Password Length error
?IA Illegal Argument (address out of range, etc.)
OTHER 8080 CONSOLE MESSAGES
---------------------------
BUS 0-35 message header for EB command
KS10> prompt message
CYC cycle type for DB command
SENT data sent to bus
RCVD data received on bus
HLTD message "HALTED/XXXXXX " where xxxxxx is data
BT SW message says BOOTING, using BOOT switch
OFF message, says current state is off
ON message, says current state is on
>>UBA? query for UNIBUS adapter
>>UNIT? query for unit to use
>>RHBASE? query for RH11 base register address to use
>>DENS? query tape density
>>SLV? query tape slave number
C CYC typed on DB-command if COM/ADR cycle blew
D CYC " " DATA cycle blew
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 71
KS10 PROCESSOR CONSOLE INFORMATION
8080-ERROR-Messages-during-BOOTING
----------------------------------
Disk:
On an error-condition, detected by the 8080, the
Fault-light will go on and a message of the form
?BT XXXYYY
will be printed on the CTY.
The following error-codes are only "rough" pointers, they can be
caused by any of the following problems:
Disk not a disk at all
Wrong unit selected (see DS-command)
Home blocks not readable or not there
Home blocks not set by SMFILE for 8080
8080 File-system garbage
XXX=001 Disk error encountered while trying to read HOME-blocks
Can mean incorrect RHBASE specified, wrong UBA selected,
bad disk drive, neither home block or alternate home
block has home block ID ("HOM" in sixbit)
XXX=002 Disk error encountered while trying to read the page of
pointers, which make up the "8080-File-System"
Can mean pack is not in format for 8080 loading, home blocks
bombed, bad drive or pack
XXX=003 Disk error encounterd while trying to read a page of
microcode - can mean pack is not in 8080 format, or bad drive or
pack
XXX=004 Microcode did not successfully start running after a BT, MT,
MB, or LB command. This error will occur when an LB is done
before the system microcode is loaded.
XXX=010 Disk error encountered while trying to read PRE-BOOT
YYY are the lower 8 bits of the 8080 address of the failing
"Channel Command List" operation. Normally it is here
a good bet to do an "EI" to get the contents of the
RH11 register that has the error-bits set !
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 72
KS10 PROCESSOR CONSOLE INFORMATION
Magtape:
The following ERROR-messages can point to the following problem areas:
Magtape is no magtape at all
Wrong unit selected (see MS-command)
Magtape is not bootable (no microcode, no PRE-BOOT)
XXX=001 Error trying to read microcode first page
Can mean wrong unit selected, wrong RHBASE address, wrong UBA
selected, wrong slave number, wrong density, bad drive, bad
controller, bad tape, tape in wrong format
XXX=003 Error trying to read additional pages of microcode
XXX=010 Error trying to read in PRE-BOOT program
May occur while doing a skip over the microcode file, or
while reading the PRE-BOOT itself
YYY see above (disk-section)
Error-messages-out-of-PRE-BOOT
PRE-BOOT is loaded from Disk or Magtape (see 8080 commands DS, MS,
BT, BT 1, MT, MT 1)
PRE-BOOT is written onto the disk using "SMFILE.EXE", it also is written on
"standard" Diagnostic-tapes and onto the "MONITOR-INSTALLATION"-tapes.
PRE-BOOT is loaded by the 8080 into MEMORY-locations 1000 and up, and starts
at 1000. The ERROR-halts are:
1001 found "bad" core-transfer address
(page 1 is illegal - can't overload PRE-BOOT)
1003 No RH11 Base Address
1004 Magtape Skip failure
1002 Disk Retry error or Magtape Read error
At ERROR-halt time the following MEMORY-Locations contain the useful INFO :
Disk-Booting Magtape-Booting
------------ ---------------
100 "8080" disk-address Not used
101 Memory transfer address same
102 T3, selection pickup pointer same
103 RPCS1-register MTCS1-register
104 RPCS2-register MTCS2-register
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 73
KS10 PROCESSOR CONSOLE INFORMATION
105 RPDS - register MTDS - register
106 RPER1-register MTER1-register
107 RPER2-register (RP06 only) Not used
110 RPER3-register Not used
111 UBA Page RAM loc 0 same
112 UBA-status register same
113 Version Nr. of PRE-BOOT same
Note: The Version Nr. of PRE-BOOT will be the same as the Version Nr.
of SMFILE. The "8080" disk-address is in the form " CYL SEC SURF "
THEREBY IT WILL BE POSSIBLE TO ASK A CUSTOMER WITH A PRE-BOOT FAILURE,
TO DO AN :
EM 77
EN,RP
...... AND TYPE SOMETHING AFTER ADDRESS 115
...... AND THEN TELL US WHAT HE SEES
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 74
KS10 PROCESSOR CONSOLE INFORMATION
8080-Communication-Area (KS10 Memory)
-------------------------------------
The 8080 maintains and services an in-core communication area.
Currently used are words 31 to 40. See PROKS.MAC for more info.
Word Nr. Meaning
---- --- -------
31 Keep Alive and Status word
32 KS-10 CTY input word (from 8080)
33 KS-10 CTY output word (to 8080)
34 KS-10 KLINIK user input word (from 8080)
35 KS-10 KLINIK user output word (to 8080)
36 BOOT RH-11 Base Address
37 BOOT Drive Number
40 Magtape Boot Format and Slave Number
Word 31 Keep Alive and Status word
---- --
Bit 4 Reload Request
Bit 5 Keep Alive active
Bit 6 KLINIK active
Bit 7 PARITY Error detect enabled
Bit 8 CRAM Parity Error detect enabled
Bit 9 DRAM Parity Error detect enabled
Bit 10 CACHE enabled
Bit 11 1 msec enabled
Bit 12 TRAPS enabled
Bit 20-27 Keep Alive counter field
Bit 32 BOOT SWITCH BOOT
Bit 33 POWER FAIL
BIT 34 Forced RELOAD
BIT 35 Keep Alive failed to change
Word 32 KS-10 CTY input word (from 8080)
---- --
Bits 20-27 0 -- no action, 1 -- CTY character pending
Bits 28-35 CTY-character
Word 33 KS-10 CTY output word (to 8080)
---- --
Bits 20-27 0 -- no action, 1 -- CTY character pending
Bits 28-35 CTY-Character
Word 34 KS-10 KLINIK user input word (from 8080)
---- --
Bits 20-27 0 -- no action, 1 -- KLINIK character,
2 -- KLINIK active, 3 -- KLINIK carrier loss
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 75
KS10 PROCESSOR CONSOLE INFORMATION
Bits 28-35 KLINIK-Character
Word 35 KS-10 KLINIK user output word (to 8080)
---- --
Bits 20-27 0 -- no action, 1 -- KLINIK character, 2 -- Hangup request
Bits 28-35 KLINIK-Character
OUTPUT process KS10 ==> 8080
----------------------------
Load character and flag into 33, set 8080-interrupt, 8080 examines
33 and gets character, clears interrupt, sends character to hardware,
clears 33 and sets KS-10 interrupt.
INPUT process 8080 ==> KS10
---------------------------
8080 gets interrupted "TTY-char available", 8080 gets character and
delivers into input-word (31) with flag(s) and sets KS-10 interrupt.
***NOTE: Additional information on KS10 console commands can be found
in the KS10 MAINTENANCE GUIDE
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 76
CRASH ANALYSIS
CRASH DUMPS
===========
Each time there is a BUGHLT there is an automatic dumping of the
system core image into PS:<SYSTEM>DUMP.EXE. If there is sufficient
room on the DSK the data that was previously in DUMP.EXE will be
copied into DUMP.CPY by SETSPD after the system is reloaded. DUMP.CPY
does not get deleted and you may find several generations of DUMP.CPY.
In the case you have set no auto reload you can dump the crash by hand
by typing /D to the system BOOT> prompt. You can get into BOOT if you
are reloading the system by bringing the system up from the switch
registers rather than hitting <ENABLE> <DISK> on the console. See the
Operators Guide for a discussion of the meaning of the various
switches on the DEC-20.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 77
CRASH ANALYSIS
CRASH ANALYSIS
--------------
First when analyzing software or software/hardware problems be sure
you have the proper tools:
1. A SWSKIT on magtape
2. A full copy of the current release microfiche MONITOR and
EXEC.
3. A MONITOR CALLS REFERENCE MANUAL.
4. A SYSERR manual.
5. A listing of the SYSERR log, especially if hardware is
suspected.
6. A CTY output for BUGHLTs and BUGINFs or other problem
indications, or an accurate reproduction of this information.
7. Any other manuals you may need for reference such as the
proper version Installation Guide, Operators Guide, System
Managers Guide, etc.
8. A TOPS-20.BWR file.
You will need the SWSKIT and perhaps listings of the latest versions
of monitor modules in case the microfiche are not up to date. FILDDT
is on the customers distribution tape.
Be sure you have analysed the SYSERR log. Be sure, also, that you
have looked up the BUGHLT and/or BUGCHKs in question in the listings
(microfiche) and have at least read the comments around them.
Probably tracing down how it got called is a good idea. If you happen
to be without a GLOB (provided on microfiche) you can find the BUGHLT
tag of interest in the monitor as follows:
$GET <SYSTEM>MONITR.EXE
$ST 140
DDT
ILPP3? ; BUGHLT of interest followed by "?"
PAGEM G ; it is defined in PAGEM and is global
Some other useful bits of information. There is a GLOB listing
provided in the microfiche which contains a list of all the global
symbols in the monitor. Most of the symbols are defined in the module
STG.MAC. If you don't know a tag name but want to look at the storage
for DTEs, say, look through STG. STG also contains some small portion
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 78
CRASH ANALYSIS
of code mostly to do with restart, start, auto reload, dispatches for
PI channels and A few scheduler tests. STG stands for storage. Note
that some stuff may be defined in PROLOG, and of course lots of stuff
is defined throughout the monitor. You may also want to get a listing
of MACSYM to be able to understand the macros you see while reading
the monitor listings; MONSYM is also useful at times. Be sure you
know how PARAMS has been changed in case it has. See BUILD.MEM on the
distribution tapes for the currently distributed information on what
to do to change various system parameters in PARAM0.MAC. Be sure that
you know about any variables that the site may have changed in STG as
well.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 79
CRASH ANALYSIS
EXAMINING THE MONITOR
---------------------
Debugging a complex, multi-process software system is largely a matter
of absorbing sufficient knowledge, experience and folklore about the
particular system with a considerable element of personal preference,
or 'taste' also involved. This document is a cursory description of
features built into the system to aid debugging, and such folklore as
can be described in written English.
There are four different versions of DDT that may be used to examine
the monitor. Each is used for a different purpose and has special
capabilites. The versions of DDT are:
1. UDDT (user DDT) used to examine or modify the MONITR.EXE
file.
2. MDDT (monitor DDT) used to examine or modify the running
monitor under timesharing.
3. EDDT (exec DDT) used to examine or modify the running monitor
from the CTY in a stand-alone mode.
4. FILDDT used to examine dumps.
All the DDT's are versions of TOPS-20 DDT documented in the TOPS-20
DDT manual, and have all of the features described in the manual. See
also the document DDT41.MEM.
The use of all four versions of the DDT's is the same and will be
described latter, however, each version is started differently.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 80
CRASH ANALYSIS
UDDT:
----
To use UDDT to modify your MONITR.EXE file on system, you must give
the following EXEC commands:
@GET <SYSTEM>MONITR.EXE
@START 140 or on Release 4 systems, @DDT
This causes EDDT to start in user mode. This is the same DDT that is
used when examining any program. You may now look at or change any
part of the monitor. If you make changes to the monitor and want to
save it, you should get back to the EXEC by typing ^Z. Then you may
save the monitor.
You will probably have to be enabled in order to save the monitor back
in <SYSTEM>. This is the safest, best, and recommended method of
putting patches into the monitor.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 81
CRASH ANALYSIS
MDDT:
----
A version of DDT which runs in monitor space is available. It can
examine and change the running monitor, and can breakpoint code
running as a process but not at PI or scheduler level. When patching
or breakpointing the swappable monitor, the normal write protection
must be defeated, either by setting DBUGSW to 2 on startup, or calling
SWPMWE. If you insert breakpoints with MDDT, remember monitor code is
reentrant and shared so that the breakpoint could be hit by any other
process in the system. In this event, the other process will most
likely crash since it will be executing a JSR to a page full of zeros.
To use MDDT you must have WHEEL or OPERATOR capabilities. You first
issue the EXEC command:
@ENABLE
$^EQUIT
; You are now in the mini-exec and receive a prompt
; of MX>. Now you give the "/" command:
MX>/
; You are now put into MDDT. To return to the EXEC
; you can issue a ^Z or a ^C which produces a
; message like "INTERRUPT AT 17372" and returns you
; to the mini-exec. If you type a ^P in MDDT you
; will get a message, "ABORT", and be returned to
; the mini-exec. If you once go into the mini-exec
; the CONTROL-P interrupt is enabled and typing this
; character will return you to the mini-exec. This
; is a good thing to use when debugging programs
; that do CONTROL-C trapping. From the mini-exec
; you may give either:
MX>S
; or
MX>E
; The S is filled out as START and the E as EXEC.
; both of these commands will return you to the
; EXEC. See the document EXEC-DEBUGGING.MEM for more
; about ^P and getting out of the EXEC to MX> and
; returning from MX> to either your copy of the EXEC
; or the system EXEC.
; You may also give the command:
MRETN$G
; From MDDT to return directly to the EXEC. While
; in MDDT you may examine any core location in the
; running monitor. You may also change any location
; in the resident monitor (done frequently by
; accident). If you wish to change any of the
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 82
CRASH ANALYSIS
; locations in the swappable monitor you must give
; the command:
CALL SWPMWE$X
; To write enable the monitor. After you have made
; your changes you must give the command:
CALL SWPMWP$X
; to write protect the monitor again.
MDDT may also be entered from process level via JSYS:
JSYS 777$X
or
MDDT%$X ; will enter MDDT from the context of the current process
If you wish to examine the system from the EXECs inferior fork monitor
context:
@ENA
$SDDT
DDT
JSYS 777$X
MDDT
To return to user context:
MRETN$G
Use SETMPG to map pages to this context:
page 677 has been traditionally used for this;
but any unused page may be used. To make sure that the page
is currently unused type:
ADDRESS/ ? ; the question mark from DDT indicates that the
; page is nonexistent.
when the destination page has been found, set up AC2 as:
AC2/ ACCESS,,677000
If the page has its own SPT slot:
AC1/SPT INDEX
If the source page does not have its own SPT slot, it will belong to
either a file or process page table. It will be represented as an
index into this page table:
AC1/ SPT INDEX OF PAGE TABLE,,INDEX INTO PAGE TABLE
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 83
CRASH ANALYSIS
Access = read or/and or write access
Read/Write access = 140000 in LH
Therefore, to map a page, call with either:
AC1/SPT INDEX OF PAGE
AC2/140000,,677000
or
AC1/SPT INDEX OF PAGETABLE,,INDEX INTO PAGE TABLE
AC2/140000,,677000
AND SAY:
CALL SETMPG$X
The page will then be mapped to page 677. In examining locations
677000-677777, you will be looking at the contents of the page.
If you desire to map another page into this slot, merely call SETMPG
again with arguments for the new page. You need not first un-map the
old page. However, when you are finished, page 677 should be
un-mapped in the following manner:
AC1/0
AC2/ACCESS,,677000
CALL SETMPG$X
WARNING:
Calling SETMPG incorrectly can crash the system. Be CAREFUL! Do not
use SETMPG on a time sharing system if a crash will cause bad
feelings.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 84
CRASH ANALYSIS
EDDT:
----
NOTE
Not to be confused with ^EEDDT command
to get into UDDT used with the command
processor. See separate document on
EXEC DEBUGGING for that.
To get into EDDT you must bring the system up using the
switch-register. See the DECSYSTEM-20 Operators Guide for a
discussion of switches. Go through the KLINIT dialog and when you get
the prompt BOOT>, respond with:
BOOT>/L
BOOT>/G141
The "/L" command causes the monitor to be loaded, but not started.
The "/G141" starts the monitor at location 141, which is a jump to
EDDT. You can use EDDT like UDDT under timesharing on the MONITR.EXE
file by giving the following commands:
$GET <SYSTEM>MONITR.EXE
$START 140
EDDT is linked into the monitor and is always there. You may also get
to EDDT from MDDT by issuing the following:
EDDT$G
from MDDT. This stops timesharing. To resume timesharing and /or get
back to MDDT give the command:
MDDT$G ; back to MDDT
MRETN$G ; back to normal timesharing
Breakpoints may be inserted in the resident monitor with EDDT, but not
in the swappable monitor in general, because its pages may be swapped
out and be unavailable to EDDT. You can bring them in by typing:
SKIP LOC$X ; where LOC is some address not in core
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 85
CRASH ANALYSIS
There are some locations in the monitor that are very useful when
using EDDT for debugging. They must be set before going on to start
the monitor.
They are:
EDDTF 1 keep EDDT in core when system comes up
0 delete DDT when system comes up (default)
DBUGSW 0 do not stop on BUGHLTs, crash and reload
1 stop on BUGHLTs (hit EDDT breakpoint)
2 write enable swappable monitor,
do not start up SYSJOB, and stop on
BUGHLTs. Also it dosn't run CHECKD
automatically on startup.
DCHKSW 0 do not stop on BUGCHKs (default)
1 stop on BUGCHKs (hit EDDT breakpoint)
DINFSW 0 do not stop on BUGINFs (default)
1 stop on BUGINFs (hit EDDT breakpoint)
In addition the symbol GOTSWM appears in the code just after the
swappable monitor is loaded. So, if you want to debug the swappable
part of the monitor you must put a breakpoint at GOTSWM (to get
swappable part in core) by,
GOTSWM$B
Then start the MONITOR by,
147$G
CALL SWPMLK$X
CALL SWPMLK is used to lock swappable monitor in core for debugging.
You must have more than 96k of core to give this command since the
resident and swappable monitor are larger than 96k. To start up the
monitor after you have gone into EDDT and set up your breakpoints
(remember the last two are used for BUGHLT and BUGCHK) give the
command:
147$G
or
SYSGO1$G
If you are in EDDT and DBUGSW is not 2, that is the monitor is write
protected, you can use the routines SWPMWE and SWPMWP to write enable
and write protect the monitor. CALL SWPMWE$X in DDT.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 86
CRASH ANALYSIS
FILDDT:
------
FILDDT is distributed on the customer software tape.
The following is an chewed-up FILDDT.HLP file.
GET(FILE) FILE-SPEC
Loads a file for DDT to examine. If you are looking at a monitor dump
you must load DUMP.CPY explicitly. FILDDT looks for MUMBLE.EXE not
MUMBLE.CPY that is DUMP<ESC> will tell you that there is no such file
or will load DUMP.EXE. When looking at a dump and you wish to load
the symbols you must first issue the load command followed by the get
command. Be sure that the file from which you get the symbols is the
same version as the dump. Be sure, also that the monitor that was
dumped is the same monitor you use for symbols. That is don't get
MONMED symbols to use with MONBCH etc.
LOAD (SYMBOLS FROM) FILE SPEC
Reads specified file and builds internal symbol table. This must be
the first command to FILDDT before "GET" when looking at a dump. You
will most probably use <SYSTEM>MONITR.EXE which would have been the
monitor running at the time of the dump.
EXIT (FROM FILDDT)
Returns to command level. You then may type a save command if a load
command was just done to preload symbols. You will get a version of
FILDDT that has the symbols you just loaded in it so you no longer
need to "LOAD" symbols. You now have a monitor specific FILDDT, which
was common practice for TOPS-10, but is not generally done for
TOPS-20.
HELP
Types something like this text.
ENABLE PATCHING
Allows writing on an existing file specified by a GET.
ENABLE DATA-FILE
Assumes file is raw binary (i.e. no ACs, and not an EXE file).
DDT FEATURES:
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 87
CRASH ANALYSIS
EP$U Sets monitor context for FILDDT mapping. EP is a symbol
which is equal to the page number of the EPT. (Rel 4)
<CTRL/E> Returns to FILDDT command level.
TRACKING DOWN UNMAPPED ADDRESSES:
The resident monitor may be looked at without any difficulties, but
the swappable monitor may not be in core at the time of the dump. If
the value of the symbol is in the swappable monitor you must sometimes
go through the monitor map to find where the location really is. The
location MONCOR contains the number of pages of resident monitor and
the location SWPCP0 contains the first page of real core for swapping.
So if the value of the symbol is greater than contents of MONCOR times
1000 then it is in swappable monitor.
If the page of the swappable monitor you want to look at is in core it
will probably not be in core in the location that it's address refer
to since the dump is of core and relocation of pages does not happen.
To find where a symbol really is in the dump, first type the symbol
followed by an "=". DDT will respond with the value of this symbol.
The value of the symbol can be divided into two, three octal digit,
fields. The high order three digits are the page number and the low
order three digits are the offset into the page.
If the value of the symbol is 324621 the high order three digits, 324,
are the page number and the low order three digits, 621, are the
offset into the page. To find the location of the page in question in
the dump you must look at the monitor map indexed by the page number.
For example:
MMAP+324/
would give you the monitor map word for page 324. This word contains
some protection bits for the page and the address of the page when the
dump was taken.
The page may have been in core, on the swapping area or on the disk at
the time of the dump.
If bits 14-17 in the monitor map word are non-zero the page
was on the swapping area or disk and is no longer available.
If bits 14-17 are zero then the page was in core, and the right half
of the word contains the page number in the dump of the page you are
looking for (the dump program overwrites the last several pages of
memory, the dump therefore does not contain these last pages.)
If the page was in core the new address of the symbol you are looking
for can be found by using the page number from the monitor map word
and appending the offset into the page to it. For example if MMAP+324
contains 104000,,256; then the new address of our symbol would be
256621.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 88
CRASH ANALYSIS
All address in the swappable monitor must be resolved in this manner.
In addition address of 600000 and above are in the JSB or PSB (PSB is
page 777) and must be resolved by finding the page containing the JSB
or PSB of the process that was running when the dump occured. There
are some locations and tables in the monitor that make this easy:
NAME INDEX DESCRIPTION
FORKX none Number of the fork that was running at the time of
the dump, -1 if in the scheduler.
JOBNO In PSB Job number to which current fork belongs.
FKJOB Fork # Job number,,SPT index of JSB
JOBDIR Job # logged in directory number
JOBPT Job # controlling TTY number,,top fork number
FKSTAT Fork # test data,,address of fork wait routine
FKPGS Fork # SPT index of page table,,SPT index of PSB
SPT indexes are indexes into a share pointer table starting at SPT.
To find the PSB of fork 20, you first look at FKPGS+20. If this
location contains 425,,426, the word at SPT+426 is the pointer to the
PSB. This pointer can point to disk, swap area, or a page in the
dump. If bits 14-17 are zero it is a pointer to a page in the dump
and the right half of the SPT word is the page number of the PSB in
the dump.
When you look at a dump, you should first try to find why the dump
occured by looking at the location BUGHLT. If BUGHLT is zero then you
should check the CTY log to find out why the dump was taken and for
information like the PC at the time of the dump and the status of the
PI system. If BUGHLT is non-zero it is the address of where the
BUGHLT was issued. You should look up the BUGHLT in BUGSTRINGS.TXT or
BUGS.MAC to find additional information about the BUGHLT. If at this
point you are not sure as to why the BUGHLT occured, you will have to
look at the listings for more information. A copy of BUGSTRINGS.TXT
is in Appendix A of the Operators manual. You can find the location
of the call to the BUGHLT by typing the BUGHLT tag to DDT followed by
a "?". DDT will tell which monitor module the BUGHLT is in and you
can go to your microfiche and read all about the conditions
precipitating the BUGHLT.
Next if necessary look at FORKX. If it contains a -1 the scheduler
was running; otherwise it is the number of the fork that was running
when the crash occurred. The registers are saved at BUGACS on a
BUGHLT, but if BUGACS+17 contains something,,BUGPDL+n, then the
registers are invalid and you must go to the SYSERR buffer to get the
good registers. This is done by adding to the right half of the
SYSERR buffer pointer, SEBQOU, the offset into the buffer for the
heading and ACs, SEBDAT+BG%ACS. This value points to a 16 block of
words containing the users ACs. You may have to chain down more than
one queued-up SYSERR entry to get to the BUGHLT block.
NOTE
Do not forget to get a print out of the
SYSERR log which will give you and the
field service representative much of the
information you can get out of the dump.
The SYSERR output is much easier to
examine, however, clearly you cannot get
as much info as you can from a dump.
Some other locations in the PSB of interest are:
LOCATION DESCRIPTION
UAC User's ACs when he did his last JSYS.
PAC monitors ACs
PPC processors PC
UPDL users pushdown stack while in a JSYS
NSKED 0 = ok to run scheduler
>0 = cannot run scheduler
INTDF -1 = ok to receive software interrupts
>= 0 , cannot receive software interrupts
It may be useful to know the status of a fork when it is hung or you
are unsure of its status. This can be determined by looking at FKSTAT
indexed by the fork number. The right half of this location is the
address of a test routine and the left half is data to be tested. For
example if FKSTAT+12 contains 23,,FKWAT, then fork 12 is waiting for
fork 23 to complete. FKWAT is a routine that waits for another fork
to complete and its data (the left half of the word) is the number of
the fork it is waiting for. There are many different wait routines
and you will have to look at the code to see what individual ones are
waiting for. There is a memo on scheduler tests which details most
all of the scheduler tests in the monitor.
You can easily determine all of the forks associated with a job by
giving the commands:
-1,,0$M
FKJOB<FKJOB+NFKS>N,,0$W
Where N is the job you are looking for. A fork structure can usually
be determined by looking at the FKSTAT of the forks and seeing which
forks are waiting on which forks. A FKSTAT of FKSKP indicates a fork
is inactive.
You should refer to STG.MAC for other fork and job tables and other
locations in the PSB and JSB of interest. All of the above locations
can be examined with MDDT or EDDT while the monitor is running. Of
course at these times you do not have to go through MMAP and the PSB
and JSB that are in core are your own.
There are two separate patch areas in the monitor (FFF and SWPF). FFF
is the resident patch area and SWPF is the swapable patch area. These
two symbols should be updated to point to the next free location in
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 90
CRASH ANALYSIS
the patch area when a patch is inserted. PAT.. is defined to be
equal to SWPF. By convention, all distributed patches are applied at
FFF. This serves the purposes of reducing confusion, always working
until the patch area is exhausted, and leaving patches always present
in a dump for the cases where that is important.
There are several general purpose routines that can be used to look at
the the monitor while it is running. These routines should be used
with caution since it is certainly possible to crash the monitor by
using them incorrectly. Two of the more general routines are MAPDIR,
for mapping a directory into core, and SETMPG for mapping pages
(someone elses PSB or JSB) into core. You will have to look at the
listing for the exact use of these and other general routines. Beware
of the precautions that should be taken when using them. You can find
the module they are located in by looking in the GLOB listing which is
a cross reference listing of all the global symbols in the monitor.
You get a GLOB listing in your microfiche.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 91
CRASH ANALYSIS
BUGHLT, BUGCHK, BUGINF
------ ------ ------
The monitor contains a considerable number of internal redundancy
checks which generally serve to prevent unexpected hardware or
software failures from cascading into severely destructive reactions.
Also, by detecting failures early, they tend to expedite the
correction of errors.
There are two failure routines, BUGCHK and BUGHLT for lesser and
greater severity of failures. Calls to them with JSR are included in
code by use of a macro which records the locations and a text string
describing the failure. The general form is:
BUG (TYPE,NAME,<STRING>)
Where type is HLT or CHK, and string describes the cause.
For example,
BUG(HLT,SKDPFL,<PAGE FAULT FROM SCHEDULER CONTEXT>)
The strings are constructed during loading and are dumped into a file.
The BUGSTRINGS.TXT file will produce an ordered listing of the bug
messages for operator or programmer use.
BUGCHK is used where the inconsistency detected is probably not fatal
to the system or to the job being run, or which can probably be
corrected automatically.
Typical is the sequence in MRETN in the SCHED module.
AOSGE INTDF
BUG(HLT,IDFOD2,<AT MRETN - INTDF OVERLY DECREMENTED>)
This BUGCHK is included strictly as a debugging aid. Detection of a
failure takes no corrective action. This situation usually results
from executing one or more excessive OKINT operations (not balanced by
a preceding NOINT). It is considered a problem because a NOINT
executed when INTDF has been overly decremented will not inhibit
interrupts and will not protect code changing sensitive data.
BUGHLT is used where the failure detected is likely to preclude
further proper operation of the system or file storage might be
jeopardized by attempted further operation. For example, the
following appears in the SCHED module:
MOVE 1,TODCLK ;CURRENT TIME
CAML 1,CHKTIM ;TIME AT WHICH JOB0 OVERDUE
BUG(HLT,J0NRUN,<JOB 0 NOT RUN FOR TOO LONG>)
This check accomplishes two things:
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 92
CRASH ANALYSIS
1. A function of JOB0 is to periodically update the disk
version of bittables, file directories and other
files. Absence of this function would make the system
vulnerable to considerable loss of information on a
crash which loses core and swapping storage. JOB 0
protects itself against various types of malfunction,
this BUGHLT detects any failure resulting in a hangup.
2. Detects if the entire system has become hung due to
failure of the swapping device or some such event, on
the basis that if JOB 0 isn't running, nobody's
running.
NOTE
For Release 4, the program form the
BUGxxx calls takes has been modified,
and the new file BUGS.MAC contains
hopefully useful information on each of
the BUGxxx calls in one place. This
should be considered a required
debugging file.
DBUGSW:
A monitor cell, DBUGSW, controls the behavior of BUGHLT and BUGCHK
when they are called. DBUGSW is set according to whether the system
is attended by system programmers.
If C(DBUGSW)=0, the system is not attended by system programmers, so
all automatic crash handling is invoked. BUGCHK will return +1
immediately, appearing effectively as NOP. BUGHLT will, if called
from the scheduler or at PI level, invoke a total reload from the disk
and a restart of the system. The BUGCHK/INF output will appear on the
CTY and in the SYSERR log when JOB0 gets around to them.
If the system continues to run or is restarted properly, the location
of the bug (saved over a reload) and its message will be reported on
the CTY.
If C(DBUGSW).NEQ.0, the system is attended, and one of the EDDT
breakpoints will be hit. This allows the programmer to look for the
bug and/or possibly correct the difficulty and proceed. There are two
defined non-zero settings of DBUGSW, 1 and 2, which have the following
distinction.
C(DBUGSW) = 1
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 93
CRASH ANALYSIS
Operation is the same as with 0 except for breakpoint
action. In particular the swappable monitor is write
protected and SYSJOB is started at startup as
described.
C(DBUGSW) = 2
Is used for actual system debugging. the swappable
monitor is not write protected so it may conveniently
be patched or breakpointed, and the SYSJOB operation
is not started to save time.
BUGCHK and BUGHLT procedures are the same as for 1.
The following is a summary of DBUGSW settings:
0 1 2
MEANING Unattended Attended Debugging
BUGCHK action NOP Hit Breakpoint Hit Breakpoint
BUGHLT action Crash System Hit Breakpoint Hit Breakpoint
SWPMON write protect? Yes Yes No
CHECKD on startup Yes Yes No
Other console functions:
In addition to EDDT, several other entry points are defined as
absolute addresses. The machine may be started at these as
appropriate.
140 JRST EDDT ; go to EDDT
141 JRST SYSDDT ; reset and go to EDDT
142 JRST EDDT ; copy of EDDT address
143 JRST SYSLOD ; initialize file system
144 0
145 JRST SYSRST ; restart
146 JRST SYSGOX ; reload and start
147 JRST SYSGO1 ; start
The soft restart (address 145, EVRST) restarts all I/O devices, but
leaves the system tables intact. If it is successful, all jobs and
all (or all but 1) process will continue in their previous state
without interruption. This may be used if an I/O device has
malfunctioned and not recovered properly. The total restart
initializes core, swapping storage and all monitor tables.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 94
CRASH ANALYSIS
A very limited set of control functions for debugging purposes has
been built into the scheduler. To invoke a function, the appropriate
bit or bits are set into location 20 via MDDT. The word is scanned
from left to right (JFFO). The first 1 bit found will select the
function.
BIT 0:
Causes scheduler to dismiss current process if any and stall
(execute a JRST .), with -1 in AC0. Useful to effect a clean
manual transfer to EDDT. System may be resumed at SCHED0.
BIT 1:
Causes the job specified by data switch bits 18-35 to be run
exclusively. Temporarily defeats JOB 0 not run BUGHLT.
BIT 2:
Forces running of JOB 0 backup function before halting the
system.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 95
INTRODUCTION
1.0 INTRODUCTION
The purpose of this article is to provide some basic guidelines for
those who have never analyzed a TOPS-20 crash dump. The
information contained in this article refers exclusively to Version
4 of the TOPS-20 Monitor, although most of the basic principles
will also apply to earlier versions of the Monitor. None of the
concepts included in this article can be considered highly
advanced, indeed it is doubtful that there exists an "advanced"
methodology in crash dump analysis. Such techniques are the result
of nothing more than the continual exercise of the basic skills.
In all cases, the person who is to perform the analysis must be
familiar with the internal structures of the Monitor, which
requires their attendance at one of the TOPS-20 Monitor courses
offered by Educational Services. Obviously, one must know where to
look for a potential problem before hoping to solve it. For this
reason, this article assumes that the reader has an in-depth
knowledge of the basic structures of the TOPS-20 Monitor. Any
comments or sugestions to improve the content of this material
would be most welcome.
2.0 NECESSARY PREPARATIONS
Obviously enough, dumps do not simply appear as a result of a
crash. There are certain prerequisites to obtaining a dump, which
will be discussed in this section.
2.1 Creating The Dump File
TOPS-20 will not, as a rule, create a dump of the Monitor unless
the system is properly prepared to do so. This means that there
must first exist a file called PS:<SYSTEM>DUMP.EXE that will
accomodate the dump. This file can be found on the distribution
tape for TOPS-20, or it can be created by using the MAKDMP program,
which will accept the memory size from the user, and create the
proper sized file. The file must contain a sufficient number of
pages equal to the total number of pages of physical memory in the
Decsystem-20. For example, a system that has 1024K words of
memory, let's say a 2060, should have a DUMP.EXE file that is 2048
pages long. It is important to remember that the umber of pages in
the dump file must be twice the size of the machine's memory
capacity in K words. In addition, unless this file already exists
before the crash that we wish to capture, we will be unable to save
the image of the system, because the BOOT program hasn't the
ability to create such a file on it's own.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 96
NECESSARY PREPARATIONS
2.2 The BOOT
Normally, when the system has crashed for whatever reason, it will
reload itself using the BOOT program. This Auto-reload feature can
be suppressed, by giving the "SET NOT RELOAD" or "CLEAR RELOAD"
command to the PARSER. The PARSER must first be set in PROGRAMMER
mode, via the "SET CONSOLE PROGRAMMER command. These commands do
not apply to 2020's, of course. There is a location in the 8080
which, when it contains the right number, will prevent automatic
reloads after crashes. The location depends on the revision level
of the ROM, which is typed at system startup. The following
commands will turn off auto-reload:
ROM level 0.1
KS10>LK 20255
KS10>DK 303
ROM level 4.2
KS10>LK 20256
KS10>DK 303
Also, patching the BUGHLT code where the reload is requested will
prevent an auto-reload. Placing a JFCL in locations BUGH2+3 and
BUGH2+4 in the running monitor will prevent the monitor from
issuing its request.
BOOT has a limited file system capability when creating the file to
contain the dump, and in this manner avoids complicating a possibly
compromised file structure during the reload. It is for this
reason that the DUMP.EXE file must already exist on the public
structure, for BOOT can find it there, but it can not create it if
it does not already exist. Also, because BOOT resides in main
memory of the host (KL10 or KS10) processor, small portions of the
Monitor will be overwritten when BOOT is loaded into memory.
Currently, BOOT is written into that area of the resident Monitor
that normally contains pure code, and as such is not usually of
much consequence. When one needs to refer to this portion of the
code, either the listings or fiche should be used.
If for some reason the system fails to auto-reload, then it is
still possible to obtain a copy of the dump. To do this, the front
end must have at least loaded the BOOT program, and the console
will display the BOOT prompt:
BOOT>
BOOT has a number of commands that may be used to manipulate the
contents of the processor memory; in this case, the command we
will use will cause BOOT to copy the contents of memory into
PS:<SYSTEM>DUMP.EXE:
BOOT>/D
BOOT>
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 97
NECESSARY PREPARATIONS
At this point the system may be brought up normally, and the
analysis of the dump may begin.
Similarly, a KL-10 system may be set to suppress the auto-reload
facility, and the CTY will prompt with the KLI> prompt. Simply
typing the word "BOOT" will load the BOOT program into memory.
There are cases where the system may be completely hung, and it is
unclear how to best initiate an orderly shutdown. Obviously, it is
always possible to type the control-backslash (^\) character at the
CTY to get into the front-end parser, but then what can be done?
The front-end parser allows the operator to force the processor to
jump to a specified location, and in the case described above, this
feature may be used to force a BUGHLT. This can be done after
typing ^\, with the following commands:
PAR>SET CONSOLE PROGRAMMER
CONSOLE MODE: PROGRAMMER
PAR>JUMP 71
PAR>
causing the console to return to USER mode, connected to the KL-10.
This will be followed immediately by a KPALVH BUGHLT (Keep Alive
Halt), and the system will perform the usual BUGHLT procedures.
The above command forces the processor to jump to location 71,
which in turn will cause the BUGHLT, sweeping the cache to ensure
all of the dump taken will contain valid data. Simply forcing the
processor to halt, and then reBOOTing and getting a dump will cause
the cache to be invalidated, and random locations in the dump will
not contain valid data.
On the 2020 the equivalent command is "KS10>ST 71".
2.3 Getting A Front-end Dump
The front-end will generally create a crash dump file called
PS:<SYSTEM>0DUMP11.BIN, containing the core image of the PDP-11.
If the front-end is hung, and none of the terminals are usuable, it
is still possible to obtain a dump of the -11. By setting the
HALT/ENABLE switch of the -11 to the HALT position, and then back
to the ENABLE position, the KL-10 will force the -11 to reload. In
the process of reloading the -11, the KL will indicate to the -11
that it has reloaded, and send the necessary information to set up
the terminals, and unit record devices connected to the -11. The
-11 will, in the process of reloading, dump the old core image into
the 0DUMP11.BIN file mentioned earlier. In the event that the
problem will be the subject of an SPR, the front-end crash dump
should also be included on the DUMPER tape with the SPR.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 98
GENERAL INFORMATION
3.0 GENERAL INFORMATION
It would not be practical to define a method of approaching each
BUGHLT in the system, but the state of the system at the time of
the crash may be defined in terms of the data structures that it
accesses. By looking at the Monitor's stack, the status of the
current job, and process, and the condition of the Monitor's tables
that were in use by the code that BUGHLTed, we can define a limited
number of "types" of crashes, e.g., a scheduler crash, a pager
crash, an APR or device interrupt crash. Each crash will occur
while the Monitor is using a specific subset of the internal data
structures of the system. We will attempt to limit the number of
"types" of crashes based upon the function being performed by the
Monitor at the time of the crash. In the sections following this
general information, we will suggest some of the areas to check
when looking at each type of crash. This information is not
complete, but contains some of the information that is more
significant in each particular context.
3.1 The Basic Materials
The most important materials in looking at dumps are the source
listings of the Monitor. Either in the form of fiche, or in
machine-readable format, it is absolutely essential to have access
to listings of the Monitor to be able to analyze any dump, because
without these listings you would simply be working in darkness. In
order to understand the significance of any BUGHLT, the
circumstances of the BUGHLT must be known, as well as the reason
the Monitor could not continue. To find out this information, we
must look in the listings. After the system has re-BOOTed, it is
always a good idea to take note of the console output, including
the name of the BUGHLT, and any other associated console output.
Try to be sure that no unusual messages, other than the BUGHLT
itself, appeared on the console within a reasonable period of time
before the system crashed. BUGCHK's, BUGINF's, and "Problem on
device..." type messages are always significant. Similarly, a copy
of the output from the SYSERR program will be helpful in revealing
any failing hardware that should be investigated first. Always try
to eliminate the possibilty of a hardware problem FIRST, especially
if the site has had any recent problems in this area. These last
two points are significant in determining the environment at the
time of the crash, and, in the event that the dump will be made
part of an SPR, the information will become essential.
Naturally, it will be necessary to have a copy of the MONITR.EXE
file that was running when the crash occurred, and a copy of FILDDT
to look at the dump. With these materials collected, we can
hopefully make a valid analysis of the dump.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 99
GENERAL INFORMATION
Here is a list, then, of the necessary and helpful materials needed
to look at dumps:
1. The MONITR.EXE file
2. The DUMP.CPY file from the crash
3. A copy of FILDDT.EXE from the distribution tape
4. A copy of the SYSERR output
5. A complete set of Monitor and Exec Fiche or listings
6. The CTY output from the crash
7. The Monitor Calls Reference Manual
8. A copy of the SWSKIT tape
9. Any other TOPS-20 Manuals that may be appropriate, such as the
Operator's Guide, or the Installation Guide.
10. The TOPS20.BWR file
3.2 Identifying The Type Of Crash
The Monitor performs several basic operations, each of which has
its own set of tables and data structures. These operations can be
defined as:
1. JSYS processing
2. Page faults
3. PSI Service
4. Scheduling
5. DTE interrupt Service
6. Initiating I/O transfers (queueing)
7. Device interrupt Service
8. APR interrupt Service
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 100
GENERAL INFORMATION
3.2.1 The BUGHLT Itself -
There are specific areas in any crash dump that can be examined to
determine the status and context of the system at the time of the
crash. The most obvious of these is the location called BUGHLT,
which will contain the address whence the BUGHLT code was called.
It is good practice to remember when looking at this address that
there are portions of the monitor that were overwritten by the BOOT
program, when the dump was taken, and therefore, the contents of
the address that called the BUGHLT code, that is, the location
whose address is contained in location "BUGHLT", may not point to
the same code that the fiche or the listings indicate. A good
example of such a BUGHLT is a PTNIC1, one that is a part of the
APRSRV code, which is overwritten by BOOT.
As of Release 4, all of the BUGHLT's, as well as the BUGCHK's and
BUGINF's in the Monitor are defined and documented in a new module
called BUGS.MAC. This module not only contains, for each BUGHLT,
etc., the name and a string describing the type of halt, but also a
description of the circumstances that cause the halt, or check,
etc., to occur. There is a new argument to the macro that creates
the BUGHLT's, etc., that is supposed to indicate whether the
problem is hardware or software related. You will find either the
word "HARD" or "SOFT" in this location of the Macro call. In
addition, the additional information supplied in BUGCHK's and
BUGINF's now has a string associated with it that indicates what
the additional information actually represents. Finally, one
argument to the BUGDEF (bug definition) Macro is a narrative
documentation of circumstances that can cause the problem being
seen. Needless to say, this sort of information is invaluable to
anyone looking at a crash dump. Unfortunately, not all of the
documentation of the BUG's was completed, and as a result, many are
indicated as being "HARD" problems, when actually they are not.
Those BUGDEF's that include the narrative description of the BUG
have been completed, but those that do not may indicate falsely
that the problem is hardware related.
The BUGHLT's are performed by using the XCT instruction of a
location that contains a JSR BUGHLT instruction. In the location
following the JSR BUGHLT, is the name of the BUGHLT, in SIXBIT
format, such as "PTNIC1". Finally in the event of multiple
BUGCHK's, BUGINF's or even nested BUGHLT's, the location "BUGNUM"
contains the number of BUGHLT's, BUGCHK's, and BUGINF's since the
last system start-up. This location is most helpful in obtaining a
clearer view of the circumstances of the crash. The case of the
BUGHLT code itself causing a BUGHLT is extremely unusual, but in
certain cases of extreme degradation of the system's data bases or
"pure" code pages, this is a possibility.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 101
GENERAL INFORMATION
3.2.2 The Monitor's Stacks -
The next piece of valuable information is contained in the stack
pointer, P. This location will point to one of eight possible
monitor stacks, and will give a strong indication about the context
of the monitor at the time of the crash. Identifying the type of
BUGHLT will usually be a direct indication of which stack will be
in use, however under certain circumstances, the monitor may crash
while changing from one stack to another, and such a circumstance
could provide a useful insight into the state of the system just
before the crash. The following are the names of the eight
possible monitor stacks, and the context under which each of them
is used:
UPDL This is the user stack, in that it is used when
processing a a user's JSYS in exec mode. Whenever any
user executes a JSYS, this area in his PSB is used for
the stack. Those processes under job 0 which run in exec
mode will also use this stack.
TRAPSK This stack is used by the paging code whenever a process
page faults. Normally a page fault will occur while in
the midst of performing some other function, such as a
JSYS, and the stack pointer at the time of the page fault
will be in location TRAPAP, which in turn will in this
case point to UPDL plus some offset.
PIPDB This is used by the software interrupt handler.
SKDPDL This stack is used by the scheduler.
DTESTK This stack is used by the DTE interrupt service routines.
PHYPDL This stack is used by PHYSIO code in the process of
queing I/O request blocks (IORB's). These IORB's are the
means by which RH20/RH11 data transfers are initiated.
PHYIPD This stack is used by the PHYSIO interrupt service
routines, and therefore is the interrupt-level equivalent
of PHYPDL. It is important to remember that these two
stacks are independent of each other, and should not be
confused.
MEMPP This stack is used when processing APR interrupts
The stack that is being used, and the section of code that
executed the BUGHLT will indicate the type of BUGHLT that has
occurred, file system BUGHLT's will be observed either while
performing a JSYS, servicing an interrupt, or otherwise attempting
to access a file system that has corrupted to the point of being
unusable.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 102
JSYS CONTEXT (UPDL)
4.0 JSYS CONTEXT (UPDL)
When a process executes a JSYS, the Monitor performs the JSYS
by dispatching through a table called JSTAB to the proper routine.
These routines are named by convention as the JSYS name, preceded
by a ".", thus the routine to perform the JSYS PMAP is called
".PMAP::". This name is always a global symbol. The last JSYS
executed in user context is saved in the PSB for the process, in
location KIMUU1, and KIMUU1+1. The second of these locations will
contain the dispatch offset in JSTAB; this number, when combined
with the JSYS opcode (104000,,0), is the last JSYS performed by the
user. This, then, will point indirectly through the JSTAB table to
the place where the user JSYS began processing. By following the
code, and examining the stack, it is often possible to reconstruct
the events leading to the crash. The stack will contain two copies
of the user's program counter (PC) and flags in the first four
locations of UPDL. The PSB location MPP will contain the stack
pointer at the time of last JSYS, and each time the Monitor
performs a JSYS internally, this data is pushed onto the stack, and
set to the current value of P.
Initial JSYS stack set-up:
UPDL/ PC
UPDL+1/ flags
UPDL+2/ PC
UPDL+3/ flags
JSYS in Monitor context (nested JSYS):
UPDL+n/ INTDF ;old interrupts-deferred flag
/ MPP ;previous PC, or level of nesting
/ PC of JSYS
/ PC flags
Some other useful locations in JSYS context are:
JSB Locations
USRNAM This contains the name of the user, in ASCII.
PSB Locations
JOBNO Contains the number of the job for this process.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 103
JSYS CONTEXT (UPDL)
FORKN Contains the fork number for the top fork of the job in
the left half of the word, and the fork number of the
current fork in the right.
INTDF Contains -1 if process is OKINT, 0 or greater if NOINT
(defer all software interrupts for this job)
NSKED Contains 0 if process is OKSKED, 1 or greater if NOSKED.
(defer scheduling of other forks)
Monitor Fork Table - indexed by the current fork number
FKCNO Contains the SPT offset that points to the second page of
the PSB in the left half of this word.
FKINT Contains the pseudo-interrupt communications register,
with flags in the left half describing the type of
request, and the channel number of the request in the
right half.
FKINTB Contains the pseudo-interrupt channel requests pending
since the fork's last PSI interrupt.
FKJOB Job number of the fork in the left half, and SPT index
for the JSB in the right half.
FKJTQ Part of a doubly linked list of forks that are waiting
program software interrupt the Monitor. JTLST points to
the top fork on the list.
FKNR Contains in bits 0-8 the age stamp value at the last time
local garbage collection was performed.
FKPGS Contains the SPT indices for the process page table, in
the left half, and the PSB in the right half.
FKPGST Contains the address of the routine to test for balance
set wait satisfied in the right half, with test data in
the left. If the fork is not in the balance set, this
contains the time of day that the fork entered a wait
list.
FKPT Part of a linked list of forks on a particular schedular
list, such as GOLST, WTLST, etc. The right half of the
word contains the address of the next element in the
list, and the left half contains the amount of runtime
the fork's job will have accumulated when the fork
exceeds its Balance Set Hold time.
FKQ1 Contain the fork's remaining run quantum. When the
quantum expires, the fork is moved to a lower run queue,
and given the appropriate new quantum.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 104
JSYS CONTEXT (UPDL)
FKQ2 Contains the fork's schedular queue level number in the
left half, and the list address, i.e. GOLST, WTLST,
etc., in the right.
FKSTAT Contains the address of the schedular test routine which
will determine when the fork is available to be placed on
the GOLST.
FKTIME Contains the time of day, in internal format, that the
fork was placed on its current run queue.
FKWSP Contains the number of physical pages assigned by the
fork in the right half, and the working set size of the
fork when the fork entered the balance set in the left.
5.0 PAGER CONTEXT (TRAPSK)
Page faults trap through the user's UPT, by placing the old
flags and PC for the process in locations UPTPFL and UPTPFO
respectively, and taking the new PC from location UPTPFN. UPTPFN
will usually contain the address PGRTRP, which is the beginning of
the page fault code. The location being referenced and therefore
causing the page fault is stored in UPTPFW, also called TRAPS0.
This contains the virtual address that page faulted in bits 13-35.
Bit 0 of this word indicates if the location is in user or exec
(monitor) address space. If this bit is set, the address is in
user address space. The PGRTRP code copies TRAPS0 into TRAPSW, in
case of recursion. This code will determine the nature of the page
fault, and attempt to resolve it. UPTPFL and UPTPFO are also
called TRAPFL and TRAPPC respectively. The old stack pointer is
saved in location TRAPAP (this is only relevant if the page fault
occurred in exec mode). The new stack, TRAPSK, is set up according
to the context of the page fault, i.e., user context, monitor
context, or recursive page fault. A page fault in user mode causes
the stack to be set up with the runtime, return PC, and return PC
flags in the first three locations of the stack:
TRAPSK/ runtime
TRAPSK+1/ return PC
TRAPSK+2/ return PC flags
Page faults from monitor context have the following initial
stack set-up:
TRAPSK/ AC1
TRAPSK+1/ AC2
TRAPSK+2/ AC3
TRAPSK+3/ AC4
TRAPSK+4/ AC7
TRAPSK+5/ AC16
TRAPSK+6/ TRAPSW
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 105
PAGER CONTEXT (TRAPSK)
TRAPSK+7/ runtime
TRAPSK+10/ PC
TRAPSK+11/ PC flags
Recursive page faults will cause the following set up in TRAPSK, at
the time of the page fault:
/ AC1
/ AC2
/ AC3
/ AC4
/ AC7
/ AC16
/ TRAPSW
/ PC
/ PC flags
Recursive page faults will indicate the level of recursion in
TRAPC. This location is normally set to -1 and is incremented
every time the page fault code is called, and decremented when a
page fault has been satisfied.
In examining a pager crash, it is usually a good idea to begin
by tracing down the Monitor's table entries for the location that
faulted. This location is stored in location TRAPS0. The identity
of the page causing the trap is stored in location TRPID, and will
be in either of two forms: page table number in left, and page
number in right, or simply the page table number in the right. The
page table number is an SPT index, and the page number, if any, is
an offset into the page table pointed to by that SPT slot. There
are four Core Status Tables (CST's) indexed by physical page
number, that are used to keep track of each page in the machine. A
page fault crash will usually have bad data in either the SPT slot
indicated in TRPID, or one of the CST's for the physical page
pointed to indirectly through that SPT slot. If TRPID contains
PTN,,PN, then find location SPT+PTN. This should have a physical
page number in the right half. Look at this physical page, offset
by PN in TRPID to find the pointer to the page that caused the
fault. Shared and indirect pointers in this location will point
through another SPT location, but private pointers will point
directly at the physical page that we are looking for. If TRPID
contains just PTN, then SPT+PTN will point directly at the physical
page we are looking for. Knowing the physical page number, it is
now possible to examine the CST tables for that page.
CST0 Used principally by the pager hardware, this location
will contain the Process Use Register, mentioned in the
FKCNO table above, and the age stamp.
CST1 Contains the system lock count, and the backup address
for the page. The lock count indicates the number of
systen events necessary before the page will be swapped
out, and the backup address for the page. The system
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 106
PAGER CONTEXT (TRAPSK)
should never swap out a page with a non-zero lock count.
The backup address can be a disk or drum address for a
page in memory.
CST2 Contains the home map location of the page, and should
match the contents of TRPID.
CST3 Is used by the software to create lists of pages in
various states of use. Those pages available for use
will be on the Replaceable Queue, and linked together in
a doubly linked list. Those pages awaiting swapping will
be on a swapping device queue, and part of a singly
linked list. Pages in use will contain the fork number
of the owner in bits 3-14, and the local disk address for
PHYSIO for the page.
CST5 Contains the list of short I/O Request Blocks (IORB's)
associated with the page.
A few other significant locations for page faults are:
RPLQ Points to the beginning of the Replaceable Queue in CST3.
NRPLQ Contains the number of pages on the Replaceable Queue.
SWPLST Points to the beginning of the PHYSIO swap list, in CST3.
NOF Contains the number of OFN's in use in the SPT.
6.0 PSI CONTEXT (PIPDB)
Take note of the Monitor fork tables in the JSYS section of
this document. The locations FKINT and FKINTB will be useful in
determining the type and timing of PSI interrupts pending at the
time of the crash. When a process has a PSI interrupt pending, it
is flagged in the FKINT entry for that fork, and the scheduler will
take note of this event and set the PPC location in the PSB for
that process to contain the address PIRQ. This action takes place
at location SCHED5 in the scheduler. The next time that the
process is ready to run, it will continue at location PIRQ, which
will set up the PSI stack, PIPDB. SCHED5 also moves the PSI
request word from FKINT to PIMSK in the PSB. Thus, it is possible
to check this location for the last PSI request that was scheduled.
The old contents of PPC and PFL are stored in PIPC and PIFL by the
SCHED5 routine, so these will indicate the point where the process
was interrupted.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 107
SCHEDULER CONTEXT (SKDPDL)
7.0 SCHEDULER CONTEXT (SKDPDL)
Take note of the Monitor Fork tables in the JSYS section of
this document. The scheduler is usually invoked in one of two
ways: through a software interrupt initiated by channel 3 PI
routine, indicating that a set period of time has elapsed since the
last scheduler cycle, or through the ENTSKD macro, which is used by
a running process that is about to dismiss. In this way the
scheduler is guaranteed to run at regular intervals, or whenever
the system is idle. The primary entry point to the scheduler is
SCHED0. It is through this location control passes whenever the
running process dismisses, or whenever one of the two scheduler
clock cycles elapses. Briefly, the hardware traps on every clock
tick through location TIMVIL in the EPT. This location contains
the instruction XPCW TIMINT. Again, as in the device interrupt
code, this instruction causes the flags and PC to be placed in
locations TIMINT, and TIMINT+1, and control passes to the location
in TIMINT+3, which in this case is TIMIN0. TIMIN0 determines
whether or not it is time to run the scheduler, and dismisses the
interrupt. If the scheduler is to be run, TIMIN0 initiates a
software interrupt on channel 7, which causes a trap through the
EPT location KIEPT+56 to PISC7R. The instruction executed in
KIEPT+56 is an XPCW PISC7R, causing the old PC and flags to be
deposited at PISC7R, and control to begin at PISC7+1. The PISC7
code sets up PPC and PFL to contain the old PC and flags, from
PISC7R, and saves the process ac's at the time of the interrupt in
a block of the PSB called PAC. Having set up for scheduler
context, the PISC7 code then transfers control to the SCHED0
routine. Similarly, the ENTSKD macro does an XPCW ENSKR, causing a
jump to the ENSKED routine that does the context switch. On the
2020 the clock will interrupt through location KIEPT+46 (standard
level 3 interrupt). The level 3 routine will first determine if
this interrupt was caused by a clock tick, and if so JRST to
routine TIMIN0.
Some other useful locations in scheduler context:
1. GOLST Points to the beginning of the GOLST in the FKPT
table.
2. WTLST Points to the Wait list in the FKPT table.
3. TTILST Points to the TTY input wait list in the FKPT table.
4. FRZLST Points to the list of frozen forks.
5. WT2LST Points to the list of forks waiting to be unblocked.
(UNBLK1)
6. TRMLST Points to the list of forks waiting for another fork
to terminate.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 108
SCHEDULER CONTEXT (SKDPDL)
7. SUMNR Contains the number of reserved pages. (locked in
memory)
8. BALSHC Contains the number of pages reserved due to shared
access.
9. INSKED Set to non-zero if in the scheduler.
8.0 DTE INTERRUPT CONTEXT (DTESTK)
DTE interrupts also dispatch through locations in the EPT,
depending upon which DTE is interrupting. For each DTE that could
exist on a system (4), there is an eight word block in the EPT used
to keep up-to-date information for that DTE. Not all of the DTE
blocks will necessarily be used, however they will all exist in the
EPT. These blocks begin at location DTEEBP. The format of one of
these blocks is described below. The DTE interrupt executes the
third word in this block, which contains a XPCW DTEN0. This will
cause the old PC and flags to be stored at location DTEN0, and,
since DTEN0+3 contains ".+1", the system will begin processing the
interrupt at location DTEN0+4. This part of the routine will set
up the DTE stack, DTESTK, and save the PC, flags, and AC's. The
flags and PC are stored at DTETRA, and the AC's are stored at
DTEACB. DTEN0 will then use INTDTE to process the interrupt. This
code can be found in the DTESRV module of the monitor.
The DTE control block:
DTEEBP/ To -11 byte pointer
DTETBP/ To -10 byte pointer
DTEINT/ "XPCW DTEN0" ;dispatch for DTE-0
/ reserved
DTEEPW/ Examine Protection Word
DTEERW/ Examine Relocation Word
DTEDPW/ Deposit Protection Word
DTEDRW/ Deposit Relocation Word
Note that the labels above apply only to DTE-0, and that the
remaining DTE's must be offset by DTE-number X 8.
Some other useful locations in the EPT:
DTEFLG/ Operation Complete Flag
DTECFK/ Clock Interrupt Flag
DTECKI/ Clock Interrupt Instruction
DTET11/ To -11 argument
DTEF11/ From -11 argument
DTECMD/ Command Word
DTESEQ/ DTE20 Operation Sequence Number
DTEOPR/ Operation In Progress Flag
DTECHR/ Last Typed Character
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 109
DTE INTERRUPT CONTEXT (DTESTK)
DTETMD/ Monitor TTY Output Complete Flag
DTEMTI/ Monitor TTY Input Flag
DTESWR/ Console Switch Register
These location are found at offsets 444 through 457 in the EPT.
9.0 I/O QUEUEING (PHYPDL)
All disk and tape I/O is initiated through the PHYSIO code, by
calling PHYSIO with a pointer to an I/O Request Block (IORB) in
AC1, and the addresses of the Channel Data Block (CDB) and Unit
Data Block (UDB) in AC2 (CDB,,UDB). PHYSIO validates the arguments
passed to it, and then determines whether the IORB belongs on the
Position Wait Queue (PWQ) or the Transfer Wait Queue (TWQ). These
two queues are pointed to by offsets UDBPWQ and UDBTWQ in the UDB
for the device. Note that these are offsets into the UDB, which
will be in resident free space, as well as the CDB's. During
processing, PHYSIO will keep the following information in the ac's:
P1/ address of the CDB
P2/ address of the KDB (for tapes) or 0
P3/ address of the UDB
P4/ address of the IORB being processed
Since PHYSIO is called via the PUSHJ P, instruction, the previous
PC is not saved. The P and Q ac's are stored on the stack via the
SAVEPQ macro. PHYSIO does use a private stack, and so the old
stack pointer is saved in PHYSVP. Also, because PHYSIO does use a
private stack, it is necessary for the process calling PHYSIO to be
NOSKED. Also take note of the fact that IORB's are associated with
the physical pages of memory that are involved with the I/O through
pointers in the CST5 table for those pages. See the next section
for more information in this area.
10.0 DEVICE INTERRUPT CONTEXT (PHYIPD)
Device interrupts, in this context, refer to disk and tape
interrupts, those devices connected through the RH20's. Each RH20
channel has a "Channel Logout" area at the beginning of EPT. This
logout area is four words in length for each channel, the fourth
word of which contains an instruction to execute on an interrupt.
This instruction causes the system to dispatch to code actually in
the CDB for the channel.
On the 2020, the interrupts work differently. The EPT
contains pointers to SM10 vector tables starting at address SMTEPT.
The number of the interrupting UBA (1 or 3) is used as an offset to
SMTEPT to find the proper vector table, and then the function and
device (read done, DZ11, etc...) is used as an offset into the
vector table which contains the appropriate XPCW instruction to
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 110
DEVICE INTERRUPT CONTEXT (PHYIPD)
transfer control to the correct routine.
The previous PC and flags are saved in the area immediately
preceding the CDB; offset CDBINT (value -6) is the location where
the flags and PC are stored. When the interrupt occurs, the
hardware executes the instruction in the channel logout area, which
is "XPCW loc". "Loc" is the address of the CDB for this channel,
offset by CDBINT (-6). The XPCW instruction saves the flags at
CDBINT(CDB), the PC at the next location, and gets the new flags
and PC from the next two locations. This area of the CDB, then,
contains the following:
CDBINT(CDB)/ old flags
-5(CDB)/ old PC
-4(CDB)/ new flags (0)
-3(CDB)/ new PC ( ".+1")
-2(CDB)/ MOVEM P1,CDBSVQ(CDB) ; saved in CDB offset CDBSVQ
-1(CDB)/ JSP P1,PHYINT ; dispatch to interrupt code
CDBSTS(CDB)/ status and configuration flags
The PHYINT code, then, resolves the interrupt, and returns to the
old PC by JRSTing through offset CDBJEN in the CDB. This part of
the CDB contains the following:
CDBJEN(CDB)/ BLT 17,17
/ DATAO RH,CDBRST
/ XJEN CDBINT(P1)
The last of these locations causes the system to resume where it
was interrupted. During processing of the interrupt, the following
information may be found:
P1/ address of the CDB
P2/ address of the KDB or 0
P3/ address of the UDB
P4/ address of the IORB or argument code:
(P4) < 0 - schedule a channel cycle
(P4) = 0 - dismiss interrupt
(P4) > 0 - complete current request (IORB address)
When the system is attempting to perform I/O to or from a
specific page of physical memory, that page is locked into core, by
incrementing the lock count in the CST1 location for that page. If
a device error occurs during the transfer of data for that page,
then the CST5 entry for that page will have either a short I/O
Request Block (IORB) or a pointer to a long (Mag Tape) IORB. The
short IORB is only one word in length and is used for disk transfer
requests, i.e., swapping. In either case, the first word of an
IORB, called IRBSTS, contains flags that describe the success or
failure of the transfer. It may be helpful to check these
locations in the event of a PHYINT crash.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 111
DEVICE INTERRUPT CONTEXT (PHYIPD)
The following offsets contain useful information for PHYSIO
crashes:
In the UDB:
UDBPS1/ cylinder number
UDBPS2/ surface,, sector number
UDBERC/ error retry count
UDBERR/ status function for error retry
In the CDB:
CDBCNI/ status of channel when interrupt began.
11.0 APR INTERRUPT CONTEXT (MEMPP)
APR Interrupts, like Device interrupts, are vectored through
the EPT, but in the case of the APR interrupts, the vector location
is a part of the priority interrupt scheme. These are priority
channel 3 interrupts, and dispatch through location KIEPT+45, which
contains a XPCW PIAPRX. This is the channel 3 interrupt routine.
This routine will attempt to resolve the interrupt, and in doing so
will set up its own stack, MEMPP. As in the case of the device
interrupt, the XPCW PIAPRX will cause the PC and flags to be stored
at locations PIAPRX and PIAPRX+1, and the processor will then jump
to the location stored in PIAPRX+3, which is PIAPR+1. PIAPR
actually dismisses the APR interrupt, or BUGHLT's. The old stack
pointer, at the time of the interrupt, is stored in MEMAP. Ac's
0-10 are saved starting at location MEMPA. One unusual aspect
about handling APR interrupts is that the PIAPR code changes the
page fault trap vector, mentioned earlier, from PGRTRP to MEMPTP,
in UPTPFN, to handle the special case of a page fault in APR
interrupt context.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 112
BUG'TYP MACRO CHANGES FOR VERSION 4 OF TOPS-20
BUG'TYP MACRO CHANGES FOR VERSION 4 OF TOPS-20
Version 4 of TOPS-20 will include some changes in the BUG code
generation. The purpose of these changes is to generate a document
describing the TOPS-20 BUGCHKs, BUGHLTs, and BUGINFs that are more
descriptive than the previous BUGSTRINGS.TXT file.
The logistics of this change include moving the BUG definitions out of
the monitor source listings and into a central source file. This
source file will serve both as the definition file for the bugs and as
documentation for the BUGS. This file is called BUGS.MAC and will be
distributed to all sites on the distribution tape. These BUGS are
still referenced in the source module where the bug is invoked but
they are defined in BUGS.MAC.
This involves a modification to the old BUG macro and a new macro
called DEFBUG. The BUG macro appears in the source modules and the
DEFBUG macro appears in BUGS.MAC.
The format of the new BUG macro is as follows:
BUG (BUGNAM,<<x1,des1>,<x2,des2>...>)
This is placed in the monitor code where the BUG called BUGNAM is to
occur. This macro executes a macro with name 'BUGNAM' which generates
a XCT BUGNAM where the contents of BUGNAM is a JSR BUG'TYP. Following
the location BUGNAM are the Accumulators to be printed (one AC per
word) followed by SIXBIT/BUGNAM/. The Accumulators to be printed are
defined with the DEFBUG macro while the locations specified in the BUG
macro are for documentation only.
Accompanying this BUG macro is a DEFBUG macro which is placed in the
file BUGS.MAC. This entry completely defines the BUG, including its
type (BUGHLT, BUGCHK, or BUGINF) and documentation.
The format of the DEFBUG macro is:
DEFBUG (TYP,TAG,MOD,WORD,STR,LOCS,HELP)
For a description of the arguments to this macro see the SWSKIT
article called BUGS.MEM.
In order to make listings (output from MACRO or CREF) more informative
than before, the BUG macro will cause the statement of the short
description displayed in the listing where the BUG macro is called.
Also, the flavor of bug (INF, CHK, or HLT) and whether it's hardware
or software related will be displayed in the listing. Hence the
OVRDTA bug would appear in the listing as
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 113
BUG'TYP MACRO CHANGES FOR VERSION 4 OF TOPS-20
BUG(OVRDTA)
;BUG Type: hardware-related BUGINF
;BUG description: PHYSIO - OVERDUE TRANSFER ABORTED
When fully documented, the BUGS.MAC file will be extremely useful
for specialists. It will describe, in one convenient place, what the
additional data printed on the console is, what caused the bug, and
what the site or specialist should do if that particular bug occurs.
Here is a section of the current BUG definition/documentation for the
BUG GIVTMR from BUGS.MAC:
DEFBUG(INF,GIVTMR,JSYSA,SOFT,<GIVOK TIMEOUT>,<<T2,FUNC>>,<
Cause: The access control job has not responded with a GIVOK within
the designated time period.
Action: If this consistently happens with the same function code, you
should see if the processing of the function can be made
faster.
If there is no obvious function code pattern, you may need to
increase the timeout period or rework the way in which the
access control program operates.
Data: FUNC - the GETOK function code
>)
INF specifies the bug is a BUGINF. GIVTMR is the name of the bug.
JSYSA is the module that the bug would occur in. SOFT specifies that
it is likely the bug is caused by a software bug. <GIVOK TIMEOUT> is
the bug string. <T2,FUNC> specifies the data that will be printed on
the operator's console. The initial spec called for the descriptor
FUNC to be included in the operator's message but at this time, this
descriptor is just for source documentation.
The blurbs following the initial line of the BUG definition attempt to
describe to the specialist, in a more detailed manner than the
description printed on the console, what it means when this bug occurs
and what should be done first in order to resolve the situation. In
this case the ACTION is to examine the GETOK routine which is executed
for the additional data FUNC. This routine is getting hung up.
Sometimes, the ACTION will state to call the hot line or to submit an
SPR. These descriptions will help the specialist be more informed
about the bugs which may occur at one of their sites and save them the
time of calling the hot line or searching through the source module
for an idea of the problem.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 114
MONITOR BUILDING HINTS
MONITOR BUILDING HINTS
======================
1. GENERAL
=======
Judging from the number of requests for help on this subject, the
chances are that you will be required to rebuild a monitor sometime
during your career as a Software Specialist. The reasons are quite
simple. There are customers, who simply want functionality other than
that provided by stock monitors. There are also those who are
experiencing performance problems. We cannot forget the sales folks.
It is not unusual to have to rebuild a monitor in order to run a
benchmark. A very common example is increasing the OFN area. Another
quite common requirement is to increase the patch area (FFF). Doing
either of these and simply submitting a build control file will often
produce a bad monitor.
We will talk about PSECTS in relation to the Monitor's address space
but will make no attempt to define what they do. A good detailed
discussion on the Monitor's address space is on pages 2-62 to 2-73 in
the Release 4 Update Manual. Also there is a memo on the Monitor's
address space in the SWSKIT.
2. BACKGROUND
==========
In V3A, all of the Monitor was in the same address space. Nevertheless
there was a crunch on space. As a result some PSECTS were allowed to
overlap. So critical was the space requirement, that attempts to
increase the OFN area or FFF usually resulted in the overlapping of
PSECTS other the the ones permitted. Therein lies the problem. The
Monitor produced from such a process would ordinarily be useless.
With the development of V4, the space requirement became more
critical. The Symbol Table became the object of concern. It required
a large number of pages, and in general, it is only used infrequently
under normal conditions. Hence the Engineering folks were of the
opinion that it should be completely elinminated. We objected. It
would be a nightmare to try to debug the monitor without symbols. It
thus became our project to somehow keep the Symbol Table while
conforming with the space restrictions. We decided to remove the
Symbol Table and place it in an alternate address space. It should
be noted that this action does not impact adversely on system
performance. With this change, the build procedure and the monitor's
address space were reorganized.
3. BUILD PROCEDURE
===============
Outlined below are some steps to guide you when rebuilding a monitor.
Bear in mind that this is a guide and might not account for all the
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 115
MONITOR BUILDING HINTS
unusual situations. This guide however, coupled with your experience
and common sense will most likely do the trick. PLEASE READ THIS
ENTIRE MEMO BEFORE ATTEMPTING TO REBUILD YOUR MONITOR. Also please
read the build BEWARE file that is on the Installation tape.
NOTE: The customers Distribution Tape will have all the files needed
to rebuild the monitor. All TOPS-20 modules will be in
TOPS-20.REL (or T2020.REL etc) The control file is TOPS20.CTL
(or T2020.CTL etc). The link file will be NAME.CCL where
"NAME" depends upon what monitor is being used (could be 2020,
ARPA etc.). For 2040/50, it is called LNKSCH.CCL. In any case
the TOPS20.CTL file will have the name. The files you will
change will be one of the PARAM's file and/or STG.MAC. It
should be noted that the special LINK.EXE and MACRO.EXE needed
to build V3A are not required under V4.
If you have the time, it is not a bad idea to use all the
standard files and build yourself a "vanilla" monitor. This
will test the procedure and files and reveal any problems
peculiar to the build itself. Once these are resolved, any
problems encountered when you are rebuilding your modified
monitor will be related to the change itself. The time for the
debugging phase can thus be reduced substantially.
STEP 1 Restore all files needed from <4-SOURCES>. This will
usually contain the monitor modules (TOPS20.REL file),
all needed source files, all build control, command
and log files.
STEP 2 Carefully make the source changes as needed.
STEP 3 Examine the TOPS20.CTL file. This file will usually
have logical name definitions and TAKE commands along
with other things. Also look at all referenced command
files.
STEP 4 Examine the corresponding log file. This will show
what the result of the original build procedure was.
It should therefore be a template which should be used
to judge the validity of the new Monitor. Pay special
attention to the section which shows the PSECT layout
at the end of the BUILD procedure. This shows the
start location, the end location and the amount of
free space between each PSECT. The file used by LINK
to set up the PSECTS is called LNKSCH.CCL. You should
look at this file to get an idea of what's happening.
STEP 5 Now edit the control and command files as necessary to
reflect your environment. This will mean, among other
things, changing or eliminating logical name
definitions. Do NOT change the order of the PSECTS in
the LNKSCH.CCL file. Also do not change the starting
value for any PSECT. The starting value is the value
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 116
MONITOR BUILDING HINTS
given to the /SET: switch.
STEP 6 Submit the control file with /TAG:SINGLE switch.
Ensure that the control file is correct and reflects
accurately logical name definitions and the .CCL file.
Also this portion of the .CTL file has the commands
necessary to compile the changed module.
STEP 7 When the job ends, examine your log file. Correct any
compilation or missing files errors and go back to
STEP 6. Continue with STEP 8 only after all errors are
eliminated.
STEP 8 At this point you should have a MONITR.EXE. Now
examine the section in the log file which gives an
outline of the PSECTS. If any PSECTS overlap, a
message will indicate the same. If there are no
overlapping messages, go to STEP 11. NOTE: There are
some instances where PSECTs can overlap. POSTCD
and SYVAR PSECTs are allowed to overlap any xxxVAR
PSECT. This will not gain very much in storage - 4
pages to be exact. If you follow the build procedure
then overlapping PSECTs are not allowed and therefore
must be resolved. You are once again advised NOT
to re-organize the monitor's address space.
STEP 9 Start with the first overlapping. Figure out the
amount of words by which the first PSECT overlap its
following PSECT. Now add this value to the start
location of the overlapped PSECT. This value quite
possibly will be location within a page i.e. an
address of the form 125300, where the page number is
125 and the offset into the page is 300. The starting
address of many PSECTs is required to be on a page
boundary i.e. an address of the form 126000. A good
rule to follow is: IF THE PSECT STARTED ON A PAGE
BOUNDARY BEFORE THE BUILD, THEN KEEP IT ON A PAGE
BOUNDARY. This would mean that you may be required to
add an additional value to round up to the next page.
For example the 125300 value would be rounded to
126000 if the PSECT is required on a page boundary.
The PSECT sequence and starting values are in the
LNKSCH.CCL file. NOTE: the values are all given in
OCTAL so add in OCTAL.
STEP 10 EDIT the LNKSCH.CCL file to reflect this new start
value for the overlapped PSECT. Go back to STEP 6.
Repeat these steps until there are no more error
messages. Note that changing the start location of the
overlapped PSECT can cause it to overlap its following
PSECT and the same procedure must be followed to
resolve any conflicts. Of course you must be careful
to ensure that you do not outgrow the monitors address
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 117
MONITOR BUILDING HINTS
space. A total of the length of all PSECTs will tell
you if the Monitor is too large.
STEP 11 At this point you should have a good Monitor. Save it
in the proper directory. The final test is getting it
up and running.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 118
EXEC DEBUGGING
EXEC DEBUGGING
--------------
Now that most SWS have micro fiche of the released EXEC and MONITOR I
anticipate questions on looking at the EXEC and MONITOR. Here is a
cursory tutorial on investigating the internals of the EXEC (or
command processor, if you prefer). The examples are intended to be a
guide and although the typein is correct, the response may not be
character perfect. You are advised to read the other chapters in this
document for more information on DDT and MONITOR snooping and
debugging.
LOOKING AT THE EXEC WITH DDT
============================
You can either look at the running system EXEC or your own copy of the
EXEC with DDT that is loaded with the EXEC.
I. TO LOOK AT THE RUNNING EXEC:
First you must have WHEEL privileges in order to use the ^EEDDT
command. The ^EEDDT command transfers control to the DDT now loaded
with EXEC, with symbols. Now you can do all the normal DDT functions.
To exit from DDT all you do is <ESC>G , echoed as $G. This starts
your program which is the EXEC and so now you are at EXEC command
level.
@ENABLE
$^EEDDT
DDT
.
.
.
$G
$DIS
@
II. TO LOOK AT YOUR COPY OF AN EXEC(RUNNING UNDER SYSTEM EXEC):
Get your copy of the EXEC in your address space, transfer control to
it and start DDT as above. There are 3 ways to exit from this
depending on the state you are in. If you are in DDT you can ^Z out
to get back to system EXEC. If you are running your EXEC and want to
exit to the system EXEC you can ^EQUIT (if you are enabled) or "POP"
(if you are not enabled). POP is preferable. Note if you prefer to
get your EXEC and not start it in order to set breakpoints or put in
patches before running, see section "VI -- PATCHING" below.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 119
EXEC DEBUGGING
EXAMPLE EXITING FROM DDT:
@GET MYEXEC.EXE
@SET NO CONTROL-C-CAPABILITY
@START
@MONNAM.TXT, TOPS-20 MONITOR (VERSION#)
@ENA
$^EEDDT
DDT
.
.
.
CINITF/ -1 0 ; reset initialization flag so you can
; run this EXEC again after it is saved
.
^Z ; to exit and save, for example
@ ; now you are in the monitors EXEC
; with your EXEC in your
; address space. You can save it, say.
@SAV MYEXEC.EXE.2
EXAMPLE, EXITING FROM YOUR RUNNING EXEC:
@GET MYEXEC.EXE
@START
@MONNAM.TXT,,TOPS-20 MONITOR(VERSION #)
@ENA
^EEDDT
DDT
.
.
.
.
CINITF/ -1 0 ; clear initialization flag
.
.
$G ; running your EXEC
.
.
$^EQUIT ; return to higher (system) EXEC
@ ; you are in system EXEC
@SAV NEWEXEC ; etc.
EXAMPLE, EXITING FROM YOUR RUNNING EXEC WITH POP:
@GET MYEXEC.EXE
@START
@MONNAM.TXT,,TOPS-20 MONITOR(VERSION#)
@
.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 120
EXEC DEBUGGING
.
.
@POP ; return to higher (system) EXEC.
@ ; now you are in system EXEC.
; NOTE: you should set CINITF to 0
; if you want to save and run this
; EXEC later. You can do it by
; DDT after the POP or ^EEDDT before
; the POP.
III. GETTING OUT OF TROUBLE:
Since it is true that you could get into trouble with your EXEC
and not be able to get out of it, CTRL/C traps or you can't POP or
whatever, there is a way to exit to the MINI-EXEC always. First you
must issue ^EQUIT to get into the MINI-EXEC. Then "S" (start) to get
back to the system EXEC. Then get into your EXEC. If you now get
into trouble you can issue ^P which will get you back into the
MINI-EXEC. Now you have the chance to get back to the system EXEC
with "S" (start).
EXAMPLE:
@ENA
$^EQUIT
INTERRUPT AT 15657
MX>S
$ ; you are now back in system EXEC.
$GET MYEXEC
$
$START
@MONNAM.TXT, TOPS-20 MONITOR (VERSION)
. ; lets say you can't do anything
. ; you are in your EXEC
. ; get out, get into MINI-EXEC
^P
INTERRUPT AT 12345
MX>S ; MINI-EXEC prompt followed by start.
$ ; you are now in the system EXEC.
IV. RUNNING YOUR EXEC AS A TOP LEVEL FORK:
Suppose that you want to run your EXEC as the top level EXEC,
that is, not running under the system EXEC. Get into the MINI-EXEC
and get your copy of the EXEC and run it as the top level EXEC.
EXAMPLE:
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 121
EXEC DEBUGGING
@ENA
$^EQUIT
INTERRUPT AT 23456
MX>R ; Reset else you will MERGE rather than just GET
MX>G <MYAREA>MYEXEC.EXE.2
MX>S
@ ; Now you are in your EXEC
.
.
. ; Lets say you want to get out
@^P ; Control-P to get to MINI-EXEC
ABORT
MX>R ; "RESET" resets your address space
MX>E ; You are requesting the system EXEC
@ ; You are in system EXEC
NOTE: If you had typed "S" rather than "E" above you would
have restarted your EXEC.
V. REPLACING THE SYSTEM EXEC
Once you have made a change to your personal copy of the EXEC, you may
wish to have your edited EXEC run as the SYSTEM EXEC. It is necessary
to make the saved EXEC non-writable before using it system-wide.
EXAMPLE:
@ENABLE (CAPABILITIES)
$GET (PROGRAM) PS:<SYSTEM>EXEC.EXE
$INFORMATION (ABOUT) MEMORY-USAGE
81. pages, Entry vector loc 6000 len 3
0 FARK:<4-FIELD-IMAGE.EXEC>EXEC.EXE.1 1 R, CW, E
6-125 FARK:<4-FIELD-IMAGE.EXEC>EXEC.EXE.1 2-121 R, E
$!MAKE THE EXEC WRITABLE SO WE CAN EDIT IT
$SET PAGE-ACCESS (OF PAGES) 6:125 (ACCESS) COPY-ON-WRITE
$DDT
DDT
. ;Make the edits
.
^Z
$
$!MAKE THOSE PAGES NON-WRITABLE
$SET PAGE-ACCESS (OF PAGES) 6:125 (ACCESS) NO WRITE
$SET PAGE-ACCESS (OF PAGES) 6:125 (ACCESS) NO COPY-ON-WRITE
$!SAVE THE NEW EXEC
$SAVE EXEC.EXE.2 !New generation! (PAGES FROM) 6 (TO) 125
EXEC.EXE.2 Saved
$!RENAME THE SYSTEM EXEC SO WE CAN GET IT BACK IF WE NEED IT
$RENAME (EXISTING FILE) PS:<SYSTEM>EXEC.EXE.* (TO BE) PS:<SYSTEM>OLD-EXEC.EXE
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 122
EXEC DEBUGGING
$!AND COPY THE NEW ONE INTO PS:<SYSTEM>
$COPY (FROM) EXEC.EXE.2 (TO) PS:<SYSTEM>EXEC.EXE.197 !New generation!
VI. OTHER INFORMATION:
There is one error message when trying to start DDT; "?" implies that
you do not have sufficient privleges enabled.
When searching for symbols you may notice that the module name
DDT gives you is different from the module names that are assembled
for the EXEC. For example to open the symbol table for EXECED you say
CANDE$: to DDT.
The following is a correspondence list:
FILENAME.MAC INTERNAL REFERENCE
==================================
EXECDE.MAC XDEF
EXECGL.MAC XGLOBS
EXECPR.MAC PRIV
EXEC0.MAC EXEC0
EXEC1.MAC EXEC1
EXEC2.MAC EXEC2
EXEC3.MAC EXEC3
EXEC4.MAC EXEC4
EXECED.MAC CANDE
EXECCS.MAC CSCAN
EXECSU.MAC SUBRS
EXECMT.MAC EXECMT
EXECQU.MAC EXECQU
EXECSE.MAC EXECSE
EXECP.MAC EXECP
EXECVR.MAC VER
EXECMI.MAC MIC
The sources and .CTL file for assembling the EXEC are on the
SWSKIT.
If it is true that upon trying to examine a location symbolically
you get "U" implying the symbol is undefined you may have to reset the
symbol table pointers. Look in location 770001 for the address that
contains the symbol table pointer then look at location 116 to find
the real symbol table pointer. Put the contents of 116 in the
location pointed to by 770001.
116/ 762600,54463 ; real symbol table pointer
770001/ 776456 ; location of symbol table pointer
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 123
EXEC DEBUGGING
776456/ 743200,,23540 762600,,54463
VII. PATCHING
There is a patch command in DDT. The form is as follows:
$< ; patch before this instruction
$$< ; patch after this instruction
$> ; end this patch following this instruction
DDT will put the patch in the EXEC patch area. The symbol is PAT..
DDT will insert JUMPA 1,LOC+1 and JUMPA 2,LOC+2 following the patch
you typed in. Where LOC is the location of the instruction you're
patching. DDT then replaces LOC, the original INST., with a JUMPA
XXXXX, where XXXXX is the patch area where your patch is now. Then
the patch area (PAT..) is redefined to follow your last patch.
EXAMPLE:
Get a copy of <SYSTEM>EXEC, insert calls to subroutine MUMBLE and
subroutine FRATZ before location DING+1. DING+1 contains PRINT Q3
originally and contains a JUMPA to the patch area after the patch.
The patch area will contain:
CALL MUMBLE
CALL FRATZ
PRINT Q3
JUMPA 1,DING+2
JUMPA 2,DING+3
USER TYPESCRIPT FOR THE ABOVE:
@ENABLE
$GET<SYSTEM>EXEC
$SAVE NUEXEC ; you must SAVE and GET in order to write
$GET NUEXEC ; enable the EXEC to use DDT not ^EEDDT.
$DDT
DDT
EXEC0$: ; open symbols for module where DING is
DING/ PUSH P,A ; first location in routine "DING"
DING+1/ PRINT Q3 $< ; begin patching before location DING+1
PAT../ 0 CALL MUMBLE ; DDT opens up PAT.. area, you add code
PAT..+1/CALL FRATZ ; continue to insert your patch
$> ; close the patch
PAT..+2/ PRINT Q3 ; the original instruction being replaced.
PAT..+3/ JUMPA 1,DING+2 ; DDT inserts this return.
PAT..+4/ JUMPA 2,DING+3 ; incase a SKIP inst.
DING+1/ JUMPA 12345 ; JUMPA to PAT.. replaces original LOC.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 124
EXEC DEBUGGING
$G ; start your copy of EXEC etc.
Various methods may be used to write-enable the EXEC for
patching. You can use the GET, SAVE method above, or SET PAGE n
COPY-ON-WRITE, or the $W command in DDT to achieve the same results.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 125
RECOVERING FROM A BAD EXEC
RECOVERING FROM A BAD EXEC
--------------------------
This procedure is simply a rehash of the procedure for recovering
from the case in which the EXEC refuses to log in. For more
information see the article "Looking at the EXEC with DDT".
If your system version of the EXEC blows up completely, you can
recover rather easily. You type a ^C on the CTY, and when the EXEC
blows up you will be dumped into the MINI-EXEC. Then you can use the
GET and START commands to read in a good version of the EXEC, either
from a copy on disk, or from the distribution magtapes.
If the problem with the EXEC is that it does not blow up, but it
still fails to let you log in, then you have a harder time. In this
case you have to bring up the system with the switches, and bring up
the system stand-alone. An example of what to do from the point where
the BOOT program is loaded follows:
BOOT>/L ; load in the monitor
BOOT>/G141 ; start up EDDT
EDDT
DBUGSW[ 0 2 ; set system as debugging
EDDTF[ 0 1 ; keep EDDT around
GOTSWM$B ; set a breakpoint after the swappable
; part of the monitor has been loaded
147$G ; start the system
GOTSWM$1B>> STEX+1/ HRROI T2,BOOTER+51 HRROI T2,FFF
FFF[ ""PS:<SYSTEM>OLD-EXEC.EXE"
FFF: ; change the name of the EXEC file
0$1B ; remove the GOTSWM breakpoint
$P ; proceed to bring up the system
^C ; and Control-C to get the new EXEC
If you had no old version of the EXEC around, then change the name to
some garbage, so that the monitor can't find any such program. This
will then dump you into the MINI-EXEC, and then you can read a good
EXEC in from magtape.
In release 3 of the monitor, there is a new JSYS which is very
useful for debugging new versions of the EXEC. The CRJOB JSYS can
allow you to start up a new job with any program at all as it's top
level fork. You can also start the job not logged in. So you can
debug your new versions of the EXEC easily, with no possibility of
ripping yourself off. Of course the ^EQUIT, GET from MINI-EXEC is
still a valid sequence for starting a new top-level fork.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 126
Debugging the GALAXY System
Debugging the GALAXY System
1.0 INTRODUCTION
The GALAXY system presents a unique problem to the software specialist
who is trying to debug one of its components. Usually, any user mode
program can be debugged under TOPS-20 by running a copy of it, loaded
with DDT, taking appropriate care that nothing is done which will
affect any users of the system. For GALAXY, however, it is very
difficult to not affect users of the system. For example, if you are
trying to debug BATCON, you will find that QUASAR will very happily
schedule batch jobs submitted by other users to be run by your BATCON.
If you are not careful, you can cause those batch jobs to be lost, or
at least slowed down, while you are debugging.
Debugging QUASAR or ORION would be even worse. Users would see PRINT,
SUBMIT, etc. commands hang when you hit a breakpoint in QUASAR.
Operators would be unable to control any system components if you were
breakpointed in ORION. On top of this, the monitor knows about
QUASAR, and you may lose messages which happen when users close a
spooled lineprinter file, or when a job logs out.
To solve these problems, the concept of a "private GALAXY system" has
been implemented by software engineering in version 4 of GALAXY. When
a private GALAXY system is operating, all of its components are
completely independent of the primary GALAXY system. QUASAR, the
queue maintainer, keeps queues that are separate from the system
queues and are failsofted to a different master queue file. This
QUASAR communicates only with other components in the same private
system. It is even possible to run several complete private GALAXY
systems, with the restrictions that:
1. All components in a private system must run under the same
user name.
2. Only one private system may be run by a given user.
3. Each private QUASAR must be connected to a different
directory.
4. Each private ORION must be connected to a different
directory.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 127
BUILDING A PRIVATE GALAXY SYSTEM
2.0 BUILDING A PRIVATE GALAXY SYSTEM
Since the changes necessary to create a private GALAXY system were
implemented in the version 4 source code, it is relatively simple to
build the system. The recommended procedure is as follow:
1. Create a directory to for the private GALAXY system.
2. Restore the file EXEC-FOR-DEBUGGING-GALAXY.EXE from the
SWSKIT to this newly created directory.
3. Restore each of the following files from the "Subsys files
for TOPS20 V4" saveset on the TOPS-20 distribution tape to
this directory.
BATCON.EXE
CDRIVE.EXE
GLXLIB.EXE
LPTSPL.EXE
OPR.EXE
ORION.EXE
PLEASE.EXE
QMANGR.EXE
QUASAR.EXE
SPRINT.EXE
SPROUT.EXE
4. For each component in the above list except GLXLIB.EXE and
QMANGR.EXE, perform the following steps:
1. Give the EXEC command "GET xxxxxx.EXE"
2. Give the command "DEPOSIT 135 -1"
3. Give the command "SAVE xxxxxx"
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 128
EXAMPLE OF A PRIVATE GALAXY BUILD
3.0 EXAMPLE OF A PRIVATE GALAXY BUILD
It is not strictly necessary to restore all of the GALAXY components
for a one time only debugging session. To debug a component like
BATCON, you would need at a minimum:
1. Your own copy of BATCON
2. Your own copy of QUASAR for BATCON to speak to
3. Your own copy of ORION for BATCON and QUASAR to speak to
4. A copy of OPR to speak to ORION to control BATCON
5. An EXEC which knows about your QUASAR to make queue entries
The following is a log of an example build of a private GALAXY system:
TOPS-20 Command processor 4(560)
@ENABLE (CAPABILITIES)
$!
$! First connect to a debugging directory
$!
$CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG>
$!
$! Now build and save debugging .EXE files
$!
$! QUASAR, the queue maintainer
$!
$GET (PROGRAM) SYS:QUASAR.EXE.55
$DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
[Shared]
$SAVE (ON FILE) QUASAR.EXE.1 !New file! (PAGES FROM)
QUASAR.EXE.1 Saved
$!
$! ORION, the message clearinghouse
$!
$GET (PROGRAM) SYS:ORION.EXE.53
$DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
[Shared]
$SAVE (ON FILE) ORION.EXE.1 !New file! (PAGES FROM)
ORION.EXE.1 Saved
$!
$! OPR, the operator interface
$!
$GET (PROGRAM) SYS:OPR.EXE.55
$DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
[Shared]
$SAVE (ON FILE) OPR.EXE.1 !New file! (PAGES FROM)
OPR.EXE.1 Saved
$!
$! BATCON, the batch controller
$!
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 129
EXAMPLE OF A PRIVATE GALAXY BUILD
$GET SYS:BATCON.EXE.39
$DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
[Shared]
$SAVE (ON FILE) BATCON.EXE.1 !New file! (PAGES FROM)
BATCON.EXE.1 Saved
$!
$! Now a directory of what we've got
$!
$VDIRECTORY (OF FILES) *.*.*
MISC:<HEMPHILL.GALAXY.DEBUG>
BATCON.EXE.1;P777700 16 8192(36) 13-Feb-80 22:00:37
EXEC-FOR-DEBUGGING-GALAXY.EXE.1;P777700
82 41984(36) 13-Feb-80 04:33:50
OPR.EXE.1;P777700 31 15872(36) 13-Feb-80 22:00:09
ORION.EXE.1;P777700 44 22528(36) 13-Feb-80 21:59:45
QUASAR.EXE.1;P777700 40 20480(36) 13-Feb-80 21:59:27
Total of 213 pages in 5 files
$
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 130
RUNNING THE PRIVATE GALAXY SYSTEM
4.0 RUNNING THE PRIVATE GALAXY SYSTEM
Starting and running a private GALAXY system is similar to running
GALAXY in the usual manner. First QUASAR and ORION are started, then
the component you wish to debug. You will also need OPR to issue
operator commands and the modified EXEC to make queue entries. Since
you will need about five jobs, it is usually most convenient to run
each component as a separate subjob under PTYCON.
4.1 Starting QUASAR
QUASAR and ORION should be started before everything else. Nothing
evil happens if you start them last, but all the other components will
be waiting for these two to start. A suggested procedure is:
1. Define a subjob "Q"
2. Connect to it
3. LOGIN a job under the same user name
4. CONNECT that job to the directory in which you did the
private GALAXY build
5. ENABLE
6. RUN QUASAR
4.2 Starting ORION
Starting ORION is as painless as starting QUASAR:
1. Define a subjob "O"
2. Connect to it
3. LOGIN a job under the same user name
4. CONNECT that job to the directory in which you did the
private GALAXY build
5. ENABLE
6. RUN ORION
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 131
Starting OPR
4.3 Starting OPR
OPR starts up using the same formula as QUASAR and ORION:
1. Define a subjob "OPR"
2. Connect to it
3. LOGIN a job under the same user name
4. CONNECT that job to the directory in which you did the
private GALAXY build
5. ENABLE
6. RUN OPR
7. You may now type OPR commands to see if QUASAR and ORION
appear to be healthy.
4.4 Starting The Component To Be Debugged
If the component you wish to debug is QUASAR, ORION, or OPR, then you
have already started it. Breakpoints could have been set, and when
they were hit, the component could have been debugged without any
noticable affect on other users of the system. If you wish to debug
PLEASE, BATCON, LPTSPL, CDRIVE, SPRINT, or SPROUT, do the following:
1. Define a subjob with an appropriate ID (e.g. B for BATCON)
2. Connect to it
3. LOGIN a job under the same user name
4. CONNECT that job to the directory in which you did the
private GALAXY build
5. ENABLE
6. GET the component
7. Enter DDT
8. Set breakpoints, then start the program
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 132
Starting the Modified EXEC
4.5 Starting The Modified EXEC
The file "EXEC-FOR-DEBUGGING-GALAXY.EXE" which has been supplied on
the SWSKIT has exactly two commands added to its repertoire. These
are "^ESET DEBUGGING-GALAXY" and "^ESET NO DEBUGGING-GALAXY". The
effect of these commands is to select which one of two PIDs (Process
IDs) to communicate with: the system QUASAR or the private QUASAR.
If "NO DEBUGGING-GALAXY" is set, then PRINT, SUBMIT, CANCEL, MODIFY,
and the INFORMATION commands will all cause communication with the
system QUASAR. If "DEBUGGING-GALAXY" is set for this EXEC, then the
commands listed will communicate with the private QUASAR run by that
user.
1. Define a subjob "E"
2. Connect to it
3. LOGIN a job under the same user name
4. CONNECT that job to the directory in which you did the
private GALAXY build
5. RUN EXEC-FOR-DEBUGGING-GALAXY
6. ENABLE
7. ^ESET DEBUGGING-GALAXY
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 133
EXAMPLE DEBUGGING SESSION
5.0 EXAMPLE DEBUGGING SESSION
The following is a log of a sample debugging session:
TOPS-20 Command processor 4(560)
@!
@! First run PTYCON, so we can control five jobs from one terminal
@!
@PTYCON.EXE.7
PTYCON> !
PTYCON> ! Now start up QUASAR as subjob Q
PTYCON> !
PTYCON> DEFINE (SUBJOB #) 0 (AS) Q
PTYCON> CONNECT (TO SUBJOB) Q
[CONNECTED TO SUBJOB Q(0)]
2102 Development System, TOPS-20 Monitor 4(3245)
@LOG HEMPHILL (PASSWORD)
Job 21 on TTY222 13-Feb-80 22:18:05
Structure PS: mounted
Structure MISC: mounted
@ENABLE (CAPABILITIES)
$!
$! Connect to directory where debugging .EXE files are
$!
$CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG>
$!
$! Finally run the component
$!
$RUN (PROGRAM) QUASAR.EXE.1
% QUASAR GLXIPC Becoming [HEMPHILL]QUASAR (PID = 66000031)
% QUASAR GLXIPC Waiting for ORION to start
^X
PTYCON> !
PTYCON> ! Now start up ORION as subjob O
PTYCON> !
PTYCON> DEFINE (SUBJOB #) 1 (AS) O
PTYCON> CONNECT (TO SUBJOB) O
[CONNECTED TO SUBJOB O(1)]
2102 Development System, TOPS-20 Monitor 4(3245)
@LOG HEMPHILL (PASSWORD)
Job 22 on TTY223 13-Feb-80 22:19:25
Structure PS: mounted
Structure MISC: mounted
@ENABLE (CAPABILITIES)
$!
$! Connect to directory where debugging .EXE files are
$!
$CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG>
$!
$! Finally run the component
$!
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 134
EXAMPLE DEBUGGING SESSION
$RUN (PROGRAM) ORION.EXE.1
% ORION GLXIPC Alternate [HEMPHILL]QUASAR (PID = 66000031)
% ORION GLXIPC Becoming [HEMPHILL]ORION (PID = 70000032)
**** Q(0) 22:19:58 ****
% QUASAR GLXIPC Alternate [HEMPHILL]ORION (PID = 70000032)
**** O(1) 22:19:58 ****
^X
PTYCON> !
PTYCON> ! Now start up OPR as subjob OPR
PTYCON> !
PTYCON> DEFINE (SUBJOB #) 2 (AS) OPR
PTYCON> CONNECT (TO SUBJOB) OPR
[CONNECTED TO SUBJOB OPR(2)]
2102 Development System, TOPS-20 Monitor 4(3245)
@LOG HEMPHILL (PASSWORD)
Job 23 on TTY224 13-Feb-80 22:20:29
Structure PS: mounted
Structure MISC: mounted
@ENABLE (CAPABILITIES)
$!
$! Connect to directory where debugging .EXE files are
$!
$CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG>
$!
$! Finally run the component
$!
$RUN (PROGRAM) OPR.EXE.1
% OPR GLXIPC Alternate [HEMPHILL]QUASAR (PID = 66000031)
% OPR GLXIPC Alternate [HEMPHILL]ORION (PID = 70000032)
OPR>
22:19:59 -- Network Node 1031 is Online --
22:19:59 -- Network Node 2137 is Online --
22:19:59 -- Network Node 4097 is Online --
22:19:59 -- Network Node DN20A is Online --
22:19:59 -- Network Node MILL20 is Online --
22:19:59 -- Network Node SYS880 is Online --
OPR>!
OPR>! Let's take a look at our brand new queues
OPR>!
OPR>SHOW QUEUES
OPR>
22:21:21 --The Queues are Empty--
OPR>SHOW STATUS PRINTER
OPR>
22:21:27 --There are no Devices Started--
OPR>^X
PTYCON> !
PTYCON> ! Now start up BATCON as subjob B
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 135
EXAMPLE DEBUGGING SESSION
PTYCON> !
PTYCON> DEFINE (SUBJOB #) 3 (AS) B
PTYCON> CONNECT (TO SUBJOB) B
[CONNECTED TO SUBJOB B(3)]
2102 Development System, TOPS-20 Monitor 4(3245)
@LOG HEMPHILL (PASSWORD)
Job 24 on TTY225 13-Feb-80 22:21:49
Structure PS: mounted
Structure MISC: mounted
@ENABLE (CAPABILITIES)
$!
$! Connect to directory where debugging .EXE files are
$!
$CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG>
$!
$! Finally run the component
$!
$RUN (PROGRAM) BATCON.EXE.1
% BATCON GLXIPC Alternate [HEMPHILL]QUASAR (PID = 66000031)
% BATCON GLXIPC Alternate [HEMPHILL]ORION (PID = 70000032)
^X
PTYCON> !
PTYCON> ! Now start up special EXEC as subjob E
PTYCON> !
PTYCON> DEFINE (SUBJOB #) 4 (AS) E
PTYCON> CONNECT (TO SUBJOB) E
[CONNECTED TO SUBJOB E(4)]
2102 Development System, TOPS-20 Monitor 4(3245)
@LOG HEMPHILL (PASSWORD)
Job 19 on TTY226 13-Feb-80 22:23:00
Structure PS: mounted
Structure MISC: mounted
@CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG>
@!
@! Run the special EXEC, which is provided on the SWSKIT
@!
@RUN (PROGRAM) EXEC-FOR-DEBUGGING-GALAXY.EXE.1
TOPS-20 Command processor 4(560)-1
@ENABLE (CAPABILITIES)
$!
$! Make this EXEC switch from system queues to private queues
$!
$^ESET DEBUGGING-GALAXY
$!
$! Use ordinary EXEC commands to examine private queues
$!
$INFORMATION (ABOUT) OUTPUT-REQUESTS
[The Queues are Empty]
$INFORMATION (ABOUT) BATCH-REQUESTS
[The Queues are Empty]
$!
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 136
EXAMPLE DEBUGGING SESSION
$! Now switch back to look at system queues
$!
$^ESET NO DEBUGGING-GALAXY
$INFORMATION (ABOUT) OUTPUT-REQUESTS
Printer Queue:
Job Name Req# Limit User
-------- ---- ----- ------------------------
* KLERR 6 1197 DEUFEL On Unit:0
Started at 22:05:47, printed 314 of 1197 pages
XXX 3 18 KAMANITZ /Dest:4097
MS-OUT 18 117 BRAITHWAITE /Unit:0
There are 3 Jobs in the Queue (1 in Progress)
$INFORMATION (ABOUT) BATCH-REQUESTS
Batch Queue:
Job Name Req# Run Time User
-------- ---- -------- ------------------------
* DUMP 16 02:00:00 OPERATOR In Stream:0
Job# 17 Running DUMPER Last Label: A Runtime 0:23:55
BATCH 2 00:05:00 BLIZARD /Proc:FOO
SOURCE 8 00:05:00 BLOUNT /After:14-Feb-80 0:00
SRCCOM 12 00:05:00 MURPHY /After:14-Feb-80 0:00
QJD4R 13 00:05:00 SROBINSON /After:19-Feb-80 0:00
QAR 10 00:05:00 BLOUNT /After:19-Feb-80 0:14
SAVE 1 00:05:00 FICHE /After:19-Feb-80 9:10
There are 7 Jobs in the Queue (1 in Progress)
$!
$! Now let's submit a batch job to our own BATCON
$!
$^ESET DEBUGGING-GALAXY
$!
$! Make a trivial batch control file
$!
$COPY (FROM) TTY: (TO) A.CTL.1 !New file!
TTY: => A.CTL.1
@SY A
^Z
$!
$! And submit the job
$!
$SUBMIT (BATCH JOB) A.CTL.1
[Job A Queued, Request-ID 1, Limit 0:05:00]
$!
$! Now examine private queues
$!
$INFORMATION (ABOUT) BATCH-REQUESTS
Batch Queue:
Job Name Req# Run Time User
-------- ---- -------- ------------------------
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 137
EXAMPLE DEBUGGING SESSION
A 1 00:05:00 HEMPHILL
There is 1 Job in the Queue (None in Progress)
$!
$! Our job is in the batch queue, but no batch-streams have been started
$!
$^X
PTYCON> CONNECT (TO SUBJOB) OPR
[CONNECTED TO SUBJOB OPR(2)]
OPR>START (Object) BATCH-STREAM (Stream Number) 0
OPR>
22:25:40 Batch-Stream 0 --Startup Scheduled--
22:25:40 Batch-Stream 0 --Started--
OPR>
22:25:40 Batch-Stream 0 --Begin--
Job A Req #1 for HEMPHILL
OPR>
22:25:51 Batch-Stream 0 --End--
Job A Req #1 for HEMPHILL
OPR>
^X
PTYCON> !
PTYCON> ! Cleaning up is easy
PTYCON> !
PTYCON> KILL (SUBJOB) ALL
PTYCON> EXIT (FROM PTYCON)
@
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 138
TECHNICAL DETAILS
6.0 TECHNICAL DETAILS
This section is to explain what happens differently when a component
has had location 135 (.JBOPS) poked to -1, and to present a few
helpful tidbits of information about debugging some of the programs.
.JBOPS incidentally is the word in the job data area (defined under
TOPS-10) which is reserved for a program's OTS. GALAXY references
this location by the symbol "DEBUGW".
6.1 GLXLIB
GLXLIB is the GALAXY library. It consists of a code segment which
starts at address 400000 and a data segment at address 600000. Each
of the programs QUASAR, ORION, OPR, PLEASE, BATCON, LPTSPL, CDRIVE,
SPRINT, and SPROUT uses it. Part of the initialization code of each
of these programs maps in GLXLIB as a "high segment". This is in
effect an object time system for GALAXY, with many commonly used
routines. Most of the support for the private GALAXY system is in
this library, enough so that OPR, PLEASE, BATCON, LPTSPL, SPRINT and
SPROUT actually have no code which cares whether they are part of a
private GALAXY. The initialization code in each component looks in
three places to find GLXLIB.EXE: first on the structure and directory
that the component itself came from, second on DSK:, third on SYS:.
This search order is the same for both the system GALAXY and the
private one.
The actual changes implemented for the private GALAXY are as
follows:
1. Ordinarily, a component which stopcodes will save a crash
file on disk. When debugging, however, the crash file is not
written. In either case, if DDT is loaded with the program,
the stopcode will invoke a jump to DDT.
2. GALAXY components do not require receiving privileged packets
under debugging.
3. Ordinarily, QUASAR and ORION get special system PIDs for IPCF
communications. When debugging, they get PIDs with names of
the form "[username]QUASAR" and "[username]ORION". All
GALAXY components will then look for these PID names. Even a
pseudo-GALAXY component, such as MOUNTR or IBMSPL, will be
able to find these PIDs if its location 135 has been poked to
-1, simply because it uses GLXLIB.
4. GALAXY components print messages like:
"% QUASAR GLXIPC Waiting for ORION to start"
only while debugging.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 139
TECHNICAL DETAILS
5. ORION and QUASAR print messages about PIDs they acquire,
like:
"% QUASAR GLXIPC Becoming [HEMPHILL]QUASAR (PID =
66000031)"
6. All components print messages about the special PIDs they
find for QUASAR and ORION, like:
"% ORION GLXIPC Alternate [HEMPHILL]QUASAR (PID =
66000031)"
6.2 QUASAR
1. QUASAR reads and writes private queues from its connected
directory. The full filespec is
"DSK:PRIVATE-MASTER-QUEUE-FILE.QUASAR"
2. QUASAR does absolutely no privilege checking. Anyone can
modify or kill any request in the queues (if they know how to
speak to this private QUASAR).
6.3 ORION
1. ORION will create a log file under the name of
"DSK:ORION-TEST.LOG" instead of
"PS:<SPOOL>ORION-SYSTEM-LOG.001", and does no renaming of any
old log files present.
2. ORION will not set up any NSP servers when debugging. It
therefore will not speak to remote nodes to run OPRs for
them. However, there are hooks for ORION to initialize
"SRV:128" instead of the usual "SRV:47" when debugging.
6.4 QMANGR
QMANGR has also been modified to look for a private QUASAR's PID if
the low segment has a non-zero entry in .JBOPS.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 140
TECHNICAL DETAILS
6.5 CDRIVE
CDRIVE can pose a problem to debug, since it has potentially many
inferior forks all executing the same code, so each fork automatically
loads SDDT into its address space and jumps to it when it starts up.
After setting any breakpoints or otherwise modifying this fork's code,
the debugger types "GO<ESC>G" to resume the fork. While debugging, if
the fork terminates (crashes), CDRIVE will not go through its normal
purging of the crashed fork, so that its status can be examined.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 141
EXAMINING GALAXY CRASH FILES
7.0 EXAMINING GALAXY CRASH FILES
All GALAXY components use the stopcode facility supplied by GLXLIB.
This facility dumps the ACs, program error codes, associated error
messages, program version numbers, and the last nine locations of the
stack onto the controlling terminal of the program executing the
stopcode. In addition, a crash file is created with the name of the
form: PS:<SPOOL>program-stopcode-CRASH.EXE. This .EXE file contains
the entire core image of the program which has crashed, and is
extremely useful in determining the cause of the crash. In
particular, there is a block of data referred to as the "crash block"
which usually contains the information most pertinent to the debugger.
This information can be read with either DDT or FILDDT. Its contents
are tabulated as follows:
Location Data
.SPC PC of stopcode
.SCODE SIXBIT name of stopcode
.SERR Last TOPS-20 error code
.SACS Contents of the sixteen accumulators
.SPTBL Base address of page table used by
GLXMEM
.SPRGM Name of program in SIXBIT
.SPVER Program version number
.SPLIB GLXLIB version number
.LGERR Last GALAXY error code
.LGEPC PC of last GALAXY error return
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 142
DEBUGGING MOUNTR
DEBUGGING MOUNTR
1.0 INTRODUCTION
This write-up was prepared to assist developers and maintainers
in understanding and debugging the TOPS-20 tape and structure mounting
program, MOUNTR. It is assumed that the reader has a working
knowledge of TOPS-20 assembler language coding and the set of TOPS-20
monitor calls.
2.0 SOURCES OF INFORMATION
This document will serve primarily as a guide to debugging MOUNTR
crashes. Much of the information needed to understand the data bases
and the operation of MOUNTR resides within the first 20 or 30 pages of
the MOUNTR code itself. Just make a listing and start reading.
3.0 DEBUGGING A LIVE MOUNTR
MOUNTR can be debugged as a standard GALAXY component, by
depositing -1 in location 135 of MOUNTR.EXE. MOUNTR will aquire a PID
for a private copy of QUASAR and will communicate with it.
To debug a MOUNTR which is actually recognized by the system as
the "real" MOUNTR it is usually best to run it as a seperate job by
including the following commands in SYSJOB.RUN:
JOB n /LOGIN OPERATOR XX OPERATOR
ENABLE
GET SYS:MOUNTR
START
/
This job can be reached by use of the ADVISE command, MOUNTR can
be killed and a new copy can be started with appropriate breakpoints
or patches installed. Before MOUNTR can be patched or breakpointed it
is necessary to issue the DDT command $W since MOUNTR write protects
itself during execution. For example:
@ENABLE
$ADVISE OPERATOR
TTY2, NRT20
TTY235, OPR
TTY234, MOUNTR
TTY233, PTYCON
TTY232, EXEC
TTY: 234
[Pseudo-terminal, confirm]
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 143
DEBUGGING A LIVE MOUNTR
Escape character is <CTRL>E, type <CTRL>^? for help
OPERATOR Job 3 MOUNTR
LINK FROM MOSER, TTY 60
[Advising]
^C !KILL OLD MOUNTR
^C
$GET SYS:MOUNTR !GET A NEW ONE
$DDT !ENTER DDT
DDT
$W !YOU MUST DO THIS
USM/ MOVEI 2,BSRRTA# .$B !SET SOME BREAKPOINTS OR WHATEVER
DDSCIH/ JSP 16,SAVEQR# .$B
^Z !EXIT DDT
$START !START MOUNTR
Depositing 1 in location CDFLG will enable CONTROL-D interrupts.
Typing CONTROL-D when enabled causes MOUNTR to enter DDT.
4.0 MOUNTR CRASHES
When MOUNTR crashes, it saves its core image in the file,
PS:<SPOOL>MOUNTR-CRASH.EXE
All crashes are initiated by a CALL STOP instruction. This may result
from a logic inconsistency, or it can happen if MOUNTR receives a
software interrupt on a panic channel. The STOP routine gathers some
important data and saves it in core. It then types a message giving
the name of the filespec wherein it is saving the core image, and
issues an SSAVE JSYS to save the image. After restoring the ACs from
the time of the crash, MOUNTR halts.
To begin debugging a MOUNTR crash, follow these steps:
1. GET PS:<SPOOL>MOUNTR-CRASH.EXE
2. Get into DDT and type STOP1$G. This will load DDT's ACs with
MOUNTR's ACs at the time of the crash and exit to the EXEC.
Give the DDT command to the EXEC again to get back into DDT.
3. Look at P (AC 17). If it contains PDL1+something, there has
been a stack trap, and the routine STOPP was called as a
result. The location BADP contains the contents of P at the
time of the trap.
4. If P contains PDL+something, type TAB to look at the top of
the stack. This will contain one plus the address of the
CALL STOP instruction. Type TAB and ^H to display the
CALL STOP instruction that invoked the crash. If MOUNTR died
as a result of a panic channel interrupt, LPC1 will contain
one plus the address of the instruction which was executing
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 144
MOUNTR CRASHES
at the time of the interrupt.
The following locations and data structures are helpful in locating
the cause of difficulties in MOUNTR:
NAME FUNCTION
---- --------
CRSHAC Contains the ACs at the time the STOP routine was called.
LPC1 For crashes caused by panic channel interrupts, LPC1 contains
one plus the address of the instruction that caused the crash.
LSTERR Contains the last TOPS-20 error.
MRPDB PDB for last IPCF message received by MOUNTR
MSTRBK Used as an argument block for MTOPR and MSTR monitor calls.
RBUF Last IPCF message received by MOUNTR (particularly useful if
SSSDAT+1 contains MRCVIH, indicating that MOUNTR crashed while
processing an incoming IPCF message).
SSSDAT When MOUNTR crashes, SSSDAT+1 contains the address of the
routine that was invoked by MOUNTR's scheduler. Starting here
and using the stack, you can trace the execution of MOUNTR's
code that led to the crash.
TBUF Last IPCF message sent by MOUNTR.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 145
DEBUGGING PA1050
DEBUGGING PA1050
In order to debug the compatibility package you must have a copy of
the file called PAT.EXE. PA1050 is just the system name for PAT. If
there is no copy of PAT.EXE, then take the source program called
PAT.MAC, and assemble it. Thereby creating a sharable save file
called PAT.EXE. To debug the compatibility package the following
steps are required.
$RESET
$GET ISAM ;Where ISAM may be any program you choose
$MERGE PAT ;PAT is the source name for PA1050
$DDT
PAT$: MOVBF$b ;You set your breakpoints here
DEBUG$G
$G ;You must type $G twice because of the double
symbol table
NOTE
Some of the error messages you may
receive from PA1050 may not be the true
error message. To have the correct
error message printed out use an ERJMP,
or an ERCAL after the JSYS it fails on.
For more information on ERJMP and ERCAL
refer to the Monitor Calls Reference
Manual.
In order to build the compatibility package the following steps are
required.
$LOAD /CREF PAT.MAC
$START
$SAVE PAT
$GET PAT
$DDT
MAKEPF$G
Output file: PA1050.EXE
$
UDDT
40000,,0$X
^Z
$I MEM
The start after loading causes the program to be moved from its
location to its running location in high core. The symbol table is
also moved, and the pointer adjusted. A sharable save file of pages
700-777 must be made for debugging. This is created when you
MAKEPF$G, then load 40000,,0 in UDDT. When you type I MEM you should
now have PA1050.EXE in 700-730.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 146
COPYING FLOPPY DISKS
COPYING FLOPPY DISKS
====================
This is a description of the front end program COP (quick floppy
copy). This program should be used to create backup copies of the
distributed set of floppies.
CAUTIONARY NOTES ABOUT FLOPPY DISKS:
1) Only IBM floppies should be used. Other floppies may
destroy the DX11 drives.
2) Floppies have a finite life while mounted in the
drive. The heads do not float, and the floppies turn
continuously. This causes the magnetic surface to be
eaten away. Minimum floppy life is something like 200
hours.
3) Floppies which are dropped, badly shocked, or used as
frisbees will lose their sector headers, and will be
good for nothing.
4) Never put a floppy which you suspect is bent into the
drive -- it may damage the drive.
5) COP is discussed also in the Front End File System
Specification manual in Volume 14 of the TOPS-20
Software Notebooks, section 3.2.
COP COMMANDS:
The basic COP command string is of the form:
COP> <destination device>/<switch>=<source device>
To enter COP, type a Control-backslash to get to the
Parser, then MCR COP to start up COP. The floppies
should have already been mounted with MCR MOUNT, and
should then be dismounted with MCR DMOUNT after the
copy.
COP SWITCHES:
/HE Help, types a list of switches
/RD Read Device, check for errors
/CP Copy (default action)
/VF Verify copy (default when copy in effect)
/ZE Zero the device
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 147
COPYING FLOPPY DISKS
COP EXAMPLE:
The following sequence of commands will succeed in
copying the contents of the floppy in DX0: (the left
hand drive) onto the floppy in DX1:, and verifying the
operation.
^\
PAR>MCR MOU
MOU>DX0:
Mount completed
MOU>DX1:
Mount completed
MOU>^Z
^\
PAR>MCR COP
COP>DX1:=DX0:
COP>^Z
^\
PAR>MCR DMO
DMO>DX0:
Dismount Complete
DMO>DX1:
Dismount Complete
DMO>^Z
The copy takes about two minutes, the verify about the same.
Take care to specify the correct source and destination
devices.
CAUTIONARY NOTE--
If you COP for many generations you will build up
ghost bad blocks until RSX will declare the floppy
useless. This is because in each generation the bad
block file of the old floppy is copied onto the new
(which will have its bad blocks in different physical
locations). A way around this is to use PIP for any
non-boot copies once every several generations.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 148
THE SWSKIT TOOLS PROGRAMS
THE SWSKIT TOOLS PROGRAMS
=========================
Included on the SWSKIT are a number of utility programs, as summarized
below. These tools have been found to have at least some usefullness in
the past in a debugging environment. Most of these programs require the
user to have WHEEL or OPERATOR privleges to work, but also most are of
the "show and tell but don't touch" category, so they are in general
"safe" to run.
We have cleaned up some of the old ones a bit, added a few new ones, and
checked them all out to the extent that they will all run. There should
even be some documentation, at least a HELP file, with each program.
While we do not actively "support" these programs, we are quite willing
to accept complaints and suggestions and submissions from the field.
These are the "standard" tools; the Marlboro Support Group is generally
familiar with their operation and quirks, and in providing support to
the field may request that one or more of the programs be used at a
customer site to diagnose or assist in correcting a problem. This is
generally more effective than random poking about in DDT, or trying to
learn the peculiarities of whatever the customer may have available.
And now, the current collection:
PROGRAM DESCRIPTION
------- -----------
CHANS This program will produce system
configuration, and status information on
tapes and disks.
DIRPNT This program will list the contents of
the blocks in a disk directory.
DIRTST This program will check the format, and
list any invalid data in directory
files.
DS This program will provide software
diagnostic help concerning the disk file
system. It can perform the functions of
READ, FILADR, and UNITS.
TOPS-20 TROUBLE-SHOOTING HANDBOOK Page 149
THE SWSKIT TOOLS PROGRAMS
PROGRAM DESCRIPTION
------- -----------
DSKERR This program will provide a convenient
listing of the hard and soft disk errors
that have occurred.
DX20PC This program will trace the microcode PC
in the DX20.
EXEC-FOR-DEBUGGING-GALAXY This EXEC contains commands to
facilitate debugging a private GALAXY
system.
FILADR This program will display the disk
addresses a file is using, or the
addresses which are marked in the BAT
block.
JSTRAP This program will produce information in
a log on any JSYS, including the PC and
arguments used.
MONRD This program will allow you to easily
examine the running monitor.
MTEST This program will allow the you to
insert MONITOR instruction execution
tests anywhere in the monitor.
READ This program performs the same action as
the CHECK FILE command to DS; it
read-checks files for disk errors.
REV This program will allow you to easily
alter, edit, delete, obtain information,
etc. on files.
RSTRSH This program will detect bug induced
changes in the resident monitor in a
dump file.
SWSERR This program produces a convenient
listing of BUG HLT/CHK/INF occurances.
TYPVF7 This program is useful for typing out
the contents of a VFU file in a readable
form.
UNITS This program will produce status
information on disk drives.