Trailing-Edge - PDP-10 Archives - BB-L288A-RM - swskit-documentation/handbook.mem
There are 5 other files named handbook.mem in the archive. Click here to see a list.


                          Release 4 Edition
                           RP20 LIR Update

                             January 1981

                         TOPS-20 Monitor Group
                        Marlboro Support Group
                          Software Services
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 2


     This document is the TOPS-20 Trouble-Shooting Handbook.  It is  a
collection  of materials designed to increase the effectiveness of the
Software Specialist in the field  in  coping  with  TOPS-20  problems.
Some  of the common "disasters" to befall TOPS-20 sites are discussed,
along with debugging  methods  in  general.   Though  the  information
contained  herein is probably not sufficient to make a Specialist into
a TOPS-20 "wizard", it  should  help  ease  the  communication  burden
between  the  Specialist  in the field and his counterpart in Marlboro
and lead to quicker resolution of problems.

     This document contains materials from many sources, and  presents
some information not available anywhere else.  Certain sections may be
a bit dated, but an effort has been made to remove at  least  some  of
the old/wrong stuff along with including new articles.

     There is a continuing need to update this document as part of the
SWSKIT  materials, and Specialists are encouraged to give the Marlboro
Support Group feedback on these materials.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 3

                          TABLE OF CONTENTS

     1.  INTRODUCTION                                             2

     2.  TABLE OF CONTENTS                                        3

     3.  POLICY STATEMENT                                         5

     4.  PRODUCING A GOOD SPR                                     6

     5.  USING SIRUS                                              9

     6.  DDT PATCHING THE TOPS-20 MONITOR                        16

     7.  MAPPING DIRECTORIES IN MDDT                             20

     8.  RECOVERING FROM DIRECTORY ERRORS                        23

     9.  MORE ABOUT DIRECTORY PROBLEMS                           26

    10.  JSB AND PSB MAPPING                                     28

    11.  BREAKPOINTING MULTI-USER CODE                           32


    13.  RECOVERING FROM SYSTEM DISASTERS                        37

    14.  LOOKING AT HUNG TAPES                                   43

    15.  A LOOK AT SOME OF THE DISK STUFF                        47

    16.  NEW DISK FEATURES FOR FILDDT                            51

    17.  TOPS-20 SCHEDULER TEST ROUTINES                         54

    18.  TOPS-20 PAGE ZERO LOCATIONS                             61

    19.  KNOWN HARDWARE DEFICIENCIES LIST                        65

    20.  KS10 CONSOLE INFORMATION                                67

    21.  CRASH ANALYSIS                                          76

    22.  MORE CRASH ANALYSIS                                     95


    24.  MONITOR BUILDING HINTS                                 114
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 4

    25.  EXEC DEBUGGING                                         118

    26.  RECOVERING FROM A BAD EXEC                             125

    27.  DEBUGGING THE GALAXY SYSTEM                            126

    28.  DEBUGGING MOUNTR                                       142

    29.  DEBUGGING PA1050                                       145

    30.  COPYING FLOPPY DISKS                                   146

    31.  THE SWSKIT TOOLS PROGRAMS                              148
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 5


     There is a great confusion concerning the materials that make  up
the SWSKIT tape, and their legal standing.  This memo is an attempt to
clear up some of those problems.

     The SWSKITs are made up of an assortment of materials intended to
increase   the   effectiveness  of  the  software  specialist.   These
materials include program sources not normally distributed or sold for
a premium;  internal and company confidential documentation, which may
be in part incomplete or actually  incorrect,  but  supplied  for  the
information value on subsystems which may be insufficiently documented
through the usual channels;  documentation for  specialists  specially
produced  by  the  corporate  support  people;   and  utility programs
produced and maintained to  some  extent  by  corporate  support.   In
addition,  the  SWSKIT  may contain special or pre-release versions of
supported software provided for the incremental value a specialist may
obtain  from  the  software  under controlled circumstances.  In time,
utilities from the SWSKIT may evolve into supported products.

     All of the SWSKIT materials are proprietary to DIGITAL, and  were
never  intended  to  be  just  given  to the customer.  Obviously, the
materials which are otherwise sold cannot  be  given  away;   and  the
company  confidential  materials  should not be.  While it is expected
that the tools programs may wind up  being  used  at  customer  sites,
neither  are  they  gifts  to the customer.  An effort must be made to
protect  DIGITAL's  rights  to  these  proprietary   materials.    For
instance,  a PL90 contract retains rights to all materials provided to
the customer.  Deleting a tool program after use at  a  customer  site
indicates  intent.   There  should  be an awareness that if a customer
incurs damages due to  use  of  some  program  given  to  him  by  the
specialist,  even  though improperly used, then DIGITAL may be seen to
be at least in part responsible.  This should be avoided.

     In summary, the  SWSKIT  is  a  tool  provided  to  increase  the
effectiveness  of  the  specialist, especially with regard to PL90 and
debugging activity, but  the  rights  to  all  materials  remain  with
DIGITAL and the specialist should act accordingly.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 6

                          PRODUCING A GOOD SPR

           A software specialist is  often  asked  to  assist  with  the
      submission  of  SPRs for a customer.  It is always discouraging to
      have  problems  getting  an  answer  to  an   SPR   for   entirely
      non-technical  reasons.  For that reason, below are some hints for
      producing a "good" SPR which will  help  in  getting  the  problem
      solved more quickly.

      1.0  THE SPR FORM

      Much of the data on the SPR  form  is  unimportant,  until  it  is
      omitted.   The  line  of  product data is one.  Try to isolate the
      problem to the correct component, since that  will  determine  who
      first  receives  the SPR.  This will remove the time it takes for,
      say the COBOL maintainer, to determine that  the  problem  is  not
      really  in  COBOL,  but  in PA1050 or the monitor, and the time it
      takes for the next maintainer to become familiar with the problem.
      Something  which  crashes  the system is always a monitor problem,
      even if it is an EXEC command which causes the problem, or a short
      BASIC program.

           If you really have a problem, be sure to mark  the  "problem"
      box,  and  don't  use  words  like  "we  suggest  you  correct the
      following situation...".  If the people who  handle  the  incoming
      paperwork  think they have a suggestion, it gets routed elsewhere,
      and is never seen by the maintainers.  A few  problems  have  been
      greatly delayed this way.

           The priority boxes are not super-critical, but if you have  a
      problem  which  is  holding  up production, or crashing the system
      several times a day, try to make a note of that somewhere  in  the
      description  of  the problem.  That should let the maintainer know
      that a work-around may also be appropriate in the short term.

           The phone number of the submitter could be important  if  the
      problem  is  of  such a nature that it proves not-reproducible, or
      the  complexity  is  such  that  futher  clarification   just   to
      understand  the  problem  might  be needed.  Your number here as a
      software specialist provides a more informal contact  than  direct
      maintainer-to-customer  confrontation,  although the customer will
      be contacted directly if that is most expedient.

           The attachments--be sure to mark some of these boxes  if  you
      send  along  supporting  materials.  Since these can get separated
      from the form, this will help keep them from  getting  permanently
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 7

           The "DO NOT PUBLISH" box is for security problems and ways to
      crash  the system.  We double-check this on incoming handling, but
      if the box is checked you can be sure that the  SPR  will  not  be
      published unanswered.

           Describe the problem as clearly  as  possible  in  the  space
      provided.   Try  to  provide enough detail to easily reproduce the
      problem.  Concentrate on the description of the problem,  and  any
      diagnosis  you  may  have made.  Attempting to declare a "cure" is
      not always good idea because the actual correction may  be  of  an
      entirely  different  nature  for a number of reasons.  However, if
      you have something that works, the information could  be  of  use.
      Just  don't  count  on that exact change being the actual fix.  If
      the problem  is  not  reproducible  from  the  description  given,
      chances  are  that  something  you  left  out  is  relevant to the
      problem.  Unless the problem directly concerns them,  things  like
      logical  names,  mounted  structures,  and  other  features  often
      obscure the problem.  For the purpose of the problem  description,
      a terminal listing of an occurrance is often highly desirable, and
      it is sometimes a  good  idea  to  create  a  brand-new  directory
      without  any  fancy  LOGIN.CMD  setups or user groups and so on to
      demonstrate the problem.


           As above, the listing from a terminal session is often a very
      good  attachment.   Try  to  include all the relevant information.
      Again, sometimes things like logical  names,  file  and  directory
      protections,  user  groups,  and  other  job-state  variables  are
      important and should be  included.   Inclusion  of  data  such  as
      program version numbers and edit levels can be useful for products
      with large numbers of edits.  If you are  complaining  of  monitor
      problems,  which  patches  you  have  installed  could  be  useful
      information.  Terminal sessions should be as  clear  as  possible.
      It  should be made obvious just what is going on or the maintainer
      may just see a series of commands and think "So?".  Concurrent  or
      after the fact commenting is one way to accomplish this.

           Many times there  is  a  program  which  exercises  the  bug.
      Sometimes  these  programs are alright as they are, but often they
      are giant COBOL monsters working on a multi-RP06  data  base,  and
      very  unwieldy  for  a  maintainer  to  try  to work with.  If the
      program can be reduced to a small subset,  do  so.   Many  monitor
      problems often turn out to be reproducible from a set of arguments
      to a single JSYS.  If it is a question of  incorrect  output  from
      some  program, it is helpful to send along all the files needed to
      reproduce the problem, and the files of incorrect output.  In  the
      case  of  programs with multiple edits to field-image, this speeds
      up the maintainer, since he does not have to manually apply  those
      edits  to attempt to recreate your versions, and he can also check
      the installation of the edits, if that  is  appropriate.   And  in
      case  the  problem  proves  to  be not easily reproducible the bad
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 8

      output can at least be examined for clues.

           In the case of a monitor crash, the  problem  may  have  been
      reduced  to  a  program  of less than one page.  It is tempting to
      type this on the front of the SPR and send it in that way.   While
      the  maintainer can type in the program easily enough (if the copy
      is  both  legible  and  correct),  the  submitter  has  been  lax.
      Sometimes,  that short program will not cause a crash, even though
      run thousands of times under varying conditions by the maintainer.
      And  even  when  it  does  cause  the  crash  the  first time, the
      submitter has lengthened the turn-around by not sending  the  dump
      from  the  crash along with the SPR.  Sending the dump solves both
      problems.  If the problem is not reproducible with ease, the  dump
      is  vital  to further understanding.  And having the dump to start
      with speeds up the work of the maintainer who now does not need to
      schedule  stand alone to try to exercise the bug and cause a crash
      so he has a dump to look at.

           When sending a dump, always send the unrun monitor along with
      it.   If  you  don't, you are just causing a delay in handling the
      problem while the maintainer tries it against the  standard  ones,
      which  involves  finding tapes with the standard ones, and loading
      them...  If you are running an unpatched standard monitor, and you
      refuse  to send it, at least tell which one it is somewhere on the
      form.  The unrun monitor is also useful for checking the existence
      and correct installation of patches when that becomes an issue.

           The current preferred tape format is 9-track, 1600bpi, and in
      standard  DUMPER  format,  not  in  INTERCHANGE format, since file
      information can be lost that way.  Take the time to get a  listing
      of  a directory of the tape and include it with the tape.  It will
      help to speed things up, as if it is obvious  from  the  directory
      that something is missing, faster feedback is generated.  There is
      also the indication that the tape will  indeed  be  readable  when
      received,  and  will  partly eliminate the usual first step of the
      maintainer in getting a directory of the tape.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 9

                              USING SIRUS

     Did you know that you can dial into a Marlboro development system
and type out almost any patch that the Marlboro Support Group has made
to -10 or -20 software in the last three to four years?   The  program
which does this is called SIRUS, and with it you can:

     1.  Search through all the patches to a  particular  product,  if
         you know a problem exists but don't know what the patch is or
         don't know if we've heard of the problem.  If  you  find  the
         patch you want, you can then type it out.

     2.  Type out a particular patch to a particular product,  if  you
         know what the edit number is.

     3.  Obtain the status of any SPR, including the entire answer  if
         it has been answered.

     By using SIRUS, you can get patches whenever the  system  is  up,
even  if  it's  two  A.  M.  and the Hotline is closed.  You can print
patches in your local office without having to wait for  a  specialist
in  Marlboro  to  mail you a copy.  You can be sure that the patch you
have is correct.  (Dictating patches over the Hotline is very prone to
errors.)  Even  if the problem you are experiencing cannot be found in
SIRUS, you can help us when you call by so  stating.   We  immediately
know that the problem you are having is a new one.

     There have been several articles about SIRUS  in  previous  Large
Buffers, but none have been oriented towards specialists in the field.
This one is!

     To use SIRUS, dial into system 1026 in Marlboro, log in, and then
run it.  In more detail:

     1.  Dial into system 1026.  Any of  the  following  numbers  will
         reach system 1026 in Marlboro.  They are all 300 baud lines.
                            231-1171  (DTN)
                            231-1172  (DTN)
         Once the machine notices you, type "SET HOST  26"  to  insure
         that  you  are  connected  to  system  1026.   If you get the
         message "?Undefined Network Node", the machine is  down  (try
         again later).
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 10

     2.  To login, type "LOGIN 10,#".  When  the  machine  requests  a
         name, type one in.  You will not need a password.

     3.  To run SIRUS, just  type  "R  SIRUS".   SIRUS  takes  several
         seconds  to  initialize  itself  and  then  prompts  you with
         "PRODUCT [H]*".  At this point,  type  either  "10<CRLF>"  or
         "20<CRLF>"  depending  on  whether the customer of concern is
         running TOPS10  or  TOPS20.   SIRUS  then  prompts  you  with
         "[H] *".  You are now at SIRUS command level.

     SIRUS has many commands, but only a few are of  interest  to  the
field specialist.  They are:

     1.  H -- for Help.  This may be typed anytime SIRUS precedes  its
         prompt with "[H]".

     2.  EX -- for Exit.  Use this to exit SIRUS.  Then  type  K/N  to
         logout, and hang up.

     3.  PP -- for Peruse PCOs.  PCO stands for Product  Change  Order
         and  essentially means a patch.  This command is used to look
         through patches for a particular product if you  aren't  sure
         which patch you want.

     4.  GP -- for Get PCO.  This is used to  type  out  a  particular
         patch once you know which one you want.

     5.  GS -- for Get SPR.  Use this to  retrieve  information  on  a
         particular SPR.

     6.  NP -- for New Product.  Use this  command  if  you  type  the
         wrong  answer to "PRODUCT [H]*" as mentioned above, or use it
         in association with the PP command as described below.  SIRUS
         will prompt you for a product again.

     The three most useful of these commands are PP, GP, and GS.

3.0  PP Command

     Use this command to peruse the patches for a  particular  product
--  e.g.   LINK  or  603  (monitor) or BATCON -- if you want to find a
particular patch you know exists, or  if  you  want  to  know  if  the
support group has heard of and fixed some problem you are experiencing
with a product.  After you type "PP<CRLF>" SIRUS  will  prompt  for  a
component.  Here type the program you're interested in -- LINK, BATCON
or whatever.  A response of LIST will type the  programs  SIRUS  knows
about and then prompt you for a component again.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 11
PP Command

     Once you type in the component, SIRUS prompts with  "[H] PCO #:".
There  are two reasonable responses to this.  The first is ALL.  (Type
NO to the subsequent question about a file.)  This  will  give  you  a
short  summary of all the patches available for this product, one line
per patch.  This includes a PCO number, the SPR for which  this  patch
was  written,  the  edit  number  corresponding  to the patch (for the
TOPS10 monitor this is the MCO number), a keyword describing the  bug,
the  maintainer  who  wrote  the patch, and the date it was made.  The
other response you might type here is simply  <CRLF>.   In  this  case
SIRUS will type out the symptom of the newest PCO, and then prompt you
with "NEXT?".  By continuing to type carriage returns,  you  can  type
all  the symptoms of all the patches for this product, from the newest
to the oldest.  When you have found the patch you want  (remember  the
PCO number), type RETURN to get back to SIRUS command level.

     If you did not find your symptom while perusing, and your product
exists  on both TOPS10 and TOPS20, you should also search the PCOs for
the alternate operating system.  To do this, type NP to SIRUS  command
level,  and  then type in the other product number when SIRUS asks for
it.  Then peruse PCOs for your product as you did before.

4.0  GP Command

     This is used to print out a patch once you know the  PCO  number.
The  PCO  number  is printed while you are perusing PCOs and is of the
form 10-product-nnn or  20-product-nnn.   After  typing  GP  to  SIRUS
command  level,  SIRUS prompts for a PCO number.  The leading "10-" or
"20-" is supplied by SIRUS, so your response should  be  of  the  form

     In response, SIRUS types out information about  the  patch.   The
two most useful data are labeled VLD and SAE.  VLD stands for validity
and is the version of the software to which the patch applies.  SAE is
Source  After Edit and is the edit or MCO number of the patch.  To get
the actual text of the patch, respond YES to  SIRUS's  question  "Show
Write-up File?".

5.0  GS Command

     This is used to get the status of an SPR.  SIRUS will prompt  for
an  SPR  number, and then will provide you with info about the SPR you
specified.  This  includes  the  site  that  submitted  the  SPR,  the
specialist  responsible  for  the  SPR, and date received and the date
closed, if the SPR has been answered.  If answered, it will  also  say
whether or not an auxiliary file was written for the SPR and what PCOs
(if any) were included.  The aux file  is  an  introductory  paragraph
which  is written for most SPR answers.  For SPRs which do not require
patches, the aux file constitutes the entire answer.  The aux file can
be typed by responding YES to "SHOW AUXILIARY FILE?".  The PCOs can be
typed out with the GP command.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 12
GS Command

     Finally, if SIRUS begins to give you error messages such as "File
not  found",  EX  from  SIRUS  and  mount a special disk pack with the
monitor command "MOUNT SIRS:".  Then try again.  This gives you access
to more PCOs and aux files than are normally available.

     For more information, see the example  run  of  SIRUS  below,  in
which  user  input  is  shown  underlined,  or  the  article  on SIRUS
published in volume 409 of the Large Buffer.  Finally,  SIRUS  is  for
use  by  DIGITAL personnel only.  DO NOT give out instructions for its
use or the system 1026 phone numbers to customers.

 - -----

[H] *PP
[H] PCO #:<CR>

Jobs sent to the LPT queue from D60SPL are  given  a  random
file name and are billed to OPERATOR.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 13
GS Command

If the spooler is pausing, typing a  GO  can  result  in  an
illegal instruction.

PCO 015 SPR 12355             (6,022) KEY= LNAME      BENCE      09-JUL-79
PCO 014 SPR 12225  OUTOUT     (6,020) KEY= PAUSE      WEISBACH   09-JUL-79
PCO 013 SPR 11660  LODVFU 6013(6,014) KEY= VFU        WEISBACH   09-JUL-79
PCO 012 SPR 13244  D60CRE 103 (6,032) KEY= CARD       L.NEFF     06-JUL-79
PCO 011 SPR        D60CR4 103 (6,015) KEY= CARDS      L.NEFF     03-JUL-79
PCO 010 SPR        REQUEU 103 (6,030) KEY= CTQMFQ     L.NEFF     14-JUN-79
PCO 009 SPR 12588  INTCTC 1   (6,026) KEY= CONTROL C  TEEGARDEN  17-MAY-79
PCO 008 SPR 12881  OUTE.6 103 (6,025) KEY= REQUEUE    NEFF       17-APR-79
PCO 007 SPR 12139         103 (6,019) KEY= ILLEGAL    WEISBACH   27-OCT-78
PCO 006 SPR 12005             (0) KEY= SIMULTANEO BENCE      22-SEP-78
PCO 005 SPR 11672  ENDJOB 103 (6,018) KEY= QUASAR     BENCE      18-SEP-78
PCO 004 SPR 11841  D60STK 103 (6,016) KEY= BAD        WEISBACH   23-AUG-78
PCO 003 SPR 11476  TTYOUT 103 (6,010) KEY= OVERWRITE  WEISBACH   12-MAY-78
PCO 002 SPR 11431  OUTE.6     (6,007) KEY= INTERRUPTS WEISBACH   12-APR-78
PCO 001 SPR 11456  D60SPL     (6,006) KEY= BLANK      WEISBACH   03-APR-78
[H] *GP
[H] PCO #: 20-D60SPL-8
VLD:    103(2304)
SBE     %103 (6,024)
SAE     %103 (6,025)
DOC:    N 
F/D:    F
TEST FILE:     :          [        ]
P-IND:  10
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 14
GS Command

008             NEFF

     If a job is requeued because of a  communications  failure,  with
D60SPL  reporting  that  the  station  has  signed off, then, when the
station signs on again, the print file  will  be  restarted  from  its
beginning, not from the last checkpoint.


     When the  error  is  detected,  routine  OUTE.6  calls  IBACK  to
backspace  the  file  five  pages.   IBACK  zeroes  the  page counter,
J$RNPP(J), and rewinds the  file,  in  the  belief  that  the  forward
spacing  code  will  update  the page count as it skips to the correct
page.  However, D60SPL discovers the error is not recoverable  and  it
requeues  the job immediately.  Since the page count is never updated,
DOREQ requeues the job to start at the beginning of the file.


     Preserve the page at which to resume printing over  the  call  to
IBACK.  if the job is to be requeued immediately, restore J$RNPP(J) so
that the job will be requeued and checkpointed five  pages  back  from
its current position.
File 1) DSK:D60SPL.MAC[4,1022]  created: 1724 09-Apr-1979
File 2) DSK:D60SPL.MAC[4,417]   created: 1625 10-Apr-1979

1)1             LPTEDT==6024                    ;EDIT LEVEL
1)              LPTWHO==1                       ;WHO LAST PATCHED
2)1             LPTEDT==6025                    ;EDIT LEVEL
2)              LPTWHO==1                       ;WHO LAST PATCHED
1)4     ;*****End of Revision History*****
2)4     ;6025   If a job printing on a remote printer is interruped by
2)      ;       a communications failure, requeue to start five pages ba
2)      ;       instead of at beginning of file.  LLN, SPR # 20-12881,
2)      ;       10-APR-79
2)      ;*****End of Revision History*****
1)179           PUSHJ   P,IBACK                 ;BACKSPACE THE FILE
1)              PUSHJ   P,INTON                 ;[6007]TURN INTERRUPTS B
        ACK ON
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 15
GS Command

1)              PUSHJ   P,D60NRY                ;PERFORM "NOT READY" DIA
1)               JRST   OUTE.7                  ;ERROR IS UNRECOVERABLE
1)              TELL    OPR,[ASCIZ /![LPT...  continueing!]
2)179   ;**;[6025] ADD SEVERAL LINES AT OUTE.6 + 13L.  LLN, 10-APR-79
2)              MOVE    T1,J$RNPP(J)            ;[6025] CALCULATE THE NE
2)              SUB     T1,N                    ;[6025]  DESTINATION PAG
2)              PUSH    P,T1                    ;[6025]  AND SAVE IT
2)              PUSHJ   P,IBACK                 ;BACKSPACE THE FILE
2)              PUSHJ   P,INTON                 ;[6007]TURN INTERRUPTS B
        ACK ON
2)              PUSHJ   P,D60NRY                ;PERFORM "NOT READY" DIA
2)               JRST   [POP    P,J$RNPP(J)     ;[6025] RESTORE PAGE NO.
         FOR REQUEUE
2)                       JRST   OUTE.7]         ;[6025] ERROR IS UNRECOV
2)              POP     P,(P)                   ;[6025] THROW AWAY DESTI
2)                                              ;[6025] PAGE - FORWARD S
2)                                              ;[6025] CODE WILL HANDLE
2)              TELL    OPR,[ASCIZ /![LPT...  continueing!]
[H] *EX


TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 16

                    DDT PATCHING THE TOPS-20 MONITOR

     This article discusses how DDT patches are made to TOPS-20.

     From time to time the Marlboro Support Group has  to  describe  and
explain  the DDT patching of TOPS-20 to Specialists from the field.  The
following is an explanation, if not a justification,  of  the  way  some
things are done.

     A DDT patch to TOPS-20 as published is, in essence, a terminal  log
of a session applying the patch by hand.  This differs from the sometime
practice of a control file containing only the typein to DDT.   The  raw
typein  has  a few disadvantages with respect to the log:  It is hard to
display in a publication format like  the  Software  Dispatch  the  bare
control  characters like linefeeds and tabs that might be used, and even
harder to edit around them with the  only  currently  supported  editor,
EDIT.   In addition, the full typescript allows some confidence building
(or cause for concern) if the DDT typeout from application of the  patch
is  (is  not)  the  same  as  the typescript.  The published patch IS an
actual typescript, and is  "proof"  that  the  patch  CAN  be  correctly

     In applying  the  patch,  the  basic  methodology,  lacking  innate
knowledge,  is  to  just  start  typing from the typescript whenever the
computer goes into input wait.  Any "$" appearing in a DDT session which
is  not  the prompt from the enabled EXEC should be the result of typing
an ESCAPE.  (ESCAPE is sometimes referred to  as  ALTMODE  or  ALT.)  In
order  to  avoid confusion, we try never to use any dollar sign symbols,
and hopefully should make special note of any that might occur.

     Starting at the top of a session, there are usually a few  comments
about  the  patch.   If  we  are currently patching multiple releases of
TOPS-20, the specific release for the patch should be noted here.   Also
noted  should  be any hardware or monitor dependencies:  KS- or KL-only,
or 2040, 2060, or ARPA only, etc.

     The first monitor command is an ENABLE, followed by a  GET  of  the
monitor  file  to be patched.  Unless we are patching an existing patch,
our published patches always show us patching a "virgin"  monitor  file,
one  without  any previous patches installed.  You should always be able
to duplicate the patch typescript yourself on an unpatched monitor.

     At this point we do a START 140 command to get into DDT.  There  is
a  fine distinction at this step between typing START 140 and typing DDT
to get into DDT.  START 140 starts up EDDT (Exec-mode  DDT)  running  in
user  mode,  which is the required action.  Typing DDT to the EXEC would
merge  SYS:UDDT.EXE  with  the  monitor  EXE  file  and  start  up  UDDT
(User-mode  DDT), which is not what we want.  In fact, with Release 4 of
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 17

TOPS-20 the EXEC is clever enough to start up EDDT for  us  on  the  DDT
command  also,  but  even  so, for the sake of consistency, and to avoid
confusion, published patches should still use START 140.

     After entering DDT, it is common to select the local  symbol  table
for  the  module  to  be  patched  in  case  there might be local symbol
conflicts, etc.  This is done using the  MODULE-NAME$:   (ESCAPE  colon)

     Next follows the body of the patch.  We purposely avoid the fancier
DDT  commands when applying patches in order to avoid confusion.  We try
to limit ourselves to a few DDT commands:

        ADDRESS/ (slash)to open the location at ADDRESS
        ADDRESS[ (open-square-bracket)
                        similar to / but typeout numeric not symbolic
        RETURN          to close the current location, storing any new
                        value specified
        LINE-FEED       to close the current location, storing any new
                        value specified, and open the next location
        TAB             a convenience command used to close the current
                        location and open the location specified by the
                        last reference; commonly used to get to and
                        open location FFF immediately after inserting a
                        JRST FFF instruction in the code
        SYMBOL: (colon) to define a symbol at the current location;
                        usually to redefine FFF: further down in the
                        patch space
        FFF$< (ESCAPE open-angle-bracket) or
        FFF$$< (ESCAPE ESCAPE open-angle-bracket)
                        to start a patch in the patch area named FFF
        $> (ESCAPE close-angle-bracket)
                        to terminate a patch, which installs the jumps
                        back to the inline code, redefines the FFF
                        symbol value past the used patch space, and then
                        inserts the initial jump to the patch into the
                        inline code

Those who apply patches are of course free to use the more sophisticated
DDT commands to achieve the same effect.

     A few TOPS-20 peculiarities  should  be  explained  here.   TOPS-20
patches  are  applied  using  the FFF patch area.  The default DDT patch
area symbol, PAT.., (used if no argument  is  given  to  an  $<  or  $$<
command) should NOT be used.  You are apt to wind up with system crashes
since the PAT..  area is not locked down.  FFF is defined in the  module
STG.MAC  (which  goes to the customers), and the area is 100 octal words
long.  FFF is part of the resident monitor code PSECT, and is always  in
memory.   Special  care  must  be  taken  when installing patches not to
overrun the patch area, which could also result in system crashes.   The
first symbol past the FFF area is DTSCNW.  If that symbol shows up while
attempting to install a patch, you may be in trouble.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 18

     There is another patch space defined in TOPS-20,  called  SWPF,  in
the  swappable  portion of the monitor.  We always use FFF in preference
to SWPF since first, SWPF can only be  used  for  patches  to  swappable
code,  but  FFF will work for either.  Second, two patch areas in common
use might be confusing to the customers, specialists, and us.  Third, if
we  get  a  dump to examine from a customer, we can always check the FFF
area for possible (bad) patch installation.  SWPF might be swapped  out,
and not in the dump.

     Unconventionally enough, the symbols FFF, FFF1, and  FFF2  are  all
defined together in STG.MAC with the same value.  When DDT decides which
to type out when printing the symbolic form of an address, it finds FFF2
first,  which accounts for the common appearance of FFF2 in patches.  In
addition, just the symbol FFF is  redefined  on  patch  installation  to
always  point  to the first free word of the remaining patch area.  FFF1
and FFF2 are  never  redefined,  and  so  should  always  point  to  the
beginning of the initial patch area built into the monitor.  FFF2 should
never have been explicitly referenced as typeIN to DDT;   any  occurance
in  a  patch should be known to be from DDT typeOUT, probably from a DDT
LINE-FEED command.  This  is  a  common  source  of  error  in  applying
patches;   writing  over  earlier patch area by typing in the FFF2-based

     Normally,  in  a  DDT  patch,  lines  which  follow   one   another
immediately in the published patch are the result of typing LINE-FEED at
the end of the line, and not RETURN and the next address  symbol.   When
the  $<  and  $$<  commands  are  used, all lines from that point to the
terminating $> command should have  been  ended  with  LINE-FEED,  using
successive locations in the patch space.  The patches should show breaks
in this form by inserting extra blank lines in the  published  patch  to
indicate a new "sub-section" of the patch.

     The patching session is ended by the ^Z (Control-Z) command to exit
DDT properly.  The Control-Z command is the correct way to exit from DDT
when applying patches.  It allows DDT to do any  final  cleanup  it  may
need  to  do.   Exiting  via  Control-C  is NOT recommended when you are
installing patches, and is NOT guaranteed to work.

     Finally, the patched monitor is saved away on  a  disk  file.   The
published  typescript  shows  creating  a  new  generation of the system
MONITR.EXE file, but a more conservative approach is to save the patched
monitor  as  some  other  name, and try running it experimentally during
system time before installing it as the default monitor.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 19

     And now for an annotated example:

@ENABLE (CAPABILITIES)          !Appropriate releases noted above.
$GET SYSTEM:MONITR              !Get the monitor
$START 140                      !Enter user mode EDDT

ENQ$:                           !Open the symbol table for the module

FFF/   0   XXX:   410300,,T2    !Store into the patch area and define
FFF2+1/   0   FFF:              ! label XXX: to point to it; redefine
                                ! FFF to be the new first unused word
STRCMP+5/   MOVE T3,T2   FFF$<  !Begin an $< patch at FFF
FFF/   0   LDB T3,XXX           !This line and the next are ended by
FFF+1/   0   CAIN T3,5          ! LINE-FEEDs
FFF+2/   0   RET$>              !Terminate the patch
FFF+3/   MOVE T3,T2             !These 4 lines are typed out by DDT on
FFF+4/   JUMPA T1,STRCMP+6      ! terminating the patch
STRCMP+5/   JUMPA FFF2+1        !And another blank line indicating end
                                ! of this sub-patch region
^Z                              !Control-Z to exit DDT properly
$SAVE SYSTEM:MONITR             !Save away the patched monitor
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 20


     Release  3  of  TOPS-20  can  take  advantage  of  the   extended
addressing  features  of  the model B processor.  Some of its data has
been reorganized and moved into non-zero sections  of  the  addressing
space.   One of the things moved was directories.  Directories are now
mapped into section 2, starting at the beginning of the section.  Thus
the  old  procedure of reading a user's directory in MDDT is no longer
valid.  This will describe how  to  map  a  directory  correctly,  for
release 2 and for releases 3, 3A, and 4.

     The procedure for release 2 was the following.  You first have to
find  out  the structure number and directory number for the directory
to be mapped.  You can use the TRANSL program  to  get  the  directory
number,  or use the ^EPRINT command to list the directory information.
As an example, suppose you want to find the  directory  and  structure
information  for  the  directory  SNARK:<DBELL>.   You  run TRANSL and
obtain the results:  


The "programmer number" obtained is the directory  number,  in  octal.
In  this example, the directory number is 117.  If the directory is in
bad shape, and you can't run TRANSL or use ^EPRINT, you will  have  to
find  out  the directory number by looking at the output from a DLUSER
or ULIST run, or from BUGCHK output.

     To find the structure number, you have to work  harder.   If  the
structure  is  mounted  as PS:, its structure number is always 0.  For
structures mounted other than PS:, you do the following.  You get into
MDDT,  and  look  at the table STRTAB.  This table contains all of the
addresses of the structure data blocks in the system.  The first  word
of  each structure data block is the structure name in SIXBIT.  So you
search the tables looking for the desired structure.  The offset  into
the table STRTAB is then the structure number.  For our example:

JSYS 777$X
STRTAB/   ,8[   /   PS
STRTAB+1/      M^I   /   REL3
STRTAB+2/      M_%   /   SNARK

In  the  example  above,  you  see  that  PS:  is the first structure,
followed by the structures REL3:  and SNARK:.  Since the  offset  into
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 21

STRTAB was 2 for SNARK:, the structure number you want is 2.

     Knowing the structure number and the directory  number,  you  can
now  map  the directory and look at it.  When the directory is mapped,
location DIRORA will point to the area in the monitor you can find  it
at.   This  is  currently the address 740000.  To save typing, you can
use the symbol DA, which has the value 740000 (none  of  the  examples
here  uses  this  symbol however).  To map the directory, you call the
routine MAPDIR which is in the module DIRECT.  It takes two arguments.
The  directory  number  goes  in AC1, and the structure number goes in
AC2.  For our example, the output looks like:

DIRORA[   740000
740000/   ?

1!   117
2!   2

740000[   400300,,100

The  skip  return  from  MAPDIR means you have successfully mapped the
directory.  You can now look at the whole directory by  examining  the
proper  locations.   The  number of pages that are mapped by MAPDIR is
30, which is the  length  of  a  directory,  so  the  whole  thing  is
available  to  look at.  By examining or changing location 740000+N in
core, you are examining or changing location N of the directory.  When
you  are  finished,  you can just leave MDDT by jumping to MRETN or by
typing ^C.

     In release 3, however, when you  examine  location  DIRORA  after
calling  MAPDIR,  it doesn't have to contain 740000.  If it does, then
your machine cannot support extended addressing  and  the  monitor  is
running  the  same  as release 2 did.  In this case you can ignore the
rest of this document.  If your machine does have extended addressing,
when  you  examine location DIRORA you will see the number 2,,0.  This
address is now in section 2 of the monitor, and MDDT cannot  read  the
data there directly.  If you look at the location 740000 after calling
MAPDIR, it will still be unreadable, since the directory is no  longer
read in there.  Those pages are now unused.

     To be able to read the  directory  now,  you  have  to  tell  the
monitor  to  map  in  the pages where you can see them with MDDT.  The
first step is to examine the location DRMAP.   This  location  is  the
section pointer for section 2, where the directories are mapped.  This
is a share-type pointer,  which  contains  the  OFN  for  the  desired
directory  in the right half.  This number is one of the arguments for
the MSETMP  routine.   MSETMP  takes  the  following  arguments.   AC1
contains  the  OFN  in  the left half, and the first page number to be
mapped in the right half.  AC2 contains flag bits in  the  left  half,
and  the  address  where  you want to map the pages in the right half.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 22

AC3  contains  the  number  of  pages  to  be  mapped.   For   mapping
directories, you can use 740000 as the address, and you want to map 30
pages.  You also want to set flag bits so that the  directory  can  be
changed.  For the example, you do the following:

DRMAP[   224000,,147

1!   147,,0
2!   140000,,740000
3!   30

After  the  call to MSETMP, the directory is now mapped in 740000, and
you can proceed as you used to in release 2.  When  you  are  finished
with  the  directory,  you  should  call  MSETMP  again  to  unmap the
directory.  This is done by supplying the same  arguments  as  before,
except that ac 1 contains zero.  As an example:  

1!   0
2!   140000,,740000
3!   30

Now you can simply ^C out of MDDT or jump to MRETN.

     For Release 4 of TOPS-20,   the various flavors of  DDT  have been
trained to  understand extended addresses,   so the mapping contortions
used for 3 and 3A are once  again unnecessary.     On extended machines
one can reference section two directly as below:

DIRORA[   2,,0

2,,0[   400300,,100

When done, you can still just ^C out or jump to MRETN.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 23


     Sometimes after a monitor crash due to disk problems, some of the
directories  on  the  system  will contain errors.  These errors cause
BUGCHKs such as DIRFDB, NAMBAD, DIRPG0, and DIRPG1.  It  is  sometimes
possible  to  find  the  error  in the directory by getting into MDDT,
mapping the directory, finding what is wrong,  and  fixing  it.   This
procedure  is  described  in  the SWSKIT.  However, this is not always
easy, and may take a lot of time.  It  is  therefore  better  in  many
cases  to  simply  delete  the bad directory and recreate it.  This is
easy to do for most directories.  But special procedures are necessary
for the directories <SYSTEM> and <SUBSYS>.  The rest of this memo will
describe the methods of recovering from bad directories,  handling  in
particular the difficult case of the <SYSTEM> directory.

     You can first try to give the EXPUNGE command  with  the  REBUILD
and  PURGE  subcommands.   If  the  problem with the directory is very
simple, it may fix your problem.  As an example, suppose the directory
PS:<SICK-DIRECTORY> is incorrect.  You would type:


     If this does not help the problem, you will have  to  delete  the
directory  and  then  recreate it.  Before proceeding, you should make
sure that any files you can reference are copied to another directory,
or  else  are  saved  on  tape.  Now first try to delete the directory
normally, as follows:

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 24

     If this is successful, then simply recreate the directory  again,
and  restore the user's files.  You should recreate the directory with
the same directory number as it had before, so that DLUSER's data will
still be correct.

     The procedure above will fail if either the directory  is  mapped
by  another  job,  or if it is totally unusable.  If it is mapped, and
the directory is a random user, you can wait until the directory is no
longer  in use, or you can take the system stand-alone so that no user
can reference it.

     If the directory is totally unusable, you will then have  to  try
to  delete  it  the  hard  way.   Before proceeding, you should try to
delete and expunge all files in the directory.  This will minimize the
amount  of  lost  pages  that will result.  Now there are two cases to
consider.  If the directory is  not  a  sub-directory,  you  type  the


     If the directory is a subdirectory, you modify the above  command
by   replacing  "ROOT-DIRECTORY"  by  the  name  of  the  next  higher
directory.  Thus if the directory was PS:<ANOTHER.BAD-ONE>, you  type:


     The above procedure tells the monitor to treat the directory file
like  a  normal  file,  and to delete it as such.  This means that any
files in the directory will become "lost".   The  disk  pages  can  be
recovered  later  with  CHECKD.   If  the  above works, you simply can
recreate the directory and restore the files.

     The only reason the above command should fail is if the directory
is  still  mapped.   For  PS:<SUBSYS>,  you  can  bring  up the system
stand-alone so that no programs are run from it, and then  delete  it.
For PS:<SYSTEM>, even taking the system stand-alone will not help, for
it is always mapped by job 0.  But there are two  procedures  you  can
use which do work.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 25

     The safest method can be used if the user's system has  mountable
structures.   If  you  have built another PS: structure, you can mount
the pack with the bad directory as an alias, and  then  the  directory
will not be mapped and can be deleted.  As an example:

        $$STRUCTURE-ID (IS) PS:

     Then you can build the new directory, restore the  files  to  it,
and  then use it again for your normal PS: pack.  Be sure to build the
new directory with the same number.  This is especially important  for
the special system directories.

     If you do not have another disk drive or another PS: disk, or  if
you  don't want to bother SMOUNTing the disk, you can fix the <SYSTEM>
area by using MDDT.  The basic idea is to patch the monitor so that it
no  longer  thinks  that  the  directory  is  in use.  This is done as


        INTERRUPT AT 17117


     Then  you  should  have  no  problems  deleting  the   directory.
Immediately  after  doing  the  delete,  you should reload the system.
When the system restarts, you can read the monitor and the EXEC either
from  the distribution magtape or from another directory where you had
kept copies.  Then recreate the <SYSTEM> area, making sure to give  it
the  same directory number as it had before.  Then you can restore the
files and let the users back  on.   Finally,  you  should  run  CHECKD
sometime to recover the lost pages.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 26



NOTE -- Use the methods documented in the Operators
        Guide before resorting to the methods below.

     1.  There is a file on the SWSKIT called  DIRTST.EXE  which  will
         test for inconsistencies in the directory pointers.
                $RU DIRTST
         This will tell you just about everything.

     2.  Another program on SWSKIT is  DIRPNT  which  prints  out  the
         contents  on  the  chained  FDB's,  entire directory, FDB, or
         symbol table.
                To run it:
                $RU DIRPNT
         And answer the questions.  This also  may  not  work  if  the
         headers are bad.

     3.  If you get a BUGCHK:

         Go into the monitor with MDDT and set  a  breakpoint  at  the
         BUGCHK address, say, FDBBAD.  Do the functions that cause the
         BUGCHK;   DIR,  say.   Trace  down  the  bug.   The  relevent
         listings  are  PROLOG  and  DIRECT.  These give the directory
         format and useful symbols.

     4.  If the pointers are destroyed or confused you can map in  the
         directory as follows:
                $^EQUIT                 ; get into MINI-EXEC
                MX>/                    ; get into MDDT
                ; Map in  directory,  put dir  number  in 1.  Get  dir
                ; number   from   DLUSER    or   TRANSL.  Format    is
                ; [4,directory#].  Put the structure number in AC2.
                ; To find  the  structure  number look  at  the  table
                ; STRTAB.  STRTAB contains a  list of pointers to  the
                ; SDBs of structures that are mounted.  The  structure
                ; numbers are equal to the offset into the STRTAB.  To
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 27

                ; find  out  which  structure   has  structure  number
                ; 3 look at STRTAB+3.   Address contents which are the
                ; SIXBIT structure name.
                STRTAB/  54321          ; str number 0
                STRTAB+1/  56776        ; str no 1
                STRTAB+2/  12345        ; str no 2
                12345$6T/       FOO     ; str no 2 is FOO:
                1/ DIRECTORY NUMBER
                2/ STR NUMBER
                CALL MAPDIR$X
                ; Now you can  look at the  header pointers etc.,  and
                ; fix things  up  if  you're lucky.  Go  back  to  the
                ; MINI-EXEC.

     5.  If you can't (or don't want to) recover  the  existing  files
         you  can  delete  the directory and restore the files using a
         DUMPER  tape.   This  works  for  <SYSTEM>  and   all   other

         In order to delete  a  directory  you  must  remove  it  from
         <ROOT-DIRECTORY> (or next higher-level directory).
                You can do  this with  the
                following set of commands:
                (first  be  sure   nothing  is   mapped  from   this
         Create new directory with the same directory number.  The same number
         is important for the special system directories.
                $^ECREATE <DIRECTORYNAME>
                $$NUMBER nn
         Now DUMPER the files back.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 28

        An Easy Way to Examine the PSB and JSB of Another Job

     There is an occasional need to look at the  state  in  detail  of
another  job on the system.  A common reason for doing this is to find
the cause and cure of a "hung job" which cannot  be  logged  out.   To
find  out  what  the  job is doing you usually start by looking at the
JSYS stack in the PSB.   But  you  cannot  examine  such  data  easily
because  the  fork data in the PSB and the job data in the JSB are not
in the monitor's address space until the fork is run.  If you  try  to
look  at  the PSB or JSB using MDDT you will see the data for your own
fork.  To look at the data for another  fork  you  must  do  what  the
monitor does, and that is to map it.

     A procedure for doing the mapping of a PSB or JSB  was  given  in
the release 3 and 3A SWSKITs.  You first find the SPT index of the PSB
or JSB you want to map, then you call  SETMPG  or  MSETMP  to  set  up
pointers  to  the  data,  and  then you can examine it.  But there are
several problems in using that method, which are:

     1.  You have to find an empty  set  of  pages  in  the  monitor's
         address space which can be used for mapping.

     2.  There is not enough room to map all of the PSB and  JSB.   So
         if  you  want to examine many different things you have to do
         the mapping many times.

     3.  The routines SETMPG and MSETMP do  no  validity  checking  of
         their  arguments.   Thus if you feed them bad data the system
         will probably crash.  So if you need to map things many times
         your chances are you will make a mistake once too often.

     4.  The addresses of the data are not correct.  To  look  at  PPC
         for example, you can't just examine location PPC (which would
         be for your own fork).  You have to look in the page you  are
         using  for  mapping.   So every reference has to be offset by
         some constant.

     5.  When you are done looking at the fork, you can't simply leave
         MDDT.   You  have to call SETMPG or MSETMP again to unmap the
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 29

     Since that documentation was written I  have  found  a  procedure
which is much easier.  It eliminates almost all of the above problems.
The procedure is this:

     1.  Do a "GET" of the file the monitor was loaded  from,  usually

     2.  Enter user mode DDT in the file you got, and then do  a  JSYS
         777 to get into MDDT.

     3.  Find out the SPT indexes as before, and call  MSETMP  to  map
         the  PSB  or  JSB  to  the USER address space, in the correct

     4.  Return from MDDT, and examine PSB and JSB locations directly,
         and see the correct data in the right place.

     5.  When you are done, just ^C and do a RESET.

     The rest of this document will document  step  by  step  how  the
procedure  above is done, by using an example.  Assume that we wish to
examine the state of fork 105, which  belongs  to  job  21.   We  then
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 30

@ENABLE                                 !Get a copy of the monitor
$START 140                              !Get into user DDT

JSYS 777$X                              !Enter MDDT

!Following is an example of the procedure to map the JSB of a job:

FKJOB+105[   25,,2035                   !Get the SPT index of the JSB
                                        !of fork 105

T1!      2035,,0                        !Put SPT index in left half
T2!      540000,,JSBPGA                 !* Flags and where to map to
T3!      JSLSTA'1000-JSBPGA'1000        !Number of pages to map

CALL MSETMP$X                           !Do the mapping

!Following is an example of the procedure to map the PSB of a fork:

FKPGS+105[   2657,,2332                 !Get the SPT index of the PSB
                                        !of fork 105

T1!      2332,,PSBMAP-PSBPGA            !Put SPT index in left half,
                                        !and offset in right half
T2!      540000,,PSSPSA                 !* Flags and where to map to
T3!      PSBMSZ                         !Number of pages to map

CALL MSETMP$X                           !Do the mapping
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 31

!Example of returning to user mode and looking at data from both
!the PSB and the JSB of the fork:

MRETN$G                                 !Return to user mode

USRNAM[   3                             !Examine job's user name
USRNAM+1[   422050,,546230   $T;DBELL   

CTRLTT[   777777,,777777                !Controlling terminal

FILBYT+MLJFN[   4400,,334010            !Start of data block for JFN 1

PPC/   T1,,DISXE#+2                     !Current PC of the fork

PAC+17/   -215,,UPDL+62                 !Current stack pointer

UPDL/   CHKHO5#                         !First few stack locations
UPDL+1/   CAM CHKAE0#+12   
UPDL+2/   CHKHO5#   
UPDL+3/   CAM CHKAE0#+12   
UPDL+4/   T1,,.COMND+1   
UPDL+5/   -273,,UPDL+4   

!Example of terminating the mapping we have done:

$RESET                                  !To finish, just quit and reset

     The procedure as given above maps the JSB and PSB  write-enabled.
So  if  you  find something you want to change, you can simply deposit
the new value  into  the  location.   If  you  want  the  data  to  be
write-protected,  then  change  the  540000 to 500000 in the two steps
marked with an asterisk.

     WARNING:  The procedure of mapping things into your user  address
space  has its limitations.  Mapping the JSB and PSB works because the
user core used for mapping was previously empty.  In general, you  can
only  map  things  into  your  user core if your core pages are either
nonexistant or are private.  If you call  MSETMP  or  SETMPG  and  map
something  over  a  shared page, the old file page is unmapped without
the share counts being updated, which prevents your job  from  logging
out  later.  To get around this problem you can BLT your core image to
force all of the pages to be private.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 32


     When inserting a breakpoint into the running monitor, you have to
be  careful  that  no other users will execute the code containing the
breakpoint.  If some other user hits the breakpoint, they will blow up
with an illegal instruction since MDDT will not be there to handle the
breakpoint.  This normally limits the places you can set  breakpoints,
since  most  of the monitor can be gotten to by any user.  Even if you
run the system stand-alone, it is possible that the  routine  you  are
debugging  will  be called by job 0.  However, it is still possible to
do such debugging, even on a system which is not stand-alone, and this
document will describe how this is done.

     The essential element of this technique is to put in the patch in
such  a  way  that  only  your own fork can ever reach the breakpoint.
First you write a simple routine which will skip if it  is  not  being
run  by your particular fork.  This can be done easily if you remember
that the location FORKX contains the currently  running  fork  number.
An example of such a routine is the following:  

JSYS 777$X

FORKX[   23                     ; check our fork number

FFF/   0   NOTME:   PUSH P,T1   ; save an AC
NOTME+1/   0   MOVE T1,FORKX    ; get currently running fork number
NOTME+2/   0   CAIE T1,23       ; is it us=23?
NOTME+3/   0   AOS -1(P)        ; no, setup skip return
NOTME+4/   0   POP P,T1         ; restore the saved AC
NOTME+5/   0   POPJ P,          ; and return to caller
NOTME+6/   0   FFF:             ; reset the position of FFF

The  routine above simply saves AC T1, gets the currently running fork
number, compares it with your own fork number which  you  obtained  by
looking at location FORKX, and skips if they differ.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 33

     Now assume that you want to set a breakpoint into  the  following
code, which is in the routine BLKSCN in the module DIRECT.  


Assume  you  want  the  breakpoint  at  location BLKSC2+3.  You do the

BLKSC2+3/   JUMPGE B,BLKSCE   FFF$<     ; patch this location
FFF/   0   PUSHJ P,NOTME                ; call the NOTME routine
FFF+1/   0   .$B   JFCL$>               ; me if it gets here, set breakpoint

Notice  that  the  breakpoint  has  been  set  in the JFCL instruction
following the call to NOTME.  Only your fork will execute it,  so  you
can  now  debug the section of code while other users are executing it
at the same time.  Remember to remove  the  breakpoint  when  you  are

     To run a particular program while  having  breakpoints  set,  you
must  remember  that  the breakpoint has to be set by the same process
which you expect to hit it.  So for example, typing ^EQUIT, setting  a
breakpoint,  returning  to  the EXEC and running your program will not
work.  You must enter MDDT and set the breakpoints from  your  program
you want to debug.  As an example:  

$GET PROGRAM    ; get the program to be used
$DDT            ; enter DDT
JSYS 777$X      ; and enter MDDT from there


MRETN$G         ; return to the context of the test program
$G              ; start the test program
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 34

                Using Address Break to Debug the Monitor

Sometimes when examining a set of dumps, you will notice  the  crashes
are  caused  by  some  location  being destroyed.  If you have no idea
where the destruction is done from, finding the problem could be  very
difficult.   One  useful procedure in such cases is to use the address
break feature of the hardware to track down the  problem  (except  for
2020's!).   The  only  problem is that the use of address break is not
obvious.  This is a manual describing how to use address break in  the
TOPS-20 monitor.

     In order to use address break, four things must be done.   First,
the  current routines the monitor uses to set address breaks for users
must be disabled.  Secondly, your own address break must be  set  from
MDDT  or  EDDT.   Thirdly,  instructions  which  you  want  to execute
properly have to be modified so that they will not cause  an  unwanted
address  break.  Finally, breakpoints must be placed in the monitor so
that the state of the monitor can be examined when the  address  break
occurs.  The following is a step by step example of doing this.

1.      Load the monitor for debugging, and enter EDDT.  The procedure
        starting from BOOT is the following:

        BOOT>/L                         ;Load monitor but don't start it
        BOOT>/G140                      ;Start EDDT
        DBUGSW/   0   2                 ;Set debugging mode
        EDDTF/   0   1                  ;Keep EDDT once system starts
        GOTSWM$B                        ;Install useful breakpoint
        SYSGO1$G                        ;Start the monitor

        [PS MOUNTED]
        $1B>>GOTSWM   0$1B              ;Remove breakpoint now

2.      Disable the monitor's normal changing of  the  address  break.
        This is currently done at two places:
        KISSAV+4/   DATAO UNPFG1+26   JFCL      ;Disable instruction
        SETBRK+12/   DATAO A   JFCL             ;Here too
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 35

3.      Set your own address break at the desired location.  Refer  to
        the Hardware Reference Manual for details.  The instruction to
        set an address break is:
        DATAO APR,ADDR          ;Note:  APR = 0
        where ADDR contains the following fields:
        Bits            Description
        ----            -----------
          9             Break at given address on instruction fetches
         10             Break at given address on reads
         11             Break at given address on writes
         12             0=exec address space, 1=user address space
        13-35           Address to break on.
        So now assume you want  to  catch  a  bug  which  is  blasting
        location  CURDS.   You want to break only for writes, and want
        to use exec virtual space.  Therefore you type the following:
        FFF/   0   100000000+CURDS      ;Put data in convenient place
        DATAO APR,FFF$X                 ;Set the address break
4.      Now you want to disable address  break  for  all  instructions
        which you expect to change the given location.  Assume in this
        example that  only  location  DIDDLE  should  change  location
        CURDS.  Then you do the following for a model B CPU:
        FFF!   IT:                      ;Define location to get old flags
        IT+1!                           ;Old PC
        IT+2!                           ;New flags
        IT+3!   IT+4                    ;New PC
        IT+4!   EXCH IT                 ;Save AC and get old flags
        IT+5!   TLO 1000                ;Set address break inhibit bit
        IT+6!   EXCH IT                 ;Restore flags and AC
        IT+7!   JRST 5,IT               ;Return to caller
        IT+10!   FFF:                   ;Redefine FFF
        DIDDLE/   MOVEM A,CURDS   FFF$< ;Insert patch
        FFF/   0   JRST 7,IT$>          ;Call above routine
        FFF+1/   0   MOVEM A,CURDS      ;Typed by DDT when finishing patch
        FFF+2/   0   JUMPA A,DIDDLE+1
        FFF+3/   0   JUMPA B,DIDDLE+2
        The JRST 7,IT instruction is used to save the old PC at IT and
        IT+1,  and take a new PC from IT+2 and IT+3.  There the old PC
        is changed to include the address break inhibit bit.   Then  a
        JRST 5,IT  is  done  which  returns  to  the caller.  The next
        instruction then executes without causing  an  address  break.
        You   have  to  insert  the  JRST 7,IT  instruction  at  every
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 36

        instruction you want to succeed.
        For model A CPUs the procedure is similar, but a little easier:
        FFF!   IT:                      ;Define location to hold PC
        IT+1!   EXCH IT                 ;Get old PC and save AC
        IT+2!   TLO 1000                ;Set address break inhibit flag
        IT+3!   EXCH IT                 ;Restore PC and AC
        IT+4!   JRSTF @IT               ;Return to caller
        IT+5!   FFF:                    ;Redefine FFF
        DIDDLE/   MOVEM A,CURDS   FFF$< ;Insert patch
        FFF/   0   JSR IT$>             ;Call above routine
        FFF+1/   0   MOVEM A,CURDS      ;Typed by DDT when finishing patch
        FFF+2/   0   JUMPA A,DIDDLE+1
        FFF+3/   0   JUMPA B,DIDDLE+2
5.      Now put the breakpoints into  the  monitor  so  that  when  an
        address  break  occurs, you will get into EDDT.  There are two
        locations to patch, one for PI level and one for non-PI level.
        You  also  have  to patch a monitor bug in release 3 and 3A so
        that the page fail dispatch code works properly.

        ADRCMP$B                        ;Set breakpoint at non-PI routine
        PFCD23$B                        ;Set breakpoint at PI routine
        PIPTRP+1/   MOVE A,TRAPSW   MOVE A,TRAPS0       ;And fix a bug
        $P                              ;Now let the monitor proceed

6.      When either of the above breakpoints is hit, the flags and  PC
        of  the  instruction which caused the address break will be in
        locations TRAPFL and TRAPPC.    If the address break was  from
        JSYS  level  (breakpoint  was to ADRCMP and location INSKED is
        zero) then an $P will proceed properly.  If the address  break
        was  from  the  scheduler  or  from PI level, doing $P will be
        useless since the monitor will then BUGHLT because it  doesn't
        want to see an address break under these conditions.  However,
        this is ok if all you want  to  do  is  find  the  instruction
        causing the trashing.

      If the location still gets trashed after trying to catch it this
 way,  either  your  procedure is wrong; you are trying this on a 2020
 (which has no address break feature); the location is  being  changed
 by  some  IO  being  done  (RH20s, DTEs, etc); or else the machine is
 having some hardware problems.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 37


There are some common system disasters which  in  many  cases  can  be
recovered from quickly and with a minimum of effort.  The four we will
discuss in this article are:

     1.  Hung Terminals

     2.  Hung SETSPD

     3.  Trashed Disks

     4.  Hung Jobs


     Hung terminals are usually the result of  two  problems.   Either
the speed has been set incorrectly for that terminal type or a problem
exists between the KL and the front end.  If the problem is  a  result
of  an improper speed setting, then simply resetting the speed will be
sufficient.  On the other hand, if the problem is  due  to  some  sync
problem between the KL and the 11 then the easiest way to recover from
this is to reload the front end.  This can be done by  depressing  the
halt  switch  on  the operator's console of the 11 and then placing it
back in the enable state.  After about fifteen seconds, the message

                        [DECsystem-20 continued]

to be printed on the CTY.  If this fails to free the terminal, perhaps
the problem is a hung job.  See the discussion under that heading.

     2.0  HUNG SETSPD

     This is a fairly common  problem  brought  on  by  some  hardware
problem.  It is possible to bring the system up without running SETSPD
under JOB 0, logging in, and then trying  to  run  SETSPD  under  some
other operator job.  If SETSPD then hangs, it is possible to CONTROL/C
out of the program, edit 4-CONFIG.CMD to remove the commands suspected
of  hanging  SETSPD, and retrying.  In this way, while waiting for the
problem to be resolved, it is possible to continue timesharing.

     To bring the system up without running SETSPD automatically,  one
need  only  install  the  following patch to the MONITOR using EDDT on
system start up.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 38

          EDDTF[   0   -1
          DBUGSW[   0   2
          [PS MOUNTED]
          RUNDD3+7/   PUSHJ P,RUNDII   JFCL
          %%No SETSPD

     The system will then come up as usual except that SYSJOB will not
run.   After successfully deciding the problem with SETSPD, SYSJOB can
be run by typing


     This will cause all the commands in the  SYSJOB.RUN  file  to  be
executed by SYSJOB.

     There is a project under way to allow SETSPD to time  out  itself
and continue with the next comand in 4-CONFIG.CMD.  Look for it in the
Large Buffer or the 20 Dispatch.


     This is surely one of the biggest  headaches  facing  specialist.
Trashed  disks come in many forms and recovering from these requires a
good knowledge of the structure of the TOPS-20 file system.

     If the structure cannot be mounted, it is because of one  of  the
following reasons:

     1.  Inconsistency in either of the HOM blocks

         1.  Word HOMNAM (1) of either HOM block not SIXBIT/HOM/

         2.  Word HOMCOD (176) of either HOM block not 707070

         3.  Word HOMHOM (5) of first HOM block not 1,,12

         4.  Word HOMHOM (5) of second HOM block not 12,,1

         5.  Word HOMFSN (173) of either HOM block not 20040,,47524
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 39

         6.  Word HOMFSN+1 (174) of either HOM block not 51520,,31055

         7.  Word HOMFSN+2 (175) of either HOM block not 20060,,20040

         8.  Right half of word HOMLUN (4) of either home block either
             refers  to  a  unit  greater  than  the left half of word
             HOMLUN or it refers to a UNIT already verified

         9.  Word HOMSNM (3) of either home block does not agree  with

        10.  No disk address for index block in word  HOMRXB  (10)  of
             either HOM blocks

     2.  Inconsistencies in Root-Directory page 0

         1.  Directory number in Directory page  0  of  Root-Directory
             not 1

         2.  Directory block type (DRTYP) of Root-Directory page 0 not

         3.  Relative Page number (DRRPN) of Root-Directory page 0 not

         4.  Top of symbol table (DRSTP) of Root-Directory page 0  out
             of Directory bounds

         5.  Pointer to first free  block  (DRFFB)  of  Root-Directory
             page 0 not in page 0 of the directory

         6.  Pointer to Directory Name String (DRNAM) not under  start
             of symbol table

         7.  Directory name pointer (DRNAM)  not  0  and  Name  string
             block length (NMLEN) not at least 2 words long

         8.  Directory name pointer (DRNAM) not 0 and  directory  name
             block header (NMTYP) not 400001

         9.  Password block pointer not 0 and  password  string  block
             length (NMLEN) not at least 2 words long

        10.  Password block pointer not 0 and  password  string  block
             header (NMTYP) not 400001

        11.  Account string block pointer not  0  and  Account  string
             block length (NMLEN) not at least 2 words long

        12.  Account string block pointer not  0  and  Account  string
             block header (NMTYP) not 400001
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 40

     3.  Inconsistencies in Block types or free  space  in  subsequent
         pages of the directory.

              All blocks in the directory (including free space) begin
         with   a  block  header  which  specifies  type  and  length.
         Immediatly following one block should be a header for  a  new
         block.  If this scheme is corrupted, the mount will fail.

         1.  Header of a block not

             1.  (NAMTYP)  400001

             2.  (EXTTYP)  400002

             3.  (ACCTYP)  400003

             4.  (USRTYP)  400004

             5.  (FDBTYP)  400100

             6.  (DIRTYP)  400300

             7.  (FRETYP)  400500

             8.  (FBTTYP)  400600

             9.  (GRPTYP)  400700

         2.  Header of a block is NAMTYP and Block length not at least
             2 words

         3.  Header of a block is EXTTYP and block length not at least
             2 words

         4.  Header of a block is ACCTYP and block length not at least
             3 words

         5.  Header of a block is USRTYP and block length not at least
             3 words

         6.  Header of a block is FDBTYP and

             1.  Block length not at least 30 (.FBLN0) words long

             2.  Pointer to Author String (.FBAUT) not 0 and points to
                 a block outside of the directory or points to a block
                 that does not meet the tests for a user  name  string
                 as described above.

             3.  Pointer to Last Writer  String  (.FBLWR)  not  0  and
                 points  to a block outside of the directory or points
                 to a block that does not meet the tests  for  a  user
                 name string block as described above.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 41

             4.  Pointer to Account String (.FBACT) is not  less  than
                 or  equal to zero and it points to a block outside of
                 the directory or it points to a block that  does  not
                 meet  the  tests  for  an  account  string  block  as
                 described above.

             5.  Pointer to Name String  (.FBNAM)  is  not  0  and  it
                 points  to  a  block  outside  of the directory or it
                 points to a block that does not meet the tests for  a
                 Name String Block as described above.

             6.  Pointer to Extension String (.FBEXT) is not 0 and  it
                 points  to  a  block  outside  of the directory or it
                 points to a block that does not meet the tests for an
                 Extension String Block as described above.

         7.  Header of a block is DIRTYP and

             1.  Header is not on a page boundary

             2.  Relative page number (DRRPN) not the calculated  page

             3.  Pointer to first free block (DRFFB) does not point to
                 a location within the current directory page

             4.  Directory number (DRNUM) not 1.

         8.  Header of a block is FRETYP and block is not at least two
             words  or  Pointer to next free block (FRNFB) is not zero
             and points to a location not on the same page as current

         9.  Last block did not end at  DRFTP  (address  specified  on
             first page of directory)

     4.  BAT blocks inconsistent.

         1.  Either block  does  not  contain  SIXBIT/BAT/  in  BATNAM
             (offset 0 in block)

         2.  Either block does not contain 606060  in  BATCOD  (offset
             176 in block)

         3.  Sector number of the BAT  block  (BATBLK)  not  the  true
             sector of block

         4.  The BAT blocks to not compare  exactly  with  each  other
             through word 176 of the blocks
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 42

     5.  Checksum of the Root-directory Index  Block  does  not  agree
         with the checksum calculated.

              Checksums are calculated as follows:

         CHKSUM = 0 ;
         For I = 0 to 777
             If XB(I) = 0 then 
                 CHKSUM = CHKSUM + I
                 CHKSUM = CHKSUM + XB(I) ;

         where XB is the first word of the index block.

     As you can see, there are many things that could be wrong with  a
structure that inhibits it from being mounted.  The consistency of the
structure can be checked quite easily using the new FILDDT commands of

For  structures  which  are  badly  trashed,  the  only  sane  way  of
recovering  is to rebuild the structure using a catastrophe tape.  For
simple inconsistencies such as a bad BAT block, CHECKD  does  the  job
well.   For  more  involved  trashes which can not be recovered from a
back up tape  (because  of  a  forgetful  system  manager)  the  above
information can be of great help.

     4.0  HUNG JOBS

     There are a number of circumstances which arise which cause a job
to  become  hung,  usually  waiting for some resource to free up, some
share count to become zero etc.  Some times, these  tests  will  never
become  satisfied,  the  Job  has  its PSI system turned off, and as a
result the job becomes Hung.  Freeing it up can be very  tricky.   The
first thing to try is to log the job out from some other terminal.  If
this doesn't succeed in freeing the job up, then the next  best  thing
is to detatch the job from the terminal and allow it to sit there.  It
may be using negligable amounts of CPU  time  and  causes  no  adverse
affects  to the system.  To zap the job may crash the system which, in
most cases, is not the disirable approach.

The next time the system is reloaded, be sure to get  a  dump  of  the
system  with  the  hung  job  and  submit it as an SPR (see the SWSKIT
article about getting informative Dumps).
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 43

                         LOOKING AT HUNG TAPES

     A number of problems of the general  classification  "tape  hang"
have  been  reported, and will probably always exist as long as we use
magtapes.  Although there  are  apparently  several  variants  of  the
problem,  there  are  some  things  which  can  be  done by a suitably
cautious specialist when presented with a  hung  tape  drive.   Listed
below  are  some  techniques  which  can  be  used  in  an  attempt to
investigate and perhaps alleviate the problem.  These  things  should,
in general, be harmless to the system, barring mis-typing in MDDT.  As
a result, perhaps they will not clear the problem.

     For release 4, there are several tables that are used in relation
to  tape  drives.  Some of these tables are indexed by MT unit number,
some by MTA unit number.  In general, it can be said that if  a  table
name  begins  with  the  characters  MT,  it will be indexed by MTA or
physical unit number, and if the table name begins with TL or  TP,  it
will  be  indexed  by MT or logical unit number.  The TL and TP tables
will usually have something to do with the tape labeling system.  This
article concerns itself mainly with the more important tables relating
to MTAs (physical tape units).

     When playing with the tape  subsystem,  certain  care  should  be
taken.  For instance, it always helps if no one else is actively using
the tape  drives  while  you  attempt  something  like  reloading  the
microcode for a DX20.

1.  Finding the Tape Drive

     There are several tables parallel to each other which concern the
ownership  of a tape drive.  Those of interest are DEVNAM, DEVCHR, and
DEVUNT.  At DEVNAM+n is the device name in SIXBIT.  At DEVUNT+n  is  a
word  with the left half set to the assigner's job number, -1 if free,
or -2 if being controlled by the allocator.  The right  half  contains
the  unit number.  Note that in release 4, with tape allocation turned
on, MTAs will always indicate that job 0 has the  drive  assigned  and
that the offset to the MT unit number will contain the job number of a
user.  At DEVCHR+n is the device characteristics  word.   Knowing  the
devicename  or  the  owning  job,  one  can  use DDT to find the table
offset.  See example below.

2.  Grabbing the Drive

     Knowing the offsets into DEVUNT, the  device  assignment  can  be
freed  by  putting  -1  into  the  left half of the appropriate DEVUNT
entry.  The drive can then be assigned by the normal ASSIGN command to
the  EXEC.   In dealing with the allocator for Release 4, your own job
number can be placed here if  necessary.   The  drive,  however,  will
still  be  in no state to use.  Note that the appropriate DEVUNT entry
would be the one referring to the MT not the MTA.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 44

3.  Clearing External Errors

     Make sure that there is a tape of  some  sort  mounted,  and  the
drive  is  placed on-line.  Having a write-enable ring in the tape may
help in being sure the unit is functional if  the  hung  condition  is

4.  Checking the UDB

     Next, the Unit Data Block status should be reset.  This word  can
be  found  using  the MTCUTB table.  This table is indexed by MTA unit
number, the left half is the address of the channel data block  (CDB),
and  the  right half contains the address of the UDB.  The status word
of the UDB should then be reset to the base  state.   The  right  half
should be left alone--it basically contains drive type.  The left half
should have only bit 16  set,  which  indicates  a  tape  type  device
(US.TAP).  The old contents should be remembered for purposes of later

5.  Checking the Status

     Now, table MTASTS is examined, indexed by MTA unit number  again.
Remember the old contents.  Then clear the word to zero.

6.  Example

                !INTO THE MTA OFFSETS IN THE
                !DEVxxx TABLES.
    devnam+21/   HLRZM P2,FKBSPW+217(T1)   $6t;MTA0     
    DEVNAM+22/   MTA1     
    DEVNAM+23/   MTA2     
    DEVNAM+24/   MTA3     
    DEVNAM+25/   MTA4     
    DEVNAM+26/   MTA5
    DEVNAM+40/   MTA17     
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 45

                !TO THE MT ENTRIES
    devnam+41/   HLRZM P1,@0   $6t;MT0      
    DEVNAM+42/   MT1      
    DEVNAM+43/   MT2      
    DEVNAM+44/   MT3      
    DEVNAM+45/   MT4      
    DEVNAM+46/   MT5      
    DEVNAM+60/   MT17
                !AND OFFSETS INTO THE TLxxxx/TPxxxx TABLES
                !FOR MTs
    DEVUNT+22[   1   !JOB 0,,MTA1:
    DEVUNT+23[   2   !JOB 0,,MTA2:
    DEVUNT+24[   3   !JOB 0,,MTA3:
    DEVUNT+25[   4   !JOB 0,,MTA4:
    DEVUNT+26[   5   !JOB 0,,MTA5:
    DEVUNT+27[   777777,,6   !UNASSIGNED,,MTA6:
    DEVUNT+40[   777777,,17   !UNASSIGNED,,MTA17:
                !DV%PSD=400000 INDICATES A PSEUDO DEVICE
    devunt+41[   32,,400000   !PSEUDO DEVICE MT0: IS ASSIGNED TO
                              !JOB 32 OCTAL (JOB 26 IN DECIMAL)
    DEVUNT+42[   777776,,400001   !CONTROLLED BY ALLOCATOR,,MT1:
    DEVUNT+43[   777776,,400002   !     "     "       "   ,,MT2:
    DEVUNT+44[   777776,,400003   !     "     "       "   ,,MT3:
    DEVUNT+60[   777776,,400017   !     "     "       "   ,,MT17:
                !PHYSICAL MTA NUMBER IN BITS 2-8.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 46

    tlabr0[   405000,,0   !BIT 0 INDICATES A VALID VOLUME IS MOUNTED ON MTA5
    mtcutb+5[   730437,,730625   !CDB,,UDB FOR MTA5 BEING USED BY JOB 26
                                 !WHO KNOWS IT AS MT0 (SEE ABOVE)
    730625[   102,,157  !FIRST WORD OF UDB FOR MTA5
                        !US.WLK=1B11 >> WRITE LOCKED
                        !US.TAP=1B16 >> TAPE TYPE DEVICE
                        !.UTT70=17B35 >> TU70
    mretn$g             !TO RETURN TO SDDT FROM MDDT
    ^Z                  !TO RETURN TO THE EXEC FROM SDDT

     If clearing MTASTS and UDBSTS for the drive doesn't seem to clear
the  problem, you will probably have to do more digging around to find
some other, more obscure, inconsistency in the  MTA/MT  tables.   This
can  be  accomplished  by  referring  to  the  monitor  tables (which,
hopefully, have been included with the SWSKIT) under MTA-STORAGE-AREA.
As always, extreme caution should be exercised while fooling around in
MDDT as you can accidentally trash some random location in the monitor
just by hitting a carriage return at the wrong time.

     One last note should be made about the monitor tables here.   The
description  of  the  DEVUNT  table would lead one to believe that the
right half will contain a -2 if the device is  under  control  of  the
allocator.   If  the  device is under control of the allocator, the -2
will appear in the left half.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 47

                   A LOOK AT SOME OF THE DISK STUFF

     This article is a front for the PHYPAR module, which is where the
information may be reliably obtained, and should serve as the ultimate
reference for these problems.

        Much of the system debugging you  will have to deal with  will
involve the DEC-20  hardware.  There always  seems to be  a large  gap
between what the  diagnostics can  tolerate and what  the monitor  can
tolerate in the way of malfunctioning hardware.  The monitor will  not
always point you to  the real disk or  magtape problem, say, but  will
crash after  something has  gone wrong  a few  minutes ago  somewhere.
Most of the hardware problems that we have had to deal with that  were
really difficult to track  down and point the  Field Service rep.   to
were problems with disk hardware.  The following is information  which
you can use to  help Field Service trace  down problems which are  not
reported in  the diagnostics.   In most  cases the  Field Service  rep
knows what all the status  bits etc.  mean but  have not been able  to
find them in the monitor crashes or running monitor.

        CHNTAB is  an  ordered  list  of  Channel  Data  Block
        addresses starting with channel 0.  RH20-0 data  block
        address is in the first word etc.

        CDB is the Channel Data  Block.  There is one CDB  per
        channnel.   The   CDB   contains   channel   dependant
        instructions and data, pointers to the unit data block
        (UDB) in the case of  RPO4, RP05, and RP06's.  In  the
        case of TU45's the pointer  is to the Kontroller  Data
        Block (TM02's) which point in  turn to the UDBs.   The
        CDB also  contains  information  about  the  currently
        active unit.   When  the channel  interrupts,  control
        passes (via  a JSP)  to CDBINT.   The CDB  address  is
        stored in AC1, P1 and the principal analysis  routine,
        PHYINT, is called.
NOTE:   The CDBs are referenced in modules PHYSIO, PHYH2 (RH20
        code), PHYM2  (TMO2  code)  and PHYP4  (RP04,  05,  06
        code).  The  Channel  Data  Block is  defined  in  the
        module PHYPAR.  The address that you get in CHNTAB  is
        really a pointer  to word0 which  contains the  status
        bits for this controller (CDBSTS).  Look in PHYPAR for
        the table  definition.  Some  words of  interest  are:
        CDBaddress  +   CDBSTS:   status   and   configuration
        information CDBaddress + CDBUDB:  8 word table of  UDB
        (or KDB) addresses.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 48

        The status bits which are  also defined in PHYPAR  are
        listed here for your convenience:

        CS.OFL==1B0             ; offline
        CS.AC1==1B1             ; primary command active
        CS.AC2==1B2             ; secondary command active
        CS.ACT==CS.AC1!CS.AC2   ; any active
        CS.MAI==1B3             ; channel is in maintenance mode
        CS.MRQ==1B4             ; maintenance mode requested for unit
        CS.ERC==1B5             ; error recovery in progress
        CS.STK==1B6             ; channel supports command stacking
        CS.ACL==1B7             ; alternate command list is current

        BITs 30-32              ; PIA field
        BITs 33-35              ; channel type field

        Kontroller Data Block  (TM02 only)  defined in  PHYPAR
        also.  Referenced in PHYM2, PHYPAR, PHYSIO.  Words  of
        interest are:

        KDBADDR+KDBSTS:         ; flags unit type
        KDBADDR+KDBUDB:         ; UDB table first word (1 word/UDB)

        Unit Data Block.  There is one UDB per unit associated
        with a CDB or KDB.  The UDB contains information about
        the current activity on the unit in question.  The UDB
        is defined in PHYPAR as well.  Some words of  interest
        are noted  below.   Look  in the  listings  for  other

        UDBADDR + UDBSTS:       ; status and configuration info (see below)
        UDBADDR + UDBERR:       ; error recovery status word
        UDBADDR + UDBERP:       ; error reporting work area if non 0
        UDBADDR + UDBRED:       ; reads - sectors if disk, frames if tape
        UDBADDR + UDBWRT:       ; writes - sectors if disk, frames if MTA
        UDBADDR + UDBSRE:       ; soft read errors
        UDBADDR + UDBSWE:       ; soft write errors
        UDBADDR + UDBHRE:       ; hard read errors
        UDBADDR + UDBHWE:       ; hard write errors
        UDBADDR + UDBPS1:       ; current cylinder if disk, cur file if MTA
        UDBADDR + UDBPS2:       ; current sector within cyl if disk, record
                                ;  in file if tape
        UDBADDR + UDBSPE:       ; soft positioning error
        UDBADDR + UDBHPE:       ; hard positioning error        

                                ; NOTE - there are several other UDB words
                                ; including a device dependent portion
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 49


        US.OFS==1B0     ; off line or unsafe
        US.CHB==1B1     ; check HOME blocks before any normal I/O
        US.POS==1B2     ; positioning in progress
        US.ACT==1B3     ; active
        US.BAT==1B4     ; on if bad BAT blocks on this unit
        US.BLK==1B5     ; lock bit for this units BAT blocks
        US.PGM==1B6     ; dual port switch in (A or B)
        US.MAI==1B7     ; unit is in maintenance mode
        US.MRQ==1B8     ; maintenance mode requested on this unit
        US.BOT==1B9     ; unit is at BOT
        US.REW==1B10    ; unit is rewinding
        US.WLK==1B11    ; unit is write locked
        US.MAL==1B12    ; maintenance mode allowed on this unit
        US.OIR==1B13    ; operator intervention required, set at
                        ;  interrupt level, checked at periodically.
        US.OMS==1B14    ; once a minute message to operator,  used in
                        ;  conjunction with US.OIR.
        US.PRQ==1B15    ; positioning required on this unit
        US.TAP==1B16    ; device type tape
        US.PSI==1B17    ; tape - online/offline/rewind done transition


        .UTRP4 = 1      ; RP04
        .UTRS4 = 2      ; RS04 (drum)
        .UTT16 = 3      ; TU16 (TU45)
        .UTTM2 = 4      ; TM02 as a unit
        .UTRP5 = 5      ; RP05
        .UTRP6 = 6      ; RP06
        .UTRP7 = 7      ; RP07
        .UTRP8 = 10     ; RP08
        .UTRM3 = 11     ; RM03
        .UTTM3 = 12     ; TM03 AS A UNIT
        .UTT77 = 13     ; TU77
        .UTTM7 = 14     ; TM78
        .UTT78 = 15     ; TU78
        .UTDX2 = 16     ; DX20-A
        .UTT70 = 17     ; TU70
        .UTT71 = 20     ; TU71
        .UTT72 = 21     ; TU72
        .UTT73 = 22     ; TU7x
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 50


        BLOCK 0:        ; 11 bootstrap
        BLOCK 1:        ; primary HOME block
        BLOCK 2:        ; primary BAT block
        BLOCKS 3-11:    ; reserved
        BLOCK 12        ; secondary HOME block
        BLOCK 13        ; secondary BAT block

The places where the  disk pages for  the above are  stored is in  the
table HOME.  HOME  is defined in  STG. The BAT  blocks are defined  in
PROLOG and the HOME blocks are defined in DSKALC.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 51

                         NEW DISK FEATURES FOR FILDDT

    The FILDDT to be shipped with release  4 of TOPS-20 will have two  new
    commands in relation to disk file structure maintenance.

    They are:

        STRUCTURE (FOR PHYSICAL I/O IS) disk-structure
                Examines the specified disk structure.

                Examines the specified disk unit.

    These are privileged functions and one must have privileges enabled to
    use these.

    These two commands are nearly  identical.  Their difference is in  the
    way the structure  is identified.   To use the  STRUCTURE command  the
    structure must  be  mounted.   The STRUCTURE  command  is  useful  for
    examining a multi-pack  structure.  The  DRIVE command  is useful  for
    examining the  file system  of a  structure which  cannot be  mounted.
    Channel and unit  numbers can be  found from the  programs UNITS,  DS,
    SYSDPY, or OPR.
    Addressing is in the same format as in other forms of DDT.
    It is easier  to understand exactly  what the disk  will look like  in
    FILDDT if you keep in mind that all sectors will be packed in the  DDT
    address space, without regard for sector size, starting at DDT address
    0.  For instance, on an RP06 there are four sectors per memory page or
    200 (octal) words per sector.  Therefore, sector zero of the structure
    will begin at FILDDT address 0 and end at memory address 177  (octal).
    Sector 1 will begin at address 200 and end at 377.  For release 4, all
    DEC supported  disks  contain  200  (octal) words  per  sector,  so  a
    consistent mapping  exists between  sector  number and  FILDDT  memory
    location.  Soon, TOPS-20 will support  RP20's.  For RP20's, there  are
    1000 (octal)  words per  sector (one  page per  sector).  Index  block
    addresses and most monitor disk addresses are in sectors.  That is why
    it is important to be able  to translate between sector addresses  and
    FILDDT memory addresses.
    The FILDDT option of  ENABLE PATCHING is also  available for use  with
    the DRIVE and  STRUCTURE command.  With  this option on,  the user  is
    able  to  modify  specific  words  on  the  structure.   Another  very
    convenient FILDDT command  one may  use in conjunction  with the  disk
    commands is LOAD (symbols from) input file spec.  One may specify  any
    file here but a useful one is SYSTEM:MONITR.  The symbol table to  the
    MONITOR has  home block  sector addresses,  FDB offsets  etc.  When  a
    file's  symbols  are  loaded,  one may  also define  his own  symbols.
    This is useful to remember addresses of data structures on the  units.
    For example, after finding the index block to a file, one could define
    a symbol, FILIDX at that address for easy referencing later on.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 52

    When examining  a multi-pack  structure using  the STRUCTURE  command,
    addressing the first unit is exactly as if there were only one unit in
    the structure.  FILDDT addresses of  sectors on the other units  begin
    immediately  after  the  last  address  for  the  first  unit  of  the
    structure.  For example, consider  that we would  like to examine  the
    BAT blocks for the second unit of a two pack STR: on RP06 drives.
    An RP06 contains 304000. sectors per unit and 128.  words per sector.
    The first FILDDT address for the second unit of a RP06 two  pack  STR:
    is  304000.*128.=38912000. or 224340000 (octal)
    [Looking at file structure PS:]

                        ; starting address of second unit in structure
                        ; plus sector address of BAT blocks (2)
                        ; times number of words per sector gives
                        ; FILDDT address of start of BAT blocks for
                        ; that unit


    224,,340400[   424164,,0   $6T;   BAT
    For another example, let's say we would like to find the start of  the
    ROOT-DIRECTORY symbol table.

    [22722 symbols loaded from file]
    [Looking at file structure PS:]
    NWSEC=200                   ; number of words per sector
    HM1BLK=1                    ; sector number of HOM block
    HOMRXB=10                   ; offset in HOM block for index
                                ; block to root-directory
                                ; sector number of HOM block
                                ; times words per sector equals
                                ; FILDDT address of start of HOM block
    HM1BLK*NWSEC[   505755,,0   $6T;HOM  
    HM1BLK*NWSEC+HOMRXB[   10,,5740 ; plus offset to address of index block
                                ; sector number of index block times
                                ; number of words per sector gives
    5740*NWSEC[   10,,5744      ; FILDDT adr of root-dir index block
                                ; NOTE:  Bit 14 (DSKAB) specifies this
                                ; address as a disk sector address.
                                ; sector addresses are bits 15-35
    RTDIDX:                     ; define symbol for index block
                                ; sector number of first page of
                                ; root-directory times number of words
                                ; per sector gives the
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 53

    5744*NWSEC[   400300,,100   ; FILDDT adr of first page of ROOT-DIR
    RTDIR0:                     ; define start of page 0 of ROOT-DIR
    RTDIR0+3[   30610           ; plus 3 for start of symbol table
                                ; NOTE: adr is a 'directory address'
                                ;       offset 610 of directory page 30
    RTDIDX+30[   10,,6250       ; get sector adr of page 30 of ROOT-DIR
                                ; sector adr of page 30 times words per
                                ; sector gives FILDDT address of page
                                ; 30 of ROOT-DIR.
    6250*NWSEC+610[   400400,,1 ; Add offset for symbol table start

    Here are some magic numbers for all DEC supported drives.
                                        OF 2nd UNIT     OF 3rd UNIT
                        (in decimal)     (in octal)      (in octal)
        __________      ____________    ____________    ____________
        RP04-RP05         152000.       112,,160000     224,,340000
        RP06              304000.       224,,340000     450,,700000
        RP07              502200.       365,,156000     752,,334000
        RM03              121360.        73,,204000     166,,410000
        RP20              201420.       611,,314000    1422,,630000

    NOTE: RP20 will not be supported in  release 4.  It is important  to
          remember that there are  1000 (octal) words  per sector for  a
          RP20.  As a result, to look at a sector of an RP20, one  would
          multiply the sector number by 1000 (octal) to find the  FILDDT
          starting address for that sector.   For all other drive  types
          there are 200 (octal) words per sector.

    The above information is calculated  from the parameters available  in

    REF: DDT41.MEM
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 54

                    TOPS-20 SCHEDULER TEST ROUTINES

     The following is a tabulation of (hopefully) all of  the  scheduler
tests  used by the TOPS-20 monitor, time-frame approximately Release 3A.
This includes ARPA and DECNET tests.  This is the data one finds in  the
monitor table FKSTAT indexed by fork number for forks which have blocked
and left the GOLST (i.e.  LH(FKPT) contains WTLST).  The format  of  the
FKSTAT  table  words  is TEST DATA,,TEST ROUTINE ADDRESS.  The scheduler
test routines are called periodically to determine if a process  can  be
unblocked.   This is indicated by a skip return from the scheduler test.
A nonskip return is taken if the process cannot yet be unblocked.

     When examining the monitor because of  a  hung  job  or  fork,  the
FKSTAT  table  can  often  reveal  the reason the fork is hung, and this
sometimes even allows corrective action to be taken.

     The table below gives routine name, what you should expect  to  see
in  the  FKSTAT  table,  and  the  module in which the scheduler test is
defined, followed finally by a short description of what the  particular
condition is which is being tested.

                            SCHEDULER TESTS

 ----           ----------------------------------------        -------

BALTST          [CONNECTION #,,BALTST]                          [NETWRK]
                Wait for network bit allocation.

BATTST          [UNIT #,,BATTST]                                [DSKALC]
                Wait for US.BLK, the lock bit for the BAT blocks
                on the unit, in the UDB to be zero.

BLOCKM          [TIME,,BLOCKM]                                  [SCHED]
                Wait for TIME in BLOCKM format which is the low
                order 17 bits of the desired future time to be
                compared against a suitably masked TODCLK.

BLOCKT          [TIME,,BLOCKT]                                  [SCHED]
                Wait for TIME in BLOCKT format which is a
                value that is shifted left 10 bits and compared
                against a suitably masked TODCLK, providing a
                longer delay than BLOCKM, but less precision.

BLOCKW          [TIME,,BLOCKW]                                  [SCHED]
                Wait for TIME in BLOCKW format (same as BLOCKM).

CDRBLK          [UNIT NUMBER,,CDRBLK]                           [CDRSRV]
                Wait for card-reader offline, or not waiting for
                a card.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 55

 ----           ----------------------------------------        -------

CHKLOK          [ADDRESS,,CHKLOK]                               [NSPSRV]
                Wait for NSP block lock at address to free.

COFTST          [TIME,,COFTST]                                  [MEXEC]
                Wait for job in FKJOBN to be attached or time
                in BLOCKT form to elapse.

DBWAIT          [DTE #,,DBWAIT]                                 [DTESRV]
                Wait for the TO-10 doorbell from the given DTE.

DGLTST          [0,,DGLTST]                                     [DIAG]
                Wait for DIAGLK lock to be free.

DGUIDL          [UDB ADDRESS,,DGUIDL]                           [DIAG]
                Wait for the unit to show as idle in the UDB.

DGUTST          [UDB ADDRESS,,DGUTST]                           [DIAG]
                Wait for the maintenance bit to set in the UDB.

DISET           [ADDRESS,,DISET]                                [SCHED]
                Wait for contents of ADDRESS to be zero.

DISGET          [ADDRESS ,,DISGET]                              [SCHED]
                Wait for contents of ADDRESS to be positive.

DISGT           [ADDRESS,,DISGT]                                [SCHED]
                Wait for contents of ADDRESS to be greater than

DISLT           [ADDRESS,,DISLT]                                [SCHED]
                Wait for contents of address to be less than

DISNT           [ADDRESS,,DISNT]                                [SCHED]
                Wait for contents of ADDRESS to be non-zero.

DMPTST          [COUNT,,DMPTST]                                 [IO]
                Wait for COUNT to be less than DMPCNT to indicate
                dump mode buffers freed.

DSKRT           [PAGE #,,DSKRT]                                 [PAGEM]
                Wait for CSTAGE for PAGE # to not be PSRIP,
                meaning disk read completed.

DWRTST          [PAGE #,,DWRTST]                                [PAGEM]
                Wait for DRWBIT to clear in CST3(PAGE #),
                meaning write completed.

ENQTST          [FORK #,,ENQTST]                                [ENQ]
                Wait for the lock on ENFKTB+FORK #.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 56

 ----           ----------------------------------------        -------

FEBWT           [ADDRESS OF FE UDB,,FEBWT]                      [FESRV]
                Wait for EOF or input bytes available from FE.
                Wake also on invalid assignment.

FEDOBE          [ADDRESS OF FE UDB,,FEDOBE]                     [FESRV]
                Wait for output buffer empty and all bytes are
                acknowledged by the FE.  Wake also if not a 
                valid assignment.

FEFULL          [ADDRESS OF FE UDB,,FEFULL]                     [FESRV]
                Wait for the current count of output bytes to be
                less than the count of bytes in the interrupt
                buffer.  Wake also on invalid assignment.

FORCTM          [SUPERIOR FORK INDEX,,FORCTM]                   [SCHED]
                Identifiable wait forever, forced termination.

FRZWT           [PREVIOUS TEST,,FRZWT]                          [FORK]
                Identifiable wait forever, frozen fork.

HALTT           [SUPERIOR FORK INDEX,,HALTT]                    [SCHED]
                Identifiable wait forever for halted fork.

HIBERT          [TIME,,HIBERT]                                  [SCHED]
                Wait for TIME in BLOCKT format.

HUPTST          [<0:9>TIME<10:17>HOST #,,HUPTST]                [NETWRK]
                Wait for IMPHRT bit set for host or time out in
                BLOCKW form.

IDVTST          [0,,IDVTST]                                     [IMPDV]
                Wait for the lock on IDVLCK to free, lock it.

IMPBPT          [0,,IMPBPT]                                     [IMPDV]
                Wait for IMPFLG nonzero, or IBPTIM timer to run
                out, or IDVLCK lock free and output scan needed
                for the IMP.

JB0TST          [TIME,,JB0TST]                                  [MEXEC]
                Wait for JB0FLG set nonzero for explicit request
                or time in BLOCKT form to elapse.

JRET            [0,,JRET]                                       [SCHED]
                Wait forever, interruptible.

JSKP            [0,,JSKP]                                       [SCHED]
                Unconditional skip used to schedule immediately.

JTQWT           [0,,JTQWT]                                      [SCHED]
                Wait for JSYS trap queue.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 57

 ----           ----------------------------------------        -------

LCKTSS          [ADDRESS,,LCKTSS]                               [IO]
                Wait for lock at ADDRESS to unlock, lock it.

LKDSPT          [0,,LKDSPT]                                     [STG]
                Wait for room in LDTAB table of directories
                currently locked.
LKDTST          [INDEX INTO LDTAB,,LKDTST]                      [STG]
                Wait for bit in LCKDBT to clear, indicating
                directory unlocked.

                Wait for flag LP%LHC to set in the addressed
                word, indicating loading has completed of the
                VFU or RAM file.

LPTDIS          [UNIT ADDRESS,,LPTDIS]                          [LINEPR]
                Wait for an error condition on the addressed
                unit, or for all buffers cleared and no bytes
                still in the front-end, before finishing close
                operation on the device.

MTARWT          [IORB ADDRESS,,MTARWT]                          [MAGTAP]
                Wait for IRBFA in the IORB to indicate that this
                IORB is no longer active.

MTAWAT          [UNIT #,,MTAWAT]                                [MAGTAP]
                Wait for all outstanding IORBs for unit to be

MTDWT1          [UNIT #,,MTDWT1]                                [MAGTAP]
                Wait for the count of outstanding requests on the
                unit to go to one.

NCPLKT          [0,,NCPLKT]                                     [NETWRK]
                Wait for lock NCPLCK to free, lock it.

NICTST          [0,,NICTST]                                     [PAGEM]
                Wait for SUMNR less than or equal to MAXNR or
                only one fork in BALSET.

NOTTST          [<0:8>CONNECTION #<9:17>STATE,,NOTTST]          [NETWRK]
                Wait for connection to leave state.

NSPTST          [0,,NSPTST]                                     [NSPSRV]
                Wait for KDPFLG nonzero, indicating KMC11 wants
                service, or MSGQ nonzero, indicating messages to

NVTNTT          [<0:8>OPTION #,<9:17>LINE #,,NVTNTT]            [TTNTDV]
                Wait for completed NVT negotiation.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 58

 ----           ----------------------------------------        -------

OFNLKT          [OFN,,OFNLKT]                                   [PAGEM]
                Wait for OFN unlocked--SPTLKB zero in SPTH(OFN).

PIDWAT          [FORK #,,PIDWAT]                                [IPCF]
                Wait for bit for fork in PDFKTB to set.

SEBTST          [0,,SEBTST]                                     [SYSERR]
                Wait for SECHKF to go nonzero before starting
                Job 0 task to write queued SYSERR entries.

SEEALL          [0,,SEEALL]                                     [TTYSRV]
                Waits for SNDALL to go to zero, indicating the
                send-all buffer available.

SPCTST          [0,,SPCTST]                                     [DTESRV]
                Wait for a node.

SPMTST          [0,,SPMTST]                                     [PAGEM]
                Wait for page in SPMTPG to be on SPMQ or the
                time SPMTIM to expire.

SQLTST          [0,,SQLTST]                                     [IMPDV]
                Wait for the special queues lock SQLCK and lock

                Wait for the structure lock to be free.

                Wait for flag CD%SHA to come on in the addressed
                word, indicating that cardreader status has

                Wait for flag LP%SHA to set in the addressed
                word, indicating that printer status has

SUSFKT          [FORK #,,SUSFKT]                                [FORK]
                Wait for fork to be on WTLST in either SUSWT
                OR FRZWT.

SWPRT           [PAGE #,,SWPRT]                                 [PAGEM]
                Wait for CSTAGE for PAGE # to not be PSRIP,
                meaning swap read completed.

SWPWTT          [0,,SWPWTT]                                     [PAGEM]
                Wait for NRPLQ nonzero.  Increment CGFLG each
                time test is unsuccessful.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 59

 ----           ----------------------------------------        -------

TCIPIT          [FORK #,,TCIPIT]                                [TTYSRV]
                Waits for no interrupts pending for FORK #.

TCITST          [LINE #,,TCITST]                                [TTYSRV]
                Wait for line inactive, no fork in input wait,
                or input buffer non-empty.

TCOTST          [LINE #,,TCOTST]                                [TTYSRV]
                Wait for line inactive, or output buffer not
                too full to add a character to it.

TRMTS1          [0,,TRMTS1]                                     [FORK]
                Identifiable wait forever for inferior fork termination.

TRMTST          [FORK #,,TRMTST]                                [FORK]
                Wait for FORK # to be on WTLST for either HALTT
                or FORCTM.

TRP0CT          [MINIMUM NRPLQ,,TRP0CT]                         [PAGEM]
                Wait for NRLPQ to be above stated minimum or
                normal minimum.  Increment CGFLG each time
                test is unsuccessful.

TSACT1          [LINE #,,TSACT1]                                [TTYSRV]
                Wait until line inactive, becoming active, or
                has a full length dynamic block assigned.

TSACT2          [LINE #,,TSACT2]                                [TTYSRV]
                Wait for line available--inactive or fully

TSACT3          [LINE #,,TSACT3]                                [TTYSRV]
                Wait for line inactive--dynamic data unlocked.

TSTSAL          [0,,TSTSAL]                                     [TTYSRV]
                Wait for SALCNT to go to zero, indicating the
                send-all is finished for this buffer.

TTBUFW          [NUMBER,,TTBUFW]                                [TTYSRV]
                Wait for NUMBER of buffers.

TTIBET          [LINE #,,TTIBET]                                [TTYSRV]
                Wait for line inactive or input buffer empty.

TTOAV           [LINE #,,TTOAV]                                 [TTYSRV]
                Wait for line inactive and output buffer not

TTOBET          [LINE #,,TTOBET]                                [TTYSRV]
                Wait for line inactive or output buffer empty.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 60

 ----           ----------------------------------------        -------

UDITST          [0,,UDITST]                                     [PHYSIO]
                Wait for at least two free IORBs on UIOLST.

UDWDON          [IORB ADDRESS,,UDWDON]                          [PHYSIO]
                Wait for IS.DON to set in IRBSTS for this IORB.

UPBGT           [CONNECTION INDEX,,UPBGT]                       [IMPDV]
                Wait for LTDF connection done flag to set, or
                output buffers to appear.

USGWAT          [0,,USGWAT]                                     [JSYSA]
                Wait for lock on queued USAGE blocks to free.

VVBWAT          [UNIT #,,VVBWAT]                                [TAPE]
                Wait for the MDA to reset TPVV handling EOV.

WATTST          [<0:8>CONNECTION #<9:17>STATE,,WATTST]          [NETWRK]
                Wait for connection to be in state.

WTFKT           [FORK #,,WTFKT]                                 [FORK]
                Wait for fork to be on WTLST.

WTSPTT          [PAGE #,,WTSPTT]                                [SCHED]
                Wait for share count on PAGE # to go to 1.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 61

                      TOPS-20 PAGE ZERO LOCATIONS

     The following text outlines the uses of memory in page zero of  the
TOPS-20 monitor as of Release 4.

====   ======== =====

  0-17    --    Shadow ACs, not used.

 20     SCTLW   Scheduler halt request word (see SWTST in SCHED).   Word
                 of   function  bits,  current  functions  include  Halt
                 timesharing, wait for system down,  manual  pause,  and
                 reset FE protocol.

 21       --    Used by BOOT to build CCW lists (unused by monitor).

 22       --    Same as 21;  both unused for KS10 systems.

 23     CRSHTM  Initial time for  reload;   -1  =>  time  not  set  yet.
                 Contains   the  date/time  that  the  system  was  last
                 reloaded.   May  see  -1  after  forced  reload  on  KS
                 processor.   BUGSTO  (APRSRV) copies TADIDT into it for
                 each BUGHLT/CHK/INF.

 24     SEBQOU  Pointer to queued SYSERR blocks not yet written.

 25     DBUGS1  Not currently used by the monitor.

 26     BUGHAD  Code around SYSLD1 (STG) puts LH into  BUGCHK,  RH  into
                 BUGHLT  after  a  reload.   No  one else uses it, so it
                 should contain zero.

 27     CRSTD1  Current time is saved here on each BUGHLT/CHK/INF.  This
                 is the value that gets into the SYSERR block.  Contains
                 the   date/time   for   the   system's   most    recent

 30     SHLTW   Scheduler  halt  word,  depositing  a  nonzero  contents
                 requests system shutdown.

 31     RLWORD  KS  only;   used  for  front-end  communication,  flags,
                 keep-alive, etc.  (see PROKS).  Unused on KL.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 62

 32     CTYIWD  KS only;  used for front-end communication, used for the
                 CTY input location.  Unused on KL.

 33     CTYOWD  KS only;  used for front-end communication, used for the
                 CTY output location.  Unused by KL.

 34     KLIIWD  KS only;  used for front-end communication, used for the
                 KLINIK input location.  Unused by KL.

 35     KLIOWD  KS only;  used for front-end communication, used for the
                 KLINIK output location.  Unused by KL.

 36       --    Unused/reserved.  Holds KS RHBASE during boot.

 37       --    Unused/reserved.  Holds KS unit number during boot.

 40     .JBUUO  Monitor's location 40.  Holds KS tape info during boot.

 41     .JB41   Monitor's LUUO dispatch word.
                 Contains XPCW LUUBLK.

 42-43    --    Unused/reserved.

 44     .JBREL  Job Data Area word filled in by LINK.  Contains 777.

 45-67    --    Unused/reserved.

 70     PWRTRP  Location executed by the front-end on powerfail restart.
                 Contains JRST PWRRST.

 71     RLDADR  Executed  by  the  front-end  on  certain   (keep-alive)
                 reloads.   APRSRV  demands  this  location be PWRTRP+1.
                 Contains XPCW RLODPC which winds up  at  RLDHLT  for  a
                 KPALVH BUGHLT.

 72     EDDTF   Retain EDDT in core if contents is one.

 73     CRSTAD  Is supposed to contain date/time of last crash.  Code in
                 STG  checks  it  to  decide  to  restore  the data from
                 BUGHAD.  During system startup for KL-10s the  word  is
                 used   to   set   the   reload  date/time  if  nonzero.
                 Apparently it gets no real  use  on  KS-10s.   Contains
                 zero while system is in normal operation.

 74     .JBDDT  JOBDDT location.
                 Contains DDT (EDDT entry point).

 75     .JBHSO  Unused/reserved.

 76     DBUGSW  BUGHLT action switch  word  (0=unattended;   1=attended;
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 63

 77     DCHKSW  BUGCHK action switch word.

100-107   --    Reserved for use by the front-end command language.

110     STSBLK  KL-Status block pointer, virtual address.  Contains zero
                 if status reporting is not enabled.

111       --    Physical address (MAP) of above virtual address.

112     .JBEDV  Pointer to Exec Data Vector
                 Contains MONEDV.

113-114   --    Unused/reserved.

115-117   --    Unused/reserved.

120     .JOBSA  TOPS-10 style start address.
                 Contains EVGO.

121-132   --    Unused/reserved.

133     .JBCOR  Job Data Area location set by LINK.  LH contains highest
                 low  segment  address loaded with data.  RH refers to a
                 SAVE argument for highest page.

134-136   --    Unused/reserved.

137     .JBVER  Job Data Area version number word.
                 Contains current monitor version number.

140     EVDDT   Monitor startup transfer vector;  enter EDDT.
                 Contains JRST DDTX.

141       --    Reset and go to EDDT location.
                 Contains JRST SYSDDT.

142     EVDDT2  Copy of 140.
                 Contains JRST DDTX.

143     EVSLOD  Entry to initialize file system, used for installation.
                 Contains JRST SYSLOD.

144       --    Unused;  contains zero.

145     EVRST   Restart the system location.
                 Contains JRST SYSRST.

146     EVLDGO  Reload and start the system location.
                 Contains JRST SYSGO.

147     EVGO    Start the monitor location.
                 Contains JRST SYSGO1.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 64

150     DDTPRS  DDT present flag;  EDDT is present if nonzero.
                 Contains -1 initially, cleared later for EDDTF not set.

151     BUTRXB  Defined in BOOT and STG but not  used  (BOOT  reads  the
                 disk   address  of  the  Root-directory  from  the  HOM
                 blocks).  Contains zero.

152     BUTMUN  Defined in BOOT and STG but not  used  (BOOT  reads  the
                 values  from  the  HOM blocks, and uses variable MAXUNI
                 instead).  Contains zero.

153-162 BUTDRT  Defined in BOOT and STG but not used (BOOT uses internal
                 variable  DSKTAB  for  logical  to  physical  structure
                 mapping).  Contains zeros.

163-201 BUTCMD  ASCIZ file  name  of  monitor;   used  for  booting  the
                 swappable monitor with calls to VBOOT for segments.

202     BUTPGS  Start,,End virtual addresses of VBOOT  pages.   Used  to
                 reference and finally unlock/destroy VBOOT pages.

203     BUTEPT  Contains in LH:  Address of the VBOOT EPT page.
                 RH:  Address of the VBOOT page table page.

204     BUTPHY  Contains in LH:  Minus number of pages to map.
                 RH:  Address of first page to map  (for  the  monitor).
                 Typically  contains -5,,773777 for three pages of code,
                 a file data page and an index block  page.   Used  with
                 the value in BUTVIR.

205     BUTVIR  Virtual address of first page of BOOT to map.  Typically
                 will contain 773000.  Used in conjunction with BUTPHY.

206     BOOTFL  BOOT flags word, 0 => normal, nonzero =>  special  boot.
                 The  contents  is supposed to be the index into a table
                 (BOOTD) designating how to boot the swappable  monitor.
                 An ILBOOT BUGHLT results if the index is too large.  In
                 the SYSGO routine the value IRBOOT is put into  BOOTFL;
                 the table BOOTD contains entries of JRST GSMDSK for all
                 entries but the IRBOOT offset, which has JRST GSMIRB.

207     DINFSW  BUGINF action switch word.

210-237 PHYPZS  Formerly  used  for  page  zero  I/O  use   by   PHYSIO.
                 Currently unused, contain zero.

240-777   --    Not used, contain zero.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 65

                    Known Hardware Deficiencies List

This is a collected list of known hardware characteristics which show up
from  time  to  time  as  part  of certain reported problems.  This says
nothing about whether these characteristics are  bugs  or  features,  or
whether  they will ever be fixed or changed, but merely attempts to make
them known internally.

     1.  DZ11 - Cannot set the speed to zero in the hardware,  can  only
         turn off the receiver.

     2.  TM02 - Can generate bad parity which it  passes  to  memory  to
         cause  the  system  memory  parity  errors  when  the  data  is
         referenced.  This is still seen with Rev 12 to the RH20.

     3.  TM03 - A chip race condition has been known to  occur  where  a
         function  register  has wrong value because it has not settled.
         This generates a device error which  appears  transient;   i.e.
         CRLFing DUMPER tries the read again and succeeds.

     4.  TM03 - ANSI ASCII was  not  included  in  the  hardware  format

     5.  TM03 - When using industry-compatible  mode,  reads  not  of  a
         multiple of four bytes will produce strange results.  The bytes
         are counted, but the extra bytes are  not  written  to  memory,
         leaving garbage.

     6.  DX20 - there is a race type condition where the DX20  generates
         an  an  interrupt  request on channel 5 for some condition, but
         the code is playing with the DX20 and handles the condition, so
         it lowers its request, however the KL has latched the interrupt
         and tries to process it, but no one will respond.  So it  tries
         the 40+2n type, which gives a PI5ERR occasionally.

     7.  VT100 - on a VT100 without the extended memory, one can confuse
         the  internal  microprogram enough to have it clear sections of
         the screen on Control-U, Control-R, etc.

     8.  RH20 - perfectly willing to store bad parity data  into  memory
         until Rev 12.  May still do so.

     9.  DX20 - is unwilling to allow registers to be examined after  it
         has  started  I/O.   Can  cause  register  access errors if not
         programmed in correct sequence.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 66

    10.  LP20 - at least one of the printers fails to go  off-line  when
         there  is  anything  in the print line buffer, even if the drum
         gate is opened.

    11.  KS-10 Front End - Rev.  3.  exhibits problems with  the  KLINIK
         line.   If  the  link is in use, it is possible to lock out the
         CTY.  There are problems with the password check on  subsequent
         tries, and problems with line hang-up.

    12.  KS-10 Front  End  -  Rev.   3.   exhibits  some  problems  with
         powerfail  restart.   If  the  power  returns  in less that 3.5
         seconds or so the restart will hang.  In addition  if  Rev.   3
         and  Rev.  2 boards are mixed, there is no powerfail restart or
         reload capability.

    13.  DX20/TU71 - the DX20 microcode does not set the 556 bpi density
         correctly   for   TU71  (7-track)  drives.   This  can  be  set
         successfully from the maintenance panel.

    14.  TM03 - if an error ocurs while rewinding, the  monitor  may  be
         left in a state of waiting for the rewind to complete, the tape
         being unusable.  The easiest way to clear this condition is  to
         reset the TM03, most easily done by the customer by powering it
         down and back up.

    15.  KS10 - during a forced reload, the halt status block is written
         twice,  first when halting and second when rebooting;  thus the
         second time wipes any valuable data from the first time.   It's
         once again the 8080 that's responsible.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 67



^Z      ;enter USER mode
^\      ;enter CONSOLE mode
MK XX   ;Marks microcode word at CRAM address XX (sets bit 95)
UM XX   ;Unmarks Microcode at CRAM address XX
MB      ;load only bootstrap of currently selected magtape
LA XX   ;Load/set KS10 Memory Address
LI XX   ;Load/set I/O address
LK XX   ;Load/set 8080 address
LC XX   ;Load/set CRAM address to be written/read
EM      ;Examine KS10 Memory (last Memory location specified)
EM XX   ;Examine KS10 Memory location XX
EN      ;Examine Next (either from last EK, EM or EI)
EB      ;Examine BUS and 8080 control registers
EI      ;Examine I/O (last I/O address specified)
EI XX   ;Exmaine I/O address XX
EK      ;Examine 8080 location
EK XX   ;Examine 8080 address XX
DM XX   ;Deposit KS10 Memory last addressed, XX data (36 bits)
DN XX   ;Deposit next (depending on last DK, DM or DI) XX data
DB XX   ;Deposit BUS, XX data (36 bits)
DI XX   ;Deposit I/O, XX data (16,18 or 36 bits)
DK XX   ;Deposit XX (8 bits) into 8080 (Data can only be deposited
        ;in RAM addresses)
CS      ;CPU clock start
CH      ;CPU clock halt
CP XX   ;CPU clock pulse (XX=NR of pulses -- default 1 pulse)
SI      ;Single Instruction
LF XX   ;Load diagnostic write function (0-7) specifying 12 bits of
        ;microcode (see note at end ****)
DF XX   ;Deposit Field, write microcode bits according to last LF-command
EC      ;Examine CRAM ..curr. Control reg, no clocks .. current loc as addr.
EC XX   ;Examine CRAM at address XX
DC XX   ;Deposit CRAM, XX is at least 32 octal characters. Address 
        ;previously loaded by LC command
EX XX   ;EXecute KS10 instruction XX
ST XX   ;STart KS10 at address XX. Console enters user mode
SM XX   ;Start microcode at XX (SM 1 causes dump of HALT-status block !!) 
        ;Default is 0 -- Start microcode
HA      ;HALT KS10 (execute HALT-instruction -- causes microcode to
        ; write HSB and then to enter HALT-loop)
SH      ;SHUTDOWN (deposit non-zero data in memory location 30)
        ; causing TOPS20 to shut down
CO      ;Continue (causes microcode to leave HALT-loop)
PE X    ;Parity Enable (0=disable, 1=DRAM-par, 2=CRAM-par
        ; 4=clock-par error stop, 5=DPE/DPM, 6=CRA/CRM, 7=enable all)
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 68

CE X    ;CACHE enable (0=OFF, 1=ON, <CR>=show current state)
TE X    ;CPU timer (1 MSEC) enable (0= OFF, 1=ON, <CR>=show current state)
TP X    ;CPU TRAPS enable (0=OFF, 1=ON (enables paging),
        ;<CR>=show current state)
LT      ;Lamp Test, lights three lamps of front panel
RC      ;Read CRAM direct, functions 0-17
        ; (no resets, no load diag adr, no CPU clock) (see note at end ****)
EJ      ;Examine Jumps -- prints CRAM address signals (Current CRAM address, 
        ;next CRAM address, jump address, subroutine return address)
TR XX   ;TRACE - repeats CP and EJ commands until any character typed
        ;XX (if typed) is desired CRAM stop-address
PM      ;Pulse Microcode (issue single CP and EJ)
ZM      ;Zero KS10 MOS Memory (beware -- slow)
RP      ;Repeat - repeats last command, or line of commands which it delimits
        ; Any character (except CNTRL-O) typed will stop repeat
        ;EXAMPLE: EM 0, EK 0, EC 0, RP will repeat execution of this line
BT      ;Boot SYSTEM -- load CRAM from designated disk (see DS)
        ; via memory then load monitor boot from disk and start at 1000
BT 1    ;same as BT, but loads diagnostic monitor SMMON and starts at 20000
LB      ;Load Bootstrap from designated disk (see DS)
LB 1    ;Load Bootstrap diagnostic monitor SMMON
DS      ;Disk Select for bootstrap or microcode verification. Command prompts 
        ;to specify UNIT NUMBER (default 0), RHBASE (default 776700), 
        ;and UNIBUS ADAPTER (default 1) to load from when booting
MS      ;Magtape Select for bootstrap or microcode verification. Command 
        ;prompts to specify UNIT NUMBER (default 0), RH BASE (default 772440),
        ;UNIBUS ADAPTER (default 3), SLAVE NUMBER (default 0), and 
        ;DENSITY (default 1600 BPI) of magtape to boot from
MT      ;Magtape Boot system from selected magtape
MT 1    ;BOOT diagnostic monitor SMMAG from magtape
PW      ;clears KLINIK password, or sets it (6 char's max)
BC      ;BOOT Check. PROM code which tests the basic 2020 system
        ; load path from the UNIBUS adaptor into the CRAM via memory.

^U      ;rub out current line
^O      ;switch: first one stops CTY-output, second one resumes CTY-output
^S      ;stop TTY-output and hangs 8080 waiting for CONTROL-Q (see below)
^Q      ;resumes TTY-output
^C      ;stops whatever the 8080 is doing
RUB-OUT ;rub out previous character typed

NOTE:    Several commands may be put on a single line, separated by commas.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 69

*****   CRAM Bit Formats

        LF-Command CRAM Bits            RC-Command CRAM Data
        --------------------            ---------------------

        LF      CRAM bits               RC      Data
        --      ---------               --      ------------------------------

        0       00-11                   0       CRAM bits 00-11
        1       12-23                   1       Next CRAM address
        2       24-35                   2       CRAM subroutine return address
        3       36-47                   3       current CRAM address
        4       48-59                   4       CRAM bits 12-23
        5       60-71                   5       CRAM bits 24-35 (Copy A)
        6       72-83                   6       CRAM bits 24-35 (Copy B)
        7       84-95                   7       0s
                                        10      Parity bits A-F
                                        11      KS10 bus bits 24-35
                                        12      CRAM bits 36-47 (Copy A)
                                        13      CRAM bits 36-47 (Copy B)
                                        14      CRAM bits 48-59
                                        15      CRAM bits 60-71
                                        16      CRAM bits 72-83
                                        17      CRAM bits 84-95
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 70


?BUS            BUS polluted on power-up
?BFO            Input Buffer Overflow
?IL             ILLEGAL Instruction
?UI             Unknown Interrupt
?A/B            A and B copies of CRAM bits did not match
?DNF            Did Not Finish instruction
?BT             device error or timeout during BOOT operation
?DNC            Did Not Complete HALT
?PAR ERR        report clock-freeze due to parity error,
                 and type out READ IO of 100,303,103
?RE             memory Refresh Error (MEM BUSY stayed set too long,
                 because it didn't release data on a write to memory)
?CHK            PROM checksum failed
?BC             BOOT Check failed
?RUNNING        CPU clock running (command typed requires clock to be stopped
                 and may fail)
?NDA            received No Data Acknowledge on memory request
?NXM            referenced NoneXistent Memory location
?NBR            Console was not granted BUS on a request
?RA             command Requires Argument
?BN             received Bad Number on input (character typed is not an 
                 octal number
?KA             KEEP ALIVE failed
?FRC            had a forced reload
?PWL            Password Length error
?IA             Illegal Argument (address out of range, etc.)


BUS 0-35                message header for EB command
KS10>                   prompt message
CYC                     cycle type for DB command
SENT                    data sent to bus
RCVD                    data received on bus
HLTD                    message "HALTED/XXXXXX " where xxxxxx is data
BT SW                   message says BOOTING, using BOOT switch
OFF                     message, says current state is off
ON                      message, says current state is on
>>UBA?                  query for UNIBUS adapter
>>UNIT?                 query for unit to use
>>RHBASE?               query for RH11 base register address to use
>>DENS?                 query tape density
>>SLV?                  query tape slave number
C CYC                   typed on DB-command if COM/ADR cycle blew
D CYC                     "             "      DATA    cycle blew
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 71


        On an error-condition, detected by the 8080, the
        Fault-light will go on and a message of the form

                ?BT XXXYYY

        will be printed on the CTY.

The following error-codes are only "rough" pointers, they can be
caused by any of the following problems:

        Disk not a disk at all
        Wrong unit selected (see DS-command)
        Home blocks not readable or not there
        Home blocks not set by SMFILE for 8080
        8080 File-system garbage

XXX=001 Disk error encountered while trying to read HOME-blocks
        Can mean incorrect RHBASE specified, wrong UBA selected,
        bad disk drive, neither  home block or alternate home
        block has home block ID ("HOM" in sixbit)

XXX=002 Disk error encountered while trying to read the page of
        pointers, which make up the "8080-File-System"
        Can mean pack is not in format for 8080 loading, home blocks
        bombed, bad drive or pack

XXX=003 Disk error encounterd while trying to read a page of
        microcode - can mean pack is not in 8080 format, or bad drive or 

XXX=004 Microcode did not successfully start running after a BT, MT,
        MB, or LB command.  This error will occur when an LB is done
        before the system microcode is loaded.
XXX=010 Disk error encountered while trying to read PRE-BOOT

YYY     are the lower 8 bits of the 8080 address of the failing
        "Channel Command List" operation. Normally it is here
        a good bet to do an "EI" to get the contents of the
        RH11 register that has the error-bits set !
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 72


The following ERROR-messages can point to the following problem areas:

        Magtape is no magtape at all
        Wrong unit selected (see MS-command)
        Magtape is not bootable (no microcode, no PRE-BOOT)

XXX=001 Error trying to read microcode first page
        Can mean wrong unit selected, wrong RHBASE address, wrong UBA
        selected, wrong slave number, wrong density, bad drive, bad
        controller, bad tape, tape in wrong format

XXX=003 Error trying to read additional pages of microcode

XXX=010 Error trying to read in PRE-BOOT program
        May occur while doing a skip over the microcode file, or
        while reading the PRE-BOOT itself

YYY     see above (disk-section)


PRE-BOOT is loaded from Disk or Magtape (see 8080 commands DS, MS,
         BT, BT 1, MT, MT 1)

PRE-BOOT is written onto the disk using "SMFILE.EXE", it also is written on
"standard" Diagnostic-tapes  and onto the "MONITOR-INSTALLATION"-tapes.

PRE-BOOT is loaded by the 8080 into MEMORY-locations 1000 and up, and starts
at 1000.  The ERROR-halts are:

        1001    found "bad" core-transfer address
                 (page 1 is illegal - can't overload PRE-BOOT)
        1003    No RH11 Base Address
        1004    Magtape Skip failure
        1002    Disk Retry error or Magtape Read error

At ERROR-halt time the following MEMORY-Locations contain the useful INFO :

                Disk-Booting                    Magtape-Booting
                ------------                    ---------------

        100     "8080" disk-address             Not used
        101     Memory transfer address         same
        102     T3, selection pickup pointer    same
        103     RPCS1-register                  MTCS1-register
        104     RPCS2-register                  MTCS2-register
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 73

        105     RPDS - register                 MTDS - register
        106     RPER1-register                  MTER1-register
        107     RPER2-register (RP06 only)      Not used
        110     RPER3-register                  Not used
        111     UBA Page RAM loc 0              same
        112     UBA-status register             same
        113     Version Nr. of PRE-BOOT         same

        Note: The Version Nr. of PRE-BOOT will be the same as the Version Nr.
        of SMFILE. The "8080" disk-address is in the form " CYL SEC SURF "


        EM 77
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 74

8080-Communication-Area (KS10 Memory)

The 8080 maintains and services an in-core communication area.
Currently used are words 31 to 40.  See PROKS.MAC for more info.

Word Nr.                Meaning
---- ---                -------
  31            Keep Alive and Status word
  32            KS-10 CTY input word (from 8080)
  33            KS-10 CTY output word (to 8080)
  34            KS-10 KLINIK user input word (from 8080)
  35            KS-10 KLINIK user output word (to 8080)
  36            BOOT RH-11 Base Address
  37            BOOT Drive Number
  40            Magtape Boot Format and Slave Number

Word 31         Keep Alive and Status word
---- --
Bit 4           Reload Request
Bit 5           Keep Alive active
Bit 6           KLINIK active
Bit 7           PARITY Error detect enabled
Bit 8           CRAM Parity Error detect enabled
Bit 9           DRAM Parity Error detect enabled
Bit 10          CACHE enabled
Bit 11          1 msec enabled
Bit 12          TRAPS enabled
Bit 20-27       Keep Alive counter field
Bit 32          BOOT SWITCH BOOT
Bit 33          POWER FAIL
BIT 34          Forced RELOAD
BIT 35          Keep Alive failed to change

Word 32         KS-10 CTY input word (from 8080)
---- --
Bits 20-27      0 -- no action, 1 -- CTY character pending
Bits 28-35      CTY-character

Word 33         KS-10 CTY output word (to 8080)
---- --
Bits 20-27      0 -- no action, 1 -- CTY character pending
Bits 28-35      CTY-Character

Word 34         KS-10 KLINIK user input word (from 8080)
---- --
Bits 20-27      0 -- no action, 1 -- KLINIK character,
                2 -- KLINIK active, 3 -- KLINIK carrier loss
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 75

Bits 28-35      KLINIK-Character

Word 35         KS-10 KLINIK user output word (to 8080)
---- --
Bits 20-27      0 -- no action, 1 -- KLINIK character, 2 -- Hangup request
Bits 28-35      KLINIK-Character

OUTPUT process KS10 ==> 8080

 Load character and flag into  33,   set 8080-interrupt,   8080 examines
   33 and gets character, clears interrupt, sends character to hardware,
   clears 33 and sets KS-10 interrupt.

INPUT process 8080 ==> KS10

 8080 gets interrupted "TTY-char available",   8080 gets character and
  delivers into input-word (31) with flag(s) and sets KS-10 interrupt.

***NOTE: Additional information on KS10 console commands can be found 
         in the KS10 MAINTENANCE GUIDE
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 76

                              CRASH DUMPS

Each time there is a BUGHLT there  is  an  automatic  dumping  of  the
system  core  image  into PS:<SYSTEM>DUMP.EXE.  If there is sufficient
room on the DSK the data that  was  previously  in  DUMP.EXE  will  be
copied into DUMP.CPY by SETSPD after the system is reloaded.  DUMP.CPY
does not get deleted and you may find several generations of DUMP.CPY.

In the case you have set no auto reload you can dump the crash by hand
by typing /D to the system BOOT> prompt.  You can get into BOOT if you
are reloading the system by bringing the system  up  from  the  switch
registers rather than hitting <ENABLE> <DISK> on the console.  See the
Operators Guide for  a  discussion  of  the  meaning  of  the  various
switches on the DEC-20.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 77

                             CRASH ANALYSIS

First when analyzing software or software/hardware  problems  be  sure
you have the proper tools:

     1.  A SWSKIT on magtape

     2.  A full copy of the current  release  microfiche  MONITOR  and


     4.  A SYSERR manual.

     5.  A listing of  the  SYSERR  log,  especially  if  hardware  is

     6.  A CTY  output  for  BUGHLTs  and  BUGINFs  or  other  problem
         indications, or an accurate reproduction of this information.

     7.  Any other manuals you may need  for  reference  such  as  the
         proper  version  Installation  Guide, Operators Guide, System
         Managers Guide, etc.

     8.  A TOPS-20.BWR file.

You will need the SWSKIT and perhaps listings of the  latest  versions
of  monitor modules in case the microfiche are not up to date.  FILDDT
is on the customers distribution tape.

Be sure you have analysed the SYSERR log.  Be  sure,  also,  that  you
have  looked  up the BUGHLT and/or BUGCHKs in question in the listings
(microfiche)  and  have  at  least  read  the  comments  around  them.
Probably tracing down how it got called is a good idea.  If you happen
to be without a GLOB (provided on microfiche) you can find the  BUGHLT
tag of interest in the monitor as follows:

        $ST 140
        ILPP3?                  ; BUGHLT of interest followed by "?"
        PAGEM G                 ; it is defined in PAGEM and is global

Some other useful bits  of  information.   There  is  a  GLOB  listing
provided  in  the  microfiche  which contains a list of all the global
symbols in the monitor.  Most of the symbols are defined in the module
STG.MAC.  If you don't know a tag name but want to look at the storage
for DTEs, say, look through STG.  STG also contains some small portion
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 78

of  code mostly to do with restart, start, auto reload, dispatches for
PI channels and A few scheduler tests.  STG stands for storage.   Note
that  some stuff may be defined in PROLOG, and of course lots of stuff
is defined throughout the monitor.  You may also want to get a listing
of  MACSYM  to  be able to understand the macros you see while reading
the monitor listings;  MONSYM is also useful at times.   Be  sure  you
know how PARAMS has been changed in case it has.  See BUILD.MEM on the
distribution tapes for the currently distributed information  on  what
to do to change various system parameters in PARAM0.MAC.  Be sure that
you know about any variables that the site may have changed in STG  as
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 79

                         EXAMINING THE MONITOR

Debugging a complex, multi-process software system is largely a matter
of  absorbing  sufficient knowledge, experience and folklore about the
particular system with a considerable element of personal  preference,
or  'taste'  also involved.  This document is a cursory description of
features built into the system to aid debugging, and such folklore  as
can be described in written English.

There are four different versions of DDT that may be used  to  examine
the  monitor.   Each  is  used for a different purpose and has special
capabilites.  The versions of DDT are:

     1.  UDDT (user DDT) used to  examine  or  modify  the  MONITR.EXE

     2.  MDDT (monitor DDT) used to  examine  or  modify  the  running
         monitor under timesharing.

     3.  EDDT (exec DDT) used to examine or modify the running monitor
         from the CTY in a stand-alone mode.

     4.  FILDDT used to examine dumps.

All the DDT's are versions of TOPS-20 DDT documented  in  the  TOPS-20
DDT manual, and have all of the features described in the manual.  See
also the document DDT41.MEM.

The use of all four versions of the DDT's is  the  same  and  will  be
described latter, however, each version is started differently.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 80


To use UDDT to modify your MONITR.EXE file on system,  you  must  give
the following EXEC commands:

        @START 140      or on Release 4 systems, @DDT

This causes EDDT to start in user mode.  This is the same DDT that  is
used  when  examining  any program.  You may now look at or change any
part of the monitor.  If you make changes to the monitor and  want  to
save  it,  you should get back to the EXEC by typing ^Z.  Then you may
save the monitor.

You will probably have to be enabled in order to save the monitor back
in  <SYSTEM>.   This  is  the  safest, best, and recommended method of
putting patches into the monitor.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 81


A version of DDT which runs in monitor space  is  available.   It  can
examine  and  change  the  running  monitor,  and  can breakpoint code
running as a process but not at PI or scheduler level.  When  patching
or  breakpointing  the  swappable monitor, the normal write protection
must be defeated, either by setting DBUGSW to 2 on startup, or calling
SWPMWE.  If you insert breakpoints with MDDT, remember monitor code is
reentrant and shared so that the breakpoint could be hit by any  other
process  in  the  system.   In this event, the other process will most
likely crash since it will be executing a JSR to a page full of zeros.

To use MDDT you must have WHEEL or OPERATOR capabilities.   You  first
issue the EXEC command:


                ; You are now in the mini-exec and receive a  prompt
                ; of MX>.  Now you give the "/" command:
                ; You are now put into MDDT.  To return to the  EXEC
                ; you can  issue  a ^Z  or  a ^C  which  produces  a
                ; message like "INTERRUPT AT 17372" and returns  you
                ; to the mini-exec.  If  you type a  ^P in MDDT  you
                ; will get a  message, "ABORT", and  be returned  to
                ; the mini-exec.  If you once go into the  mini-exec
                ; the CONTROL-P interrupt is enabled and typing this
                ; character will return you to the mini-exec.   This
                ; is a  good thing  to use  when debugging  programs
                ; that do  CONTROL-C trapping.   From the  mini-exec
                ; you may give either:
                ; or
                ; The S is filled  out as START and  the E as  EXEC.
                ; both of  these commands  will  return you  to  the
                ; EXEC. See the document EXEC-DEBUGGING.MEM for more
                ; about ^P and getting  out of the  EXEC to MX>  and
                ; returning from MX> to either your copy of the EXEC
                ; or the system EXEC.

                ; You may also give the command:


                ; From MDDT to return  directly to the EXEC.   While
                ; in MDDT you may examine  any core location in  the
                ; running monitor.  You may also change any location
                ; in   the  resident  monitor  (done  frequently  by
                ; accident).  If  you  wish  to change  any  of  the
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 82

                ; locations in the swappable  monitor you must  give
                ; the command:

                ; To write enable the monitor.  After you have  made
                ; your changes you must give the command:


                ; to write protect the monitor again.

     MDDT may also be entered from process level via JSYS:

        JSYS 777$X
        MDDT%$X ; will enter MDDT from the context of the current process

If you wish to examine the system from the EXECs inferior fork monitor


        JSYS 777$X

To return to user context:


Use SETMPG to map pages to this context:

        page 677 has been traditionally used for this;
        but any unused page may be used.  To make sure that the page
        is currently unused type:

        ADDRESS/   ?    ; the question mark from DDT indicates that the
                        ; page is nonexistent.

        when the destination page has been found, set up AC2 as:

        AC2/ ACCESS,,677000

        If the page has its own SPT slot:

        AC1/SPT INDEX

If  the  source page does not have its own SPT slot, it will belong to
either a file or process page table.  It will  be  represented  as  an
index into this page table:

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 83

        Access = read or/and or write access
        Read/Write access = 140000 in LH

Therefore, to map a page, call with either:




        AND SAY:


The page will then be mapped to  page  677.   In  examining  locations
677000-677777, you will be looking at the contents of the page.

If you desire to map another page into this slot, merely  call  SETMPG
again  with arguments for the new page.  You need not first un-map the
old page.   However,  when  you  are  finished,  page  677  should  be
un-mapped in the following manner:



Calling SETMPG incorrectly can crash the system.  Be CAREFUL!  Do  not
use  SETMPG  on  a  time  sharing  system  if  a  crash will cause bad
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 84



               Not to be confused with  ^EEDDT  command
               to  get  into UDDT used with the command
               processor.   See  separate  document  on
               EXEC DEBUGGING for that.

To  get  into  EDDT  you  must  bring  the   system   up   using   the
switch-register.    See   the   DECSYSTEM-20  Operators  Guide  for  a
discussion of switches.  Go through the KLINIT dialog and when you get
the prompt BOOT>, respond with:


The "/L" command causes the monitor to be  loaded,  but  not  started.
The  "/G141"  starts  the  monitor at location 141, which is a jump to
EDDT.  You can use EDDT like UDDT under timesharing on the  MONITR.EXE
file by giving the following commands:

        $START 140

EDDT is linked into the monitor and is always there.  You may also get
to EDDT from MDDT by issuing the following:


from MDDT.  This stops timesharing.  To resume timesharing and /or get
back to MDDT give the command:

        MDDT$G                  ; back to MDDT
        MRETN$G                 ; back to normal timesharing

Breakpoints may be inserted in the resident monitor with EDDT, but not
in  the swappable monitor in general, because its pages may be swapped
out and be unavailable to EDDT.  You can bring them in by typing:

        SKIP LOC$X              ; where LOC is some address not in core
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 85

There are some locations in the monitor  that  are  very  useful  when
using  EDDT  for debugging.  They must be set before going on to start
the monitor.

They are:

        EDDTF   1        keep EDDT in core when system comes up
                0        delete DDT when system comes up (default)

        DBUGSW  0        do not stop on BUGHLTs, crash and reload
                1        stop on BUGHLTs (hit EDDT breakpoint)
                2        write enable swappable monitor,
                         do not start up SYSJOB, and stop on
                         BUGHLTs.  Also it dosn't run CHECKD
                         automatically on startup.

        DCHKSW  0        do not stop on BUGCHKs (default)
                1        stop on BUGCHKs (hit EDDT breakpoint)

        DINFSW  0        do not stop on BUGINFs (default)
                1        stop on BUGINFs (hit EDDT breakpoint)

In  addition  the  symbol  GOTSWM  appears  in the code just after the
swappable monitor is loaded.  So, if you want to debug  the  swappable
part  of  the  monitor  you  must  put  a breakpoint at GOTSWM (to get
swappable part in core) by,


Then start the MONITOR by,



CALL  SWPMLK  is used to lock swappable monitor in core for debugging.
You must have more than 96k of core to give  this  command  since  the
resident  and  swappable monitor are larger than 96k.  To start up the
monitor after you have gone into EDDT  and  set  up  your  breakpoints
(remember  the  last  two  are  used  for  BUGHLT and BUGCHK) give the


If  you  are in EDDT and DBUGSW is not 2, that is the monitor is write
protected, you can use the routines SWPMWE and SWPMWP to write  enable
and write protect the monitor.  CALL SWPMWE$X in DDT.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 86


FILDDT is distributed on the customer software tape.

The following is an chewed-up FILDDT.HLP file.


Loads a file for DDT to examine.  If you are looking at a monitor dump
you must load DUMP.CPY explicitly.  FILDDT looks  for  MUMBLE.EXE  not
MUMBLE.CPY  that is DUMP<ESC> will tell you that there is no such file
or will load DUMP.EXE.  When looking at a dump and you  wish  to  load
the  symbols you must first issue the load command followed by the get
command.  Be sure that the file from which you get the symbols is  the
same  version  as  the  dump.  Be sure, also that the monitor that was
dumped is the same monitor you use for symbols.   That  is  don't  get
MONMED symbols to use with MONBCH etc.


Reads specified file and builds internal symbol table.  This  must  be
the  first command to FILDDT before "GET" when looking at a dump.  You
will most probably use <SYSTEM>MONITR.EXE which would  have  been  the
monitor running at the time of the dump.


Returns to command level.  You then may type a save command if a  load
command  was  just done to preload symbols.  You will get a version of
FILDDT that has the symbols you just loaded in it  so  you  no  longer
need to "LOAD" symbols.  You now have a monitor specific FILDDT, which
was common practice  for  TOPS-10,  but  is  not  generally  done  for


Types something like this text.


Allows writing on an existing file specified by a GET.


Assumes file is raw binary (i.e.  no ACs, and not an EXE file).

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 87

        EP$U    Sets monitor context for FILDDT mapping.  EP is a symbol
                which is equal to the page number of the EPT.  (Rel 4)

   <CTRL/E>     Returns to FILDDT command level.


The resident monitor may be looked at without  any  difficulties,  but
the  swappable monitor may not be in core at the time of the dump.  If
the value of the symbol is in the swappable monitor you must sometimes
go  through the monitor map to find where the location really is.  The
location MONCOR contains the number of pages of resident  monitor  and
the location SWPCP0 contains the first page of real core for swapping.
So if the value of the symbol is greater than contents of MONCOR times
1000 then it is in swappable monitor.

If the page of the swappable monitor you want to look at is in core it
will  probably  not be in core in the location that it's address refer
to since the dump is of core and relocation of pages does not  happen.
To  find  where  a symbol really is in the dump, first type the symbol
followed by an "=".  DDT will respond with the value of  this  symbol.
The  value  of  the symbol can be divided into two, three octal digit,
fields.  The high order three digits are the page number and  the  low
order three digits are the offset into the page.

If the value of the symbol is 324621 the high order three digits, 324,
are  the  page  number  and  the  low order three digits, 621, are the
offset into the page.  To find the location of the page in question in
the  dump you must look at the monitor map indexed by the page number.
For example:


would  give you the monitor map word for page 324.  This word contains
some protection bits for the page and the address of the page when the
dump was taken.

The page may have been in core, on the swapping area or on the disk at
the time of the dump.

        If bits 14-17 in the monitor map word are  non-zero  the  page
was on the swapping area or disk and is no longer available.

If bits 14-17 are zero then the page was in core, and the  right  half
of  the  word contains the page number in the dump of the page you are
looking for (the dump program overwrites the  last  several  pages  of
memory, the dump therefore does not contain these last pages.)

If the page was in core the new address of the symbol you are  looking
for  can  be  found by using the page number from the monitor map word
and appending the offset into the page to it.  For example if MMAP+324
contains  104000,,256;   then  the  new address of our symbol would be
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 88

All address in the swappable monitor must be resolved in this  manner.
In  addition address of 600000 and above are in the JSB or PSB (PSB is
page 777) and must be resolved by finding the page containing the  JSB
or  PSB  of the process that was running when the dump occured.  There
are some locations and tables in the monitor that make this easy:


        FORKX   none    Number of the fork that was running at the time of
                        the dump, -1 if in the scheduler.
        JOBNO   In PSB  Job number to which current fork belongs.
        FKJOB   Fork #  Job number,,SPT index of JSB
        JOBDIR  Job #   logged in directory number
        JOBPT   Job #   controlling TTY number,,top fork number
        FKSTAT  Fork #  test data,,address of fork wait routine
        FKPGS   Fork #  SPT index of page table,,SPT index of PSB

SPT  indexes  are  indexes into a share pointer table starting at SPT.
To find the PSB of fork 20, you  first  look  at  FKPGS+20.   If  this
location  contains 425,,426, the word at SPT+426 is the pointer to the
PSB.  This pointer can point to disk, swap area,  or  a  page  in  the
dump.   If  bits  14-17 are zero it is a pointer to a page in the dump
and the right half of the SPT word is the page number of  the  PSB  in
the dump.

When you look at a dump, you should first try to  find  why  the  dump
occured by looking at the location BUGHLT.  If BUGHLT is zero then you
should check the CTY log to find out why the dump was  taken  and  for
information  like the PC at the time of the dump and the status of the
PI system.  If BUGHLT is non-zero it  is  the  address  of  where  the
BUGHLT was issued.  You should look up the BUGHLT in BUGSTRINGS.TXT or
BUGS.MAC to find additional information about the BUGHLT.  If at  this
point  you are not sure as to why the BUGHLT occured, you will have to
look at the listings for more information.  A copy  of  BUGSTRINGS.TXT
is  in  Appendix A of the Operators manual.  You can find the location
of the call to the BUGHLT by typing the BUGHLT tag to DDT followed  by
a  "?".   DDT  will tell which monitor module the BUGHLT is in and you
can  go  to  your  microfiche  and  read  all  about  the   conditions
precipitating the BUGHLT.

Next if necessary look at FORKX.  If it contains a  -1  the  scheduler
was  running;  otherwise it is the number of the fork that was running
when the crash occurred.  The registers  are  saved  at  BUGACS  on  a
BUGHLT,  but  if  BUGACS+17  contains  something,,BUGPDL+n,  then  the
registers are invalid and you must go to the SYSERR buffer to get  the
good  registers.   This  is  done  by  adding to the right half of the
SYSERR buffer pointer, SEBQOU, the offset  into  the  buffer  for  the
heading  and  ACs,  SEBDAT+BG%ACS.  This value points to a 16 block of
words containing the users ACs.  You may have to chain down more  than
one queued-up SYSERR entry to get to the BUGHLT block.

               Do not forget to get a print out of  the
               SYSERR  log  which will give you and the
               field service representative much of the
               information you can get out of the dump.
               The SYSERR  output  is  much  easier  to
               examine, however, clearly you cannot get
               as much info as you can from a dump.

Some other locations in the PSB of interest are:


        UAC             User's ACs when he did his last JSYS.
        PAC             monitors ACs
        PPC             processors PC
        UPDL            users pushdown stack while in a JSYS
        NSKED           0 = ok to run scheduler
                        >0 = cannot run scheduler
        INTDF           -1 = ok to receive software interrupts
                        >= 0 , cannot receive software interrupts

It may be useful to know the status of a fork when it is hung  or  you
are unsure of its status.  This can be determined by looking at FKSTAT
indexed by the fork number.  The right half of this  location  is  the
address of a test routine and the left half is data to be tested.  For
example if FKSTAT+12 contains 23,,FKWAT, then fork 12 is  waiting  for
fork  23  to complete.  FKWAT is a routine that waits for another fork
to complete and its data (the left half of the word) is the number  of
the  fork  it  is waiting for.  There are many different wait routines
and you will have to look at the code to see what individual ones  are
waiting  for.   There  is a memo on scheduler tests which details most
all of the scheduler tests in the monitor.

You can easily determine all of the forks associated  with  a  job  by
giving the commands:


Where N is the job you are looking for.  A fork structure can  usually
be  determined  by looking at the FKSTAT of the forks and seeing which
forks are waiting on which forks.  A FKSTAT of FKSKP indicates a  fork
is inactive.

You should refer to STG.MAC for other fork and job  tables  and  other
locations  in the PSB and JSB of interest.  All of the above locations
can be examined with MDDT or EDDT while the monitor  is  running.   Of
course  at  these times you do not have to go through MMAP and the PSB
and JSB that are in core are your own.

There are two separate patch areas in the monitor (FFF and SWPF).  FFF
is the resident patch area and SWPF is the swapable patch area.  These
two symbols should be updated to point to the next  free  location  in
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 90

the  patch  area  when  a  patch is inserted.  PAT..  is defined to be
equal to SWPF.  By convention, all distributed patches are applied  at
FFF.   This  serves the purposes of reducing confusion, always working
until the patch area is exhausted, and leaving patches always  present
in a dump for the cases where that is important.

There are several general purpose routines that can be used to look at
the  the  monitor  while it is running.  These routines should be used
with caution since it is certainly possible to crash  the  monitor  by
using  them incorrectly.  Two of the more general routines are MAPDIR,
for mapping a directory  into  core,  and  SETMPG  for  mapping  pages
(someone  elses  PSB  or JSB) into core.  You will have to look at the
listing for the exact use of these and other general routines.  Beware
of the precautions that should be taken when using them.  You can find
the module they are located in by looking in the GLOB listing which is
a  cross  reference  listing of all the global symbols in the monitor.
You get a GLOB listing in your microfiche.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 91

------  ------  ------

The monitor contains a  considerable  number  of  internal  redundancy
checks  which  generally  serve  to  prevent  unexpected  hardware  or
software failures from cascading into severely destructive  reactions.
Also,   by  detecting  failures  early,  they  tend  to  expedite  the
correction of errors.

There are two failure routines,  BUGCHK  and  BUGHLT  for  lesser  and
greater  severity of failures.  Calls to them with JSR are included in
code by use of a macro which records the locations and a  text  string
describing the failure.  The general form is:


Where type is HLT or CHK, and string describes the cause.

For example,


The strings are constructed during loading and are dumped into a file.
The BUGSTRINGS.TXT file will produce an ordered  listing  of  the  bug
messages for operator or programmer use.

BUGCHK is used where the inconsistency detected is probably not  fatal
to  the  system  or  to  the  job  being run, or which can probably be
corrected automatically.

Typical is the sequence in MRETN in the SCHED module.


This BUGCHK is included strictly as a debugging aid.  Detection  of  a
failure  takes  no  corrective action.  This situation usually results
from executing one or more excessive OKINT operations (not balanced by
a  preceding  NOINT).   It  is  considered  a  problem because a NOINT
executed when INTDF has  been  overly  decremented  will  not  inhibit
interrupts and will not protect code changing sensitive data.

BUGHLT is used where  the  failure  detected  is  likely  to  preclude
further  proper  operation  of  the  system  or  file storage might be
jeopardized  by  attempted  further  operation.   For   example,   the
following appears in the SCHED module:


This check accomplishes two things:
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 92

        1.      A function of JOB0 is to periodically update the disk
                version of bittables, file directories and other
                files.  Absence of this function would make the system
                vulnerable to considerable loss of information on a
                crash which loses core and swapping storage.  JOB 0
                protects itself against various types of malfunction,
                this BUGHLT detects any failure resulting in a hangup.

        2.      Detects if the entire system has become hung due to
                failure of the swapping device or some such event, on
                the basis that if JOB 0 isn't running, nobody's


               For Release  4,  the  program  form  the
               BUGxxx  calls  takes  has been modified,
               and  the  new  file  BUGS.MAC   contains
               hopefully  useful information on each of
               the BUGxxx calls  in  one  place.   This
               should    be   considered   a   required
               debugging file.


A monitor cell, DBUGSW, controls the behavior  of  BUGHLT  and  BUGCHK
when  they  are called.  DBUGSW is set according to whether the system
is attended by system programmers.

If C(DBUGSW)=0, the system is not attended by system  programmers,  so
all  automatic  crash  handling  is  invoked.   BUGCHK  will return +1
immediately, appearing effectively as NOP.   BUGHLT  will,  if  called
from the scheduler or at PI level, invoke a total reload from the disk
and a restart of the system.  The BUGCHK/INF output will appear on the
CTY and in the SYSERR log when JOB0 gets around to them.

If the system continues to run or is restarted properly, the  location
of  the  bug (saved over a reload) and its message will be reported on
the CTY.

If C(DBUGSW).NEQ.0, the system  is  attended,  and  one  of  the  EDDT
breakpoints  will  be hit.  This allows the programmer to look for the
bug and/or possibly correct the difficulty and proceed.  There are two
defined non-zero settings of DBUGSW, 1 and 2, which have the following

        C(DBUGSW) = 1 
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 93

                Operation is the same as with 0 except for breakpoint
                action.  In particular the swappable monitor is write
                protected and SYSJOB is started at startup as
        C(DBUGSW) = 2

                Is used for actual system debugging. the swappable
                monitor is not write protected so it may conveniently
                be patched or breakpointed, and the SYSJOB operation
                is not started to save time.

                BUGCHK and BUGHLT procedures are the same as for 1.

The following is a summary of DBUGSW settings:

                        0               1               2
MEANING                 Unattended      Attended        Debugging

BUGCHK action           NOP             Hit Breakpoint  Hit Breakpoint
BUGHLT action           Crash System    Hit Breakpoint  Hit Breakpoint
SWPMON write protect?   Yes             Yes             No
CHECKD on startup       Yes             Yes             No

Other console functions:

In addition to  EDDT,  several  other  entry  points  are  defined  as
absolute   addresses.    The  machine  may  be  started  at  these  as

        140     JRST EDDT               ; go to EDDT
        141     JRST SYSDDT             ; reset and go to EDDT
        142     JRST EDDT               ; copy of EDDT address
        143     JRST SYSLOD             ; initialize file system
        144       0
        145     JRST SYSRST             ; restart
        146     JRST SYSGOX             ; reload and start
        147     JRST SYSGO1             ; start

The soft restart (address 145, EVRST) restarts all  I/O  devices,  but
leaves  the  system  tables intact.  If it is successful, all jobs and
all (or all but 1) process  will  continue  in  their  previous  state
without  interruption.   This  may  be  used  if  an  I/O  device  has
malfunctioned  and  not  recovered  properly.    The   total   restart
initializes core, swapping storage and all monitor tables.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 94

A very limited set of control functions  for  debugging  purposes  has
been  built into the scheduler.  To invoke a function, the appropriate
bit or bits are set into location 20 via MDDT.  The  word  is  scanned
from  left  to  right  (JFFO).   The first 1 bit found will select the

BIT 0:
        Causes scheduler to dismiss current process if any and stall
        (execute a JRST .), with -1 in AC0. Useful to effect a clean
        manual transfer to EDDT. System may be resumed at SCHED0.

BIT 1:
        Causes the job specified by data switch bits 18-35 to be run
        exclusively. Temporarily defeats JOB 0 not run BUGHLT.

BIT 2:
        Forces running of JOB 0 backup function before halting the
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 95


     The purpose of this article is to provide some basic guidelines for
     those   who   have  never  analyzed  a  TOPS-20  crash  dump.   The
     information contained in this article refers exclusively to Version
     4  of  the  TOPS-20  Monitor, although most of the basic principles
     will also apply to earlier versions of the Monitor.   None  of  the
     concepts   included  in  this  article  can  be  considered  highly
     advanced, indeed it is doubtful that  there  exists  an  "advanced"
     methodology in crash dump analysis.  Such techniques are the result
     of nothing more than the continual exercise of  the  basic  skills.
     In  all  cases,  the  person who is to perform the analysis must be
     familiar  with  the  internal  structures  of  the  Monitor,  which
     requires  their  attendance  at  one of the TOPS-20 Monitor courses
     offered by Educational Services.  Obviously, one must know where to
     look  for  a potential problem before hoping to solve it.  For this
     reason, this article  assumes  that  the  reader  has  an  in-depth
     knowledge  of  the  basic  structures  of the TOPS-20 Monitor.  Any
     comments or sugestions to improve  the  content  of  this  material
     would be most welcome.


     Obviously enough, dumps do not simply  appear  as  a  result  of  a
     crash.   There are certain prerequisites to obtaining a dump, which
     will be discussed in this section.

     2.1  Creating The Dump File

     TOPS-20 will not, as a rule, create a dump of  the  Monitor  unless
     the  system  is  properly prepared to do so.  This means that there
     must first  exist  a  file  called  PS:<SYSTEM>DUMP.EXE  that  will
     accomodate  the  dump.   This file can be found on the distribution
     tape for TOPS-20, or it can be created by using the MAKDMP program,
     which  will  accept  the  memory size from the user, and create the
     proper sized file.  The file must contain a  sufficient  number  of
     pages  equal to the total number of pages of physical memory in the
     Decsystem-20.  For example,  a  system  that  has  1024K  words  of
     memory,  let's say a 2060, should have a DUMP.EXE file that is 2048
     pages long.  It is important to remember that the umber of pages in
     the  dump  file  must  be  twice  the  size of the machine's memory
     capacity in K words.  In addition, unless this file already  exists
     before the crash that we wish to capture, we will be unable to save
     the image of the  system,  because  the  BOOT  program  hasn't  the
     ability to create such a file on it's own.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 96

     2.2  The BOOT

     Normally, when the system has crashed for whatever reason, it  will
     reload itself using the BOOT program.  This Auto-reload feature can
     be suppressed, by giving the "SET NOT  RELOAD"  or  "CLEAR  RELOAD"
     command  to the PARSER.  The PARSER must first be set in PROGRAMMER
     mode, via the "SET CONSOLE PROGRAMMER command.  These  commands  do
     not  apply  to  2020's, of course.  There is a location in the 8080
     which, when it contains the right number,  will  prevent  automatic
     reloads  after crashes.  The location depends on the revision level
     of the ROM, which  is  typed  at  system  startup.   The  following
     commands will turn off auto-reload:

        ROM level 0.1
                KS10>LK 20255
                KS10>DK 303
        ROM level 4.2
                KS10>LK 20256
                KS10>DK 303
     Also, patching the BUGHLT code where the reload is  requested  will
     prevent  an  auto-reload.   Placing a JFCL in locations BUGH2+3 and
     BUGH2+4 in the  running  monitor  will  prevent  the  monitor  from
     issuing its request.

     BOOT has a limited file system capability when creating the file to
     contain the dump, and in this manner avoids complicating a possibly
     compromised file structure during  the  reload.   It  is  for  this
     reason  that  the  DUMP.EXE  file  must already exist on the public
     structure, for BOOT can find it there, but it can not create it  if
     it  does  not  already  exist.   Also, because BOOT resides in main
     memory of the host (KL10 or KS10) processor, small portions of  the
     Monitor  will  be  overwritten  when  BOOT  is  loaded into memory.
     Currently, BOOT is written into that area of the  resident  Monitor
     that  normally  contains  pure  code, and as such is not usually of
     much consequence.  When one needs to refer to this portion  of  the
     code, either the listings or fiche should be used.

     If for some reason the system fails  to  auto-reload,  then  it  is
     still possible to obtain a copy of the dump.  To do this, the front
     end must have at least loaded the BOOT  program,  and  the  console
     will display the BOOT prompt:
     BOOT  has  a  number of commands that may be used to manipulate the
     contents of the processor memory;  in this  case,  the  command  we
     will  use  will  cause  BOOT  to  copy  the contents of memory into
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 97

     At this point the system  may  be  brought  up  normally,  and  the
     analysis of the dump may begin.

     Similarly, a KL-10 system may be set to  suppress  the  auto-reload
     facility,  and  the  CTY  will prompt with the KLI> prompt.  Simply
     typing the word "BOOT" will load  the  BOOT  program  into  memory.
     There  are cases where the system may be completely hung, and it is
     unclear how to best initiate an orderly shutdown.  Obviously, it is
     always possible to type the control-backslash (^\) character at the
     CTY to get into the front-end parser, but then what  can  be  done?
     The  front-end parser allows the operator to force the processor to
     jump to a specified location, and in the case described above, this
     feature  may  be  used  to  force a BUGHLT.  This can be done after
     typing ^\, with the following commands:
                PAR>JUMP 71
     causing the console to return to USER mode, connected to the KL-10.
     This  will  be  followed immediately by a KPALVH BUGHLT (Keep Alive
     Halt), and the system will perform  the  usual  BUGHLT  procedures.
     The  above  command  forces  the  processor to jump to location 71,
     which in turn will cause the BUGHLT, sweeping the cache  to  ensure
     all  of the dump taken will contain valid data.  Simply forcing the
     processor to halt, and then reBOOTing and getting a dump will cause
     the  cache to be invalidated, and random locations in the dump will
     not contain valid data.

     On the 2020 the equivalent command is "KS10>ST 71".

     2.3  Getting A Front-end Dump

     The front-end will  generally  create  a  crash  dump  file  called
     PS:<SYSTEM>0DUMP11.BIN,  containing  the  core image of the PDP-11.
     If the front-end is hung, and none of the terminals are usuable, it
     is  still  possible  to  obtain  a dump of the -11.  By setting the
     HALT/ENABLE switch of the -11 to the HALT position, and  then  back
     to the ENABLE position, the KL-10 will force the -11 to reload.  In
     the process of reloading the -11, the KL will indicate to  the  -11
     that  it has reloaded, and send the necessary information to set up
     the terminals, and unit record devices connected to the  -11.   The
     -11 will, in the process of reloading, dump the old core image into
     the 0DUMP11.BIN file mentioned earlier.   In  the  event  that  the
     problem  will  be  the  subject of an SPR, the front-end crash dump
     should also be included on the DUMPER tape with the SPR.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 98


     It would not be practical to define a method  of  approaching  each
     BUGHLT  in  the  system, but the state of the system at the time of
     the crash may be defined in terms of the data  structures  that  it
     accesses.   By  looking  at  the Monitor's stack, the status of the
     current job, and process, and the condition of the Monitor's tables
     that were in use by the code that BUGHLTed, we can define a limited
     number of "types" of crashes, e.g.,  a  scheduler  crash,  a  pager
     crash,  an  APR  or  device interrupt crash.  Each crash will occur
     while the Monitor is using a specific subset of the  internal  data
     structures  of  the system.  We will attempt to limit the number of
     "types" of crashes based upon the function being performed  by  the
     Monitor  at  the time of the crash.  In the sections following this
     general information, we will suggest some of  the  areas  to  check
     when  looking  at  each  type  of  crash.   This information is not
     complete, but  contains  some  of  the  information  that  is  more
     significant in each particular context.

     3.1  The Basic Materials

     The most important materials in looking at  dumps  are  the  source
     listings  of  the  Monitor.   Either  in  the  form of fiche, or in
     machine-readable format, it is absolutely essential to have  access
     to  listings of the Monitor to be able to analyze any dump, because
     without these listings you would simply be working in darkness.  In
     order   to   understand   the   significance  of  any  BUGHLT,  the
     circumstances of the BUGHLT must be known, as well  as  the  reason
     the  Monitor  could not continue.  To find out this information, we
     must look in the listings.  After the system has re-BOOTed,  it  is
     always  a  good  idea to take note of the console output, including
     the name of the BUGHLT, and any other  associated  console  output.
     Try  to  be  sure  that  no unusual messages, other than the BUGHLT
     itself, appeared on the console within a reasonable period of  time
     before  the  system  crashed.   BUGCHK's, BUGINF's, and "Problem on
     device..." type messages are always significant.  Similarly, a copy
     of  the output from the SYSERR program will be helpful in revealing
     any failing hardware that should be investigated first.  Always try
     to eliminate the possibilty of a hardware problem FIRST, especially
     if the site has had any recent problems in this area.   These  last
     two  points  are  significant in determining the environment at the
     time of the crash, and, in the event that the  dump  will  be  made
     part of an SPR, the information will become essential.

     Naturally, it will be necessary to have a copy  of  the  MONITR.EXE
     file that was running when the crash occurred, and a copy of FILDDT
     to look at the  dump.   With  these  materials  collected,  we  can
     hopefully make a valid analysis of the dump.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 99

     Here is a list, then, of the necessary and helpful materials needed
     to look at dumps:

     1.  The MONITR.EXE file

     2.  The DUMP.CPY file from the crash

     3.  A copy of FILDDT.EXE from the distribution tape

     4.  A copy of the SYSERR output

     5.  A complete set of Monitor and Exec Fiche or listings

     6.  The CTY output from the crash

     7.  The Monitor Calls Reference Manual

     8.  A copy of the SWSKIT tape

     9.  Any other TOPS-20 Manuals that may be appropriate, such as  the
         Operator's Guide, or the Installation Guide.

    10.  The TOPS20.BWR file

     3.2  Identifying The Type Of Crash

     The Monitor performs several basic operations, each  of  which  has
     its own set of tables and data structures.  These operations can be
     defined as:

     1.  JSYS processing

     2.  Page faults

     3.  PSI Service

     4.  Scheduling

     5.  DTE interrupt Service

     6.  Initiating I/O transfers (queueing)

     7.  Device interrupt Service

     8.  APR interrupt Service
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 100

     3.2.1  The BUGHLT Itself - 

     There are specific areas in any crash dump that can be examined  to
     determine  the  status and context of the system at the time of the
     crash.  The most obvious of these is the  location  called  BUGHLT,
     which  will  contain the address whence the BUGHLT code was called.
     It is good practice to remember when looking at this  address  that
     there are portions of the monitor that were overwritten by the BOOT
     program, when the dump was taken, and therefore,  the  contents  of
     the  address  that  called  the  BUGHLT code, that is, the location
     whose address is contained in location "BUGHLT", may not  point  to
     the  same  code  that  the  fiche or the listings indicate.  A good
     example of such a BUGHLT is a PTNIC1, one that is  a  part  of  the
     APRSRV code, which is overwritten by BOOT.

     As of Release 4, all of the BUGHLT's, as well as the  BUGCHK's  and
     BUGINF's  in the Monitor are defined and documented in a new module
     called BUGS.MAC.  This module not only contains, for  each  BUGHLT,
     etc., the name and a string describing the type of halt, but also a
     description of the circumstances that cause  the  halt,  or  check,
     etc.,  to occur.  There is a new argument to the macro that creates
     the BUGHLT's, etc.,  that  is  supposed  to  indicate  whether  the
     problem  is hardware or software related.  You will find either the
     word "HARD" or "SOFT" in this  location  of  the  Macro  call.   In
     addition,  the  additional  information  supplied  in  BUGCHK's and
     BUGINF's now has a string associated with it  that  indicates  what
     the  additional  information  actually  represents.   Finally,  one
     argument to the  BUGDEF  (bug  definition)  Macro  is  a  narrative
     documentation  of  circumstances  that  can cause the problem being
     seen.  Needless to say, this sort of information is  invaluable  to
     anyone  looking  at  a  crash  dump.  Unfortunately, not all of the
     documentation of the BUG's was completed, and as a result, many are
     indicated  as  being  "HARD"  problems, when actually they are not.
     Those BUGDEF's that include the narrative description  of  the  BUG
     have  been  completed,  but  those that do not may indicate falsely
     that the problem is hardware related.

     The BUGHLT's are performed  by  using  the  XCT  instruction  of  a
     location  that  contains a JSR BUGHLT instruction.  In the location
     following the JSR BUGHLT, is the name  of  the  BUGHLT,  in  SIXBIT
     format,  such  as  "PTNIC1".   Finally  in  the  event  of multiple
     BUGCHK's, BUGINF's or even nested BUGHLT's, the  location  "BUGNUM"
     contains  the  number of BUGHLT's, BUGCHK's, and BUGINF's since the
     last system start-up.  This location is most helpful in obtaining a
     clearer  view  of  the circumstances of the crash.  The case of the
     BUGHLT code itself causing a BUGHLT is extremely  unusual,  but  in
     certain  cases of extreme degradation of the system's data bases or
     "pure" code pages, this is a possibility.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 101

     3.2.2  The Monitor's Stacks - 

     The next piece of valuable information is contained  in  the  stack
     pointer,  P.   This  location  will  point to one of eight possible
     monitor stacks, and will give a strong indication about the context
     of  the  monitor at the time of the crash.  Identifying the type of
     BUGHLT will usually be a direct indication of which stack  will  be
     in  use, however under certain circumstances, the monitor may crash
     while changing from one stack to another, and such  a  circumstance
     could  provide  a  useful insight into the state of the system just
     before the crash.   The  following  are  the  names  of  the  eight
     possible  monitor  stacks, and the context under which each of them
     is used:

     UPDL      This  is  the  user  stack,  in  that  it  is  used  when
               processing  a  a  user's JSYS in exec mode.  Whenever any
               user executes a JSYS, this area in his PSB  is  used  for
               the stack.  Those processes under job 0 which run in exec
               mode will also use this stack.

     TRAPSK    This stack is used by the paging code whenever a  process
               page  faults.   Normally a page fault will occur while in
               the midst of performing some other function,  such  as  a
               JSYS, and the stack pointer at the time of the page fault
               will be in location TRAPAP, which in turn  will  in  this
               case point to UPDL plus some offset.

     PIPDB     This is used by the software interrupt handler.

     SKDPDL    This stack is used by the scheduler.

     DTESTK    This stack is used by the DTE interrupt service routines.

     PHYPDL    This stack is used by  PHYSIO  code  in  the  process  of
               queing I/O request blocks (IORB's).  These IORB's are the
               means by which RH20/RH11 data transfers are initiated.

     PHYIPD    This stack  is  used  by  the  PHYSIO  interrupt  service
               routines, and therefore is the interrupt-level equivalent
               of PHYPDL.  It is important to remember  that  these  two
               stacks  are  independent of each other, and should not be

     MEMPP     This stack is used when processing APR interrupts

          The stack that is being used, and the  section  of  code  that
     executed  the  BUGHLT  will  indicate  the  type of BUGHLT that has
     occurred, file  system  BUGHLT's  will  be  observed  either  while
     performing  a JSYS, servicing an interrupt, or otherwise attempting
     to access a file system that has corrupted to the  point  of  being
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 102


          When a process executes a JSYS, the Monitor performs the  JSYS
     by  dispatching through a table called JSTAB to the proper routine.
     These routines are named by convention as the JSYS  name,  preceded
     by  a  ".",  thus  the  routine  to perform the JSYS PMAP is called
     ".PMAP::".  This name is always a global  symbol.   The  last  JSYS
     executed  in  user  context is saved in the PSB for the process, in
     location KIMUU1, and KIMUU1+1.  The second of these locations  will
     contain  the  dispatch offset in JSTAB;  this number, when combined
     with the JSYS opcode (104000,,0), is the last JSYS performed by the
     user.  This, then, will point indirectly through the JSTAB table to
     the place where the user JSYS began processing.  By  following  the
     code,  and examining the stack, it is often possible to reconstruct
     the events leading to the crash.  The stack will contain two copies
     of  the  user's  program  counter  (PC) and flags in the first four
     locations of UPDL.  The PSB location MPP  will  contain  the  stack
     pointer  at  the  time  of  last  JSYS,  and  each time the Monitor
     performs a JSYS internally, this data is pushed onto the stack, and
     set to the current value of P.

     Initial JSYS stack set-up:
        UPDL/   PC
        UPDL+1/ flags
        UPDL+2/ PC
        UPDL+3/ flags

     JSYS in Monitor context (nested JSYS):
        UPDL+n/ INTDF           ;old interrupts-deferred flag
              / MPP             ;previous PC, or level of nesting
              / PC of JSYS
              / PC flags

          Some other useful locations in JSYS context are:

                             JSB Locations

     USRNAM    This contains the name of the user, in ASCII.

                             PSB Locations

     JOBNO     Contains the number of the job for this process.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 103

     FORKN     Contains the fork number for the top fork of the  job  in
               the  left  half  of  the word, and the fork number of the
               current fork in the right.

     INTDF     Contains -1 if process is OKINT, 0 or  greater  if  NOINT
               (defer all software interrupts for this job)

     NSKED     Contains 0 if process is OKSKED, 1 or greater if  NOSKED.
               (defer scheduling of other forks)

        Monitor Fork Table - indexed by the current fork number

     FKCNO     Contains the SPT offset that points to the second page of
               the PSB in the left half of this word.

     FKINT     Contains the  pseudo-interrupt  communications  register,
               with  flags  in  the  left  half  describing  the type of
               request, and the channel number of  the  request  in  the
               right half.

     FKINTB    Contains the pseudo-interrupt  channel  requests  pending
               since the fork's last PSI interrupt.

     FKJOB     Job number of the fork in the left half,  and  SPT  index
               for the JSB in the right half.

     FKJTQ     Part of a doubly linked list of forks  that  are  waiting
               program  software interrupt the Monitor.  JTLST points to
               the top fork on the list.

     FKNR      Contains in bits 0-8 the age stamp value at the last time
               local garbage collection was performed.

     FKPGS     Contains the SPT indices for the process page  table,  in
               the left half, and the PSB in the right half.

     FKPGST    Contains the address of the routine to test  for  balance
               set  wait  satisfied in the right half, with test data in
               the left.  If the fork is not in the  balance  set,  this
               contains  the  time  of  day that the fork entered a wait

     FKPT      Part of a linked list of forks on a particular  schedular
               list,  such  as GOLST, WTLST, etc.  The right half of the
               word contains the address of  the  next  element  in  the
               list,  and  the  left half contains the amount of runtime
               the fork's  job  will  have  accumulated  when  the  fork
               exceeds its Balance Set Hold time.

     FKQ1      Contain the  fork's  remaining  run  quantum.   When  the
               quantum  expires, the fork is moved to a lower run queue,
               and given the appropriate new quantum.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 104

     FKQ2     Contains the fork's schedular queue level  number  in  the
               left  half,  and  the  list  address, i.e.  GOLST, WTLST,
               etc., in the right.

     FKSTAT    Contains the address of the schedular test routine  which
               will determine when the fork is available to be placed on
               the GOLST.

     FKTIME    Contains the time of day, in internal  format,  that  the
               fork was placed on its current run queue.

     FKWSP     Contains the number of physical  pages  assigned  by  the
               fork  in  the right half, and the working set size of the
               fork when the fork entered the balance set in the left.


          Page faults trap through the user's UPT, by  placing  the  old
     flags  and  PC  for  the  process  in  locations  UPTPFL and UPTPFO
     respectively, and taking the new PC from location  UPTPFN.   UPTPFN
     will  usually contain the address PGRTRP, which is the beginning of
     the page fault code.  The location being referenced  and  therefore
     causing  the  page  fault  is stored in UPTPFW, also called TRAPS0.
     This contains the virtual address that page faulted in bits  13-35.
     Bit  0  of  this  word indicates if the location is in user or exec
     (monitor) address space.  If this bit is set,  the  address  is  in
     user  address space.  The PGRTRP code copies TRAPS0 into TRAPSW, in
     case of recursion.  This code will determine the nature of the page
     fault,  and  attempt  to  resolve  it.   UPTPFL and UPTPFO are also
     called TRAPFL and TRAPPC respectively.  The old  stack  pointer  is
     saved  in  location TRAPAP (this is only relevant if the page fault
     occurred in exec mode).  The new stack, TRAPSK, is set up according
     to  the  context  of  the  page  fault, i.e., user context, monitor
     context, or recursive page fault.  A page fault in user mode causes
     the  stack  to be set up with the runtime, return PC, and return PC
     flags in the first three locations of the stack:
                TRAPSK/         runtime
                TRAPSK+1/       return PC
                TRAPSK+2/       return PC flags

          Page faults from monitor context have  the  following  initial
     stack set-up:
                TRAPSK/         AC1
                TRAPSK+1/       AC2
                TRAPSK+2/       AC3
                TRAPSK+3/       AC4
                TRAPSK+4/       AC7
                TRAPSK+5/       AC16
                TRAPSK+6/       TRAPSW
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 105

                TRAPSK+7/       runtime
                TRAPSK+10/      PC
                TRAPSK+11/      PC flags
     Recursive page faults will cause the following set up in TRAPSK, at
     the time of the page fault:
                / AC1
                / AC2
                / AC3
                / AC4
                / AC7
                / AC16
                / TRAPSW
                / PC
                / PC flags
     Recursive  page  faults  will  indicate  the  level of recursion in
     TRAPC.  This location is normally set  to  -1  and  is  incremented
     every  time  the  page fault code is called, and decremented when a
     page fault has been satisfied.

          In examining a pager crash, it is usually a good idea to begin
     by  tracing  down the Monitor's table entries for the location that
     faulted.  This location is stored in location TRAPS0.  The identity
     of  the page causing the trap is stored in location TRPID, and will
     be in either of two forms:  page table number  in  left,  and  page
     number in right, or simply the page table number in the right.  The
     page table number is an SPT index, and the page number, if any,  is
     an  offset  into the page table pointed to by that SPT slot.  There
     are four Core  Status  Tables  (CST's)  indexed  by  physical  page
     number, that are used to keep track of each page in the machine.  A
     page fault crash will usually have bad data in either the SPT  slot
     indicated  in  TRPID,  or  one  of  the CST's for the physical page
     pointed to indirectly through that SPT  slot.   If  TRPID  contains
     PTN,,PN,  then  find location SPT+PTN.  This should have a physical
     page number in the right half.  Look at this physical page,  offset
     by  PN  in  TRPID  to  find the pointer to the page that caused the
     fault.  Shared and indirect pointers in this  location  will  point
     through  another  SPT  location,  but  private  pointers will point
     directly at the physical page that we are looking  for.   If  TRPID
     contains just PTN, then SPT+PTN will point directly at the physical
     page we are looking for.  Knowing the physical page number,  it  is
     now possible to examine the CST tables for that page.

     CST0      Used principally by the  pager  hardware,  this  location
               will  contain  the Process Use Register, mentioned in the
               FKCNO table above, and the age stamp.

     CST1      Contains the system lock count, and  the  backup  address
               for  the  page.   The  lock count indicates the number of
               systen events necessary before the page will  be  swapped
               out,  and  the  backup  address for the page.  The system
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 106

               should never swap out a page with a non-zero lock  count.
               The  backup  address  can be a disk or drum address for a
               page in memory.

     CST2      Contains the home map location of the  page,  and  should
               match the contents of TRPID.

     CST3      Is used by the software  to  create  lists  of  pages  in
               various  states  of  use.   Those pages available for use
               will be on the Replaceable Queue, and linked together  in
               a doubly linked list.  Those pages awaiting swapping will
               be on a swapping device  queue,  and  part  of  a  singly
               linked  list.   Pages in use will contain the fork number
               of the owner in bits 3-14, and the local disk address for
               PHYSIO for the page.

     CST5      Contains the list of short I/O  Request  Blocks  (IORB's)
               associated with the page.

          A few other significant locations for page faults are:

     RPLQ      Points to the beginning of the Replaceable Queue in CST3.

     NRPLQ     Contains the number of pages on the Replaceable Queue.

     SWPLST    Points to the beginning of the PHYSIO swap list, in CST3.

     NOF       Contains the number of OFN's in use in the SPT.


          Take note of the Monitor fork tables in the  JSYS  section  of
     this  document.   The  locations FKINT and FKINTB will be useful in
     determining the type and timing of PSI interrupts  pending  at  the
     time  of the crash.  When a process has a PSI interrupt pending, it
     is flagged in the FKINT entry for that fork, and the scheduler will
     take  note  of  this  event and set the PPC location in the PSB for
     that process to contain the address PIRQ.  This action takes  place
     at  location  SCHED5  in  the  scheduler.   The  next time that the
     process is ready to run, it will continue at location  PIRQ,  which
     will  set  up  the  PSI  stack,  PIPDB.   SCHED5 also moves the PSI
     request word from FKINT to PIMSK in the PSB.  Thus, it is  possible
     to check this location for the last PSI request that was scheduled.
     The old contents of PPC and PFL are stored in PIPC and PIFL by  the
     SCHED5  routine, so these will indicate the point where the process
     was interrupted.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 107


          Take note of the Monitor Fork tables in the  JSYS  section  of
     this  document.   The  scheduler  is  usually invoked in one of two
     ways:  through a software  interrupt  initiated  by  channel  3  PI
     routine, indicating that a set period of time has elapsed since the
     last scheduler cycle, or through the ENTSKD macro, which is used by
     a  running  process  that  is  about  to  dismiss.  In this way the
     scheduler is guaranteed to run at regular  intervals,  or  whenever
     the  system  is  idle.  The primary entry point to the scheduler is
     SCHED0.  It is through this location control  passes  whenever  the
     running  process  dismisses,  or  whenever one of the two scheduler
     clock cycles elapses.  Briefly, the hardware traps on  every  clock
     tick  through  location  TIMVIL in the EPT.  This location contains
     the instruction XPCW TIMINT.  Again, as  in  the  device  interrupt
     code,  this  instruction  causes  the  flags and PC to be placed in
     locations TIMINT, and TIMINT+1, and control passes to the  location
     in  TIMINT+3,  which  in  this  case  is TIMIN0.  TIMIN0 determines
     whether or not it is time to run the scheduler, and  dismisses  the
     interrupt.   If  the  scheduler  is  to  be run, TIMIN0 initiates a
     software interrupt on channel 7, which causes a  trap  through  the
     EPT  location  KIEPT+56  to  PISC7R.   The  instruction executed in
     KIEPT+56 is an XPCW PISC7R, causing the old  PC  and  flags  to  be
     deposited  at  PISC7R,  and control to begin at PISC7+1.  The PISC7
     code sets up PPC and PFL to contain the  old  PC  and  flags,  from
     PISC7R,  and saves the process ac's at the time of the interrupt in
     a block of the  PSB  called  PAC.   Having  set  up  for  scheduler
     context,  the  PISC7  code  then  transfers  control  to the SCHED0
     routine.  Similarly, the ENTSKD macro does an XPCW ENSKR, causing a
     jump  to  the  ENSKED routine that does the context switch.  On the
     2020 the clock will interrupt through location  KIEPT+46  (standard
     level  3  interrupt).   The level 3 routine will first determine if
     this interrupt was caused by a  clock  tick,  and  if  so  JRST  to
     routine TIMIN0.

          Some other useful locations in scheduler context:

     1.  GOLST     Points to the beginning of  the  GOLST  in  the  FKPT

     2.  WTLST     Points to the Wait list in the FKPT table.

     3.  TTILST    Points to the TTY input wait list in the FKPT table.

     4.  FRZLST    Points to the list of frozen forks.

     5.  WT2LST    Points to the list of forks waiting to be  unblocked.

     6.  TRMLST    Points to the list of forks waiting for another  fork
         to terminate.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 108

     7.  SUMNR     Contains the number of reserved  pages.   (locked  in

     8.  BALSHC    Contains the number of pages reserved due  to  shared

     9.  INSKED    Set to non-zero if in the scheduler.


          DTE interrupts also dispatch through  locations  in  the  EPT,
     depending  upon which DTE is interrupting.  For each DTE that could
     exist on a system (4), there is an eight word block in the EPT used
     to  keep  up-to-date  information for that DTE.  Not all of the DTE
     blocks will necessarily be used, however they will all exist in the
     EPT.   These blocks begin at location DTEEBP.  The format of one of
     these blocks is described below.  The DTE  interrupt  executes  the
     third  word  in this block, which contains a XPCW DTEN0.  This will
     cause the old PC and flags to be stored  at  location  DTEN0,  and,
     since  DTEN0+3 contains ".+1", the system will begin processing the
     interrupt at location DTEN0+4.  This part of the routine  will  set
     up  the  DTE  stack, DTESTK, and save the PC, flags, and AC's.  The
     flags and PC are stored at DTETRA,  and  the  AC's  are  stored  at
     DTEACB.  DTEN0 will then use INTDTE to process the interrupt.  This
     code can be found in the DTESRV module of the monitor.

     The DTE control block:
        DTEEBP/ To -11 byte pointer
        DTETBP/ To -10 byte pointer
        DTEINT/ "XPCW DTEN0"            ;dispatch for DTE-0
              / reserved
        DTEEPW/ Examine Protection Word
        DTEERW/ Examine Relocation Word
        DTEDPW/ Deposit Protection Word
        DTEDRW/ Deposit Relocation Word
     Note that the labels above  apply  only  to  DTE-0,  and  that  the
     remaining DTE's must be offset by DTE-number X 8.

          Some other useful locations in the EPT:
        DTEFLG/ Operation Complete Flag
        DTECFK/ Clock Interrupt Flag
        DTECKI/ Clock Interrupt Instruction
        DTET11/ To -11 argument
        DTEF11/ From -11 argument
        DTECMD/ Command Word
        DTESEQ/ DTE20 Operation Sequence Number
        DTEOPR/ Operation In Progress Flag
        DTECHR/ Last Typed Character
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 109

        DTETMD/ Monitor TTY Output Complete Flag
        DTEMTI/ Monitor TTY Input Flag
        DTESWR/ Console Switch Register
     These location are found at offsets 444 through 457 in the EPT.

          9.0  I/O QUEUEING (PHYPDL)

          All disk and tape I/O is initiated through the PHYSIO code, by
     calling  PHYSIO  with  a  pointer to an I/O Request Block (IORB) in
     AC1, and the addresses of the Channel Data  Block  (CDB)  and  Unit
     Data Block (UDB) in AC2 (CDB,,UDB).  PHYSIO validates the arguments
     passed to it, and then determines whether the IORB belongs  on  the
     Position  Wait Queue (PWQ) or the Transfer Wait Queue (TWQ).  These
     two queues are pointed to by offsets UDBPWQ and UDBTWQ in  the  UDB
     for  the  device.   Note that these are offsets into the UDB, which
     will be in resident free space,  as  well  as  the  CDB's.   During
     processing, PHYSIO will keep the following information in the ac's:
        P1/     address of the CDB
        P2/     address of the KDB (for tapes) or 0
        P3/     address of the UDB
        P4/     address of the IORB being processed
     Since PHYSIO is called via the PUSHJ P, instruction,  the  previous
     PC  is not saved.  The P and Q ac's are stored on the stack via the
     SAVEPQ macro.  PHYSIO does use a private  stack,  and  so  the  old
     stack  pointer is saved in PHYSVP.  Also, because PHYSIO does use a
     private stack, it is necessary for the process calling PHYSIO to be
     NOSKED.  Also take note of the fact that IORB's are associated with
     the physical pages of memory that are involved with the I/O through
     pointers  in  the CST5 table for those pages.  See the next section
     for more information in this area.


          Device interrupts, in this context, refer  to  disk  and  tape
     interrupts,  those devices connected through the RH20's.  Each RH20
     channel has a "Channel Logout" area at the beginning of EPT.   This
     logout  area  is  four words in length for each channel, the fourth
     word of which contains an instruction to execute on  an  interrupt.
     This  instruction causes the system to dispatch to code actually in
     the CDB for the channel.

          On  the  2020,  the  interrupts  work  differently.   The  EPT
     contains pointers to SM10 vector tables starting at address SMTEPT.
     The number of the interrupting UBA (1 or 3) is used as an offset to
     SMTEPT  to  find the proper vector table, and then the function and
     device (read done, DZ11, etc...) is used  as  an  offset  into  the
     vector  table  which  contains  the appropriate XPCW instruction to
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 110

     transfer control to the correct routine.

          The previous PC and flags are saved in  the  area  immediately
     preceding  the CDB;  offset CDBINT (value -6) is the location where
     the flags and PC  are  stored.   When  the  interrupt  occurs,  the
     hardware executes the instruction in the channel logout area, which
     is "XPCW loc".  "Loc" is the address of the CDB for  this  channel,
     offset  by  CDBINT  (-6).   The XPCW instruction saves the flags at
     CDBINT(CDB), the PC at the next location, and gets  the  new  flags
     and  PC  from  the next two locations.  This area of the CDB, then,
     contains the following:
        CDBINT(CDB)/    old flags
            -5(CDB)/    old PC
            -4(CDB)/    new flags (0)
            -3(CDB)/    new PC ( ".+1")
            -2(CDB)/    MOVEM P1,CDBSVQ(CDB)    ; saved in CDB offset CDBSVQ
            -1(CDB)/    JSP P1,PHYINT           ; dispatch to interrupt code
        CDBSTS(CDB)/    status and configuration flags
     The  PHYINT  code, then, resolves the interrupt, and returns to the
     old PC by JRSTing through offset CDBJEN in the CDB.  This  part  of
     the CDB contains the following:
        CDBJEN(CDB)/    BLT 17,17
                   /    DATAO RH,CDBRST
                   /    XJEN CDBINT(P1)
     The  last  of  these locations causes the system to resume where it
     was interrupted.  During processing of the interrupt, the following
     information may be found:
        P1/     address of the CDB
        P2/     address of the KDB or 0
        P3/     address of the UDB
        P4/     address of the IORB or argument code:
                (P4) < 0 - schedule a channel cycle
                (P4) = 0 - dismiss interrupt
                (P4) > 0 - complete current request (IORB address)

          When the system is attempting to perform  I/O  to  or  from  a
     specific page of physical memory, that page is locked into core, by
     incrementing the lock count in the CST1 location for that page.  If
     a  device  error  occurs during the transfer of data for that page,
     then the CST5 entry for that page will  have  either  a  short  I/O
     Request  Block  (IORB) or a pointer to a long (Mag Tape) IORB.  The
     short IORB is only one word in length and is used for disk transfer
     requests,  i.e.,  swapping.   In  either case, the first word of an
     IORB, called IRBSTS, contains flags that describe  the  success  or
     failure  of  the  transfer.   It  may  be  helpful  to  check these
     locations in the event of a PHYINT crash.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 111

          The following offsets contain useful  information  for  PHYSIO

     In the UDB:
        UDBPS1/ cylinder number
        UDBPS2/ surface,, sector number
        UDBERC/ error retry count
        UDBERR/ status function for error retry
     In the CDB:
        CDBCNI/ status of channel when interrupt began.


          APR Interrupts, like Device interrupts, are  vectored  through
     the EPT, but in the case of the APR interrupts, the vector location
     is a part of the priority interrupt  scheme.   These  are  priority
     channel 3 interrupts, and dispatch through location KIEPT+45, which
     contains a XPCW PIAPRX.  This is the channel 3  interrupt  routine.
     This routine will attempt to resolve the interrupt, and in doing so
     will set up its own stack, MEMPP.  As in the  case  of  the  device
     interrupt, the XPCW PIAPRX will cause the PC and flags to be stored
     at locations PIAPRX and PIAPRX+1, and the processor will then  jump
     to  the  location  stored  in  PIAPRX+3,  which  is PIAPR+1.  PIAPR
     actually dismisses the APR interrupt, or BUGHLT's.  The  old  stack
     pointer,  at  the  time of the interrupt, is stored in MEMAP.  Ac's
     0-10 are saved starting at  location  MEMPA.   One  unusual  aspect
     about  handling  APR  interrupts is that the PIAPR code changes the
     page fault trap vector, mentioned earlier, from PGRTRP  to  MEMPTP,
     in  UPTPFN,  to  handle  the  special  case  of a page fault in APR
     interrupt context.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 112


Version 4 of TOPS-20  will  include  some  changes  in  the  BUG  code
generation.   The  purpose  of these changes is to generate a document
describing the TOPS-20 BUGCHKs, BUGHLTs, and  BUGINFs  that  are  more
descriptive than the previous BUGSTRINGS.TXT file.

The logistics of this change include moving the BUG definitions out of
the  monitor  source  listings  and  into a central source file.  This
source file will serve both as the definition file for the bugs and as
documentation  for the BUGS.  This file is called BUGS.MAC and will be
distributed to all sites on the distribution  tape.   These  BUGS  are
still  referenced  in  the  source module where the bug is invoked but
they are defined in BUGS.MAC.

This involves a modification to the old BUG  macro  and  a  new  macro
called  DEFBUG.   The  BUG macro appears in the source modules and the
DEFBUG macro appears in BUGS.MAC.

The format of the new BUG macro is as follows:

              BUG (BUGNAM,<<x1,des1>,<x2,des2>...>)

This is placed in the monitor code where the BUG called BUGNAM  is  to
occur.  This macro executes a macro with name 'BUGNAM' which generates
a XCT BUGNAM where the contents of BUGNAM is a JSR BUG'TYP.  Following
the  location  BUGNAM  are  the Accumulators to be printed (one AC per
word) followed by SIXBIT/BUGNAM/.  The Accumulators to be printed  are
defined with the DEFBUG macro while the locations specified in the BUG
macro are for documentation only.

Accompanying this BUG macro is a DEFBUG macro which is placed  in  the
file  BUGS.MAC.   This entry completely defines the BUG, including its
type (BUGHLT, BUGCHK, or BUGINF) and documentation.

The format of the DEFBUG macro is:


     For a description of the arguments to this macro see  the  SWSKIT
article called BUGS.MEM.  

In order to make listings (output from MACRO or CREF) more informative
than  before,  the  BUG  macro  will  cause the statement of the short
description displayed in the listing where the BUG  macro  is  called.
Also,  the  flavor of bug (INF, CHK, or HLT) and whether it's hardware
or software related will be  displayed  in  the  listing.   Hence  the
OVRDTA bug would appear in the listing as
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 113

          ;BUG Type:              hardware-related BUGINF
          ;BUG description:       PHYSIO - OVERDUE TRANSFER ABORTED

     When fully documented, the BUGS.MAC file will be extremely useful
for  specialists.  It will describe, in one convenient place, what the
additional data printed on the console is, what caused  the  bug,  and
what the site or specialist should do if that particular bug occurs.

Here is a section of the current BUG definition/documentation for  the


Cause:  The access control job has not responded with a  GIVOK  within
        the designated time period.

Action: If this consistently happens with the same function code,  you
        should  see  if  the  processing  of  the function can be made

        If there is no obvious function code pattern, you may need  to
        increase  the  timeout  period  or rework the way in which the
        access control program operates.

Data:   FUNC - the GETOK function code 


INF specifies the bug is a BUGINF.  GIVTMR is the  name  of  the  bug.
JSYSA  is the module that the bug would occur in.  SOFT specifies that
it is likely the bug is caused by a software bug.  <GIVOK TIMEOUT>  is
the  bug string.  <T2,FUNC> specifies the data that will be printed on
the operator's console.  The initial spec called  for  the  descriptor
FUNC  to  be included in the operator's message but at this time, this
descriptor is just for source documentation.

The blurbs following the initial line of the BUG definition attempt to
describe  to  the  specialist,  in  a  more  detailed  manner than the
description printed on the console, what it means when this bug occurs
and  what  should be done first in order to resolve the situation.  In
this case the ACTION is to examine the GETOK routine which is executed
for  the  additional  data  FUNC.   This  routine  is getting hung up.
Sometimes, the ACTION will state to call the hot line or to submit  an
SPR.   These  descriptions  will  help the specialist be more informed
about the bugs which may occur at one of their sites and save them the
time  of  calling  the hot line or searching through the source module
for an idea of the problem.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 114

                         MONITOR BUILDING HINTS


Judging from the  number of  requests for  help on  this subject,  the
chances are that you  will be required to  rebuild a monitor  sometime
during your career  as a  Software Specialist. The  reasons are  quite
simple.  There are customers, who simply want functionality other than
that provided  by  stock  monitors.  There  are  also  those  who  are
experiencing performance problems. We  cannot forget the sales  folks.
It is not  unusual to  have to  rebuild a monitor  in order  to run  a
benchmark. A very common example  is increasing the OFN area.  Another
quite common requirement is  to increase the  patch area (FFF).  Doing
either of these and simply submitting a build control file will  often
produce a bad monitor.

We will talk about PSECTS in  relation to the Monitor's address  space
but will  make  no attempt  to  define what  they  do. A good detailed
discussion on the Monitor's address space is on pages 2-62 to 2-73  in
the Release 4  Update Manual. Also  there is a  memo on the  Monitor's
address space in the SWSKIT.


In V3A, all of the Monitor was in the same address space. Nevertheless
there was a crunch on space. As  a result some PSECTS were allowed  to
overlap. So  critical  was the  space  requirement, that  attempts  to
increase the OFN area  or FFF usually resulted  in the overlapping  of
PSECTS other the  the ones  permitted. Therein lies  the problem.  The
Monitor produced from such a process would ordinarily be useless.
With  the  development  of  V4,  the  space  requirement  became  more
critical.  The Symbol Table became the object of concern. It  required
a large number of pages, and in general, it is only used  infrequently
under normal  conditions.  Hence  the Engineering  folks were  of  the
opinion that  it should  be completely  elinminated. We  objected.  It
would be a nightmare to try  to debug the monitor without symbols.  It
thus became  our  project  to  somehow keep  the  Symbol  Table  while
conforming with  the space  restrictions.  We  decided to  remove  the
Symbol Table and place it in  an alternate  address  space. It  should
be noted  that  this  action  does  not  impact  adversely  on  system
performance. With this change, the  build procedure and the  monitor's
address space were reorganized.


Outlined below are some steps to guide you when rebuilding a  monitor.
Bear in mind that this  is a guide and might  not account for all  the
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 115

unusual situations.  This guide however, coupled with your  experience
and common  sense will  most likely  do the  trick. PLEASE  READ  THIS
read the build BEWARE file that is on the Installation tape.
NOTE:   The customers Distribution Tape will have all the files needed
        to rebuild  the  monitor.  All  TOPS-20  modules  will  be  in
        TOPS-20.REL (or T2020.REL etc) The control file is  TOPS20.CTL
        (or T2020.CTL  etc).  The  link file  will be  NAME.CCL  where
        "NAME" depends upon what monitor is being used (could be 2020,
        ARPA etc.). For 2040/50, it is called LNKSCH.CCL. In any  case
        the TOPS20.CTL file  will have  the name. The  files you  will
        change will be one  of  the  PARAM's  file  and/or STG.MAC. It
        should be noted that the special LINK.EXE and MACRO.EXE needed
        to build V3A are not required under V4.
        If you have  the time, it  is not a  bad idea to  use all  the
        standard files and  build yourself a  "vanilla" monitor.  This
        will test  the procedure  and files  and reveal  any  problems
        peculiar to the  build itself.  Once these  are resolved,  any
        problems encountered  when you  are rebuilding  your  modified
        monitor will be related to the change itself. The time for the
        debugging phase can thus be reduced substantially.
STEP 1          Restore all files needed  from <4-SOURCES>. This  will
                usually contain the monitor modules (TOPS20.REL file),
                all needed source  files, all  build control,  command
                and log files.
STEP 2          Carefully make the source changes as needed.
STEP 3          Examine the TOPS20.CTL  file. This  file will  usually
                have logical name definitions and TAKE commands  along
                with other things. Also look at all referenced command
STEP 4          Examine the  corresponding log  file. This  will  show
                what the result of  the original build procedure  was.
                It should therefore be a template which should be used
                to judge the validity of the new Monitor. Pay  special
                attention to the section which shows the PSECT  layout
                at the  end of  the BUILD  procedure. This  shows  the
                start location,  the end  location and  the amount  of
                free space between each PSECT.  The file used by  LINK
                to set up the PSECTS is called LNKSCH.CCL. You  should
                look at this file to get an idea of what's happening.
STEP 5          Now edit the control and command files as necessary to
                reflect your environment. This will mean, among  other
                things,   changing   or   eliminating   logical   name
                definitions.  Do NOT change the order of the PSECTS in
                the LNKSCH.CCL file. Also  do not change the  starting
                value for any PSECT.  The starting value is the  value
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 116

                given to the /SET: switch.
STEP 6          Submit  the  control  file  with  /TAG:SINGLE  switch.
                Ensure that the control  file is correct and  reflects
                accurately logical name definitions and the .CCL file.
                Also this portion  of the .CTL  file has the  commands
                necessary to compile the changed module.
STEP 7          When the job ends, examine your log file. Correct  any
                compilation or  missing files  errors and  go back  to
                STEP 6. Continue with STEP 8 only after all errors are
STEP 8          At this  point  you  should have  a  MONITR.EXE.   Now
                examine the section  in the  log file  which gives  an
                outline of  the  PSECTS.   If any  PSECTS  overlap,  a
                message will  indicate  the  same.  If  there  are  no
                overlapping messages, go to  STEP 11. NOTE: There  are
                some   instances  where  PSECTs  can  overlap.  POSTCD
                and SYVAR  PSECTs are  allowed to  overlap any  xxxVAR
                PSECT. This will  not gain  very much in  storage -  4
                pages to be exact. If  you  follow the build procedure
                then overlapping  PSECTs are not allowed and therefore
                must  be  resolved.  You  are  once  again advised NOT
                to re-organize the monitor's address space.
STEP 9          Start with  the  first  overlapping.  Figure  out  the
                amount of words by which  the first PSECT overlap  its
                following PSECT.  Now  add  this value  to  the  start
                location of  the overlapped  PSECT. This  value  quite
                possibly will  be  location  within  a  page  i.e.  an
                address of the form 125300,  where the page number  is
                125 and the offset into the page is 300. The  starting
                address of many  PSECTs is  required to be  on a  page
                boundary i.e. an  address of the  form 126000. A  good
                rule to  follow is:  IF THE  PSECT STARTED  ON A  PAGE
                BOUNDARY. This would mean that you may be required  to
                add an additional value to round up to the next  page.
                For example  the  125300  value would  be  rounded  to
                126000 if the  PSECT is required  on a page  boundary.
                The PSECT  sequence and  starting  values are  in  the
                LNKSCH.CCL file.  NOTE: the  values are  all given  in
                OCTAL so add in OCTAL.

STEP 10         EDIT the  LNKSCH.CCL file  to reflect  this new  start
                value for the  overlapped PSECT.  Go back  to STEP  6.
                Repeat these  steps  until  there are  no  more  error
                messages. Note that changing the start location of the
                overlapped PSECT can cause it to overlap its following
                PSECT and  the  same  procedure must  be  followed  to
                resolve any conflicts. Of  course you must be  careful
                to ensure that you do not outgrow the monitors address
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 117

                space. A total of the  length of all PSECTs will  tell
                you if the Monitor is too large.
STEP 11         At this point you should have a good Monitor. Save  it
                in the proper directory. The final test is getting  it
                up and running.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 118

                             EXEC DEBUGGING

Now that most SWS have micro fiche of the released EXEC and MONITOR  I
anticipate  questions  on  looking at the EXEC and MONITOR.  Here is a
cursory tutorial on  investigating  the  internals  of  the  EXEC  (or
command  processor, if you prefer).  The examples are intended to be a
guide and although the typein is correct,  the  response  may  not  be
character perfect.  You are advised to read the other chapters in this
document  for  more  information  on  DDT  and  MONITOR  snooping  and

                      LOOKING AT THE EXEC WITH DDT

You can either look at the running system EXEC or your own copy of the
EXEC with DDT that is loaded with the EXEC.


First you must have WHEEL  privileges  in  order  to  use  the  ^EEDDT
command.   The  ^EEDDT command transfers control to the DDT now loaded
with EXEC, with symbols.  Now you can do all the normal DDT functions.
To  exit  from  DDT  all you do is <ESC>G , echoed as $G.  This starts
your program which is the EXEC and so now  you  are  at  EXEC  command



Get your copy of the EXEC in your address space, transfer  control  to
it  and  start  DDT  as  above.   There  are  3 ways to exit from this
depending on the state you are in.  If you are in DDT you can  ^Z  out
to  get back to system EXEC.  If you are running your EXEC and want to
exit to the system EXEC you can ^EQUIT (if you are enabled)  or  "POP"
(if  you  are not enabled).  POP is preferable.  Note if you prefer to
get your EXEC and not start it in order to set breakpoints or  put  in
patches before running, see section "VI -- PATCHING" below.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 119


                @GET MYEXEC.EXE 
                @MONNAM.TXT, TOPS-20 MONITOR (VERSION#)
                CINITF/  -1   0         ; reset initialization flag so you can  
                                        ; run this EXEC again after it is saved
                ^Z                      ; to exit and save, for example
                @                       ; now you are in the monitors EXEC
                                        ; with your EXEC in your
                                        ; address space.  You can save it, say.
                @SAV MYEXEC.EXE.2


                @GET MYEXEC.EXE
                @MONNAM.TXT,,TOPS-20 MONITOR(VERSION #)
                CINITF/  -1  0          ; clear initialization flag
                $G                      ; running your EXEC
                $^EQUIT                 ; return to higher (system) EXEC
                @                       ; you are in system EXEC
                @SAV NEWEXEC            ; etc.


                @GET MYEXEC.EXE
                @MONNAM.TXT,,TOPS-20 MONITOR(VERSION#)
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 120

                @POP                    ; return to higher (system) EXEC.
                @                       ; now you are in system EXEC.

                                        ; NOTE: you should set CINITF to 0
                                        ; if you want to save and run this
                                        ; EXEC later.  You can do it by
                                        ; DDT after the POP or ^EEDDT before
                                        ; the POP.


     Since it is true that you could get into trouble with  your  EXEC
and  not  be  able  to get out of it, CTRL/C traps or you can't POP or
whatever, there is a way to exit to the MINI-EXEC always.   First  you
must  issue ^EQUIT to get into the MINI-EXEC.  Then "S" (start) to get
back to the system EXEC.  Then get into your EXEC.   If  you  now  get
into  trouble  you  can  issue  ^P  which  will  get you back into the
MINI-EXEC.  Now you have the chance to get back  to  the  system  EXEC
with "S" (start).


        INTERRUPT AT 15657
        $                               ; you are now back in system EXEC.
        $GET MYEXEC
                .                       ; lets say you can't do anything
                .                       ; you are in your EXEC
                .                       ; get out, get into MINI-EXEC
        INTERRUPT AT 12345
        MX>S                            ; MINI-EXEC prompt followed by start.
        $                               ; you are now in the system EXEC.


     Suppose that you want to run your EXEC as  the  top  level  EXEC,
that  is,  not  running under the system EXEC.  Get into the MINI-EXEC
and get your copy of the EXEC and run it as the top level EXEC.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 121

        INTERRUPT AT 23456
        MX>R                  ; Reset else you will MERGE rather than just GET
        @                     ; Now you are in your EXEC
          .                   ; Lets say you want to get out 
        @^P                   ; Control-P to get to MINI-EXEC
        MX>R                  ; "RESET" resets your address space
        MX>E                  ; You are requesting the system EXEC
        @                     ; You are in system EXEC        

NOTE:   If you had typed "S"  rather than "E" above you  would
        have restarted your EXEC.


Once you have made a change to your personal copy of the EXEC, you may
wish to have your edited EXEC run as the SYSTEM EXEC.  It is necessary
to make the saved EXEC non-writable before using it system-wide.



        81. pages, Entry vector loc 6000 len 3

        0        FARK:<4-FIELD-IMAGE.EXEC>EXEC.EXE.1  1   R, CW, E
        6-125    FARK:<4-FIELD-IMAGE.EXEC>EXEC.EXE.1  2-121   R, E

        .               ;Make the edits
        $SAVE EXEC.EXE.2 !New generation! (PAGES FROM) 6 (TO) 125 
         EXEC.EXE.2 Saved
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 122

        $COPY (FROM) EXEC.EXE.2 (TO) PS:<SYSTEM>EXEC.EXE.197 !New generation!


There is one error message when trying to start DDT;  "?" implies that
you do not have sufficient privleges enabled.

     When searching for symbols you may notice that  the  module  name
DDT  gives  you  is different from the module names that are assembled
for the EXEC.  For example to open the symbol table for EXECED you say
CANDE$:  to DDT.

The following is a correspondence list:

        EXECDE.MAC      XDEF
        EXECGL.MAC      XGLOBS
        EXECPR.MAC      PRIV
        EXEC0.MAC       EXEC0
        EXEC1.MAC       EXEC1
        EXEC2.MAC       EXEC2
        EXEC3.MAC       EXEC3
        EXEC4.MAC       EXEC4
        EXECED.MAC      CANDE
        EXECCS.MAC      CSCAN
        EXECSU.MAC      SUBRS
        EXECMT.MAC      EXECMT
        EXECQU.MAC      EXECQU
        EXECSE.MAC      EXECSE
        EXECP.MAC       EXECP
        EXECVR.MAC      VER
        EXECMI.MAC      MIC

     The sources and .CTL file for assembling  the  EXEC  are  on  the

     If it is true that upon trying to examine a location symbolically
you get "U" implying the symbol is undefined you may have to reset the
symbol table pointers.  Look in location 770001 for the  address  that
contains  the  symbol  table pointer then look at location 116 to find
the real symbol table  pointer.   Put  the  contents  of  116  in  the
location pointed to by 770001.

        116/   762600,54463   ; real symbol table pointer

        770001/  776456       ; location of symbol table pointer
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 123

        776456/  743200,,23540     762600,,54463


     There is a patch command in DDT.  The form is as follows:

        $<                    ; patch before this instruction
        $$<                   ; patch after this instruction
        $>                    ; end this patch following this instruction

DDT  will  put  the patch in the EXEC patch area.  The symbol is PAT..
DDT will insert JUMPA 1,LOC+1 and JUMPA 2,LOC+2  following  the  patch
you  typed  in.   Where  LOC is the location of the instruction you're
patching.  DDT then replaces LOC, the original  INST.,  with  a  JUMPA
XXXXX,  where  XXXXX  is the patch area where your patch is now.  Then
the patch area (PAT..) is redefined to follow your last patch.


Get a copy of <SYSTEM>EXEC, insert  calls  to  subroutine  MUMBLE  and
subroutine  FRATZ  before  location  DING+1.  DING+1 contains PRINT Q3
originally and contains a JUMPA to the patch  area  after  the  patch.
The patch area will contain:

        CALL FRATZ
        PRINT Q3
        JUMPA 1,DING+2
        JUMPA 2,DING+3


        $SAVE NUEXEC          ; you must SAVE and GET in order to write
        $GET NUEXEC           ; enable the EXEC to use DDT not ^EEDDT.
        EXEC0$:               ; open symbols for module where DING is

        DING/ PUSH P,A        ; first location in routine "DING"
        DING+1/ PRINT Q3 $<   ; begin patching before location DING+1
        PAT../ 0  CALL MUMBLE ; DDT opens up PAT.. area, you add code
        PAT..+1/CALL FRATZ    ; continue to insert your patch
        $>                    ; close the patch
        PAT..+2/ PRINT Q3     ; the original instruction being replaced.
        PAT..+3/ JUMPA 1,DING+2       ; DDT inserts this return.
        PAT..+4/ JUMPA 2,DING+3       ; incase a SKIP inst.

        DING+1/  JUMPA 12345  ; JUMPA to PAT.. replaces original LOC.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 124

        $G                    ; start your copy of EXEC etc.

     Various  methods  may  be  used  to  write-enable  the  EXEC  for
patching.   You  can  use  the  GET,  SAVE method above, or SET PAGE n
COPY-ON-WRITE, or the $W command in DDT to achieve the same results.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 125

                        RECOVERING FROM A BAD EXEC

     This procedure is simply a rehash of the procedure for recovering
from  the  case  in  which  the  EXEC  refuses  to  log  in.  For more
information see the article "Looking at the EXEC with DDT".

     If your system version of the EXEC blows up completely,  you  can
recover  rather  easily.   You type a ^C on the CTY, and when the EXEC
blows up you will be dumped into the MINI-EXEC.  Then you can use  the
GET  and  START commands to read in a good version of the EXEC, either
from a copy on disk, or from the distribution magtapes.

     If the problem with the EXEC is that it does not blow up, but  it
still  fails  to let you log in, then you have a harder time.  In this
case you have to bring up the system with the switches, and  bring  up
the system stand-alone.  An example of what to do from the point where
the BOOT program is loaded follows:

BOOT>/L                 ; load in the monitor
BOOT>/G141              ; start up EDDT

DBUGSW[   0   2         ; set system as debugging
EDDTF[   0   1          ; keep EDDT around

GOTSWM$B                ; set a breakpoint after the swappable
                        ; part of the monitor has been loaded
147$G                   ; start the system
FFF:                    ; change the name of the EXEC file
0$1B                    ; remove the GOTSWM breakpoint
$P                      ; proceed to bring up the system

^C                      ; and Control-C to get the new EXEC

If  you had no old version of the EXEC around, then change the name to
some garbage, so that the monitor can't find any such  program.   This
will  then  dump  you into the MINI-EXEC, and then you can read a good
EXEC in from magtape.

     In release 3 of the monitor, there is a new JSYS  which  is  very
useful  for  debugging  new  versions of the EXEC.  The CRJOB JSYS can
allow you to start up a new job with any program at all  as  it's  top
level  fork.   You  can  also start the job not logged in.  So you can
debug your new versions of the EXEC easily,  with  no  possibility  of
ripping yourself off.     Of course the  ^EQUIT, GET from MINI-EXEC is
still a valid sequence for starting a new top-level fork.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 126
Debugging the GALAXY System

                      Debugging the GALAXY System


The GALAXY system presents a unique problem to the software specialist
who  is trying to debug one of its components.  Usually, any user mode
program can be debugged under TOPS-20 by running a copy of it,  loaded
with  DDT,  taking  appropriate  care  that nothing is done which will
affect any users of the system.   For  GALAXY,  however,  it  is  very
difficult  to not affect users of the system.  For example, if you are
trying to debug BATCON, you will find that QUASAR  will  very  happily
schedule batch jobs submitted by other users to be run by your BATCON.
If you are not careful, you can cause those batch jobs to be lost,  or
at least slowed down, while you are debugging.

Debugging QUASAR or ORION would be even worse.  Users would see PRINT,
SUBMIT,  etc.   commands  hang  when  you  hit a breakpoint in QUASAR.
Operators would be unable to control any system components if you were
breakpointed  in  ORION.   On  top  of  this,  the monitor knows about
QUASAR, and you may lose messages which  happen  when  users  close  a
spooled lineprinter file, or when a job logs out.

To solve these problems, the concept of a "private GALAXY system"  has
been implemented by software engineering in version 4 of GALAXY.  When
a private GALAXY system  is  operating,  all  of  its  components  are
completely  independent  of  the  primary  GALAXY system.  QUASAR, the
queue maintainer, keeps queues  that  are  separate  from  the  system
queues  and  are  failsofted  to  a different master queue file.  This
QUASAR communicates only with other components  in  the  same  private
system.   It  is  even possible to run several complete private GALAXY
systems, with the restrictions that:

     1.  All components in a private system must run  under  the  same
         user name.

     2.  Only one private system may be run by a given user.

     3.  Each  private  QUASAR  must  be  connected  to  a   different

     4.  Each  private  ORION  must  be  connected  to   a   different
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 127


Since the changes necessary to create a  private  GALAXY  system  were
implemented  in  the version 4 source code, it is relatively simple to
build the system.  The recommended procedure is as follow:

     1.  Create a directory to for the private GALAXY system.

     2.  Restore  the  file  EXEC-FOR-DEBUGGING-GALAXY.EXE  from   the
         SWSKIT to this newly created directory.

     3.  Restore each of the following files from  the  "Subsys  files
         for  TOPS20  V4"  saveset on the TOPS-20 distribution tape to
         this directory.


     4.  For each component in the above list  except  GLXLIB.EXE  and
         QMANGR.EXE, perform the following steps:

         1.  Give the EXEC command "GET xxxxxx.EXE"

         2.  Give the command "DEPOSIT 135 -1"

         3.  Give the command "SAVE xxxxxx"
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 128


It is not strictly necessary to restore all of the  GALAXY  components
for  a  one  time  only  debugging session.  To debug a component like
BATCON, you would need at a minimum:

     1.  Your own copy of BATCON

     2.  Your own copy of QUASAR for BATCON to speak to

     3.  Your own copy of ORION for BATCON and QUASAR to speak to

     4.  A copy of OPR to speak to ORION to control BATCON

     5.  An EXEC which knows about your QUASAR to make queue entries

The following is a log of an example build of a private GALAXY system:

 TOPS-20 Command processor 4(560)
$! First connect to a debugging directory
$! Now build and save debugging .EXE files
$! QUASAR, the queue maintainer
$! ORION, the message clearinghouse
 ORION.EXE.1 Saved
$! OPR, the operator interface
 OPR.EXE.1 Saved
$! BATCON, the batch controller
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 129

$! Now a directory of what we've got

 BATCON.EXE.1;P777700    16 8192(36)   13-Feb-80 22:00:37 
                         82 41984(36)  13-Feb-80 04:33:50 
 OPR.EXE.1;P777700       31 15872(36)  13-Feb-80 22:00:09 
 ORION.EXE.1;P777700     44 22528(36)  13-Feb-80 21:59:45 
 QUASAR.EXE.1;P777700    40 20480(36)  13-Feb-80 21:59:27 

 Total of 213 pages in 5 files
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 130


Starting and running a private GALAXY system  is  similar  to  running
GALAXY  in the usual manner.  First QUASAR and ORION are started, then
the component you wish to debug.  You will  also  need  OPR  to  issue
operator  commands and the modified EXEC to make queue entries.  Since
you will need about five jobs, it is usually most  convenient  to  run
each component as a separate subjob under PTYCON.

4.1  Starting QUASAR

QUASAR and ORION should be started before  everything  else.   Nothing
evil happens if you start them last, but all the other components will
be waiting for these two to start.  A suggested procedure is:

     1.  Define a subjob "Q"

     2.  Connect to it

     3.  LOGIN a job under the same user name

     4.  CONNECT that job to  the  directory  in  which  you  did  the
         private GALAXY build

     5.  ENABLE

     6.  RUN QUASAR

4.2  Starting ORION

Starting ORION is as painless as starting QUASAR:

     1.  Define a subjob "O"

     2.  Connect to it

     3.  LOGIN a job under the same user name

     4.  CONNECT that job to  the  directory  in  which  you  did  the
         private GALAXY build

     5.  ENABLE

     6.  RUN ORION
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 131
Starting OPR

4.3  Starting OPR

OPR starts up using the same formula as QUASAR and ORION:

     1.  Define a subjob "OPR"

     2.  Connect to it

     3.  LOGIN a job under the same user name

     4.  CONNECT that job to  the  directory  in  which  you  did  the
         private GALAXY build

     5.  ENABLE

     6.  RUN OPR

     7.  You may now type OPR commands to  see  if  QUASAR  and  ORION
         appear to be healthy.

4.4  Starting The Component To Be Debugged

If the component you wish to debug is QUASAR, ORION, or OPR, then  you
have  already  started  it.  Breakpoints could have been set, and when
they were hit, the component could  have  been  debugged  without  any
noticable  affect  on other users of the system.  If you wish to debug

     1.  Define a subjob with an appropriate ID (e.g.  B for BATCON)

     2.  Connect to it

     3.  LOGIN a job under the same user name

     4.  CONNECT that job to  the  directory  in  which  you  did  the
         private GALAXY build

     5.  ENABLE

     6.  GET the component

     7.  Enter DDT

     8.  Set breakpoints, then start the program
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 132
Starting the Modified EXEC

4.5  Starting The Modified EXEC

The file "EXEC-FOR-DEBUGGING-GALAXY.EXE" which has  been  supplied  on
the  SWSKIT  has  exactly two commands added to its repertoire.  These
effect  of  these commands is to select which one of two PIDs (Process
IDs) to communicate with:  the system QUASAR or  the  private  QUASAR.
and the INFORMATION commands will all  cause  communication  with  the
system  QUASAR.   If "DEBUGGING-GALAXY" is set for this EXEC, then the
commands listed will communicate with the private QUASAR run  by  that

     1.  Define a subjob "E"

     2.  Connect to it

     3.  LOGIN a job under the same user name

     4.  CONNECT that job to  the  directory  in  which  you  did  the
         private GALAXY build


     6.  ENABLE

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 133


The following is a log of a sample debugging session:

 TOPS-20 Command processor 4(560)
@! First run PTYCON, so we can control five jobs from one terminal
PTYCON> ! Now start up QUASAR as subjob Q

 2102 Development System, TOPS-20 Monitor 4(3245)
 Job 21 on TTY222 13-Feb-80 22:18:05
Structure PS: mounted
Structure MISC: mounted
$! Connect to directory where debugging .EXE files are
$! Finally run the component
% QUASAR GLXIPC Becoming  [HEMPHILL]QUASAR     (PID = 66000031)
% QUASAR GLXIPC Waiting for ORION to start
PTYCON> ! Now start up ORION as subjob O

 2102 Development System, TOPS-20 Monitor 4(3245)
 Job 22 on TTY223 13-Feb-80 22:19:25
Structure PS: mounted
Structure MISC: mounted
$! Connect to directory where debugging .EXE files are
$! Finally run the component
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 134

% ORION  GLXIPC Alternate [HEMPHILL]QUASAR     (PID = 66000031)
% ORION  GLXIPC Becoming  [HEMPHILL]ORION      (PID = 70000032)
**** Q(0) 22:19:58 ****
% QUASAR GLXIPC Alternate [HEMPHILL]ORION      (PID = 70000032)
**** O(1) 22:19:58 ****
PTYCON> ! Now start up OPR as subjob OPR

 2102 Development System, TOPS-20 Monitor 4(3245)
 Job 23 on TTY224 13-Feb-80 22:20:29
Structure PS: mounted
Structure MISC: mounted
$! Connect to directory where debugging .EXE files are
$! Finally run the component
% OPR    GLXIPC Alternate [HEMPHILL]QUASAR     (PID = 66000031)
% OPR    GLXIPC Alternate [HEMPHILL]ORION      (PID = 70000032)
22:19:59          -- Network Node 1031 is Online --

22:19:59          -- Network Node 2137 is Online --

22:19:59          -- Network Node 4097 is Online --

22:19:59          -- Network Node DN20A is Online --

22:19:59          -- Network Node MILL20 is Online --

22:19:59          -- Network Node SYS880 is Online --
OPR>! Let's take a look at our brand new queues
22:21:21          --The Queues are Empty--
22:21:27          --There are no Devices Started--
PTYCON> ! Now start up BATCON as subjob B
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 135


 2102 Development System, TOPS-20 Monitor 4(3245)
 Job 24 on TTY225 13-Feb-80 22:21:49
Structure PS: mounted
Structure MISC: mounted
$! Connect to directory where debugging .EXE files are
$! Finally run the component
% BATCON GLXIPC Alternate [HEMPHILL]QUASAR     (PID = 66000031)
% BATCON GLXIPC Alternate [HEMPHILL]ORION      (PID = 70000032)
PTYCON> ! Now start up special EXEC as subjob E

 2102 Development System, TOPS-20 Monitor 4(3245)
 Job 19 on TTY226 13-Feb-80 22:23:00
Structure PS: mounted
Structure MISC: mounted
@! Run the special EXEC, which is provided on the SWSKIT

 TOPS-20 Command processor 4(560)-1
$! Make this EXEC switch from system queues to private queues
$! Use ordinary EXEC commands to examine private queues
[The Queues are Empty]
[The Queues are Empty]
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 136

$! Now switch back to look at system queues

Printer Queue:
Job Name  Req#  Limit            User
--------  ----  -----  ------------------------
* KLERR      6   1197  DEUFEL                     On Unit:0
   Started at 22:05:47, printed 314 of 1197 pages
  XXX        3     18  KAMANITZ                   /Dest:4097
  MS-OUT    18    117  BRAITHWAITE                /Unit:0
There are 3 Jobs in the Queue (1 in Progress)


Batch Queue:
Job Name  Req#  Run Time            User
--------  ----  --------  ------------------------
* DUMP      16  02:00:00  OPERATOR                In Stream:0
    Job# 17 Running DUMPER Last Label: A Runtime 0:23:55
  BATCH      2  00:05:00  BLIZARD                 /Proc:FOO
  SOURCE     8  00:05:00  BLOUNT                  /After:14-Feb-80  0:00
  SRCCOM    12  00:05:00  MURPHY                  /After:14-Feb-80  0:00
  QJD4R     13  00:05:00  SROBINSON               /After:19-Feb-80  0:00
  QAR       10  00:05:00  BLOUNT                  /After:19-Feb-80  0:14
  SAVE       1  00:05:00  FICHE                   /After:19-Feb-80  9:10
There are 7 Jobs in the Queue (1 in Progress)

$! Now let's submit a batch job to our own BATCON
$! Make a trivial batch control file
$COPY (FROM) TTY: (TO) A.CTL.1 !New file! 
 TTY: => A.CTL.1

$! And submit the job
[Job A Queued, Request-ID 1, Limit 0:05:00]
$! Now examine private queues

Batch Queue:
Job Name  Req#  Run Time            User
--------  ----  --------  ------------------------
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 137

  A          1  00:05:00  HEMPHILL              
There is 1 Job in the Queue (None in Progress)

$! Our job is in the batch queue, but no batch-streams have been started

OPR>START (Object) BATCH-STREAM (Stream Number) 0
22:25:40        Batch-Stream 0  --Startup Scheduled--

22:25:40        Batch-Stream 0  --Started--
22:25:40        Batch-Stream 0  --Begin--
                Job A Req #1 for HEMPHILL
22:25:51        Batch-Stream 0  --End--
                Job A Req #1 for HEMPHILL
PTYCON> ! Cleaning up is easy
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 138


This section is to explain what happens differently when  a  component
has  had  location  135  (.JBOPS)  poked  to  -1, and to present a few
helpful tidbits of information about debugging some of  the  programs.
.JBOPS  incidentally  is  the word in the job data area (defined under
TOPS-10) which is reserved for a  program's  OTS.   GALAXY  references
this location by the symbol "DEBUGW".


GLXLIB is the GALAXY library.  It consists of  a  code  segment  which
starts  at  address 400000 and a data segment at address 600000.  Each
SPRINT,  and  SPROUT uses it.  Part of the initialization code of each
of these programs maps in GLXLIB as a  "high  segment".   This  is  in
effect  an  object  time  system  for  GALAXY, with many commonly used
routines.  Most of the support for the private  GALAXY  system  is  in
this  library,  enough so that OPR, PLEASE, BATCON, LPTSPL, SPRINT and
SPROUT actually have no code which cares whether they are  part  of  a
private  GALAXY.   The  initialization code in each component looks in
three places to find GLXLIB.EXE:  first on the structure and directory
that  the  component  itself came from, second on DSK:, third on SYS:.
This search order is the same for  both  the  system  GALAXY  and  the
private one.

     The actual changes implemented for  the  private  GALAXY  are  as

     1.  Ordinarily, a component which stopcodes  will  save  a  crash
         file on disk.  When debugging, however, the crash file is not
         written.  In either case, if DDT is loaded with the  program,
         the stopcode will invoke a jump to DDT.

     2.  GALAXY components do not require receiving privileged packets
         under debugging.

     3.  Ordinarily, QUASAR and ORION get special system PIDs for IPCF
         communications.   When debugging, they get PIDs with names of
         the  form  "[username]QUASAR"  and  "[username]ORION".    All
         GALAXY components will then look for these PID names.  Even a
         pseudo-GALAXY component, such as MOUNTR or  IBMSPL,  will  be
         able to find these PIDs if its location 135 has been poked to
         -1, simply because it uses GLXLIB.

     4.  GALAXY components print messages like:
         "% QUASAR GLXIPC Waiting for ORION to start"
         only while debugging.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 139

     5.  ORION and QUASAR print  messages  about  PIDs  they  acquire,
         "% QUASAR GLXIPC Becoming  [HEMPHILL]QUASAR     (PID =

     6.  All components print messages about  the  special  PIDs  they
         find for QUASAR and ORION, like:
         "% ORION  GLXIPC Alternate [HEMPHILL]QUASAR     (PID =


     1.  QUASAR reads and writes private  queues  from  its  connected
         directory.  The full filespec is 

     2.  QUASAR does absolutely no  privilege  checking.   Anyone  can
         modify or kill any request in the queues (if they know how to
         speak to this private QUASAR).

6.3  ORION

     1.  ORION  will  create  a   log   file   under   the   name   of
         "DSK:ORION-TEST.LOG"                instead                of
         "PS:<SPOOL>ORION-SYSTEM-LOG.001", and does no renaming of any
         old log files present.

     2.  ORION will not set up any NSP  servers  when  debugging.   It
         therefore  will  not  speak  to  remote nodes to run OPRs for
         them.  However, there  are  hooks  for  ORION  to  initialize
         "SRV:128" instead of the usual "SRV:47" when debugging.


QMANGR has also been modified to look for a private  QUASAR's  PID  if
the low segment has a non-zero entry in .JBOPS.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 140


CDRIVE can pose a problem to debug,  since  it  has  potentially  many
inferior forks all executing the same code, so each fork automatically
loads SDDT into its address space and jumps to it when it  starts  up.
After setting any breakpoints or otherwise modifying this fork's code,
the debugger types "GO<ESC>G" to resume the fork.  While debugging, if
the  fork  terminates (crashes), CDRIVE will not go through its normal
purging of the crashed fork, so that its status can be examined.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 141


All GALAXY components use the stopcode facility  supplied  by  GLXLIB.
This  facility  dumps  the  ACs, program error codes, associated error
messages, program version numbers, and the last nine locations of  the
stack  onto  the  controlling  terminal  of  the program executing the
stopcode.  In addition, a crash file is created with the name  of  the
form:   PS:<SPOOL>program-stopcode-CRASH.EXE.  This .EXE file contains
the entire core image  of  the  program  which  has  crashed,  and  is
extremely   useful   in  determining  the  cause  of  the  crash.   In
particular, there is a block of data referred to as the "crash  block"
which usually contains the information most pertinent to the debugger.
This information can be read with either DDT or FILDDT.  Its  contents
are tabulated as follows:

        Location                Data

        .SPC                    PC of stopcode

        .SCODE                  SIXBIT name of stopcode

        .SERR                   Last TOPS-20 error code

        .SACS                   Contents of the sixteen accumulators

        .SPTBL                  Base address of page table used by

        .SPRGM                  Name of program in SIXBIT

        .SPVER                  Program version number

        .SPLIB                  GLXLIB version number

        .LGERR                  Last GALAXY error code

        .LGEPC                  PC of last GALAXY error return
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 142

                            DEBUGGING MOUNTR


     This write-up was prepared to assist developers  and  maintainers
in understanding and debugging the TOPS-20 tape and structure mounting
program, MOUNTR.   It  is  assumed  that  the  reader  has  a  working
knowledge  of TOPS-20 assembler language coding and the set of TOPS-20
monitor calls.


     This document will serve primarily as a guide to debugging MOUNTR
crashes.   Much of the information needed to understand the data bases
and the operation of MOUNTR resides within the first 20 or 30 pages of
the MOUNTR code itself.  Just make a listing and start reading.


     MOUNTR can  be  debugged  as  a  standard  GALAXY  component,  by
depositing -1 in location 135 of MOUNTR.EXE.  MOUNTR will aquire a PID
for a private copy of QUASAR and will communicate with it.

     To debug a MOUNTR which is actually recognized by the  system  as
the  "real"  MOUNTR  it is usually best to run it as a seperate job by
including the following commands in SYSJOB.RUN:


     This job can be reached by use of the ADVISE command, MOUNTR  can
be  killed  and a new copy can be started with appropriate breakpoints
or patches installed.  Before MOUNTR can be patched or breakpointed it
is  necessary  to issue the DDT command $W since MOUNTR write protects
itself during execution.  For example:

      TTY2, NRT20
      TTY235, OPR
      TTY234, MOUNTR
      TTY233, PTYCON
      TTY232, EXEC
     TTY: 234
      [Pseudo-terminal, confirm]
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 143

      Escape character is <CTRL>E, type <CTRL>^? for help

     ^C                                   !KILL OLD MOUNTR
     $GET SYS:MOUNTR                      !GET A NEW ONE
     $DDT                                 !ENTER DDT
     $W                                   !YOU MUST DO THIS
     DDSCIH/   JSP 16,SAVEQR#   .$B   
     ^Z                                   !EXIT DDT
     $START                               !START MOUNTR

     Depositing 1 in location CDFLG will enable CONTROL-D  interrupts.
Typing CONTROL-D when enabled causes MOUNTR to enter DDT.


When MOUNTR crashes, it saves its core image in the file,


All crashes are initiated by a CALL STOP instruction.  This may result
from  a  logic  inconsistency,  or  it can happen if MOUNTR receives a
software interrupt on a panic channel.  The STOP routine gathers  some
important  data  and saves it in core.  It then types a message giving
the name of the filespec wherein it is  saving  the  core  image,  and
issues  an SSAVE JSYS to save the image.  After restoring the ACs from
the time of the crash, MOUNTR halts.

To begin debugging a MOUNTR crash, follow these steps:


     2.  Get into DDT and type STOP1$G.  This will load DDT's ACs with
         MOUNTR's  ACs  at the time of the crash and exit to the EXEC.
         Give the DDT command to the EXEC again to get back into DDT.

     3.  Look at P (AC 17).  If it contains PDL1+something, there  has
         been  a  stack  trap,  and  the routine STOPP was called as a
         result.  The location BADP contains the contents of P at  the
         time of the trap.

     4.  If P contains PDL+something, type TAB to look at the  top  of
         the  stack.   This  will  contain one plus the address of the
         CALL STOP instruction.   Type  TAB  and  ^H  to  display  the
         CALL STOP instruction that invoked the crash.  If MOUNTR died
         as a result of a panic channel interrupt, LPC1  will  contain
         one  plus  the address of the instruction which was executing
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 144

         at the time of the interrupt.

The following locations and data structures are  helpful  in  locating
the cause of difficulties in MOUNTR:

----    --------
CRSHAC  Contains the ACs at the time the STOP routine was called.

LPC1    For crashes caused by panic channel interrupts, LPC1  contains
        one plus the address of the instruction that caused the crash.

LSTERR  Contains the last TOPS-20 error.

MRPDB   PDB for last IPCF message received by MOUNTR

MSTRBK  Used as an argument block for MTOPR and MSTR monitor calls.

RBUF    Last IPCF message received by MOUNTR (particularly  useful  if
        SSSDAT+1 contains MRCVIH, indicating that MOUNTR crashed while
        processing an incoming IPCF message).

SSSDAT  When MOUNTR crashes, SSSDAT+1  contains  the  address  of  the
        routine that was invoked by MOUNTR's scheduler.  Starting here
        and using the stack, you can trace the execution  of  MOUNTR's
        code that led to the crash.

TBUF    Last IPCF message sent by MOUNTR.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                             Page 145

                           DEBUGGING PA1050

In order to debug the compatibility package you must have  a  copy  of
the  file called PAT.EXE.  PA1050 is just the system name for PAT.  If
there is no copy of PAT.EXE,  then  take  the  source  program  called
PAT.MAC,  and  assemble  it.   Thereby  creating  a sharable save file
called PAT.EXE.  To debug  the  compatibility  package  the  following
steps are required.

$GET ISAM          ;Where ISAM may be any program you choose
$MERGE PAT         ;PAT is the source name for PA1050
PAT$:   MOVBF$b    ;You set your breakpoints here
$G                 ;You must type  $G  twice  because  of  the  double
                    symbol table


               Some  of  the  error  messages  you  may
               receive  from PA1050 may not be the true
               error  message.   To  have  the  correct
               error  message printed out use an ERJMP,
               or an ERCAL after the JSYS it fails  on.
               For  more information on ERJMP and ERCAL
               refer to  the  Monitor  Calls  Reference

In order to build the compatibility package the  following  steps  are

Output file: PA1050.EXE

The start after loading causes  the  program  to  be  moved  from  its
location  to  its  running location in high core.  The symbol table is
also moved, and the pointer adjusted.  A sharable save file  of  pages
700-777  must  be  made  for  debugging.   This  is  created  when you
MAKEPF$G, then load 40000,,0 in UDDT.  When you type I MEM you  should
now have PA1050.EXE in 700-730.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 146

                          COPYING FLOPPY DISKS

This is  a description  of the  front end  program COP  (quick  floppy
copy). This program  should be  used to  create backup  copies of  the
distributed set of floppies.


1)	Only IBM floppies should be used.  Other floppies  may
	destroy the DX11 drives.

2)	Floppies have  a  finite  life while  mounted  in  the
	drive. The heads do not  float, and the floppies  turn
	continuously.  This causes the magnetic surface to  be
	eaten away. Minimum floppy life is something like  200

3)	Floppies which are dropped, badly shocked, or used  as
	frisbees will lose their  sector headers, and will  be
	good for nothing.

4)	Never put a floppy which you suspect is bent into  the
	drive -- it may damage the drive. 

5)	COP  is discussed  also in  the  Front End File System
	Specification  manual  in  Volume  14 of  the  TOPS-20
	Software Notebooks, section 3.2.


	The basic COP command string is of the form:

	  COP> <destination device>/<switch>=<source device>

	To  enter  COP, type a Control-backslash to get to the
	Parser,  then  MCR COP  to start up COP.  The floppies
	should have  already been mounted with  MCR MOUNT, and
	should  then be dismounted with  MCR DMOUNT  after the


	/HE	Help, types a list of switches
	/RD	Read Device, check for errors
	/CP	Copy (default action)
	/VF	Verify copy (default when copy in effect)
	/ZE	Zero the device
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 147


	The  following  sequence  of commands will  succeed in
	copying  the contents of the floppy in DX0:  (the left
	hand drive) onto the floppy in DX1:, and verifying the

	Mount completed
	Mount completed
	Dismount Complete
	Dismount Complete

	The copy takes about two minutes, the verify about the same.
	Take  care to  specify the  correct source  and  destination


	If you  COP for  many generations  you will  build  up
	ghost bad  blocks until  RSX will  declare the  floppy
	useless. This is  because in each  generation the  bad
	block file of the  old floppy is  copied onto the  new
	(which will have its bad blocks in different  physical
	locations).  A way around this  is to use PIP for  any
	non-boot copies once every several generations.  
TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 148

                       THE SWSKIT TOOLS PROGRAMS

Included on the SWSKIT are a number of utility programs,  as  summarized
below.  These tools have been found to have at least some usefullness in
the past in a debugging environment.  Most of these programs require the
user  to  have WHEEL or OPERATOR privleges to work, but also most are of
the "show and tell but don't touch" category, so  they  are  in  general
"safe" to run.

We have cleaned up some of the old ones a bit, added a few new ones, and
checked them all out to the extent that they will all run.  There should
even be some documentation, at least a HELP file, with each program.

While we do not actively "support" these programs, we are quite  willing
to accept complaints and suggestions and submissions from the field.

These are the "standard" tools;  the Marlboro Support Group is generally
familiar  with  their  operation and quirks, and in providing support to
the field may request that one or more of the  programs  be  used  at  a
customer  site  to  diagnose or assist in correcting a problem.  This is
generally more effective than random poking about in DDT, or  trying  to
learn the peculiarities of whatever the customer may have available.

And now, the current collection:

          PROGRAM                       DESCRIPTION
          -------                       -----------

          CHANS               This   program   will   produce   system
                              configuration, and status information on
                              tapes and disks.

          DIRPNT              This program will list the  contents  of
                              the blocks in a disk directory.

          DIRTST              This program will check the format,  and
                              list   any  invalid  data  in  directory

          DS                  This  program  will   provide   software
                              diagnostic help concerning the disk file
                              system.  It can perform the functions of
                              READ, FILADR, and UNITS.
TOPS-20 TROUBLE-SHOOTING HANDBOOK                             Page 149

          PROGRAM                       DESCRIPTION
          -------                       -----------

          DSKERR              This program will provide  a  convenient
                              listing of the hard and soft disk errors
                              that have occurred.

          DX20PC              This program will trace the microcode PC
                              in the DX20.

          EXEC-FOR-DEBUGGING-GALAXY This  EXEC  contains  commands  to
                              facilitate  debugging  a  private GALAXY

          FILADR              This  program  will  display  the   disk
                              addresses   a  file  is  using,  or  the
                              addresses which are marked  in  the  BAT

          JSTRAP              This program will produce information in
                              a  log on any JSYS, including the PC and
                              arguments used.

          MONRD               This program will allow  you  to  easily
                              examine the running monitor.

          MTEST               This  program  will  allow  the  you  to
                              insert   MONITOR  instruction  execution
                              tests anywhere in the monitor.

          READ                This program performs the same action as
                              the   CHECK  FILE  command  to  DS;   it
                              read-checks files for disk errors.

          REV                 This program will allow  you  to  easily
                              alter, edit, delete, obtain information,
                              etc.  on files.

          RSTRSH              This program  will  detect  bug  induced
                              changes  in  the  resident  monitor in a
                              dump file.

          SWSERR              This  program  produces   a   convenient
                              listing of BUG HLT/CHK/INF occurances.

          TYPVF7              This program is useful  for  typing  out
                              the contents of a VFU file in a readable

          UNITS               This   program   will   produce   status
                              information on disk drives.