PDP-10 Archive: swskit-documentation/handbook.mem from BB-H311B-RM

Trailing-Edge - PDP-10 Archives - BB-H311B-RM - swskit-documentation/handbook.mem

There are 5 other files named handbook.mem in the archive. Click here to see a list.






















                  TOPS-20 TROUBLE-SHOOTING HANDBOOK
                  =================================



                          Release 4 Edition


                            February 1980















                         TOPS-20 Monitor Group
                        Marlboro Support Group
                          Software Services

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 2
INTRODUCTION






                             INTRODUCTION
                             ------------





     This document is the TOPS-20 Trouble-Shooting Handbook.  It is  a
collection  of materials designed to increase the effectiveness of the
Software Specialist in the field  in  coping  with  TOPS-20  problems.
Some  of the common "disasters" to befall TOPS-20 sites are discussed,
along with debugging  methods  in  general.   Though  the  information
contained  herein is probably not sufficient to make a Specialist into
a TOPS-20 "wizard", it  should  help  ease  the  communication  burden
between  the  Specialist  in the field and his counterpart in Marlboro
and lead to quicker resolution of problems.

     This document contains materials from many sources, and  presents
some information not available anywhere else.  Certain sections may be
a bit dated, but an effort has been made to remove at  least  some  of
the old/wrong stuff along with including new articles.

     There is a continuing need to update this document as part of the
SWSKIT  materials, and Specialists are encouraged to give the Marlboro
Support Group feedback on these materials.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 3
TABLE OF CONTENTS


                          TABLE OF CONTENTS




     1.  INTRODUCTION                                             2

     2.  TABLE OF CONTENTS                                        3

     3.  POLICY STATEMENT                                         5

     4.  PRODUCING A GOOD SPR                                     6

     5.  USING SIRUS                                              9

     6.  MAPPING DIRECTORIES IN MDDT                             16

     7.  RECOVERING FROM DIRECTORY ERRORS                        19

     8.  MORE ABOUT DIRECTORY PROBLEMS                           22

     9.  JSB AND PSB MAPPING                                     24

    10.  BREAKPOINTING MULTI-USER CODE                           28

    11.  USING ADDRESS BREAK TO DEBUG THE MONITOR                30

    12.  RECOVERING FROM SYSTEM DISASTERS                        33

    13.  LOOKING AT HUNG TAPES                                   39

    14.  A LOOK AT SOME OF THE DISK STUFF                        43

    15.  NEW DISK FEATURES FOR FILDDT                            47

    16.  TOPS-20 SCHEDULER TEST ROUTINES                         50

    17.  KNOWN HARDWARE DEFICIENCIES LIST                        57

    18.  KS10 CONSOLE INFORMATION                                59

    19.  CRASH ANALYSIS                                          67

    20.  BUG'TYP MACRO CHANGES FOR VERSION 4 OF TOPS-20          86

    21.  MONITOR BUILDING HINTS                                  88

    22.  EXEC DEBUGGING                                          92

    23.  RECOVERING FROM A BAD EXEC                              98

    24.  DEBUGGING THE GALAXY SYSTEM                             99

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 4
TABLE OF CONTENTS


    25.  DEBUGGING MOUNTR                                       114

    26.  DEBUGGING PA1050                                       116

    27.  COPYING FLOPPY DISKS                                   117

    28.  THE SWSKIT TOOLS PROGRAMS                              119

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 5
POLICY STATEMENT


              LEGAL POLICY CONCERNING THE TOPS-20 SWSKIT




     There is a great confusion concerning the materials that make  up
the SWSKIT tape, and their legal standing.  This memo is an attempt to
clear up some of those problems.

     The SWSKITs are made up of an assortment of materials intended to
increase   the   effectiveness  of  the  software  specialist.   These
materials include program sources not normally distributed or sold for
a premium;  internal and company confidential documentation, which may
be in part incomplete or actually  incorrect,  but  supplied  for  the
information value on subsystems which may be insufficiently documented
through the usual channels;  documentation for  specialists  specially
produced  by  the  corporate  support  people;   and  utility programs
produced and maintained to  some  extent  by  corporate  support.   In
addition,  the  SWSKIT  may contain special or pre-release versions of
supported software provided for the incremental value a specialist may
obtain  from  the  software  under controlled circumstances.  In time,
utilities from the SWSKIT may evolve into supported products.

     All of the SWSKIT materials are proprietary to DIGITAL, and  were
never  intended  to  be  just  given  to the customer.  Obviously, the
materials which are otherwise sold cannot  be  given  away;   and  the
company  confidential  materials  should not be.  While it is expected
that the tools programs may wind up  being  used  at  customer  sites,
neither  are  they  gifts  to the customer.  An effort must be made to
protect  DIGITAL's  rights  to  these  proprietary   materials.    For
instance,  a PL90 contract retains rights to all materials provided to
the customer.  Deleting a tool program after use at  a  customer  site
indicates  intent.   There  should  be an awareness that if a customer
incurs damages due to  use  of  some  program  given  to  him  by  the
specialist,  even  though improperly used, then DIGITAL may be seen to
be at least in part responsible.  This should be avoided.

     In summary, the  SWSKIT  is  a  tool  provided  to  increase  the
effectiveness  of  the  specialist, especially with regard to PL90 and
debugging activity, but  the  rights  to  all  materials  remain  with
DIGITAL and the specialist should act accordingly.

     THIS IS NOT A LEGAL DEPARTMENT DOCUMENT.  CONSULT  LEGAL  IF  YOU
HAVE ANY DEFINITE PROBLEMS REQUIRING RESOLUTION.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 6
PRODUCING A GOOD SPR


                          PRODUCING A GOOD SPR




           A software specialist is  often  asked  to  assist  with  the
      submission  of  SPRs for a customer.  It is always discouraging to
      have  problems  getting  an  answer  to  an   SPR   for   entirely
      non-technical  reasons.  For that reason, below are some hints for
      producing a "good" SPR which will  help  in  getting  the  problem
      solved more quickly.



      1.0  THE SPR FORM

      Much of the data on the SPR  form  is  unimportant,  until  it  is
      omitted.   The  line  of  product data is one.  Try to isolate the
      problem to the correct component, since that  will  determine  who
      first  receives  the SPR.  This will remove the time it takes for,
      say the COBOL maintainer, to determine that  the  problem  is  not
      really  in  COBOL,  but  in PA1050 or the monitor, and the time it
      takes for the next maintainer to become familiar with the problem.
      Something  which  crashes  the system is always a monitor problem,
      even if it is an EXEC command which causes the problem, or a short
      BASIC program.

           If you really have a problem, be sure to mark  the  "problem"
      box,  and  don't  use  words  like  "we  suggest  you  correct the
      following situation...".  If the people who  handle  the  incoming
      paperwork  think they have a suggestion, it gets routed elsewhere,
      and is never seen by the maintainers.  A few  problems  have  been
      greatly delayed this way.

           The priority boxes are not super-critical, but if you have  a
      problem  which  is  holding  up production, or crashing the system
      several times a day, try to make a note of that somewhere  in  the
      description  of  the problem.  That should let the maintainer know
      that a work-around may also be appropriate in the short term.

           The phone number of the submitter could be important  if  the
      problem  is  of  such a nature that it proves not-reproducible, or
      the  complexity  is  such  that  futher  clarification   just   to
      understand  the  problem  might  be needed.  Your number here as a
      software specialist provides a more informal contact  than  direct
      maintainer-to-customer  confrontation,  although the customer will
      be contacted directly if that is most expedient.

           The attachments--be sure to mark some of these boxes  if  you
      send  along  supporting  materials.  Since these can get separated
      from the form, this will help keep them from  getting  permanently
      lost.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 7
THE SPR FORM


           The "DO NOT PUBLISH" box is for security problems and ways to
      crash  the system.  We double-check this on incoming handling, but
      if the box is checked you can be sure that the  SPR  will  not  be
      published unanswered.

           Describe the problem as clearly  as  possible  in  the  space
      provided.   Try  to  provide enough detail to easily reproduce the
      problem.  Concentrate on the description of the problem,  and  any
      diagnosis  you  may  have made.  Attempting to declare a "cure" is
      not always good idea because the actual correction may  be  of  an
      entirely  different  nature  for a number of reasons.  However, if
      you have something that works, the information could  be  of  use.
      Just  don't  count  on that exact change being the actual fix.  If
      the problem  is  not  reproducible  from  the  description  given,
      chances  are  that  something  you  left  out  is  relevant to the
      problem.  Unless the problem directly concerns them,  things  like
      logical  names,  mounted  structures,  and  other  features  often
      obscure the problem.  For the purpose of the problem  description,
      a terminal listing of an occurrance is often highly desirable, and
      it is sometimes a  good  idea  to  create  a  brand-new  directory
      without  any  fancy  LOGIN.CMD  setups or user groups and so on to
      demonstrate the problem.



      2.0  THE SUPPORTING MATERIALS

           As above, the listing from a terminal session is often a very
      good  attachment.   Try  to  include all the relevant information.
      Again, sometimes things like logical  names,  file  and  directory
      protections,  user  groups,  and  other  job-state  variables  are
      important and should be  included.   Inclusion  of  data  such  as
      program version numbers and edit levels can be useful for products
      with large numbers of edits.  If you are  complaining  of  monitor
      problems,  which  patches  you  have  installed  could  be  useful
      information.  Terminal sessions should be as  clear  as  possible.
      It  should be made obvious just what is going on or the maintainer
      may just see a series of commands and think "So?".  Concurrent  or
      after the fact commenting is one way to accomplish this.

           Many times there  is  a  program  which  exercises  the  bug.
      Sometimes  these  programs are alright as they are, but often they
      are giant COBOL monsters working on a multi-RP06  data  base,  and
      very  unwieldy  for  a  maintainer  to  try  to work with.  If the
      program can be reduced to a small subset,  do  so.   Many  monitor
      problems often turn out to be reproducible from a set of arguments
      to a single JSYS.  If it is a question of  incorrect  output  from
      some  program, it is helpful to send along all the files needed to
      reproduce the problem, and the files of incorrect output.  In  the
      case  of  programs with multiple edits to field-image, this speeds
      up the maintainer, since he does not have to manually apply  those
      edits  to attempt to recreate your versions, and he can also check
      the installation of the edits, if that  is  appropriate.   And  in
      case  the  problem  proves  to  be not easily reproducible the bad

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 8
THE SUPPORTING MATERIALS


      output can at least be examined for clues.

           In the case of a monitor crash, the  problem  may  have  been
      reduced  to  a  program  of less than one page.  It is tempting to
      type this on the front of the SPR and send it in that way.   While
      the  maintainer can type in the program easily enough (if the copy
      is  both  legible  and  correct),  the  submitter  has  been  lax.
      Sometimes,  that short program will not cause a crash, even though
      run thousands of times under varying conditions by the maintainer.
      And  even  when  it  does  cause  the  crash  the  first time, the
      submitter has lengthened the turn-around by not sending  the  dump
      from  the  crash along with the SPR.  Sending the dump solves both
      problems.  If the problem is not reproducible with ease, the  dump
      is  vital  to further understanding.  And having the dump to start
      with speeds up the work of the maintainer who now does not need to
      schedule  stand alone to try to exercise the bug and cause a crash
      so he has a dump to look at.

           When sending a dump, always send the unrun monitor along with
      it.   If  you  don't, you are just causing a delay in handling the
      problem while the maintainer tries it against the  standard  ones,
      which  involves  finding tapes with the standard ones, and loading
      them...  If you are running an unpatched standard monitor, and you
      refuse  to send it, at least tell which one it is somewhere on the
      form.  The unrun monitor is also useful for checking the existence
      and correct installation of patches when that becomes an issue.

           The current preferred tape format is 9-track, 1600bpi, and in
      standard  DUMPER  format,  not  in  INTERCHANGE format, since file
      information can be lost that way.  Take the time to get a  listing
      of  a directory of the tape and include it with the tape.  It will
      help to speed things up, as if it is obvious  from  the  directory
      that something is missing, faster feedback is generated.  There is
      also the indication that the tape will  indeed  be  readable  when
      received,  and  will  partly eliminate the usual first step of the
      maintainer in getting a directory of the tape.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 9
USING SIRUS


                              USING SIRUS
                              -----------

     Did you know that you can dial into a Marlboro development system
and type out almost any patch that the Marlboro Support Group has made
to -10 or -20 software in the last three to four years?   The  program
which does this is called SIRUS, and with it you can:

     1.  Search through all the patches to a  particular  product,  if
         you know a problem exists but don't know what the patch is or
         don't know if we've heard of the problem.  If  you  find  the
         patch you want, you can then type it out.

     2.  Type out a particular patch to a particular product,  if  you
         know what the edit number is.

     3.  Obtain the status of any SPR, including the entire answer  if
         it has been answered.


     By using SIRUS, you can get patches whenever the  system  is  up,
even  if  it's  two  A.  M.  and the Hotline is closed.  You can print
patches in your local office without having to wait for  a  specialist
in  Marlboro  to  mail you a copy.  You can be sure that the patch you
have is correct.  (Dictating patches over the Hotline is very prone to
errors.)  Even  if the problem you are experiencing cannot be found in
SIRUS, you can help us when you call by so  stating.   We  immediately
know that the problem you are having is a new one.

     There have been several articles about SIRUS  in  previous  Large
Buffers, but none have been oriented towards specialists in the field.
This one is!

     To use SIRUS, dial into system 1026 in Marlboro, log in, and then
run it.  In more detail:

     1.  Dial into system 1026.  Any of  the  following  numbers  will
         reach system 1026 in Marlboro.  They are all 300 baud lines.
         
                            231-1171  (DTN)
                            231-1172  (DTN)
                       (617)481-5606
                       (617)481-5632
                       (617)481-5635
                       (617)481-5636
                       (617)481-5637
                       (617)481-5638
         
         Once the machine notices you, type "SET HOST  26"  to  insure
         that  you  are  connected  to  system  1026.   If you get the
         message "?Undefined Network Node", the machine is  down  (try
         again later).

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 10
USING SIRUS


     2.  To login, type "LOGIN 10,#".  When  the  machine  requests  a
         name, type one in.  You will not need a password.

     3.  To run SIRUS, just  type  "R  SIRUS".   SIRUS  takes  several
         seconds  to  initialize  itself  and  then  prompts  you with
         "PRODUCT [H]*".  At this point,  type  either  "10<CRLF>"  or
         "20<CRLF>"  depending  on  whether the customer of concern is
         running TOPS10  or  TOPS20.   SIRUS  then  prompts  you  with
         "[H] *".  You are now at SIRUS command level.


     SIRUS has many commands, but only a few are of  interest  to  the
field specialist.  They are:

     1.  H -- for Help.  This may be typed anytime SIRUS precedes  its
         prompt with "[H]".

     2.  EX -- for Exit.  Use this to exit SIRUS.  Then  type  K/N  to
         logout, and hang up.

     3.  PP -- for Peruse PCOs.  PCO stands for Product  Change  Order
         and  essentially means a patch.  This command is used to look
         through patches for a particular product if you  aren't  sure
         which patch you want.

     4.  GP -- for Get PCO.  This is used to  type  out  a  particular
         patch once you know which one you want.

     5.  GS -- for Get SPR.  Use this to  retrieve  information  on  a
         particular SPR.

     6.  NP -- for New Product.  Use this  command  if  you  type  the
         wrong  answer to "PRODUCT [H]*" as mentioned above, or use it
         in association with the PP command as described below.  SIRUS
         will prompt you for a product again.


     The three most useful of these commands are PP, GP, and GS.



3.0  PP Command

     Use this command to peruse the patches for a  particular  product
--  e.g.   LINK  or  603  (monitor) or BATCON -- if you want to find a
particular patch you know exists, or  if  you  want  to  know  if  the
support group has heard of and fixed some problem you are experiencing
with a product.  After you type "PP<CRLF>" SIRUS  will  prompt  for  a
component.  Here type the program you're interested in -- LINK, BATCON
or whatever.  A response of LIST will type the  programs  SIRUS  knows
about and then prompt you for a component again.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 11
PP Command


     Once you type in the component, SIRUS prompts with  "[H] PCO #:".
There  are two reasonable responses to this.  The first is ALL.  (Type
NO to the subsequent question about a file.)  This  will  give  you  a
short  summary of all the patches available for this product, one line
per patch.  This includes a PCO number, the SPR for which  this  patch
was  written,  the  edit  number  corresponding  to the patch (for the
TOPS10 monitor this is the MCO number), a keyword describing the  bug,
the  maintainer  who  wrote  the patch, and the date it was made.  The
other response you might type here is simply  <CRLF>.   In  this  case
SIRUS will type out the symptom of the newest PCO, and then prompt you
with "NEXT?".  By continuing to type carriage returns,  you  can  type
all  the symptoms of all the patches for this product, from the newest
to the oldest.  When you have found the patch you want  (remember  the
PCO number), type RETURN to get back to SIRUS command level.

     If you did not find your symptom while perusing, and your product
exists  on both TOPS10 and TOPS20, you should also search the PCOs for
the alternate operating system.  To do this, type NP to SIRUS  command
level,  and  then type in the other product number when SIRUS asks for
it.  Then peruse PCOs for your product as you did before.



4.0  GP Command

     This is used to print out a patch once you know the  PCO  number.
The  PCO  number  is printed while you are perusing PCOs and is of the
form 10-product-nnn or  20-product-nnn.   After  typing  GP  to  SIRUS
command  level,  SIRUS prompts for a PCO number.  The leading "10-" or
"20-" is supplied by SIRUS, so your response should  be  of  the  form
"product-nnn".

     In response, SIRUS types out information about  the  patch.   The
two most useful data are labeled VLD and SAE.  VLD stands for validity
and is the version of the software to which the patch applies.  SAE is
Source  After Edit and is the edit or MCO number of the patch.  To get
the actual text of the patch, respond YES to  SIRUS's  question  "Show
Write-up File?".



5.0  GS Command

     This is used to get the status of an SPR.  SIRUS will prompt  for
an  SPR  number, and then will provide you with info about the SPR you
specified.  This  includes  the  site  that  submitted  the  SPR,  the
specialist  responsible  for  the  SPR, and date received and the date
closed, if the SPR has been answered.  If answered, it will  also  say
whether or not an auxiliary file was written for the SPR and what PCOs
(if any) were included.  The aux file  is  an  introductory  paragraph
which  is written for most SPR answers.  For SPRs which do not require
patches, the aux file constitutes the entire answer.  The aux file can
be typed by responding YES to "SHOW AUXILIARY FILE?".  The PCOs can be
typed out with the GP command.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 12
GS Command


     Finally, if SIRUS begins to give you error messages such as "File
not  found",  EX  from  SIRUS  and  mount a special disk pack with the
monitor command "MOUNT SIRS:".  Then try again.  This gives you access
to more PCOs and aux files than are normally available.

     For more information, see the example  run  of  SIRUS  below,  in
which  user  input  is  shown  underlined,  or  the  article  on SIRUS
published in volume 409 of the Large Buffer.  Finally,  SIRUS  is  for
use  by  DIGITAL personnel only.  DO NOT give out instructions for its
use or the system 1026 phone numbers to customers.

.R SIRUS
 - -----


SIRUS...3(3)
 
[WHEN '[H]' APPEARS YOU MAY TYPE 'HELP' FOR ASSISTANCE]
 
 
PRODUCT [H]* 20
             --
[H] *PP
     --
 
[H] COMPONENT TO PERUSE: D60SPL
                         ------
[PCO LIMIT FOR 'D60SPL' IS 15]
[H] PCO #:<CR>
          ----
[20-D60SPL-015]
 
DATE: 09-JUL-79 BY: BENCE
VLD: 
 
[SYMPTOM]




Jobs sent to the LPT queue from D60SPL are  given  a  random
file name and are billed to OPERATOR.


 
NEXT?<CR> 
     ----
[20-D60SPL-014]
 
DATE: 09-JUL-79 BY: WEISBACH
VLD: 
 
[SYMPTOM]

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 13
GS Command





If the spooler is pausing, typing a  GO  can  result  in  an
illegal instruction.


 
NEXT? ALL
      ---
DO YOU WANT A FILE? NO
                    --
PCO 015 SPR 12355             (6,022) KEY= LNAME      BENCE      09-JUL-79
PCO 014 SPR 12225  OUTOUT     (6,020) KEY= PAUSE      WEISBACH   09-JUL-79
PCO 013 SPR 11660  LODVFU 6013(6,014) KEY= VFU        WEISBACH   09-JUL-79
PCO 012 SPR 13244  D60CRE 103 (6,032) KEY= CARD       L.NEFF     06-JUL-79
PCO 011 SPR        D60CR4 103 (6,015) KEY= CARDS      L.NEFF     03-JUL-79
PCO 010 SPR        REQUEU 103 (6,030) KEY= CTQMFQ     L.NEFF     14-JUN-79
PCO 009 SPR 12588  INTCTC 1   (6,026) KEY= CONTROL C  TEEGARDEN  17-MAY-79
PCO 008 SPR 12881  OUTE.6 103 (6,025) KEY= REQUEUE    NEFF       17-APR-79
PCO 007 SPR 12139         103 (6,019) KEY= ILLEGAL    WEISBACH   27-OCT-78
PCO 006 SPR 12005             (0) KEY= SIMULTANEO BENCE      22-SEP-78
PCO 005 SPR 11672  ENDJOB 103 (6,018) KEY= QUASAR     BENCE      18-SEP-78
PCO 004 SPR 11841  D60STK 103 (6,016) KEY= BAD        WEISBACH   23-AUG-78
PCO 003 SPR 11476  TTYOUT 103 (6,010) KEY= OVERWRITE  WEISBACH   12-MAY-78
PCO 002 SPR 11431  OUTE.6     (6,007) KEY= INTERRUPTS WEISBACH   12-APR-78
PCO 001 SPR 11456  D60SPL     (6,006) KEY= BLANK      WEISBACH   03-APR-78
[H] PCO #: RETURN
           ------
 
 
[H] *GP
     --
 
 
[H] PCO #: 20-D60SPL-8
[20-D60SPL-008 RETRIEVED]
 
PROG:   NEFF
COMPONENT: D60SPL
SER/SPR:20-12881
KEYS: REQUEUE    /  
ROUTNS: OUTE.6 /  
VLD:    103(2304)
SBE     %103 (6,024)
SAE     %103 (6,025)
CRIT:   N
DOC:    N 
F/D:    F
TEST FILE:     :          [        ]
P-IND:  10
 
SHOW WRITE-UP FILE? YES
                    ---

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 14
GS Command


 
 
[WRITE-UP FILE]
008             NEFF
[SYMPTOM]




     If a job is requeued because of a  communications  failure,  with
D60SPL  reporting  that  the  station  has  signed off, then, when the
station signs on again, the print file  will  be  restarted  from  its
beginning, not from the last checkpoint.


[DIAGNOSIS]

     When the  error  is  detected,  routine  OUTE.6  calls  IBACK  to
backspace  the  file  five  pages.   IBACK  zeroes  the  page counter,
J$RNPP(J), and rewinds the  file,  in  the  belief  that  the  forward
spacing  code  will  update  the page count as it skips to the correct
page.  However, D60SPL discovers the error is not recoverable  and  it
requeues  the job immediately.  Since the page count is never updated,
DOREQ requeues the job to start at the beginning of the file.


[CURE]

     Preserve the page at which to resume printing over  the  call  to
IBACK.  if the job is to be requeued immediately, restore J$RNPP(J) so
that the job will be requeued and checkpointed five  pages  back  from
its current position.
[FILCOM]
File 1) DSK:D60SPL.MAC[4,1022]  created: 1724 09-Apr-1979
File 2) DSK:D60SPL.MAC[4,417]   created: 1625 10-Apr-1979

1)1             LPTEDT==6024                    ;EDIT LEVEL
1)              LPTWHO==1                       ;WHO LAST PATCHED
****
2)1             LPTEDT==6025                    ;EDIT LEVEL
2)              LPTWHO==1                       ;WHO LAST PATCHED
**************
1)4     ;*****End of Revision History*****
****
2)4     ;6025   If a job printing on a remote printer is interruped by
2)      ;       a communications failure, requeue to start five pages ba
        ck
2)      ;       instead of at beginning of file.  LLN, SPR # 20-12881,
2)      ;       10-APR-79
2)      ;*****End of Revision History*****
**************
1)179           PUSHJ   P,IBACK                 ;BACKSPACE THE FILE
1)              PUSHJ   P,INTON                 ;[6007]TURN INTERRUPTS B
        ACK ON

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 15
GS Command


1)              PUSHJ   P,D60NRY                ;PERFORM "NOT READY" DIA
        LOG
1)               JRST   OUTE.7                  ;ERROR IS UNRECOVERABLE
1)              TELL    OPR,[ASCIZ /![LPT...  continueing!]
****
2)179   ;**;[6025] ADD SEVERAL LINES AT OUTE.6 + 13L.  LLN, 10-APR-79
2)              MOVE    T1,J$RNPP(J)            ;[6025] CALCULATE THE NE
        W
2)              SUB     T1,N                    ;[6025]  DESTINATION PAG
        E
2)              PUSH    P,T1                    ;[6025]  AND SAVE IT
2)              PUSHJ   P,IBACK                 ;BACKSPACE THE FILE
2)              PUSHJ   P,INTON                 ;[6007]TURN INTERRUPTS B
        ACK ON
2)              PUSHJ   P,D60NRY                ;PERFORM "NOT READY" DIA
        LOG
2)               JRST   [POP    P,J$RNPP(J)     ;[6025] RESTORE PAGE NO.
         FOR REQUEUE
2)                       JRST   OUTE.7]         ;[6025] ERROR IS UNRECOV
        ERABLE
2)              POP     P,(P)                   ;[6025] THROW AWAY DESTI
        NATION
2)                                              ;[6025] PAGE - FORWARD S
        PACING
2)                                              ;[6025] CODE WILL HANDLE
         IT
2)              TELL    OPR,[ASCIZ /![LPT...  continueing!]
**************
[END OF WRITE-UP FILE]
 
 
[H] *EX
     --

EXIT

.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 16
MAPPING DIRECTORIES IN MDDT




                  MAPPING DIRECTORIES IN MDDT
                  ---------------------------



     Release  3  of  TOPS-20  can  take  advantage  of  the   extended
addressing  features  of  the model B processor.  Some of its data has
been reorganized and moved into non-zero sections  of  the  addressing
space.   One of the things moved was directories.  Directories are now
mapped into section 2, starting at the beginning of the section.  Thus
the  old  procedure of reading a user's directory in MDDT is no longer
valid.  This will describe how  to  map  a  directory  correctly,  for
release 2 and for releases 3, 3A, and 4.

     The procedure for release 2 was the following.  You first have to
find  out  the structure number and directory number for the directory
to be mapped.  You can use the TRANSL program  to  get  the  directory
number,  or use the ^EPRINT command to list the directory information.
As an example, suppose you want to find the  directory  and  structure
information  for  the  directory  SNARK:<DBELL>.   You  run TRANSL and
obtain the results:  

@TRANSL SNARK:<CURDS>
SNARK:<CURDS> (IS) SNARK:[4,117]

The "programmer number" obtained is the directory  number,  in  octal.
In  this example, the directory number is 117.  If the directory is in
bad shape, and you can't run TRANSL or use ^EPRINT, you will  have  to
find  out  the directory number by looking at the output from a DLUSER
or ULIST run, or from BUGCHK output.

     To find the structure number, you have to work  harder.   If  the
structure  is  mounted  as PS:, its structure number is always 0.  For
structures mounted other than PS:, you do the following.  You get into
MDDT,  and  look  at the table STRTAB.  This table contains all of the
addresses of the structure data blocks in the system.  The first  word
of  each structure data block is the structure name in SIXBIT.  So you
search the tables looking for the desired structure.  The offset  into
the table STRTAB is then the structure number.  For our example:

@ENABLE
$SDDT
DDT
JSYS 777$X
MDDT
$$6T
STRTAB/   ,8[   /   PS
STRTAB+1/      M^I   /   REL3
STRTAB+2/      M_%   /   SNARK

In  the  example  above,  you  see  that  PS:  is the first structure,
followed by the structures REL3:  and SNARK:.  Since the  offset  into

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 17
MAPPING DIRECTORIES IN MDDT


STRTAB was 2 for SNARK:, the structure number you want is 2.


     Knowing the structure number and the directory  number,  you  can
now  map  the directory and look at it.  When the directory is mapped,
location DIRORA will point to the area in the monitor you can find  it
at.   This  is  currently the address 740000.  To save typing, you can
use the symbol DA, which has the value 740000 (none  of  the  examples
here  uses  this  symbol however).  To map the directory, you call the
routine MAPDIR which is in the module DIRECT.  It takes two arguments.
The  directory  number  goes  in AC1, and the structure number goes in
AC2.  For our example, the output looks like:

DIRORA[   740000
740000/   ?

1!   117
2!   2
CALL MAPDIR$X
$$

740000[   400300,,100

The  skip  return  from  MAPDIR means you have successfully mapped the
directory.  You can now look at the whole directory by  examining  the
proper  locations.   The  number of pages that are mapped by MAPDIR is
30, which is the  length  of  a  directory,  so  the  whole  thing  is
available  to  look at.  By examining or changing location 740000+N in
core, you are examining or changing location N of the directory.  When
you  are  finished,  you can just leave MDDT by jumping to MRETN or by
typing ^C.


     In release 3, however, when you  examine  location  DIRORA  after
calling  MAPDIR,  it doesn't have to contain 740000.  If it does, then
your machine cannot support extended addressing  and  the  monitor  is
running  the  same  as release 2 did.  In this case you can ignore the
rest of this document.  If your machine does have extended addressing,
when  you  examine location DIRORA you will see the number 2,,0.  This
address is now in section 2 of the monitor, and MDDT cannot  read  the
data there directly.  If you look at the location 740000 after calling
MAPDIR, it will still be unreadable, since the directory is no  longer
read in there.  Those pages are now unused.

     To be able to read the  directory  now,  you  have  to  tell  the
monitor  to  map  in  the pages where you can see them with MDDT.  The
first step is to examine the location DRMAP.   This  location  is  the
section pointer for section 2, where the directories are mapped.  This
is a share-type pointer,  which  contains  the  OFN  for  the  desired
directory  in the right half.  This number is one of the arguments for
the MSETMP  routine.   MSETMP  takes  the  following  arguments.   AC1
contains  the  OFN  in  the left half, and the first page number to be
mapped in the right half.  AC2 contains flag bits in  the  left  half,
and  the  address  where  you want to map the pages in the right half.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 18
MAPPING DIRECTORIES IN MDDT


AC3  contains  the  number  of  pages  to  be  mapped.   For   mapping
directories, you can use 740000 as the address, and you want to map 30
pages.  You also want to set flag bits so that the  directory  can  be
changed.  For the example, you do the following:

DRMAP[   224000,,147

1!   147,,0
2!   140000,,740000
3!   30
CALL MSETMP$X
$

After  the  call to MSETMP, the directory is now mapped in 740000, and
you can proceed as you used to in release 2.  When  you  are  finished
with  the  directory,  you  should  call  MSETMP  again  to  unmap the
directory.  This is done by supplying the same  arguments  as  before,
except that ac 1 contains zero.  As an example:  


1!   0
2!   140000,,740000
3!   30
CALL MSETMP$X
$

Now you can simply ^C out of MDDT or jump to MRETN.


     For Release 4 of TOPS-20,   the various flavors of  DDT  have been
trained to  understand extended addresses,   so the mapping contortions
used for 3 and 3A are once  again unnecessary.     On extended machines
one can reference section two directly as below:

DIRORA[   2,,0

2,,0[   400300,,100

When done, you can still just ^C out or jump to MRETN.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 19
RECOVERING FROM DIRECTORY ERRORS


                    RECOVERING FROM DIRECTORY ERRORS




     Sometimes after a monitor crash due to disk problems, some of the
directories  on  the  system  will contain errors.  These errors cause
BUGCHKs such as DIRFDB, NAMBAD, DIRPG0, and DIRPG1.  It  is  sometimes
possible  to  find  the  error  in the directory by getting into MDDT,
mapping the directory, finding what is wrong,  and  fixing  it.   This
procedure  is  described  in  the SWSKIT.  However, this is not always
easy, and may take a lot of time.  It  is  therefore  better  in  many
cases  to  simply  delete  the bad directory and recreate it.  This is
easy to do for most directories.  But special procedures are necessary
for the directories <SYSTEM> and <SUBSYS>.  The rest of this memo will
describe the methods of recovering from bad directories,  handling  in
particular the difficult case of the <SYSTEM> directory.

     You can first try to give the EXPUNGE command  with  the  REBUILD
and  PURGE  subcommands.   If  the  problem with the directory is very
simple, it may fix your problem.  As an example, suppose the directory
PS:<SICK-DIRECTORY> is incorrect.  You would type:


        $EXPUNGE (DIRECTORY) PS:<SICK-DIRECTORY>,
        $$REBUILD (SYMBOL TABLE)
        $$PURGE (NOT COMPLETELY CREATED FILES)
        $$
         PS:<SICK-DIRECTORY> [NO PAGES FREED]
        $



     If this does not help the problem, you will have  to  delete  the
directory  and  then  recreate it.  Before proceeding, you should make
sure that any files you can reference are copied to another directory,
or  else  are  saved  on  tape.  Now first try to delete the directory
normally, as follows:


        $BUILD (USER) PS:<SICK-DIRECTORY>
        [OLD]
        $$KILL
        [CONFIRM]
        $$
        $

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 20
RECOVERING FROM DIRECTORY ERRORS


     If this is successful, then simply recreate the directory  again,
and  restore the user's files.  You should recreate the directory with
the same directory number as it had before, so that DLUSER's data will
still be correct.

     The procedure above will fail if either the directory  is  mapped
by  another  job,  or if it is totally unusable.  If it is mapped, and
the directory is a random user, you can wait until the directory is no
longer  in use, or you can take the system stand-alone so that no user
can reference it.

     If the directory is totally unusable, you will then have  to  try
to  delete  it  the  hard  way.   Before proceeding, you should try to
delete and expunge all files in the directory.  This will minimize the
amount  of  lost  pages  that will result.  Now there are two cases to
consider.  If the directory is  not  a  sub-directory,  you  type  the
following:  


        $DELETE (FILE) PS:<ROOT-DIRECTORY>SICK-DIRECTORY.DIRECTORY,
        $$DIRECTORY (AND "FORGET" FILE SPACE)
        $$
         <ROOT-DIRECTORY>SICK-DIRECTORY.DIRECTORY.1 [OK]
        $



     If the directory is a subdirectory, you modify the above  command
by   replacing  "ROOT-DIRECTORY"  by  the  name  of  the  next  higher
directory.  Thus if the directory was PS:<ANOTHER.BAD-ONE>, you  type:


        $DELETE (FILE) PS:<ANOTHER>BAD-ONE.DIRECTORY,
        $$DIRECTORY (AND "FORGET" FILE SPACE)
        $$
         <ANOTHER>BAD-ONE.DIRECTORY.1 [OK]
        $



     The above procedure tells the monitor to treat the directory file
like  a  normal  file,  and to delete it as such.  This means that any
files in the directory will become "lost".   The  disk  pages  can  be
recovered  later  with  CHECKD.   If  the  above works, you simply can
recreate the directory and restore the files.

     The only reason the above command should fail is if the directory
is  still  mapped.   For  PS:<SUBSYS>,  you  can  bring  up the system
stand-alone so that no programs are run from it, and then  delete  it.
For PS:<SYSTEM>, even taking the system stand-alone will not help, for
it is always mapped by job 0.  But there are two  procedures  you  can
use which do work.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 21
RECOVERING FROM DIRECTORY ERRORS


     The safest method can be used if the user's system has  mountable
structures.   If  you  have built another PS: structure, you can mount
the pack with the bad directory as an alias, and  then  the  directory
will not be mapped and can be deleted.  As an example:


        $SMOUNT (FILE STRUCTURE) SICK:,
        $$STRUCTURE-ID (IS) PS:
        $$
        WAITING FOR STRUCTURE SICK: TO BE PUT ON LINE...
        STRUCTURE SICK: MOUNTED
        $
        $DELETE (FILES) SICK:<ROOT-DIRECTORY>SYSTEM.DIRECTORY,
        $$DIRECTORY (AND "FORGET" FILE SPACE)
        $$
         SICK:<ROOT-DIRECTORY>SYSTEM.DIRECTORY.1 [OK]
        $



     Then you can build the new directory, restore the  files  to  it,
and  then use it again for your normal PS: pack.  Be sure to build the
new directory with the same number.  This is especially important  for
the special system directories.

     If you do not have another disk drive or another PS: disk, or  if
you  don't want to bother SMOUNTing the disk, you can fix the <SYSTEM>
area by using MDDT.  The basic idea is to patch the monitor so that it
no  longer  thinks  that  the  directory  is  in use.  This is done as
follows:


        $^EQUIT

        INTERRUPT AT 17117
        MX>/MDDT
        CHKOFN/   JSP CX,.SAVE   JRST RSKP
        MRETN$G

        $



     Then  you  should  have  no  problems  deleting  the   directory.
Immediately  after  doing  the  delete,  you should reload the system.
When the system restarts, you can read the monitor and the EXEC either
from  the distribution magtape or from another directory where you had
kept copies.  Then recreate the <SYSTEM> area, making sure to give  it
the  same directory number as it had before.  Then you can restore the
files and let the users back  on.   Finally,  you  should  run  CHECKD
sometime to recover the lost pages.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 22
MORE ABOUT DIRECTORY PROBLEMS


                     MORE ABOUT DIRECTORY PROBLEMS
                     =============================

SOME HINTS FOR TRACING DIRECTORY PROBLEMS


NOTE -- Use the methods documented in the Operators
        Guide before resorting to the methods below.


     1.  There is a file on the SWSKIT called  DIRTST.EXE  which  will
         test for inconsistencies in the directory pointers.
         
                @ENABLE
                $RU DIRTST
         
         This will tell you just about everything.

     2.  Another program on SWSKIT is  DIRPNT  which  prints  out  the
         contents  on  the  chained  FDB's,  entire directory, FDB, or
         symbol table.
                
                To run it:
         
                @ENABLE
                $RU DIRPNT
         
         And answer the questions.  This also  may  not  work  if  the
         headers are bad.

     3.  If you get a BUGCHK:

         Go into the monitor with MDDT and set  a  breakpoint  at  the
         BUGCHK address, say, FDBBAD.  Do the functions that cause the
         BUGCHK;   DIR,  say.   Trace  down  the  bug.   The  relevent
         listings  are  PROLOG  and  DIRECT.  These give the directory
         format and useful symbols.

     4.  If the pointers are destroyed or confused you can map in  the
         directory as follows:
         
                @ENA
                $^EQUIT                 ; get into MINI-EXEC
                MX>/                    ; get into MDDT
         
         
                ; Map in  directory,  put dir  number  in 1.  Get  dir
                ; number   from   DLUSER    or   TRANSL.  Format    is
                ; [4,directory#].  Put the structure number in AC2.
         
                ; To find  the  structure  number look  at  the  table
                ; STRTAB.  STRTAB contains a  list of pointers to  the
                ; SDBs of structures that are mounted.  The  structure
                ; numbers are equal to the offset into the STRTAB.  To

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 23
MORE ABOUT DIRECTORY PROBLEMS


                ; find  out  which  structure   has  structure  number
                ; 3 look at STRTAB+3.   Address contents which are the
                ; SIXBIT structure name.
         
                STRTAB/  54321          ; str number 0
                STRTAB+1/  56776        ; str no 1
                STRTAB+2/  12345        ; str no 2
                12345$6T/       FOO     ; str no 2 is FOO:
         
         
                1/ DIRECTORY NUMBER
                2/ STR NUMBER
                CALL MAPDIR$X
         
                ; Now you can  look at the  header pointers etc.,  and
                ; fix things  up  if  you're lucky.  Go  back  to  the
                ; MINI-EXEC.
         
                ^P
                MX>START
                $
         

     5.  If you can't (or don't want to) recover  the  existing  files
         you  can  delete  the directory and restore the files using a
         DUMPER  tape.   This  works  for  <SYSTEM>  and   all   other
         directories.

         In order to delete  a  directory  you  must  remove  it  from
         <ROOT-DIRECTORY> (or next higher-level directory).
                
                You can do  this with  the
                following set of commands:
                
                (first  be  sure   nothing  is   mapped  from   this
                directory)
         
                @ENA
                $DELETE<ROOT-DIRECTORY>DIRECTORYNAME.*.,
                $$DIRECTORY
                $$
         
         Create new directory with the same directory number.  The same number
         is important for the special system directories.
         
                $^ECREATE <DIRECTORYNAME>
                [New]
                $$NUMBER nn
                .
                .
                .
         
         Now DUMPER the files back.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 24
JSB AND PSB MAPPING


        An Easy Way to Examine the PSB and JSB of Another Job
        -----------------------------------------------------




     There is an occasional need to look at the  state  in  detail  of
another  job on the system.  A common reason for doing this is to find
the cause and cure of a "hung job" which cannot  be  logged  out.   To
find  out  what  the  job is doing you usually start by looking at the
JSYS stack in the PSB.   But  you  cannot  examine  such  data  easily
because  the  fork data in the PSB and the job data in the JSB are not
in the monitor's address space until the fork is run.  If you  try  to
look  at  the PSB or JSB using MDDT you will see the data for your own
fork.  To look at the data for another  fork  you  must  do  what  the
monitor does, and that is to map it.

     A procedure for doing the mapping of a PSB or JSB  was  given  in
the release 3 and 3A SWSKITs.  You first find the SPT index of the PSB
or JSB you want to map, then you call  SETMPG  or  MSETMP  to  set  up
pointers  to  the  data,  and  then you can examine it.  But there are
several problems in using that method, which are:

     1.  You have to find an empty  set  of  pages  in  the  monitor's
         address space which can be used for mapping.

     2.  There is not enough room to map all of the PSB and  JSB.   So
         if  you  want to examine many different things you have to do
         the mapping many times.

     3.  The routines SETMPG and MSETMP do  no  validity  checking  of
         their  arguments.   Thus if you feed them bad data the system
         will probably crash.  So if you need to map things many times
         your chances are you will make a mistake once too often.

     4.  The addresses of the data are not correct.  To  look  at  PPC
         for example, you can't just examine location PPC (which would
         be for your own fork).  You have to look in the page you  are
         using  for  mapping.   So every reference has to be offset by
         some constant.

     5.  When you are done looking at the fork, you can't simply leave
         MDDT.   You  have to call SETMPG or MSETMP again to unmap the
         data.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 25
JSB AND PSB MAPPING


     Since that documentation was written I  have  found  a  procedure
which is much easier.  It eliminates almost all of the above problems.
The procedure is this:

     1.  Do a "GET" of the file the monitor was loaded  from,  usually
         SYSTEM:MONITR.EXE.

     2.  Enter user mode DDT in the file you got, and then do  a  JSYS
         777 to get into MDDT.

     3.  Find out the SPT indexes as before, and call  MSETMP  to  map
         the  PSB  or  JSB  to  the USER address space, in the correct
         place!!

     4.  Return from MDDT, and examine PSB and JSB locations directly,
         and see the correct data in the right place.

     5.  When you are done, just ^C and do a RESET.




     The rest of this document will document  step  by  step  how  the
procedure  above is done, by using an example.  Assume that we wish to
examine the state of fork 105, which  belongs  to  job  21.   We  then
begin:

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 26
JSB AND PSB MAPPING


@ENABLE                                 !Get a copy of the monitor
$GET PS:<SYSTEM>MONITR.EXE
$START 140                              !Get into user DDT
DDT

JSYS 777$X                              !Enter MDDT
MDDT



!Following is an example of the procedure to map the JSB of a job:


FKJOB+105[   25,,2035                   !Get the SPT index of the JSB
                                        !of fork 105

T1!      2035,,0                        !Put SPT index in left half
T2!      540000,,JSBPGA                 !* Flags and where to map to
T3!      JSLSTA'1000-JSBPGA'1000        !Number of pages to map

CALL MSETMP$X                           !Do the mapping
$


!Following is an example of the procedure to map the PSB of a fork:


FKPGS+105[   2657,,2332                 !Get the SPT index of the PSB
                                        !of fork 105

T1!      2332,,PSBMAP-PSBPGA            !Put SPT index in left half,
                                        !and offset in right half
T2!      540000,,PSSPSA                 !* Flags and where to map to
T3!      PSBMSZ                         !Number of pages to map

CALL MSETMP$X                           !Do the mapping
$

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 27
JSB AND PSB MAPPING


!Example of returning to user mode and looking at data from both
!the PSB and the JSB of the fork:


MRETN$G                                 !Return to user mode
$

USRNAM[   3                             !Examine job's user name
USRNAM+1[   422050,,546230   $T;DBELL   

CTRLTT[   777777,,777777                !Controlling terminal

FILBYT+MLJFN[   4400,,334010            !Start of data block for JFN 1

PPC/   T1,,DISXE#+2                     !Current PC of the fork

PAC+17/   -215,,UPDL+62                 !Current stack pointer

UPDL/   CHKHO5#                         !First few stack locations
UPDL+1/   CAM CHKAE0#+12   
UPDL+2/   CHKHO5#   
UPDL+3/   CAM CHKAE0#+12   
UPDL+4/   T1,,.COMND+1   
UPDL+5/   -273,,UPDL+4   


!Example of terminating the mapping we have done:


^C
$RESET                                  !To finish, just quit and reset
$



     The procedure as given above maps the JSB and PSB  write-enabled.
So  if  you  find something you want to change, you can simply deposit
the new value  into  the  location.   If  you  want  the  data  to  be
write-protected,  then  change  the  540000 to 500000 in the two steps
marked with an asterisk.



     Warning:  The procedure of mapping things into your user  address
space  has its limitations.  Mapping the JSB and PSB works because the
user core used for mapping was previously empty.  In general, you  can
only  map  things  into  your  user core if your core pages are either
nonexistant or are private.  If you call  MSETMP  or  SETMPG  and  map
something  over  a  shared page, the old file page is unmapped without
the share counts being updated, which prevents your job  from  logging
out  later.  To get around this problem you can BLT your core image to
force all of the pages to be private.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 28
BREAKPOINTING MULTI-USER CODE


        HOW TO USE BREAKPOINTS IN CODE THAT MANY USERS EXECUTE
        ------------------------------------------------------




     When inserting a breakpoint into the running monitor, you have to
be  careful  that  no other users will execute the code containing the
breakpoint.  If some other user hits the breakpoint, they will blow up
with an illegal instruction since MDDT will not be there to handle the
breakpoint.  This normally limits the places you can set  breakpoints,
since  most  of the monitor can be gotten to by any user.  Even if you
run the system stand-alone, it is possible that the  routine  you  are
debugging  will  be called by job 0.  However, it is still possible to
do such debugging, even on a system which is not stand-alone, and this
document will describe how this is done.

     The essential element of this technique is to put in the patch in
such  a  way  that  only  your own fork can ever reach the breakpoint.
First you write a simple routine which will skip if it  is  not  being
run  by your particular fork.  This can be done easily if you remember
that the location FORKX contains the currently  running  fork  number.
An example of such a routine is the following:  

@ENABLE
$SDDT
DDT
JSYS 777$X
MDDT

FORKX[   23                     ; check our fork number

FFF/   0   NOTME:   PUSH P,T1   ; save an AC
NOTME+1/   0   MOVE T1,FORKX    ; get currently running fork number
NOTME+2/   0   CAIE T1,23       ; is it us=23?
NOTME+3/   0   AOS -1(P)        ; no, setup skip return
NOTME+4/   0   POP P,T1         ; restore the saved AC
NOTME+5/   0   POPJ P,          ; and return to caller
NOTME+6/   0   FFF:             ; reset the position of FFF

The  routine above simply saves AC T1, gets the currently running fork
number, compares it with your own fork number which  you  obtained  by
looking at location FORKX, and skips if they differ.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 29
BREAKPOINTING MULTI-USER CODE


     Now assume that you want to set a breakpoint into  the  following
code, which is in the routine BLKSCN in the module DIRECT.  

BLKSC2/   HLRZ C,BLKTAB(B)
BLKSC2+1/   CAME A,C
BLKSC2+2/   AOBJN B,BLKSC2
BLKSC2+3/   JUMPGE B,BLKSCE
BLKSC2+4/   HRRZ B,BLKTAB(B)

Assume  you  want  the  breakpoint  at  location BLKSC2+3.  You do the
following:  


BLKSC2+3/   JUMPGE B,BLKSCE   FFF$<     ; patch this location
FFF/   0   PUSHJ P,NOTME                ; call the NOTME routine
FFF+1/   0   .$B   JFCL$>               ; me if it gets here, set breakpoint
FFF+2/   JUMPGE B,BLKSCE
FFF+3/   JUMPA A,BLKSC2+4
FFF+4/   JUMPA B,BLKSC2+5
BLKSC2+3/   JUMPA NOTME+6

Notice  that  the  breakpoint  has  been  set  in the JFCL instruction
following the call to NOTME.  Only your fork will execute it,  so  you
can  now  debug the section of code while other users are executing it
at the same time.  Remember to remove  the  breakpoint  when  you  are
done.

     To run a particular program while  having  breakpoints  set,  you
must  remember  that  the breakpoint has to be set by the same process
which you expect to hit it.  So for example, typing ^EQUIT, setting  a
breakpoint,  returning  to  the EXEC and running your program will not
work.  You must enter MDDT and set the breakpoints from  your  program
you want to debug.  As an example:  

@ENABLE
$GET PROGRAM    ; get the program to be used
$DDT            ; enter DDT
DDT
JSYS 777$X      ; and enter MDDT from there
MDDT

(PUT IN "NOTME" ROUTINE AND SET BREAKPOINTS HERE)

MRETN$G         ; return to the context of the test program
$
$G              ; start the test program

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 30
USING ADDRESS BREAK TO DEBUG THE MONITOR


                Using Address Break to Debug the Monitor
                ----------------------------------------


Sometimes when examining a set of dumps, you will notice  the  crashes
are  caused  by  some  location  being destroyed.  If you have no idea
where the destruction is done from, finding the problem could be  very
difficult.   One  useful procedure in such cases is to use the address
break feature of the hardware to track down the  problem  (except  for
2020's!).   The  only  problem is that the use of address break is not
obvious.  This is a manual describing how to use address break in  the
TOPS-20 monitor.

     In order to use address break, four things must be done.   First,
the  current routines the monitor uses to set address breaks for users
must be disabled.  Secondly, your own address break must be  set  from
MDDT  or  EDDT.   Thirdly,  instructions  which  you  want  to execute
properly have to be modified so that they will not cause  an  unwanted
address  break.  Finally, breakpoints must be placed in the monitor so
that the state of the monitor can be examined when the  address  break
occurs.  The following is a step by step example of doing this.


1.      Load the monitor for debugging, and enter EDDT.  The procedure
        starting from BOOT is the following:

        BOOT>/L                         ;Load monitor but don't start it
        BOOT>/G140                      ;Start EDDT
        EDDT
        DBUGSW/   0   2                 ;Set debugging mode
        EDDTF/   0   1                  ;Keep EDDT once system starts
        GOTSWM$B                        ;Install useful breakpoint
        SYSGO1$G                        ;Start the monitor


        [PS MOUNTED]
        $1B>>GOTSWM   0$1B              ;Remove breakpoint now



2.      Disable the monitor's normal changing of  the  address  break.
        This is currently done at two places:
 
        KISSAV+4/   DATAO UNPFG1+26   JFCL      ;Disable instruction
        SETBRK+12/   DATAO A   JFCL             ;Here too

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 31
USING ADDRESS BREAK TO DEBUG THE MONITOR


3.      Set your own address break at the desired location.  Refer  to
        the Hardware Reference Manual for details.  The instruction to
        set an address break is:
 
        DATAO APR,ADDR          ;Note:  APR = 0
 
        where ADDR contains the following fields:
 
        Bits            Description
        ----            -----------
          9             Break at given address on instruction fetches
         10             Break at given address on reads
         11             Break at given address on writes
         12             0=exec address space, 1=user address space
        13-35           Address to break on.
 
 
        So now assume you want  to  catch  a  bug  which  is  blasting
        location  CURDS.   You want to break only for writes, and want
        to use exec virtual space.  Therefore you type the following:
 
        FFF/   0   100000000+CURDS      ;Put data in convenient place
        DATAO APR,FFF$X                 ;Set the address break
 
 
 
4.      Now you want to disable address  break  for  all  instructions
        which you expect to change the given location.  Assume in this
        example that  only  location  DIDDLE  should  change  location
        CURDS.  Then you do the following for a model B CPU:
 
        FFF!   IT:                      ;Define location to get old flags
        IT+1!                           ;Old PC
        IT+2!                           ;New flags
        IT+3!   IT+4                    ;New PC
        IT+4!   EXCH IT                 ;Save AC and get old flags
        IT+5!   TLO 1000                ;Set address break inhibit bit
        IT+6!   EXCH IT                 ;Restore flags and AC
        IT+7!   JRST 5,IT               ;Return to caller
        IT+10!   FFF:                   ;Redefine FFF
 
        DIDDLE/   MOVEM A,CURDS   FFF$< ;Insert patch
        FFF/   0   JRST 7,IT$>          ;Call above routine
        FFF+1/   0   MOVEM A,CURDS      ;Typed by DDT when finishing patch
        FFF+2/   0   JUMPA A,DIDDLE+1
        FFF+3/   0   JUMPA B,DIDDLE+2
        DIDDLE/   MOVEM A,CURDS   JUMPA IT+10
 
        The JRST 7,IT instruction is used to save the old PC at IT and
        IT+1,  and take a new PC from IT+2 and IT+3.  There the old PC
        is changed to include the address break inhibit bit.   Then  a
        JRST 5,IT  is  done  which  returns  to  the caller.  The next
        instruction then executes without causing  an  address  break.
        You   have  to  insert  the  JRST 7,IT  instruction  at  every

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 32
USING ADDRESS BREAK TO DEBUG THE MONITOR


        instruction you want to succeed.
 
        For model A CPUs the procedure is similar, but a little easier:
 
        FFF!   IT:                      ;Define location to hold PC
        IT+1!   EXCH IT                 ;Get old PC and save AC
        IT+2!   TLO 1000                ;Set address break inhibit flag
        IT+3!   EXCH IT                 ;Restore PC and AC
        IT+4!   JRSTF @IT               ;Return to caller
        IT+5!   FFF:                    ;Redefine FFF
 
        DIDDLE/   MOVEM A,CURDS   FFF$< ;Insert patch
        FFF/   0   JSR IT$>             ;Call above routine
        FFF+1/   0   MOVEM A,CURDS      ;Typed by DDT when finishing patch
        FFF+2/   0   JUMPA A,DIDDLE+1
        FFF+3/   0   JUMPA B,DIDDLE+2
        DIDDLE/   MOVEM A,CURDS   JUMPA IT+5
 
 
 
5.      Now put the breakpoints into  the  monitor  so  that  when  an
        address  break  occurs, you will get into EDDT.  There are two
        locations to patch, one for PI level and one for non-PI level.
        You  also  have  to patch a monitor bug in release 3 and 3A so
        that the page fail dispatch code works properly.

        ADRCMP$B                        ;Set breakpoint at non-PI routine
        PFCD23$B                        ;Set breakpoint at PI routine
        PIPTRP+1/   MOVE A,TRAPSW   MOVE A,TRAPS0       ;And fix a bug
        $P                              ;Now let the monitor proceed



6.      When either of the above breakpoints is hit, the flags and  PC
        of  the  instruction which caused the address break will be in
        locations TRAPFL and TRAPPC.    If the address break was  from
        JSYS  level  (breakpoint  was to ADRCMP and location INSKED is
        zero) then an $P will proceed properly.  If the address  break
        was  from  the  scheduler  or  from PI level, doing $P will be
        useless since the monitor will then BUGHLT because it  doesn't
        want to see an address break under these conditions.  However,
        this is ok if all you want  to  do  is  find  the  instruction
        causing the trashing.


      If the location still gets trashed after trying to catch it this
 way,  either  your  procedure is wrong; you are trying this on a 2020
 (which has no address break feature); the location is  being  changed
 by  some  IO  being  done  (RH20s, DTEs, etc); or else the machine is
 having some hardware problems.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 33
RECOVERING FROM SYSTEM DISASTERS


                    RECOVERING FROM SYSTEM DISASTERS

There are some common system disasters which  in  many  cases  can  be
recovered from quickly and with a minimum of effort.  The four we will
discuss in this article are:


     1.  Hung Terminals

     2.  Hung SETSPD

     3.  Trashed Disks

     4.  Hung Jobs




     1.0  HUNG TERMINALS

     Hung terminals are usually the result of  two  problems.   Either
the speed has been set incorrectly for that terminal type or a problem
exists between the KL and the front end.  If the problem is  a  result
of  an improper speed setting, then simply resetting the speed will be
sufficient.  On the other hand, if the problem is  due  to  some  sync
problem between the KL and the 11 then the easiest way to recover from
this is to reload the front end.  This can be done by  depressing  the
halt  switch  on  the operator's console of the 11 and then placing it
back in the enable state.  After about fifteen seconds, the message

                        [DECsystem-20 continued]

to be printed on the CTY.  If this fails to free the terminal, perhaps
the problem is a hung job.  See the discussion under that heading.



     2.0  HUNG SETSPD

     This is a fairly common  problem  brought  on  by  some  hardware
problem.  It is possible to bring the system up without running SETSPD
under JOB 0, logging in, and then trying  to  run  SETSPD  under  some
other operator job.  If SETSPD then hangs, it is possible to CONTROL/C
out of the program, edit 4-CONFIG.CMD to remove the commands suspected
of  hanging  SETSPD, and retrying.  In this way, while waiting for the
problem to be resolved, it is possible to continue timesharing.

     To bring the system up without running SETSPD automatically,  one
need  only  install  the  following patch to the MONITOR using EDDT on
system start up.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 34
HUNG SETSPD


          BOOT>/l
          BOOT>/g141
          EDDT
          EDDTF[   0   -1
          DBUGSW[   0   2
          GOTSWM$B
          SYSGO1$G
          [PS MOUNTED]
          1B>>GOTSWM
          RUNDD3+7/   PUSHJ P,RUNDII   JFCL
          0$1B
          $P
          %%No SETSPD


     The system will then come up as usual except that SYSJOB will not
run.   After successfully deciding the problem with SETSPD, SYSJOB can
be run by typing

      COPY (FROM) <SYSTEM>SYSJOB.RUN (TO) <SYSTEM>SYSJOB.COMMANDS


     This will cause all the commands in the  SYSJOB.RUN  file  to  be
executed by SYSJOB.

     There is a project under way to allow SETSPD to time  out  itself
and continue with the next comand in 4-CONFIG.CMD.  Look for it in the
Large Buffer or the 20 Dispatch.



     3.0  TRASHED DISKS

     This is surely one of the biggest  headaches  facing  specialist.
Trashed  disks come in many forms and recovering from these requires a
good knowledge of the structure of the TOPS-20 file system.

     If the structure cannot be mounted, it is because of one  of  the
following reasons:

     1.  Inconsistency in either of the HOM blocks

         1.  Word HOMNAM (1) of either HOM block not SIXBIT/HOM/

         2.  Word HOMCOD (176) of either HOM block not 707070

         3.  Word HOMHOM (5) of first HOM block not 1,,12

         4.  Word HOMHOM (5) of second HOM block not 12,,1

         5.  Word HOMFSN (173) of either HOM block not 20040,,47524

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 35
TRASHED DISKS


         6.  Word HOMFSN+1 (174) of either HOM block not 51520,,31055

         7.  Word HOMFSN+2 (175) of either HOM block not 20060,,20040

         8.  Right half of word HOMLUN (4) of either home block either
             refers  to  a  unit  greater  than  the left half of word
             HOMLUN or it refers to a UNIT already verified

         9.  Word HOMSNM (3) of either home block does not agree  with
             SIXBIT/STRUCTURE-NAME/

        10.  No disk address for index block in word  HOMRXB  (10)  of
             either HOM blocks


     2.  Inconsistencies in Root-Directory page 0

         1.  Directory number in Directory page  0  of  Root-Directory
             not 1

         2.  Directory block type (DRTYP) of Root-Directory page 0 not
             400300

         3.  Relative Page number (DRRPN) of Root-Directory page 0 not
             0

         4.  Top of symbol table (DRSTP) of Root-Directory page 0  out
             of Directory bounds

         5.  Pointer to first free  block  (DRFFB)  of  Root-Directory
             page 0 not in page 0 of the directory

         6.  Pointer to Directory Name String (DRNAM) not under  start
             of symbol table

         7.  Directory name pointer (DRNAM)  not  0  and  Name  string
             block length (NMLEN) not at least 2 words long

         8.  Directory name pointer (DRNAM) not 0 and  directory  name
             block header (NMTYP) not 400001

         9.  Password block pointer not 0 and  password  string  block
             length (NMLEN) not at least 2 words long

        10.  Password block pointer not 0 and  password  string  block
             header (NMTYP) not 400001

        11.  Account string block pointer not  0  and  Account  string
             block length (NMLEN) not at least 2 words long

        12.  Account string block pointer not  0  and  Account  string
             block header (NMTYP) not 400001

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 36
TRASHED DISKS


     3.  Inconsistencies in Block types or free  space  in  subsequent
         pages of the directory.

              All blocks in the directory (including free space) begin
         with   a  block  header  which  specifies  type  and  length.
         Immediatly following one block should be a header for  a  new
         block.  If this scheme is corrupted, the mount will fail.

         1.  Header of a block not

             1.  (NAMTYP)  400001

             2.  (EXTTYP)  400002

             3.  (ACCTYP)  400003

             4.  (USRTYP)  400004

             5.  (FDBTYP)  400100

             6.  (DIRTYP)  400300

             7.  (FRETYP)  400500

             8.  (FBTTYP)  400600

             9.  (GRPTYP)  400700


         2.  Header of a block is NAMTYP and Block length not at least
             2 words

         3.  Header of a block is EXTTYP and block length not at least
             2 words

         4.  Header of a block is ACCTYP and block length not at least
             3 words

         5.  Header of a block is USRTYP and block length not at least
             3 words

         6.  Header of a block is FDBTYP and

             1.  Block length not at least 30 (.FBLN0) words long

             2.  Pointer to Author String (.FBAUT) not 0 and points to
                 a block outside of the directory or points to a block
                 that does not meet the tests for a user  name  string
                 as described above.

             3.  Pointer to Last Writer  String  (.FBLWR)  not  0  and
                 points  to a block outside of the directory or points
                 to a block that does not meet the tests  for  a  user
                 name string block as described above.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 37
TRASHED DISKS


             4.  Pointer to Account String (.FBACT) is not  less  than
                 or  equal to zero and it points to a block outside of
                 the directory or it points to a block that  does  not
                 meet  the  tests  for  an  account  string  block  as
                 described above.

             5.  Pointer to Name String  (.FBNAM)  is  not  0  and  it
                 points  to  a  block  outside  of the directory or it
                 points to a block that does not meet the tests for  a
                 Name String Block as described above.

             6.  Pointer to Extension String (.FBEXT) is not 0 and  it
                 points  to  a  block  outside  of the directory or it
                 points to a block that does not meet the tests for an
                 Extension String Block as described above.


         7.  Header of a block is DIRTYP and

             1.  Header is not on a page boundary

             2.  Relative page number (DRRPN) not the calculated  page
                 number

             3.  Pointer to first free block (DRFFB) does not point to
                 a location within the current directory page

             4.  Directory number (DRNUM) not 1.


         8.  Header of a block is FRETYP and block is not at least two
             words  or  Pointer to next free block (FRNFB) is not zero
             and points to a location not on the same page as current

         9.  Last block did not end at  DRFTP  (address  specified  on
             first page of directory)


     4.  BAT blocks inconsistent.

         1.  Either block  does  not  contain  SIXBIT/BAT/  in  BATNAM
             (offset 0 in block)

         2.  Either block does not contain 606060  in  BATCOD  (offset
             176 in block)

         3.  Sector number of the BAT  block  (BATBLK)  not  the  true
             sector of block

         4.  The BAT blocks to not compare  exactly  with  each  other
             through word 176 of the blocks

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 38
TRASHED DISKS


     5.  Checksum of the Root-directory Index  Block  does  not  agree
         with the checksum calculated.

              Checksums are calculated as follows:

         CHKSUM = 0 ;
         For I = 0 to 777
             If XB(I) = 0 then 
                 CHKSUM = CHKSUM + I
             Else 
                 CHKSUM = CHKSUM + XB(I) ;

         where XB is the first word of the index block.



     As you can see, there are many things that could be wrong with  a
structure that inhibits it from being mounted.  The consistency of the
structure can be checked quite easily using the new FILDDT commands of
STRUCTURE  and  DISK  (see 'NEW DISK FEATURES FOR FILDDT' also in this
SWSKIT).

For  structures  which  are  badly  trashed,  the  only  sane  way  of
recovering  is to rebuild the structure using a catastrophe tape.  For
simple inconsistencies such as a bad BAT block, CHECKD  does  the  job
well.   For  more  involved  trashes which can not be recovered from a
back up tape  (because  of  a  forgetful  system  manager)  the  above
information can be of great help.



     4.0  HUNG JOBS

     There are a number of circumstances which arise which cause a job
to  become  hung,  usually  waiting for some resource to free up, some
share count to become zero etc.  Some times, these  tests  will  never
become  satisfied,  the  Job  has  its PSI system turned off, and as a
result the job becomes Hung.  Freeing it up can be very  tricky.   The
first thing to try is to log the job out from some other terminal.  If
this doesn't succeed in freeing the job up, then the next  best  thing
is to detatch the job from the terminal and allow it to sit there.  It
may be using negligable amounts of CPU  time  and  causes  no  adverse
affects  to the system.  To zap the job may crash the system which, in
most cases, is not the disirable approach.

The next time the system is reloaded, be sure to get  a  dump  of  the
system  with  the  hung  job  and  submit it as an SPR (see the SWSKIT
article about getting informative Dumps).

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 39
LOOKING AT HUNG TAPES


                         LOOKING AT HUNG TAPES




     A number of problems of the general  classification  "tape  hang"
have  been  reported, and will probably always exist as long as we use
magtapes.  Although there  are  apparently  several  variants  of  the
problem,  there  are  some  things  which  can  be  done by a suitably
cautious specialist when presented with a  hung  tape  drive.   Listed
below  are  some  techniques  which  can  be  used  in  an  attempt to
investigate and perhaps alleviate the problem.  These  things  should,
in general, be harmless to the system, barring mis-typing in MDDT.  As
a result, perhaps they will not clear the problem.

     For release 4, there are several tables that are used in relation
to  tape  drives.  Some of these tables are indexed by MT unit number,
some by MTA unit number.  In general, it can be said that if  a  table
name  begins  with  the  characters  MT,  it will be indexed by MTA or
physical unit number, and if the table name begins with TL or  TP,  it
will  be  indexed  by MT or logical unit number.  The TL and TP tables
will usually have something to do with the tape labeling system.  This
article concerns itself mainly with the more important tables relating
to MTAs (physical tape units).

     When playing with the tape  subsystem,  certain  care  should  be
taken.  For instance, it always helps if no one else is actively using
the tape  drives  while  you  attempt  something  like  reloading  the
microcode for a DX20.

1.  Finding the Tape Drive

     There are several tables parallel to each other which concern the
ownership  of a tape drive.  Those of interest are DEVNAM, DEVCHR, and
DEVUNT.  At DEVNAM+n is the device name in SIXBIT.  At DEVUNT+n  is  a
word  with the left half set to the assigner's job number, -1 if free,
or -2 if being controlled by the allocator.  The right  half  contains
the  unit number.  Note that in release 4, with tape allocation turned
on, MTAs will always indicate that job 0 has the  drive  assigned  and
that the offset to the MT unit number will contain the job number of a
user.  At DEVCHR+n is the device characteristics  word.   Knowing  the
devicename  or  the  owning  job,  one  can  use DDT to find the table
offset.  See example below.

2.  Grabbing the Drive

     Knowing the offsets into DEVUNT, the  device  assignment  can  be
freed  by  putting  -1  into  the  left half of the appropriate DEVUNT
entry.  The drive can then be assigned by the normal ASSIGN command to
the  EXEC.   In dealing with the allocator for Release 4, your own job
number can be placed here if  necessary.   The  drive,  however,  will
still  be  in no state to use.  Note that the appropriate DEVUNT entry
would be the one referring to the MT not the MTA.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 40
LOOKING AT HUNG TAPES


3.  Clearing External Errors

     Make sure that there is a tape of  some  sort  mounted,  and  the
drive  is  placed on-line.  Having a write-enable ring in the tape may
help in being sure the unit is functional if  the  hung  condition  is
cleared.

4.  Checking the UDB

     Next, the Unit Data Block status should be reset.  This word  can
be  found  using  the MTCUTB table.  This table is indexed by MTA unit
number, the left half is the address of the channel data block  (CDB),
and  the  right half contains the address of the UDB.  The status word
of the UDB should then be reset to the base  state.   The  right  half
should be left alone--it basically contains drive type.  The left half
should have only bit 16  set,  which  indicates  a  tape  type  device
(US.TAP).  The old contents should be remembered for purposes of later
analysis.

5.  Checking the Status

     Now, table MTASTS is examined, indexed by MTA unit number  again.
Remember the old contents.  Then clear the word to zero.

6.  Example

    @enaBLE (CAPABILITIES) 
    $sddt
    DDT
    mddt%$x
    MDDT
    
    dvxstn=21   !THIS WILL PROVIDE A HANDY INDEX
                !INTO THE MTA OFFSETS IN THE
                !DEVxxx TABLES.
    
    
                !DEVNAM IS A SIXBIT DEVICE NAME
    
    devnam+21/   HLRZM P2,FKBSPW+217(T1)   $6t;MTA0     
    DEVNAM+22/   MTA1     
    DEVNAM+23/   MTA2     
    DEVNAM+24/   MTA3     
    DEVNAM+25/   MTA4     
    DEVNAM+26/   MTA5
        ...
        ...
        ...
    
    DEVNAM+40/   MTA17     
    
    mtan=20     !ROOM FOR 20 (OCTAL) TAPE DRIVES HAS BEEN ALLOCATED
    
    mtindx[   777765,,5   !BUT ONLY 5 ACTUAL TAPE DRIVES ARE ON THIS SYSTEM

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 41
LOOKING AT HUNG TAPES


    
                !THE MTs WILL APPEAR AFTER MTAs IN THE DEVxxx
                !TABLES SO DVXSTN+MTAN WILL PROVIDE THE OFFSET
                !TO THE MT ENTRIES
    
    devnam+41/   HLRZM P1,@0   $6t;MT0      
    DEVNAM+42/   MT1      
    DEVNAM+43/   MT2      
    DEVNAM+44/   MT3      
    DEVNAM+45/   MT4      
    DEVNAM+46/   MT5      
        ...
        ...
        ...
    
    DEVNAM+60/   MT17
    
                !DEVUNT IS PARALLEL TO DEVNAM AND PROVIDES
                !THE OFFSETS INTO THE MTxxxx TABLES FOR MTAs
                !AND OFFSETS INTO THE TLxxxx/TPxxxx TABLES
                !FOR MTs
    
    devunt+21[   0   !MTA UNIT ZERO (MTA0: FROM DEVNAM ABOVE) ASSIGNED TO JOB 0
    DEVUNT+22[   1   !JOB 0,,MTA1:
    DEVUNT+23[   2   !JOB 0,,MTA2:
    DEVUNT+24[   3   !JOB 0,,MTA3:
    DEVUNT+25[   4   !JOB 0,,MTA4:
    DEVUNT+26[   5   !JOB 0,,MTA5:
    DEVUNT+27[   777777,,6   !UNASSIGNED,,MTA6:
        ...
        ...
        ...
    
    DEVUNT+40[   777777,,17   !UNASSIGNED,,MTA17:
    
                !DV%PSD=400000 INDICATES A PSEUDO DEVICE
                !THE FOLLOWING ENTRIES FOR MTs WILL INDICATE
                !THE AVAILABILITY OF LOGICAL TAPE UNITS
    
    devunt+41[   32,,400000   !PSEUDO DEVICE MT0: IS ASSIGNED TO
                              !JOB 32 OCTAL (JOB 26 IN DECIMAL)
    DEVUNT+42[   777776,,400001   !CONTROLLED BY ALLOCATOR,,MT1:
    DEVUNT+43[   777776,,400002   !     "     "       "   ,,MT2:
    DEVUNT+44[   777776,,400003   !     "     "       "   ,,MT3:
        ...
        ...
        ...
    
    DEVUNT+60[   777776,,400017   !     "     "       "   ,,MT17:
    
                !TLABR0 (INDEXED BY MT NUMBER) WILL INDICATE
                !WHICH PHYSICAL TAPE UNIT WILL BE USED WHEN
                !REFERENCING AN MT. THIS IS INDICATED  BY THE
                !PHYSICAL MTA NUMBER IN BITS 2-8.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 42
LOOKING AT HUNG TAPES


    
    tlabr0[   405000,,0   !BIT 0 INDICATES A VALID VOLUME IS MOUNTED ON MTA5
    
    mtcutb+5[   730437,,730625   !CDB,,UDB FOR MTA5 BEING USED BY JOB 26
                                 !WHO KNOWS IT AS MT0 (SEE ABOVE)
    
    
    730625[   102,,157  !FIRST WORD OF UDB FOR MTA5
                        !US.WLK=1B11 >> WRITE LOCKED
                        !US.TAP=1B16 >> TAPE TYPE DEVICE
                        !.UTT70=17B35 >> TU70
    
    mtasts+5[   0   !THIS EXAMPLE INDICATES A TAPE DRIVE THAT PROBABLY
                    !HASN'T BEEN REFERENCED BY THE USER YET
    
    
    mretn$g             !TO RETURN TO SDDT FROM MDDT
    <>
    
    ^Z                  !TO RETURN TO THE EXEC FROM SDDT
    $
    

     If clearing MTASTS and UDBSTS for the drive doesn't seem to clear
the  problem, you will probably have to do more digging around to find
some other, more obscure, inconsistency in the  MTA/MT  tables.   This
can  be  accomplished  by  referring  to  the  monitor  tables (which,
hopefully, have been included with the SWSKIT) under MTA-STORAGE-AREA.
As always, extreme caution should be exercised while fooling around in
MDDT as you can accidentally trash some random location in the monitor
just by hitting a carriage return at the wrong time.

     One last note should be made about the monitor tables here.   The
description  of  the  DEVUNT  table would lead one to believe that the
right half will contain a -2 if the device is  under  control  of  the
allocator.   If  the  device is under control of the allocator, the -2
will appear in the left half.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 43
A LOOK AT SOME OF THE DISK STUFF



                   A LOOK AT SOME OF THE DISK STUFF
                   ================================



     This article is a front for the PHYPAR module, which is where the
information may be reliably obtained, and should serve as the ultimate
reference for these problems.

        Much of the system debugging you  will have to deal with  will
involve the DEC-20  hardware.  There always  seems to be  a large  gap
between what the  diagnostics can  tolerate and what  the monitor  can
tolerate in the way of malfunctioning hardware.  The monitor will  not
always point you to  the real disk or  magtape problem, say, but  will
crash after  something has  gone wrong  a few  minutes ago  somewhere.
Most of the hardware problems that we have had to deal with that  were
really difficult to track  down and point the  Field Service rep.   to
were problems with disk hardware.  The following is information  which
you can use to  help Field Service trace  down problems which are  not
reported in  the diagnostics.   In most  cases the  Field Service  rep
knows what all the status  bits etc.  mean but  have not been able  to
find them in the monitor crashes or running monitor.

CHNTAB:
        CHNTAB is  an  ordered  list  of  Channel  Data  Block
        addresses starting with channel 0.  RH20-0 data  block
        address is in the first word etc.

CDB:
        CDB is the Channel Data  Block.  There is one CDB  per
        channnel.   The   CDB   contains   channel   dependant
        instructions and data, pointers to the unit data block
        (UDB) in the case of  RPO4, RP05, and RP06's.  In  the
        case of TU45's the pointer  is to the Kontroller  Data
        Block (TM02's) which point in  turn to the UDBs.   The
        CDB also  contains  information  about  the  currently
        active unit.   When  the channel  interrupts,  control
        passes (via  a JSP)  to CDBINT.   The CDB  address  is
        stored in AC1, P1 and the principal analysis  routine,
        PHYINT, is called.
        
NOTE:   The CDBs are referenced in modules PHYSIO, PHYH2 (RH20
        code), PHYM2  (TMO2  code)  and PHYP4  (RP04,  05,  06
        code).  The  Channel  Data  Block is  defined  in  the
        module PHYPAR.  The address that you get in CHNTAB  is
        really a pointer  to word0 which  contains the  status
        bits for this controller (CDBSTS).  Look in PHYPAR for
        the table  definition.  Some  words of  interest  are:
        CDBaddress  +   CDBSTS:   status   and   configuration
        information CDBaddress + CDBUDB:  8 word table of  UDB
        (or KDB) addresses.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 44
A LOOK AT SOME OF THE DISK STUFF


        
        The status bits which are  also defined in PHYPAR  are
        listed here for your convenience:

        CS.OFL==1B0             ; offline
        CS.AC1==1B1             ; primary command active
        CS.AC2==1B2             ; secondary command active
        CS.ACT==CS.AC1!CS.AC2   ; any active
        CS.MAI==1B3             ; channel is in maintenance mode
        CS.MRQ==1B4             ; maintenance mode requested for unit
        CS.ERC==1B5             ; error recovery in progress
        CS.STK==1B6             ; channel supports command stacking
        CS.ACL==1B7             ; alternate command list is current

        BITs 30-32              ; PIA field
        BITs 33-35              ; channel type field


KDB:
        Kontroller Data Block  (TM02 only)  defined in  PHYPAR
        also.  Referenced in PHYM2, PHYPAR, PHYSIO.  Words  of
        interest are:

        KDBADDR+KDBSTS:         ; flags unit type
        KDBADDR+KDBUDB:         ; UDB table first word (1 word/UDB)


UDB:
        Unit Data Block.  There is one UDB per unit associated
        with a CDB or KDB.  The UDB contains information about
        the current activity on the unit in question.  The UDB
        is defined in PHYPAR as well.  Some words of  interest
        are noted  below.   Look  in the  listings  for  other
        information.

        UDBADDR + UDBSTS:       ; status and configuration info (see below)
        UDBADDR + UDBERR:       ; error recovery status word
        UDBADDR + UDBERP:       ; error reporting work area if non 0
        UDBADDR + UDBRED:       ; reads - sectors if disk, frames if tape
        UDBADDR + UDBWRT:       ; writes - sectors if disk, frames if MTA
        UDBADDR + UDBSRE:       ; soft read errors
        UDBADDR + UDBSWE:       ; soft write errors
        UDBADDR + UDBHRE:       ; hard read errors
        UDBADDR + UDBHWE:       ; hard write errors
        UDBADDR + UDBPS1:       ; current cylinder if disk, cur file if MTA
        UDBADDR + UDBPS2:       ; current sector within cyl if disk, record
                                ;  in file if tape
        UDBADDR + UDBSPE:       ; soft positioning error
        UDBADDR + UDBHPE:       ; hard positioning error        

                                ; NOTE - there are several other UDB words
                                ; including a device dependent portion

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 45
A LOOK AT SOME OF THE DISK STUFF


        STATUS BITS IN UDBSTS OR FIRST WORD OF UDB:

        US.OFS==1B0     ; off line or unsafe
        US.CHB==1B1     ; check HOME blocks before any normal I/O
        US.POS==1B2     ; positioning in progress
        US.ACT==1B3     ; active
        US.BAT==1B4     ; on if bad BAT blocks on this unit
        US.BLK==1B5     ; lock bit for this units BAT blocks
        US.PGM==1B6     ; dual port switch in (A or B)
        US.MAI==1B7     ; unit is in maintenance mode
        US.MRQ==1B8     ; maintenance mode requested on this unit
        US.BOT==1B9     ; unit is at BOT
        US.REW==1B10    ; unit is rewinding
        US.WLK==1B11    ; unit is write locked
        US.MAL==1B12    ; maintenance mode allowed on this unit
        US.OIR==1B13    ; operator intervention required, set at
                        ;  interrupt level, checked at periodically.
        US.OMS==1B14    ; once a minute message to operator,  used in
                        ;  conjunction with US.OIR.
        US.PRQ==1B15    ; positioning required on this unit
        US.TAP==1B16    ; device type tape
        US.PSI==1B17    ; tape - online/offline/rewind done transition

        BITS 32-35 CONTAIN UNIT TYPE CODE NAME IS  USTYP

        .UTRP4 = 1      ; RP04
        .UTRS4 = 2      ; RS04 (drum)
        .UTT16 = 3      ; TU16 (TU45)
        .UTTM2 = 4      ; TM02 as a unit
        .UTRP5 = 5      ; RP05
        .UTRP6 = 6      ; RP06
        .UTRP7 = 7      ; RP07
        .UTRP8 = 10     ; RP08
        .UTRM3 = 11     ; RM03
        .UTTM3 = 12     ; TM03 AS A UNIT
        .UTT77 = 13     ; TU77
        .UTTM7 = 14     ; TM78
        .UTT78 = 15     ; TU78
        .UTDX2 = 16     ; DX20-A
        .UTT70 = 17     ; TU70
        .UTT71 = 20     ; TU71
        .UTT72 = 21     ; TU72
        .UTT73 = 22     ; TU7x

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 46
A LOOK AT SOME OF THE DISK STUFF



THE PLACES WHERE THINGS ARE ON THE DISK ARE AS FOLLOWS:

        BLOCK 0:        ; 11 bootstrap
        BLOCK 1:        ; primary HOME block
        BLOCK 2:        ; primary BAT block
        BLOCKS 3-11:    ; reserved
        BLOCK 12        ; secondary HOME block
        BLOCK 13        ; secondary BAT block

The places where the  disk pages for  the above are  stored is in  the
table HOME.  HOME  is defined in  STG. The BAT  blocks are defined  in
PROLOG and the HOME blocks are defined in DSKALC.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 47
NEW DISK FEATURES FOR FILDDT



                         NEW DISK FEATURES FOR FILDDT

    The FILDDT to be shipped with release  4 of TOPS-20 will have two  new
    commands in relation to disk file structure maintenance.

    They are:

        STRUCTURE (FOR PHYSICAL I/O IS) disk-structure
                Examines the specified disk structure.

        DRIVE (FOR PHYSICAL I/O IS ON CHANNEL) c (UNIT) u
                Examines the specified disk unit.

    These are privileged functions and one must have privileges enabled to
    use these.

    These two commands are nearly  identical.  Their difference is in  the
    way the structure  is identified.   To use the  STRUCTURE command  the
    structure must  be  mounted.   The STRUCTURE  command  is  useful  for
    examining a multi-pack  structure.  The  DRIVE command  is useful  for
    examining the  file system  of a  structure which  cannot be  mounted.
    Channel and unit  numbers can be  found from the  programs UNITS,  DS,
    SYSDPY, or OPR.
        
    Addressing is in the same format as in other forms of DDT.
        
    It is easier  to understand exactly  what the disk  will look like  in
    FILDDT if you keep in mind that all sectors will be packed in the  DDT
    address space, without regard for sector size, starting at DDT address
    0.  For instance, on an RP06 there are four sectors per memory page or
    200 (octal) words per sector.  Therefore, sector zero of the structure
    will begin at FILDDT address 0 and end at memory address 177  (octal).
    Sector 1 will begin at address 200 and end at 377.  For release 4, all
    DEC supported  disks  contain  200  (octal) words  per  sector,  so  a
    consistent mapping  exists between  sector  number and  FILDDT  memory
    location.  Soon, TOPS-20 will support  RP20's.  For RP20's, there  are
    1000 (octal)  words per  sector (one  page per  sector).  Index  block
    addresses and most monitor disk addresses are in sectors.  That is why
    it is important to be able  to translate between sector addresses  and
    FILDDT memory addresses.
        
    The FILDDT option of  ENABLE PATCHING is also  available for use  with
    the DRIVE and  STRUCTURE command.  With  this option on,  the user  is
    able  to  modify  specific  words  on  the  structure.   Another  very
    convenient FILDDT command  one may  use in conjunction  with the  disk
    commands is LOAD (symbols from) input file spec.  One may specify  any
    file here but a useful one is SYSTEM:MONITR.  The symbol table to  the
    MONITOR has  home block  sector addresses,  FDB offsets  etc.  When  a
    file's  symbols  are  loaded,  one may  also define  his own  symbols.
    This is useful to remember addresses of data structures on the  units.
    For example, after finding the index block to a file, one could define
    a symbol, FILIDX at that address for easy referencing later on.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 48
NEW DISK FEATURES FOR FILDDT


    When examining  a multi-pack  structure using  the STRUCTURE  command,
    addressing the first unit is exactly as if there were only one unit in
    the structure.  FILDDT addresses of  sectors on the other units  begin
    immediately  after  the  last  address  for  the  first  unit  of  the
    structure.  For example, consider  that we would  like to examine  the
    BAT blocks for the second unit of a two pack STR: on RP06 drives.
        
    An RP06 contains 304000. sectors per unit and 128.  words per sector.
    The first FILDDT address for the second unit of a RP06 two  pack  STR:
    is  304000.*128.=38912000. or 224340000 (octal)
        
    FILDDT>STRUCTURE (FOR PHYSICAL I/O IS) PS:
    [Looking at file structure PS:]

                        ; starting address of second unit in structure
                        ; plus sector address of BAT blocks (2)
                        ; times number of words per sector gives
                        ; FILDDT address of start of BAT blocks for
                        ; that unit

    224340000+2*200=224,,340400

    224,,340400[   424164,,0   $6T;   BAT
        
    For another example, let's say we would like to find the start of  the
    ROOT-DIRECTORY symbol table.

    @ENABLE (CAPABILITIES) 
    $FILDDT
    FILDDT>LOAD (SYMBOLS FROM) SYSTEM:MONITR
    [22722 symbols loaded from file]
    FILDDT>STRUCTURE (FOR PHYSICAL I/O IS) PS:
    [Looking at file structure PS:]
        
    NWSEC=200                   ; number of words per sector
    HM1BLK=1                    ; sector number of HOM block
    HOMRXB=10                   ; offset in HOM block for index
                                ; block to root-directory
    
                                ; sector number of HOM block
                                ; times words per sector equals
                                ; FILDDT address of start of HOM block
    HM1BLK*NWSEC[   505755,,0   $6T;HOM  
    HM1BLK*NWSEC+HOMRXB[   10,,5740 ; plus offset to address of index block
                                ; sector number of index block times
                                ; number of words per sector gives
    5740*NWSEC[   10,,5744      ; FILDDT adr of root-dir index block
                                ; NOTE:  Bit 14 (DSKAB) specifies this
                                ; address as a disk sector address.
                                ; sector addresses are bits 15-35
    RTDIDX:                     ; define symbol for index block
                                ; sector number of first page of
                                ; root-directory times number of words
                                ; per sector gives the

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 49
NEW DISK FEATURES FOR FILDDT


    5744*NWSEC[   400300,,100   ; FILDDT adr of first page of ROOT-DIR
    RTDIR0:                     ; define start of page 0 of ROOT-DIR
    RTDIR0+3[   30610           ; plus 3 for start of symbol table
                                ; NOTE: adr is a 'directory address'
                                ;       offset 610 of directory page 30
    RTDIDX+30[   10,,6250       ; get sector adr of page 30 of ROOT-DIR
                                ; sector adr of page 30 times words per
                                ; sector gives FILDDT address of page
                                ; 30 of ROOT-DIR.
    6250*NWSEC+610[   400400,,1 ; Add offset for symbol table start
    RTDSYM:
    ^E
    FILDDT>EXIT

        
    Here are some magic numbers for all DEC supported drives.
        
        DRIVE TYPE      SECTORS/UNIT    STARTING ADR    STARTING ADR
                                        OF 2nd UNIT     OF 3rd UNIT
                        (in decimal)     (in octal)      (in octal)
        __________      ____________    ____________    ____________
        
        RP04-RP05         152000.       112,,160000     224,,340000
        RP06              304000.       224,,340000     450,,700000
        RP07              502200.       365,,156000     752,,334000
        RM03              121360.        73,,204000     166,,410000
        RP20              201420.       611,,314000    1422,,630000


    NOTE: RP20 will not be supported in  release 4.  It is important  to
          remember that there are  1000 (octal) words  per sector for  a
          RP20.  As a result, to look at a sector of an RP20, one  would
          multiply the sector number by 1000 (octal) to find the  FILDDT
          starting address for that sector.   For all other drive  types
          there are 200 (octal) words per sector.


    The above information is calculated  from the parameters available  in
    STG.MAC.

    REF: DDT41.MEM

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 50
TOPS-20 SCHEDULER TEST ROUTINES


                    TOPS-20 SCHEDULER TEST ROUTINES
                    -------------------------------

     The following is a tabulation of (hopefully) all of  the  scheduler
tests  used by the TOPS-20 monitor, time-frame approximately Release 3A.
This includes ARPA and DECNET tests.  This is the data one finds in  the
monitor table FKSTAT indexed by fork number for forks which have blocked
and left the GOLST (i.e.  LH(FKPT) contains WTLST).  The format  of  the
FKSTAT  table  words  is TEST DATA,,TEST ROUTINE ADDRESS.  The scheduler
test routines are called periodically to determine if a process  can  be
unblocked.   This is indicated by a skip return from the scheduler test.
A nonskip return is taken if the process cannot yet be unblocked.

     When examining the monitor because of  a  hung  job  or  fork,  the
FKSTAT  table  can  often  reveal  the reason the fork is hung, and this
sometimes even allows corrective action to be taken.

     The table below gives routine name, what you should expect  to  see
in  the  FKSTAT  table,  and  the  module in which the scheduler test is
defined, followed finally by a short description of what the  particular
condition is which is being tested.



                            SCHEDULER TESTS


 TEST            CONTENTS OF T1 AT TIME OF SCHEDULER CALL       DEFINED
 ----            ----------------------------------------       -------

BALTST          [CONNECTION #,,BALTST]                          [NETWRK]
                Wait for network bit allocation.

BATTST          [UNIT #,,BATTST]                                [DSKALC]
                Wait for US.BLK, the lock bit for the BAT blocks
                on the unit, in the UDB to be zero.

BLOCKM          [TIME,,BLOCKM]                                  [SCHED]
                Wait for TIME in BLOCKM format which is the low
                order 17 bits of the desired future time to be
                compared against a suitably masked TODCLK.

BLOCKT          [TIME,,BLOCKT]                                  [SCHED]
                Wait for TIME in BLOCKT format which is a
                value that is shifted left 10 bits and compared
                against a suitably masked TODCLK, providing a
                longer delay than BLOCKM, but less precision.

BLOCKW          [TIME,,BLOCKW]                                  [SCHED]
                Wait for TIME in BLOCKW format (same as BLOCKM).

CDRBLK          [UNIT NUMBER,,CDRBLK]                           [CDRSRV]
                Wait for card-reader offline, or not waiting for
                a card.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 51
TOPS-20 SCHEDULER TEST ROUTINES


CHKLOK          [ADDRESS,,CHKLOK]                               [NSPSRV]
                Wait for NSP block lock at address to free.

COFTST          [TIME,,COFTST]                                  [MEXEC]
                Wait for job in FKJOBN to be attached or time
                in BLOCKT form to elapse.

DBWAIT          [DTE #,,DBWAIT]                                 [DTESRV]
                Wait for the TO-10 doorbell from the given DTE.

DGLTST          [0,,DGLTST]                                     [DIAG]
                Wait for DIAGLK lock to be free.

DGUIDL          [UDB ADDRESS,,DGUIDL]                           [DIAG]
                Wait for the unit to show as idle in the UDB.

DGUTST          [UDB ADDRESS,,DGUTST]                           [DIAG]
                Wait for the maintenance bit to set in the UDB.

DISET           [ADDRESS,,DISET]                                [SCHED]
                Wait for contents of ADDRESS to be zero.

DISGET          [ADDRESS ,,DISGET]                              [SCHED]
                Wait for contents of ADDRESS to be positive.

DISGT           [ADDRESS,,DISGT]                                [SCHED]
                Wait for contents of ADDRESS to be greater than
                zero.

DISLT           [ADDRESS,,DISLT]                                [SCHED]
                Wait for contents of address to be less than
                zero.

DISNT           [ADDRESS,,DISNT]                                [SCHED]
                Wait for contents of ADDRESS to be non-zero.

DMPTST          [COUNT,,DMPTST]                                 [IO]
                Wait for COUNT to be less than DMPCNT to indicate
                dump mode buffers freed.

DSKRT           [PAGE #,,DSKRT]                                 [PAGEM]
                Wait for CSTAGE for PAGE # to not be PSRIP,
                meaning disk read completed.

DWRTST          [PAGE #,,DWRTST]                                [PAGEM]
                Wait for DRWBIT to clear in CST3(PAGE #),
                meaning write completed.

ENQTST          [FORK #,,ENQTST]                                [ENQ]
                Wait for the lock on ENFKTB+FORK #.

FEBWT           [ADDRESS OF FE UDB,,FEBWT]                      [FESRV]
                Wait for EOF or input bytes available from FE.
                Wake also on invalid assignment.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 52
TOPS-20 SCHEDULER TEST ROUTINES



FEDOBE          [ADDRESS OF FE UDB,,FEDOBE]                     [FESRV]
                Wait for output buffer empty and all bytes are
                acknowledged by the FE.  Wake also if not a 
                valid assignment.

FEFULL          [ADDRESS OF FE UDB,,FEFULL]                     [FESRV]
                Wait for the current count of output bytes to be
                less than the count of bytes in the interrupt
                buffer.  Wake also on invalid assignment.

FORCTM          [SUPERIOR FORK INDEX,,FORCTM]                   [SCHED]
                Identifiable wait forever, forced termination.

FRZWT           [PREVIOUS TEST,,FRZWT]                          [FORK]
                Identifiable wait forever, frozen fork.

HALTT           [SUPERIOR FORK INDEX,,HALTT]                    [SCHED]
                Identifiable wait forever for halted fork.

HIBERT          [TIME,,HIBERT]                                  [SCHED]
                Wait for TIME in BLOCKT format.

HUPTST          [<0:9>TIME<10:17>HOST #,,HUPTST]                [NETWRK]
                Wait for IMPHRT bit set for host or time out in
                BLOCKW form.

IDVTST          [0,,IDVTST]                                     [IMPDV]
                Wait for the lock on IDVLCK to free, lock it.

IMPBPT          [0,,IMPBPT]                                     [IMPDV]
                Wait for IMPFLG nonzero, or IBPTIM timer to run
                out, or IDVLCK lock free and output scan needed
                for the IMP.

JB0TST          [TIME,,JB0TST]                                  [MEXEC]
                Wait for JB0FLG set nonzero for explicit request
                or time in BLOCKT form to elapse.

JRET            [0,,JRET]                                       [SCHED]
                Wait forever, interruptible.

JSKP            [0,,JSKP]                                       [SCHED]
                Unconditional skip used to schedule immediately.

JTQWT           [0,,JTQWT]                                      [SCHED]
                Wait for JSYS trap queue.

LCKTSS          [ADDRESS,,LCKTSS]                               [IO]
                Wait for lock at ADDRESS to unlock, lock it.

LKDSPT          [0,,LKDSPT]                                     [STG]
                Wait for room in LDTAB table of directories
                currently locked.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 53
TOPS-20 SCHEDULER TEST ROUTINES


LKDTST          [INDEX INTO LDTAB,,LKDTST]                      [STG]
                Wait for bit in LCKDBT to clear, indicating
                directory unlocked.

LODWAT          [ADDRESS OF STATUS WORD,,LODWAT]                [LINEPR]
                Wait for flag LP%LHC to set in the addressed
                word, indicating loading has completed of the
                VFU or RAM file.

LPTDIS          [UNIT ADDRESS,,LPTDIS]                          [LINEPR]
                Wait for an error condition on the addressed
                unit, or for all buffers cleared and no bytes
                still in the front-end, before finishing close
                operation on the device.

MTARWT          [IORB ADDRESS,,MTARWT]                          [MAGTAP]
                Wait for IRBFA in the IORB to indicate that this
                IORB is no longer active.

MTAWAT          [UNIT #,,MTAWAT]                                [MAGTAP]
                Wait for all outstanding IORBs for unit to be
                finished.

MTDWT1          [UNIT #,,MTDWT1]                                [MAGTAP]
                Wait for the count of outstanding requests on the
                unit to go to one.

NCPLKT          [0,,NCPLKT]                                     [NETWRK]
                Wait for lock NCPLCK to free, lock it.

NICTST          [0,,NICTST]                                     [PAGEM]
                Wait for SUMNR less than or equal to MAXNR or
                only one fork in BALSET.

NOTTST          [<0:8>CONNECTION #<9:17>STATE,,NOTTST]          [NETWRK]
                Wait for connection to leave state.

NSPTST          [0,,NSPTST]                                     [NSPSRV]
                Wait for KDPFLG nonzero, indicating KMC11 wants
                service, or MSGQ nonzero, indicating messages to
                process.

NVTNTT          [<0:8>OPTION #,<9:17>LINE #,,NVTNTT]            [TTNTDV]
                Wait for completed NVT negotiation.

OFNLKT          [OFN,,OFNLKT]                                   [PAGEM]
                Wait for OFN unlocked--SPTLKB zero in SPTH(OFN).

PIDWAT          [FORK #,,PIDWAT]                                [IPCF]
                Wait for bit for fork in PDFKTB to set.

SEBTST          [0,,SEBTST]                                     [SYSERR]
                Wait for SECHKF to go nonzero before starting
                Job 0 task to write queued SYSERR entries.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 54
TOPS-20 SCHEDULER TEST ROUTINES



SEEALL          [0,,SEEALL]                                     [TTYSRV]
                Waits for SNDALL to go to zero, indicating the
                send-all buffer available.

SPCTST          [0,,SPCTST]                                     [DTESRV]
                Wait for a node.

SPMTST          [0,,SPMTST]                                     [PAGEM]
                Wait for page in SPMTPG to be on SPMQ or the
                time SPMTIM to expire.

SQLTST          [0,,SQLTST]                                     [IMPDV]
                Wait for the special queues lock SQLCK and lock
                it.

STRTST          [SDB ADDRESS OF STRUCTURE,,STRTST]              [MSTR]
                Wait for the structure lock to be free.

STSWAT          [ADDRESS OF STATUS WORD,,STSWAT]                [CDRSRV]
                Wait for flag CD%SHA to come on in the addressed
                word, indicating that cardreader status has
                arrived.

STSWAT          [ADDRESS OF STATUS WORD,,STSWAT]                [LINEPR]
                Wait for flag LP%SHA to set in the addressed
                word, indicating that printer status has
                arrived.

SUSFKT          [FORK #,,SUSFKT]                                [FORK]
                Wait for fork to be on WTLST in either SUSWT
                OR FRZWT.

SWPRT           [PAGE #,,SWPRT]                                 [PAGEM]
                Wait for CSTAGE for PAGE # to not be PSRIP,
                meaning swap read completed.

SWPWTT          [0,,SWPWTT]                                     [PAGEM]
                Wait for NRPLQ nonzero.  Increment CGFLG each
                time test is unsuccessful.

TCIPIT          [FORK #,,TCIPIT]                                [TTYSRV]
                Waits for no interrupts pending for FORK #.

TCITST          [LINE #,,TCITST]                                [TTYSRV]
                Wait for line inactive, no fork in input wait,
                or input buffer non-empty.

TCOTST          [LINE #,,TCOTST]                                [TTYSRV]
                Wait for line inactive, or output buffer not
                too full to add a character to it.

TRMTS1          [0,,TRMTS1]                                     [FORK]
                Identifiable wait forever for inferior fork termination.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 55
TOPS-20 SCHEDULER TEST ROUTINES


TRMTST          [FORK #,,TRMTST]                                [FORK]
                Wait for FORK # to be on WTLST for either HALTT
                or FORCTM.

TRP0CT          [MINIMUM NRPLQ,,TRP0CT]                         [PAGEM]
                Wait for NRLPQ to be above stated minimum or
                normal minimum.  Increment CGFLG each time
                test is unsuccessful.

TSACT1          [LINE #,,TSACT1]                                [TTYSRV]
                Wait until line inactive, becoming active, or
                has a full length dynamic block assigned.

TSACT2          [LINE #,,TSACT2]                                [TTYSRV]
                Wait for line available--inactive or fully
                active.

TSACT3          [LINE #,,TSACT3]                                [TTYSRV]
                Wait for line inactive--dynamic data unlocked.

TSTSAL          [0,,TSTSAL]                                     [TTYSRV]
                Wait for SALCNT to go to zero, indicating the
                send-all is finished for this buffer.

TTBUFW          [NUMBER,,TTBUFW]                                [TTYSRV]
                Wait for NUMBER of buffers.

TTIBET          [LINE #,,TTIBET]                                [TTYSRV]
                Wait for line inactive or input buffer empty.

TTOAV           [LINE #,,TTOAV]                                 [TTYSRV]
                Wait for line inactive and output buffer not
                empty.

TTOBET          [LINE #,,TTOBET]                                [TTYSRV]
                Wait for line inactive or output buffer empty.

UDITST          [0,,UDITST]                                     [PHYSIO]
                Wait for at least two free IORBs on UIOLST.

UDWDON          [IORB ADDRESS,,UDWDON]                          [PHYSIO]
                Wait for IS.DON to set in IRBSTS for this IORB.

UPBGT           [CONNECTION INDEX,,UPBGT]                       [IMPDV]
                Wait for LTDF connection done flag to set, or
                output buffers to appear.

USGWAT          [0,,USGWAT]                                     [JSYSA]
                Wait for lock on queued USAGE blocks to free.

VVBWAT          [UNIT #,,VVBWAT]                                [TAPE]
                Wait for the MDA to reset TPVV handling EOV.

WATTST          [<0:8>CONNECTION #<9:17>STATE,,WATTST]          [NETWRK]

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 56
TOPS-20 SCHEDULER TEST ROUTINES


                Wait for connection to be in state.

WTFKT           [FORK #,,WTFKT]                                 [FORK]
                Wait for fork to be on WTLST.

WTSPTT          [PAGE #,,WTSPTT]                                [SCHED]
                Wait for share count on PAGE # to go to 1.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 57
KNOWN HARDWARE DEFICIENCIES LIST


                    Known Hardware Deficiencies List



     This is a collected list of known  hardware  characteristics  which
show  up  from  time to time as part of certain reported problems.  This
says nothing about whether these characteristics are bugs  or  features,
or  whether  they  will ever be fixed or changed, but merely attempts to
make them known internally.




     1.  DZ11 - Cannot set the speed to zero in the hardware,  can  only
         turn off the receiver.

     2.  TM02 - Can generate bad parity which it  passes  to  memory  to
         cause  the  system  memory  parity  errors  when  the  data  is
         referenced.

     3.  TM03 - A chip race condition has been known to  occur  where  a
         function  register  has wrong value because it has not settled.
         This generates a device error which  appears  transient;   i.e.
         CRLFing DUMPER tries the read again and succeeds.

     4.  TM03 - ANSI ASCII was  not  included  in  the  hardware  format
         modes.

     5.  TM03 - When using industry-compatible  mode,  reads  not  of  a
         multiple of four bytes will produce strange results.  The bytes
         are counted, but the extra bytes are  not  written  to  memory,
         leaving garbage.

     6.  DX20 - there is a race type condition where the DX20  generates
         an  an  interrupt  request on channel 5 for some condition, but
         the code is playing with the DX20 and handles the condition, so
         it lowers its request, however the KL has latched the interrupt
         and tries to process it, but no one will respond.  So it  tries
         the 40+2n type, which gives a PI5ERR occasionally.

     7.  VT100 - on a VT100 without the extended memory, one can confuse
         the  internal  microprogram enough to have it clear sections of
         the screen.

     8.  RH20 - perfectly willing to store bad parity data into memory.

     9.  DX20 - is unwilling to allow registers to be examined after  it
         has  started  I/O.   Can  cause  register  access errors if not
         programmed in correct sequence.

    10.  LP20 - at least one of the printers fails to go  off-line  when
         there  is  anything  in the print line buffer, even if the drum
         gate is opened.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 58
KNOWN HARDWARE DEFICIENCIES LIST


    11.  KS-10 Front End - Rev.  3.  exhibits problems with  the  KLINIK
         line.   If  the  link is in use, it is possible to lock out the
         CTY.  There are problems with the password check on  subsequent
         tries, and problems with line hang-up.

    12.  KS-10 Front  End  -  Rev.   3.   exhibits  some  problems  with
         powerfail  restart.   If  the  power  returns  in less that 3.5
         seconds or so the restart will hang.  In addition  if  Rev.   3
         and  Rev.  2 boards are mixed, there is no powerfail restart or
         reload capability.

    13.  KS-10 Front End - there are more commands to the  KS10>  prompt
         than  often  documented, and some typeins to the front end have
         been known to hang the system, beyond even responding to ^\.

    14.  DX20/TU71 - the DX20 microcode does not set the 556 bpi density
         correctly   for   TU71  (7-track)  drives.   This  can  be  set
         successfully from the maintenance panel.

    15.  TM03 - if an error ocurs while rewinding, the  monitor  may  be
         left in a state of waiting for the rewind to complete, the tape
         being unusable.  The easiest way to clear this condition is  to
         reset the TM03, most easily done by the customer by powering it
         down and back up.

    16.  KS10 - during a forced reload, the halt status block is written
         twice,  first when halting and second when rebooting;  thus the
         second time wipes any valuable data from the first time.   It's
         once again the 8080 that's responsible.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 59
KS10 PROCESSOR CONSOLE INFORMATION


                   KS10 PROCESSOR CONSOLE INFORMATION
                   ----------------------------------



CSL-COMMANDS CURRENTLY IMPLEMENTED (CSL V0.161)


^Z      ;enter USER mode
^\      ;enter CONSOLE mode
MK XX   ;Marks microcode location XX (sets bit 95)
UM XX   ;Unmarks Microcode location XX
MB      ;load only bootstrap of currently selected magtape
LA XX   ;Load/set KS10 Memory Address
LI XX   ;Load/set I/O address
LK XX   ;Load/set 8080 address
LC XX   ;Load/set CRAM address to be written/read
EM      ;Examine KS10 Memory (last Memory location specified)
EM XX   ;Examine KS10 Memory location XX
EN      ;Examine Next (either from last EK, EM or EI)
EB      ;Examine BUS and 8080 control registers
EI      ;Examine I/O (last I/O address specified)
EI XX   ;Exmaine I/O address XX
EK      ;Examine 8080 location
EK XX   ;Examine 8080 address XX
DM XX   ;Deposit KS10 Memory last addressed, XX data
DN XX   ;Deposit next (depending on last DK, DM or DI) XX data
DB XX   ;Deposit BUS, XX data
DI XX   ;Deposit I/O, XX data
DK XX   ;Deposit 8080 location (only RAM locations stick)
MR      ;MASTER RESET
CS      ;CPU clock start
CH      ;CPU clock halt
CP XX   ;CPU clock pulse (XX=NR of pulses -- default 1 pulse)
SI      ;Single Instruction
LF XX   ;selects a set (0-7) of 12 bits of microcode (see note at end ****)
DF XX   ;Deposit Field, write microcode bits according to last LF-command
EC      ;Examine CRAM ..curr. Control reg, no clocks .. current loc as addr.
EC XX   ;Examine CRAM at address XX
DC XX   ;Deposit CRAM, XX is at least 32 octal characters
EX XX   ;EXecute KS10 instruction XX
ST XX   ;STart KS10 at address XX
SM XX   ;Start microcode at XX (SM 1 causes dump of HALT-status block !!) 
        ;Default is 0 -- Start microcode
HA      ;HALT KS10 (execute HALT-instruction -- causes microcode to
        ; write HSB and then to enter HALT-loop)
SH      ;SHUTDOWN (deposit non-zero data in memory location 30)
        ; causing TOPS20 to shut down
CO      ;Continue (causes microcode to leave HALT-loop)
PE X    ;Parity Enable (0=disable, 7=enable all, 1=DRAM-par, 2=CRAM-par,
        ; 4=clock-par error stop)
CE X    ;CACHE enable (0=OFF, 1=ON, <CR>=show current state)
TE X    ;1 MSEC enable (0= OFF, 1=ON, <CR>=show current state)
TP X    ;TRAPS enable (0=OFF, 1=ON (enables paging), <CR>=show current state)

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 60
KS10 PROCESSOR CONSOLE INFORMATION


LT      ;Lamp Test, lights three lamps of front panel
RC      ;Read CRAM direct, functions 0-17
        ; (no resets, no load diag adr, no CPU clock) (see note at end ****)
EJ      ;Examine Jumps -- prints CRAM address signals (CURR, NXT, J, SUB)
TR XX   ;TRACE - repeats CP and EJ commands till any character typed
        ;XX (if typed) is desired CRAM stop-address
PM      ;Pulse Microcode (issue single CP and EJ)
ZM      ;Zero KS10 MOS Memory (beware -- slow)
RP      ;Repeat - repeats last command, or line of commands which it delimits
        ; Any character (except CNTRL-O) typed will stop repeat
        ;EXAMPLE: EM 0, EK 0, EC 0, RP will repeat execution of this line
BT      ;Boot SYSTEM -- load CRAM from designated disk (see DS)
        ; via memory then load monitor boot from disk and start at 1000
BT 1    ;same as BT, but loads SMMON and starts at 20000
LB      ;Load Bootstrap from designated disk (see DS)
LB 1    ;Load Bootstrap diagnostic monitor SMMON
DS      ;Disk Select. Command prompts to specify
        ; UNIT NUMBER, RHBASE, and UNIBUS ADAPTER
        ; to load from when booting
MS      ;Magtape Select. Command prompts to specify
        ; UNIT NUMBER, RH BASE, UNIBUS ADAPTER, SLAVE NUMBER, and DENSITY
        ; of magtape to boot from
MT      ;Magtape Boot system from selected magtape
MT 1    ;BOOT diagnostic monitor SMMAG from magtape
PW      ;clears KLINIK password, or sets it (6 char's max)

NOT IMPLENTED YET
***BC   ;BOOT Check. PROM code which tests the basic 2020 system
        ; load path from the UNIBUS adaptor into the CRAM via memory.


CONTROL CHARACTERS
^U      ;rub out current line
^O      ;switch: first one stops CTY-output, second one resumes CTY-output
^S      ;stop TTY-output and hangs 8080 waiting for CONTROL-Q (see below)
^Q      ;resumes TTY-output
^C      ;stops whatever the 8080 is doing
RUB-OUT ;rub out previous character typed

NOTE:    Several commands may be put on a single line, separated by commas.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 61
KS10 PROCESSOR CONSOLE INFORMATION



*****   CRAM Bit Formats

        LF-Command CRAM Bits            RC-Command CRAM Data
        --------------------            ---------------------

        LF      CRAM bits               RC      Data
        --      ---------               --      ------------------------------

        0       00-11                   0       CRAM bits 00-11
        1       12-23                   1       Next CRAM address
        2       24-35                   2       CRAM subroutine return address
        3       36-47                   3       current CRAM address
        4       48-59                   4       CRAM bits 12-23
        5       60-71                   5       CRAM bits 24-35 (Copy A)
        6       72-83                   6       CRAM bits 24-35 (Copy B)
        7       84-95                   7       0s
                                        10      Parity bits A-F
                                        11      KS10 bus bits 24-35
                                        12      CRAM bits 36-47 (Copy A)
                                        13      CRAM bits 36-47 (Copy B)
                                        14      CRAM bits 48-59
                                        15      CRAM bits 60-71
                                        16      CRAM bits 72-83
                                        17      CRAM bits 84-95

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 62
KS10 PROCESSOR CONSOLE INFORMATION


8080-CONSOLE-ERROR-CODES
------------------------


?BUS            BUS polluted on power up
?BFO            Input Buffer Overflow
?IL             ILLEGAL Instruction
?UI             Unknown Interrupt
?A/B            A and B copies of CRAM bits did not match
?DNF            Did Not Finish instruction
?BT             device error or timeout during BOOT operation
?DNC            Did Not Complete HALT
?PAR ERR        report clock-freeze due to parity error,
                 and type out READ IO of 100,303,103
?MEM REFRSH ERR Memory Refresh Error (MEM BUSY stayed set too long,
                 because it didn't release data on a write to memory)
?CHK            PROM checksum failed
?BC             BOOT Check failed
?RUNNING        trying to do a command that may screw up
?NDA            received No Data Acknowledge on memory request
?NXM            referenced NoneXistent Memory location
?NBR            Console was not granted BUS on a request
?RA             command Requires Argument
?BN             received Bad Number on input
?KA             KEEP ALIVE failed
?FRC            had a forced reload
?PWL            Password Length error
?IA             Illegal Argument (address out of range, etc.)


OTHER 8080 CONSOLE MESSAGES
---------------------------

BUS 0-35                message header for EB command
KS10>                   prompt message
CYC                     cycle type for DB command
SENT                    data sent to bus
RCVD                    data received on bus
HLTD                    message "HALTED/XXXXXX " where xxxxxx is data
BT SW                   message says BOOTING, using BOOT switch
OFF                     message, says this signal is off
ON                      message, says this signal is on
>>UBA?                  query for UNIBUS adapter
>>UNIT?                 query for unit to use
>>RHBASE?               query for RH11 to use
>>DENS?                 query tape density
>>SLV?                  query tape slave
C CYC                   typed on DB-command if COM/ADR cycle blew
D CYC                     "             "      DATA    cycle blew

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 63
KS10 PROCESSOR CONSOLE INFORMATION


8080-ERROR-Messages-during-BOOTING
----------------------------------


Disk:
        On an error-condition, detected by the 8080, the
        Fault-light will go on and a message of the form

                ?BT XXXYYY

        will be printed on the CTY.


The following error-codes are only "rough" pointers, they can be
caused by any of the following problems:

        Disk not a disk at all
        Wrong unit selected (see DS-command)
        Home blocks not readable or not there
        Home blocks not set by SMFILE for 8080
        8080 File-system garbage

XXX=001 Disk error encountered while trying to read HOME-blocks

XXX=002 Disk error encountered while trying to read the page of
        pointers, which make up the "8080-File-System"

XXX=003 Disk error encounterd while trying to read a page of
        microcode

XXX=004 Disk error encountered while trying to read PRE-BOOT

YYY     are the lower 8 bits of the 8080 address of the failing
        "Channel Command List" operation. Normally it is here
        a good bet to do an "EI" to get the contents of the
        RH11 register that has the error-bits set !


Magtape:

The following ERROR-messages can point to the following problemareas:

        Magtape is no magtape at all
        Wrong unit selected (see MS-command)
        Magtape is not bootable (no microcode, no PRE-BOOT)

XXX=001 Error trying to read microcode first page

XXX=003 Error trying to read additional pages of microcode

XXX=004 Error trying to read in PRE-BOOT program

YYY     see above (disk-section)

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 64
KS10 PROCESSOR CONSOLE INFORMATION




Error-messages-out-of-PRE-BOOT


PRE-BOOT is loaded from Disk or Magtape (see 8080 commands DS, MS,
         BT, BT 1, MT, MT 1)

PRE-BOOT is written onto the disk using "SMFILE.EXE", it also is written on
"standard" Diagnostic-tapes  and onto the "MONITOR-INSTALLATION"-tapes.

PRE-BOOT is loaded by the 8080 into MEMORY-locations 1000 and up, and starts
at 1000.  The ERROR-halts are:

        1001    found "bad" core-transfer address
                 (page 1 is illegal - can't overload PRE-BOOT)
        1003    No RH11 Base Address
        1004    Magtape Skip failure
        1002    all other failures

At ERROR-halt time the following MEMORY-Locations contain the useful INFO :


                Disk-Booting                    Magtape-Booting
                ------------                    ---------------

        100     "8080" disk-address             Not used
        101     Memory transfer address         same
        102     Index-pointer                   same
        103     RPCS1-register                  MTCS1-register
        104     RPCS2-register                  MTCS2-register
        105     RPDS - register                 MTDS - register
        106     RPER1-register                  MTER1-register
        107     RPER2-register (RP06 only)      Not used
        110     RPER3-register                  Not used
        111     UBA Page RAM loc 0              same
        112     UBA-status register             same
        113     Version Nr. of PRE-BOOT         same

        Note: The Version Nr. of PRE-BOOT will be the same as the Version Nr.
        of SMFILE. The "8080" disk-address is in the form " CYL SEC SURF "


THEREBY IT WILL BE POSSIBLE TO ASK A CUSTOMER WITH A PRE-BOOT FAILURE,
TO DO AN :

        EM 77
        EN,RP
        ...... AND TYPE SOMETHING AFTER ADDRESS 115
        ...... AND THEN TELL US WHAT HE SEES

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 65
KS10 PROCESSOR CONSOLE INFORMATION



8080-Communication-Area (KS10 Memory)
-------------------------------------


The 8080 maintains and services an in-core communication area.
Currently used are words 31 to 40.  See PROKS.MAC for more info.

Word Nr.                Meaning
---- ---                -------
  31            Keep Alive and Status word
  32            KS-10 CTY input word (from 8080)
  33            KS-10 CTY output word (to 8080)
  34            KS-10 KLINIK user input word (from 8080)
  35            KS-10 KLINIK user output word (to 8080)
  36            BOOT RH-11 Base Address
  37            BOOT Drive Number
  40            Magtape Boot Format and Slave Number


Word 31         Keep Alive and Status word
---- --
Bit 4           Reload Request
Bit 5           Keep Alive active
Bit 6           KLINIK active
Bit 7           PARITY Error detect enabled
Bit 8           CRAM Parity Error detect enabled
Bit 9           DRAM Parity Error detect enabled
Bit 10          CACHE enabled
Bit 11          1 msec enabled
Bit 12          TRAPS enabled
Bit 20-27       Keep Alive counter field
Bit 32          BOOT SWITCH BOOT
Bit 33          POWER FAIL
BIT 34          Forced RELOAD
BIT 35          Keep Alive failed to change


Word 32         KS-10 CTY input word (from 8080)
---- --
Bits 20-27      0 -- no action, 1 -- CTY character pending
Bits 28-35      CTY-character


Word 33         KS-10 CTY output word (to 8080)
---- --
Bits 20-27      0 -- no action, 1 -- CTY character pending
Bits 28-35      CTY-Character


Word 34         KS-10 KLINIK user input word (from 8080)
---- --
Bits 20-27      0 -- no action, 1 -- KLINIK character,
                2 -- KLINIK active, 3 -- KLINIK carrier loss

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 66
KS10 PROCESSOR CONSOLE INFORMATION


Bits 28-35      KLINIK-Character


Word 35         KS-10 KLINIK user output word (to 8080)
---- --
Bits 20-27      0 -- no action, 1 -- KLINIK character, 2 -- Hangup request
Bits 28-35      KLINIK-Character




OUTPUT process KS10 ==> 8080
----------------------------

 Load character and flag into  33,   set 8080-interrupt,   8080 examines
   33 and gets character, clears interrupt, sends character to hardware,
   clears 33 and sets KS-10 interrupt.

INPUT process 8080 ==> KS10
---------------------------

 8080 gets interrupted "TTY-char available",   8080 gets character and
  delivers into input-word (31) with flag(s) and sets KS-10 interrupt.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 67
CRASH ANALYSIS


                              CRASH DUMPS
                              ===========

Each time there is a BUGHLT there  is  an  automatic  dumping  of  the
system  core  image  into PS:<SYSTEM>DUMP.EXE.  If there is sufficient
room on the DSK the data that  was  previously  in  DUMP.EXE  will  be
copied into DUMP.CPY by SETSPD after the system is reloaded.  DUMP.CPY
does not get deleted and you may find several generations of DUMP.CPY.

     In the case you have set no auto reload you can dump the crash by
hand  by  typing /D to the system BOOT> prompt.  You can get into BOOT
if you are reloading the system by bringing the  system  up  from  the
switch  registers  rather than hitting <ENABLE> <DISK> on the console.
See the Operators Guide for a discussion of the meaning of the various
switches on the DEC-20.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 68
CRASH ANALYSIS


                             CRASH ANALYSIS
                             --------------



     First when analyzing software or  software/hardware  problems  be
sure you have the proper tools:

     1.  A SWSKIT on magtape

     2.  A full copy of the current  release  microfiche  MONITOR  and
         EXEC.

     3.  A MONITOR CALLS REFERENCE MANUAL.

     4.  A SYSERR manual.

     5.  A listing of  the  SYSERR  log,  especially  if  hardware  is
         suspected.

     6.  A CTY  output  for  BUGHLTs  and  BUGINFs  or  other  problem
         indications, or an accurate reproduction of this information.

     7.  Any other manuals you may need  for  reference  such  as  the
         proper  version  Installation  Guide, Operators Guide, System
         Managers Guide, etc.

     8.  A TOPS-20.BWR file.


     You will need the SWSKIT  and  perhaps  listings  of  the  latest
versions of monitor modules in case the microfiche are not up to date.
FILDDT is on the customers distribution tape.

     Be sure you have analysed the SYSERR log.  Be  sure,  also,  that
you  have  looked  up  the  BUGHLT  and/or  BUGCHKs in question in the
listings (microfiche) and have at least read the comments around them.
Probably tracing down how it got called is a good idea.  If you happen
to be without a GLOB (provided on microfiche) you can find the  BUGHLT
tag of interest in the monitor as follows:

        $GET <SYSTEM>MONITR.EXE
        $ST 140
        DDT
        ILPP3?                  ; BUGHLT of interest followed by "?"
        PAGEM G                 ; it is defined in PAGEM and is global



     Some other useful bits of information.  There is a  GLOB  listing
provided  in  the  microfiche  which contains a list of all the global
symbols in the monitor.  Most of the symbols are defined in the module
STG.MAC.  If you don't know a tag name but want to look at the storage
for DTEs, say, look through STG.  STG also contains some small portion

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 69
CRASH ANALYSIS


of  code mostly to do with restart, start, auto reload, dispatches for
PI channels and A few scheduler tests.  STG stands for storage.   Note
that  some stuff may be defined in PROLOG, and of course lots of stuff
is defined throughout the monitor.  You may also want to get a listing
of  MACSYM  to  be able to understand the macros you see while reading
the monitor listings;  MONSYM is also useful at times.   Be  sure  you
know how PARAMS has been changed in case it has.  See BUILD.MEM on the
distribution tapes for the currently distributed information  on  what
to do to change various system parameters in PARAM0.MAC.  Be sure that
you know about any variables that the site may have changed in STG  as
well.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 70
CRASH ANALYSIS


                         EXAMINING THE MONITOR
                         ---------------------




     Debugging a complex, multi-process software system is  largely  a
matter  of  absorbing  sufficient  knowledge,  experience and folklore
about the particular system with a considerable  element  of  personal
preference,  or  'taste'  also  involved.   This document is a cursory
description of features built into the system to  aid  debugging,  and
such folklore as can be described in written English.

     There are four different versions of DDT  that  may  be  used  to
examine  the  monitor.   Each  is used for a different purpose and has
special capabilites.  The versions of DDT are:

     1.  UDDT (user DDT) used to  examine  or  modify  the  MONITR.EXE
         file.

     2.  MDDT (monitor DDT) used to  examine  or  modify  the  running
         monitor under timesharing.

     3.  EDDT (exec DDT) used to examine or modify the running monitor
         from the CTY in a stand-alone mode.

     4.  FILDDT used to examine dumps.


     All the DDT's are versions  of  TOPS-20  DDT  documented  in  the
TOPS-20  DDT  manual,  and  have  all of the features described in the
manual.  See also the document DDT41.MEM.

     The use of all four versions of the DDT's is the same and will be
described latter, however, each version is started differently.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 71
CRASH ANALYSIS


UDDT:
----

     To use UDDT to modify your MONITR.EXE file on  system,  you  must
give the following EXEC commands:

        @GET <SYSTEM>MONITR.EXE
        @START 140      or on Release 4 systems, @DDT

This causes EDDT to start in user mode.  This is the same DDT that  is
used  when  examining  any program.  You may now look at or change any
part of the monitor.  If you make changes to the monitor and  want  to
save  it,  you should get back to the EXEC by typing ^Z.  Then you may
save the monitor.


     You will probably have to be enabled in order to save the monitor
back in <SYSTEM>.  This is the safest, best, and recommended method of
putting patches into the monitor.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 72
CRASH ANALYSIS


MDDT:
----


     A version of DDT which runs in monitor space  is  available.   It
can  examine  and  change the running monitor, and can breakpoint code
running as a process but not at PI or scheduler level.  When  patching
or  breakpointing  the  swappable monitor, the normal write protection
must be defeated, either by setting DBUGSW to 2 on startup, or calling
SWPMWE.  If you insert breakpoints with MDDT, remember monitor code is
reentrant and shared so that the breakpoint could be hit by any  other
process  in  the  system.   In this event, the other process will most
likely crash since it will be executing a JSR to a page full of zeros.

     To use MDDT you must have WHEEL or  OPERATOR  capabilities.   You
first issue the EXEC command:

        @ENABLE
        $^EQUIT

                ; You are now in the mini-exec and receive a  prompt
                ; of MX>.  Now you give the "/" command:
        MX>/
                ; You are now put into MDDT.  To return to the  EXEC
                ; you can  issue  a ^Z  or  a ^C  which  produces  a
                ; message like "INTERRUPT AT 17372" and returns  you
                ; to the mini-exec.  If  you type a  ^P in MDDT  you
                ; will get a  message, "ABORT", and  be returned  to
                ; the mini-exec.  If you once go into the  mini-exec
                ; the CONTROL-P interrupt is enabled and typing this
                ; character will return you to the mini-exec.   This
                ; is a  good thing  to use  when debugging  programs
                ; that do  CONTROL-C trapping.   From the  mini-exec
                ; you may give either:
        MX>S
                ; or
        MX>E
        
                ; The S is filled  out as START and  the E as  EXEC.
                ; both of  these commands  will  return you  to  the
                ; EXEC. See the document EXEC-DEBUGGING.MEM for more
                ; about ^P and getting  out of the  EXEC to MX>  and
                ; returning from MX> to either your copy of the EXEC
                ; or the system EXEC.

                ; You may also give the command:

        MRETN$G

                ; From MDDT to return  directly to the EXEC.   While
                ; in MDDT you may examine  any core location in  the
                ; running monitor.  You may also change any location
                ; in   the  resident  monitor  (done  frequently  by
                ; accident).  If  you  wish  to change  any  of  the

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 73
CRASH ANALYSIS


                ; locations in the swappable  monitor you must  give
                ; the command:
                
        CALL SWPMWE$X

                ; To write enable the monitor.  After you have  made
                ; your changes you must give the command:

        CALL SWPMWP$X

                ; to write protect the monitor again.

     MDDT may also be entered from process level via JSYS:

        JSYS 777$X
            or
        MDDT%$X ; will enter MDDT from the context of the current process

     If you wish to examine the system from the  EXECs  inferior  fork
monitor context:

        @ENA
        $SDDT
        DDT

        JSYS 777$X
        MDDT

To return to user context:

        MRETN$G

Use SETMPG to map pages to this context:

        page 677 has been traditionally used for this;
        but any unused page may be used.  To make sure that the page
        is currently unused type:

        ADDRESS/   ?    ; the question mark from DDT indicates that the
                        ; page is nonexistent.

        when the destination page has been found, set up AC2 as:

        AC2/ ACCESS,,677000

        If the page has its own SPT slot:

        AC1/SPT INDEX

If  the  source page does not have its own SPT slot, it will belong to
either a file or process page table.  It will  be  represented  as  an
index into this page table:

        AC1/ SPT INDEX OF PAGE TABLE,,INDEX INTO PAGE TABLE

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 74
CRASH ANALYSIS



        Access = read or/and or write access
        Read/Write access = 140000 in LH

Therefore, to map a page, call with either:

        AC1/SPT INDEX OF PAGE
        AC2/140000,,677000

                or

        AC1/SPT INDEX OF PAGETABLE,,INDEX INTO PAGE TABLE
        AC2/140000,,677000

        AND SAY:

        CALL SETMPG$X

The page will then be mapped to  page  677.   In  examining  locations
677000-677777, you will be looking at the contents of the page.

If you desire to map another page into this slot, merely  call  SETMPG
again  with arguments for the new page.  You need not first un-map the
old page.   However,  when  you  are  finished,  page  677  should  be
un-mapped in the following manner:

        AC1/0
        AC2/ACCESS,,677000
        CALL SETMPG$X

WARNING:

Calling SETMPG incorrectly can crash the system.  Be CAREFUL!  Do  not
use  SETMPG  on  a  time  sharing  system  if  a  crash will cause bad
feelings.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 75
CRASH ANALYSIS


EDDT:
----



                                 NOTE

               Not to be confused with  ^EEDDT  command
               to  get  into UDDT used with the command
               processor.   See  separate  document  on
               EXEC DEBUGGING for that.



     To get  into  EDDT  you  must  bring  the  system  up  using  the
switch-register.    See   the   DECSYSTEM-20  Operators  Guide  for  a
discussion of switches.  Go through the KLINIT dialog and when you get
the prompt BOOT>, respond with:

        BOOT>/L
        BOOT>/G141

The "/L" command causes the monitor to be  loaded,  but  not  started.
The  "/G141"  starts  the  monitor at location 141, which is a jump to
EDDT.  You can use EDDT like UDDT under timesharing on the  MONITR.EXE
file by giving the following commands:

        $GET <SYSTEM>MONITR.EXE
        $START 140

EDDT is linked into the monitor and is always there.  You may also get
to EDDT from MDDT by issuing the following:

        EDDT$G

from MDDT.  This stops timesharing.  To resume timesharing and /or get
back to MDDT give the command:

        MDDT$G                  ; back to MDDT
        MRETN$G                 ; back to normal timesharing



     Breakpoints may be inserted in the resident  monitor  with  EDDT,
but  not in the swappable monitor in general, because its pages may be
swapped out and be unavailable to EDDT.  You  can  bring  them  in  by
typing:

        SKIP LOC$X              ; where LOC is some address not in core

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 76
CRASH ANALYSIS


     There are some locations in the monitor that are very useful when
using  EDDT  for debugging.  They must be set before going on to start
the monitor.


     They are:

        EDDTF   1        keep EDDT in core when system comes up
                0        delete DDT when system comes up (default)

        DBUGSW  0        do not stop on BUGHLTs, crash and reload
                1        stop on BUGHLTs (hit EDDT breakpoint)
                2        write enable swappable monitor,
                         do not start up SYSJOB, and stop on
                         BUGHLTs.  Also it dosn't run CHECKD
                         automatically on startup.

        DCHKSW  0        do not stop on BUGCHKs (default)
                1        stop on BUGCHKs (hit EDDT breakpoint)

        DINFSW  0        do not stop on BUGINFs (default)
                1        stop on BUGINFs (hit EDDT breakpoint)

In  addition  the  symbol  GOTSWM  appears  in the code just after the
swappable monitor is loaded.  So, if you want to debug  the  swappable
part  of  the  monitor  you  must  put  a breakpoint at GOTSWM (to get
swappable part in core) by,

        GOTSWM$B

Then start the MONITOR by,

        147$G

        CALL SWPMLK$X

CALL  SWPMLK  is used to lock swappable monitor in core for debugging.
You must have more than 96k of core to give  this  command  since  the
resident  and  swappable monitor are larger than 96k.  To start up the
monitor after you have gone into EDDT  and  set  up  your  breakpoints
(remember  the  last  two  are  used  for  BUGHLT and BUGCHK) give the
command:

        147$G
or
        SYSGO1$G

If  you  are in EDDT and DBUGSW is not 2, that is the monitor is write
protected, you can use the routines SWPMWE and SWPMWP to write  enable
and write protect the monitor.  CALL SWPMWE$X in DDT.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 77
CRASH ANALYSIS


FILDDT:
------

FILDDT is distributed on the customer software tape.

The following is an chewed-up FILDDT.HLP file.

GET(FILE) FILE-SPEC

Loads a file for DDT to examine.  If you are looking at a monitor dump
you must load DUMP.CPY explicitly.  FILDDT looks  for  MUMBLE.EXE  not
MUMBLE.CPY  that is DUMP<ESC> will tell you that there is no such file
or will load DUMP.EXE.  When looking at a dump and you  wish  to  load
the  symbols you must first issue the load command followed by the get
command.  Be sure that the file from which you get the symbols is  the
same  version  as  the  dump.  Be sure, also that the monitor that was
dumped is the same monitor you use for symbols.   That  is  don't  get
MONMED symbols to use with MONBCH etc.


LOAD (SYMBOLS FROM) FILE SPEC

Reads specified file and builds internal symbol table.  This  must  be
the  first command to FILDDT before "GET" when looking at a dump.  You
will most probably use <SYSTEM>MONITR.EXE which would  have  been  the
monitor running at the time of the dump.


EXIT (FROM FILDDT)

Returns to command level.  You then may type a save command if a  load
command  was  just done to preload symbols.  You will get a version of
FILDDT that has the symbols you just loaded in it  so  you  no  longer
need to "LOAD" symbols.  You now have a monitor specific FILDDT, which
was common practice  for  TOPS-10,  but  is  not  generally  done  for
TOPS-20.


HELP

Types something like this text.


ENABLE PATCHING

Allows writing on an existing file specified by a GET.


ENABLE DATA-FILE

Assumes file is raw binary (i.e.  no ACs, and not an EXE file).

DDT FEATURES:

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 78
CRASH ANALYSIS


        EP$U    Sets monitor context for FILDDT mapping.  EP is a symbol
                which is equal to the page number of the EPT.  (Rel 4)

   <CTRL/E>     Returns to FILDDT command level.

TRACKING DOWN UNMAPPED ADDRESSES:

     The resident monitor may be looked at without  any  difficulties,
but  the swappable monitor may not be in core at the time of the dump.
If the value of the symbol  is  in  the  swappable  monitor  you  must
sometimes go through the monitor map to find where the location really
is.  The location MONCOR contains the  number  of  pages  of  resident
monitor  and  the location SWPCP0 contains the first page of real core
for swapping.  So if the value of the symbol is greater than  contents
of MONCOR times 1000 then it is in swappable monitor.

If the page of the swappable monitor you want to look at is in core it
will  probably  not be in core in the location that it's address refer
to since the dump is of core and relocation of pages does not  happen.
To  find  where  a symbol really is in the dump, first type the symbol
followed by an "=".  DDT will respond with the value of  this  symbol.
The  value  of  the symbol can be divided into two, three octal digit,
fields.  The high order three digits are the page number and  the  low
order three digits are the offset into the page.

If the value of the symbol is 324621 the high order three digits, 324,
are  the  page  number  and  the  low order three digits, 621, are the
offset into the page.  To find the location of the page in question in
the  dump you must look at the monitor map indexed by the page number.
For example:

        MMAP+324/

would  give you the monitor map word for page 324.  This word contains
some protection bits for the page and the address of the page when the
dump was taken.

The page may have been in core, on the swapping area or on the disk at
the time of the dump.

        If bits 14-17 in the monitor map word are  non-zero  the  page
was on the swapping area or disk and is no longer available.

If bits 14-17 are zero then the page was in core, and the  right  half
of  the  word contains the page number in the dump of the page you are
looking for (the dump program overwrites the  last  several  pages  of
memory, the dump therefore does not contain these last pages.)

If the page was in core the new address of the symbol you are  looking
for  can  be  found by using the page number from the monitor map word
and appending the offset into the page to it.  For example if MMAP+324
contains  104000,,256;   then  the  new address of our symbol would be
256621.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 79
CRASH ANALYSIS


All address in the swappable monitor must be resolved in this  manner.
In  addition address of 600000 and above are in the JSB or PSB (PSB is
page 777) and must be resolved by finding the page containing the  JSB
or  PSB  of the process that was running when the dump occured.  There
are some locations and tables in the monitor that make this easy:

        NAME    INDEX   DESCRIPTION

        FORKX   none    Number of the fork that was running at the time of
                        the dump, -1 if in the scheduler.
        JOBNO   In PSB  Job number to which current fork belongs.
        FKJOB   Fork #  Job number,,SPT index of JSB
        JOBDIR  Job #   logged in directory number
        JOBPT   Job #   controlling TTY number,,top fork number
        FKSTAT  Fork #  test data,,address of fork wait routine
        FKPGS   Fork #  SPT index of page table,,SPT index of PSB


SPT  indexes  are  indexes into a share pointer table starting at SPT.
To find the PSB of fork 20, you  first  look  at  FKPGS+20.   If  this
location  contains 425,,426, the word at SPT+426 is the pointer to the
PSB.  This pointer can point to disk, swap area,  or  a  page  in  the
dump.   If  bits  14-17 are zero it is a pointer to a page in the dump
and the right half of the SPT word is the page number of  the  PSB  in
the dump.

When you look at a dump, you should first try to  find  why  the  dump
occured by looking at the location BUGHLT.  If BUGHLT is zero then you
should check the CTY log to find out why the dump was  taken  and  for
information  like the PC at the time of the dump and the status of the
PI system.  If BUGHLT is non-zero it  is  the  address  of  where  the
BUGHLT was issued.  You should look up the BUGHLT in BUGSTRINGS.TXT or
BUGS.MAC to find additional information about the BUGHLT.  If at  this
point  you are not sure as to why the BUGHLT occured, you will have to
look at the listings for more information.  A copy  of  BUGSTRINGS.TXT
is  in  Appendix A of the Operators manual.  You can find the location
of the call to the BUGHLT by typing the BUGHLT tag to DDT followed  by
a  "?".   DDT  will tell which monitor module the BUGHLT is in and you
can  go  to  your  microfiche  and  read  all  about  the   conditions
precipitating the BUGHLT.

Next if necessary look at FORKX.  If it contains a  -1  the  scheduler
was  running;  otherwise it is the number of the fork that was running
when the crash occurred.  The registers  are  saved  at  BUGACS  on  a
BUGHLT,  but  if  BUGACS+17  contains  something,,BUGPDL+n,  then  the
registers are invalid and you must go to the SYSERR buffer to get  the
good  registers.   This  is  done  by  adding to the right half of the
SYSERR buffer pointer, SEBQOU, the offset  into  the  buffer  for  the
heading  and  ACs,  SEBDAT+BG%ACS.  This value points to a 16 block of
words containing the users ACs.  You may have to chain down more  than
one queued-up SYSERR entry to get to the BUGHLT block.

                                 NOTE

               Do not forget to get a print out of  the
               SYSERR  log  which will give you and the
               field service representative much of the
               information you can get out of the dump.
               The SYSERR  output  is  much  easier  to
               examine, however, clearly you cannot get
               as much info as you can from a dump.


Some other locations in the PSB of interest are:

        LOCATION        DESCRIPTION

        UAC             User's ACs when he did his last JSYS.
        PAC             monitors ACs
        PPC             processors PC
        UPDL            users pushdown stack while in a JSYS
        NSKED           0 = ok to run scheduler
                        >0 = cannot run scheduler
        INTDF           -1 = ok to receive software interrupts
                        >= 0 , cannot receive software interrupts

It may be useful to know the status of a fork when it is hung  or  you
are unsure of its status.  This can be determined by looking at FKSTAT
indexed by the fork number.  The right half of this  location  is  the
address of a test routine and the left half is data to be tested.  For
example if FKSTAT+12 contains 23,,FKWAT, then fork 12 is  waiting  for
fork  23  to complete.  FKWAT is a routine that waits for another fork
to complete and its data (the left half of the word) is the number  of
the  fork  it  is waiting for.  There are many different wait routines
and you will have to look at the code to see what individual ones  are
waiting  for.   There  is a memo on scheduler tests which details most
all of the scheduler tests in the monitor.



     You can easily determine all of the forks associated with  a  job
by giving the commands:

        -1,,0$M
        FKJOB<FKJOB+NFKS>N,,0$W

Where N is the job you are looking for.  A fork structure can  usually
be  determined  by looking at the FKSTAT of the forks and seeing which
forks are waiting on which forks.  A FKSTAT of FKSKP indicates a  fork
is inactive.

You should refer to STG.MAC for other fork and job  tables  and  other
locations  in the PSB and JSB of interest.  All of the above locations
can be examined with MDDT or EDDT while the monitor  is  running.   Of
course  at  these times you do not have to go through MMAP and the PSB
and JSB that are in core are your own.

There are two separate patch areas in the monitor (FFF and SWPF).  FFF
is the resident patch area and SWPF is the swapable patch area.  These
two symbols should be updated to point to the next  free  location  in

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 81
CRASH ANALYSIS


the  patch  area  when  a  patch is inserted.  PAT..  is defined to be
equal to SWPF.  By convention, all distributed patches are applied  at
FFF.   This  serves the purposes of reducing confusion, always working
until the patch area is exhausted, and leaving patches always  present
in a dump for the cases where that is important.

There are several general purpose routines that can be used to look at
the  the  monitor  while it is running.  These routines should be used
with caution since it is certainly possible to crash  the  monitor  by
using  them incorrectly.  Two of the more general routines are MAPDIR,
for mapping a directory  into  core,  and  SETMPG  for  mapping  pages
(someone  elses  PSB  or JSB) into core.  You will have to look at the
listing for the exact use of these and other general routines.  Beware
of the precautions that should be taken when using them.  You can find
the module they are located in by looking in the GLOB listing which is
a  cross  reference  listing of all the global symbols in the monitor.
You get a GLOB listing in your microfiche.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 82
CRASH ANALYSIS


BUGHLT, BUGCHK, BUGINF
------  ------  ------


The monitor contains a  considerable  number  of  internal  redundancy
checks  which  generally  serve  to  prevent  unexpected  hardware  or
software failures from cascading into severely destructive  reactions.
Also,   by  detecting  failures  early,  they  tend  to  expedite  the
correction of errors.

There are two failure routines,  BUGCHK  and  BUGHLT  for  lesser  and
greater  severity of failures.  Calls to them with JSR are included in
code by use of a macro which records the locations and a  text  string
describing the failure.  The general form is:


     BUG (TYPE,NAME,<STRING>)

Where type is HLT or CHK, and string describes the cause.

For example,

        BUG(HLT,SKDPFL,<PAGE FAULT FROM SCHEDULER CONTEXT>)

The strings are constructed during loading and are dumped into a file.
The BUGSTRINGS.TXT file will produce an ordered  listing  of  the  bug
messages for operator or programmer use.

BUGCHK is used where the inconsistency detected is probably not  fatal
to  the  system  or  to  the  job  being run, or which can probably be
corrected automatically.

Typical is the sequence in MRETN in the SCHED module.

        AOSGE INTDF
        BUG(HLT,IDFOD2,<AT MRETN - INTDF OVERLY DECREMENTED>)

This BUGCHK is included strictly as a debugging aid.  Detection  of  a
failure  takes  no  corrective action.  This situation usually results
from executing one or more excessive OKINT operations (not balanced by
a  preceding  NOINT).   It  is  considered  a  problem because a NOINT
executed when INTDF has  been  overly  decremented  will  not  inhibit
interrupts and will not protect code changing sensitive data.

BUGHLT is used where  the  failure  detected  is  likely  to  preclude
further  proper  operation  of  the  system  or  file storage might be
jeopardized  by  attempted  further  operation.   For   example,   the
following appears in the SCHED module:

        MOVE 1,TODCLK   ;CURRENT TIME
        CAML 1,CHKTIM   ;TIME AT WHICH JOB0 OVERDUE
        BUG(HLT,J0NRUN,<JOB 0 NOT RUN FOR TOO LONG>)

This check accomplishes two things:

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 83
CRASH ANALYSIS



        1.      A function of JOB0 is to periodically update the disk
                version of bittables, file directories and other
                files.  Absence of this function would make the system
                vulnerable to considerable loss of information on a
                crash which loses core and swapping storage.  JOB 0
                protects itself against various types of malfunction,
                this BUGHLT detects any failure resulting in a hangup.

        2.      Detects if the entire system has become hung due to
                failure of the swapping device or some such event, on
                the basis that if JOB 0 isn't running, nobody's
                running.



                                 NOTE


                    For Release 4, the program form the
               BUGxxx  calls  takes  has been modified,
               and  the  new  file  BUGS.MAC   contains
               hopefully  useful information on each of
               the BUGxxx calls  in  one  place.   This
               should    be   considered   a   required
               debugging file.



DBUGSW:

A monitor cell, DBUGSW, controls the behavior  of  BUGHLT  and  BUGCHK
when  they  are called.  DBUGSW is set according to whether the system
is attended by system programmers.

If C(DBUGSW)=0, the system is not attended by system  programmers,  so
all  automatic  crash  handling  is  invoked.   BUGCHK  will return +1
immediately, appearing effectively as NOP.   BUGHLT  will,  if  called
from the scheduler or at PI level, invoke a total reload from the disk
and a restart of the system.  The BUGCHK/INF output will appear on the
CTY and in the SYSERR log when JOB0 gets around to them.

If the system continues to run or is restarted properly, the  location
of  the  bug (saved over a reload) and its message will be reported on
the CTY.

If C(DBUGSW).NEQ.0, the system  is  attended,  and  one  of  the  EDDT
breakpoints  will  be hit.  This allows the programmer to look for the
bug and/or possibly correct the difficulty and proceed.  There are two
defined non-zero settings of DBUGSW, 1 and 2, which have the following
distinction.

        C(DBUGSW) = 1

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 84
CRASH ANALYSIS


                Operation is the same as with 0 except for breakpoint
                action.  In particular the swappable monitor is write
                protected and SYSJOB is started at startup as
                described.
                
        C(DBUGSW) = 2

                Is used for actual system debugging. the swappable
                monitor is not write protected so it may conveniently
                be patched or breakpointed, and the SYSJOB operation
                is not started to save time.

                BUGCHK and BUGHLT procedures are the same as for 1.

The following is a summary of DBUGSW settings:

                        0               1               2
MEANING                 Unattended      Attended        Debugging

BUGCHK action           NOP             Hit Breakpoint  Hit Breakpoint
BUGHLT action           Crash System    Hit Breakpoint  Hit Breakpoint
SWPMON write protect?   Yes             Yes             No
CHECKD on startup       Yes             Yes             No


Other console functions:

In addition to  EDDT,  several  other  entry  points  are  defined  as
absolute   addresses.    The  machine  may  be  started  at  these  as
appropriate.

        140     JRST EDDT               ; go to EDDT
        141     JRST SYSDDT             ; reset and go to EDDT
        142     JRST EDDT               ; copy of EDDT address
        143     JRST SYSLOD             ; initialize file system
        144       0
        145     JRST SYSRST             ; restart
        146     JRST SYSGOX             ; reload and start
        147     JRST SYSGO1             ; start

The soft restart (address 145, EVRST) restarts all  I/O  devices,  but
leaves  the  system  tables intact.  If it is successful, all jobs and
all (or all but 1) process  will  continue  in  their  previous  state
without  interruption.   This  may  be  used  if  an  I/O  device  has
malfunctioned  and  not  recovered  properly.    The   total   restart
initializes core, swapping storage and all monitor tables.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 85
CRASH ANALYSIS


A very limited set of control functions  for  debugging  purposes  has
been  built into the scheduler.  To invoke a function, the appropriate
bit or bits are set into location 20 via MDDT.  The  word  is  scanned
from  left  to  right  (JFFO).   The first 1 bit found will select the
function.

BIT 0:
        Causes scheduler to dismiss current process if any and stall
        (execute a JRST .), with -1 in AC0. Useful to effect a clean
        manual transfer to EDDT. System may be resumed at SCHED0.

BIT 1:
        Causes the job specified by data switch bits 18-35 to be run
        exclusively. Temporarily defeats JOB 0 not run BUGHLT.

BIT 2:
        Forces running of JOB 0 backup function before halting the
        system.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 86
BUG'TYP MACRO CHANGES FOR VERSION 4 OF TOPS-20


             BUG'TYP MACRO CHANGES FOR VERSION 4 OF TOPS-20

Version 4 of TOPS-20  will  include  some  changes  in  the  BUG  code
generation.   The  purpose  of these changes is to generate a document
describing the TOPS-20 BUGCHKs, BUGHLTs, and  BUGINFs  that  are  more
descriptive than the previous BUGSTRINGS.TXT file.

The logistics of this change include moving the BUG definitions out of
the  monitor  source  listings  and  into a central source file.  This
source file will serve both as the definition file for the bugs and as
documentation  for the BUGS.  This file is called BUGS.MAC and will be
distributed to all sites on the distribution  tape.   These  BUGS  are
still  referenced  in  the  source module where the bug is invoked but
they are defined in BUGS.MAC.

This involves a modification to the old BUG  macro  and  a  new  macro
called  DEFBUG.   The  BUG macro appears in the source modules and the
DEFBUG macro appears in BUGS.MAC.

The format of the new BUG macro is as follows:

              BUG (BUGNAM,<<x1,des1>,<x2,des2>...>)

This is placed in the monitor code where the BUG called BUGNAM  is  to
occur.  This macro executes a macro with name 'BUGNAM' which generates
a XCT BUGNAM where the contents of BUGNAM is a JSR BUG'TYP.  Following
the  location  BUGNAM  are  the Accumulators to be printed (one AC per
word) followed by SIXBIT/BUGNAM/.  The Accumulators to be printed  are
defined with the DEFBUG macro while the locations specified in the BUG
macro are for documentation only.

Accompanying this BUG macro is a DEFBUG macro which is placed  in  the
file  BUGS.MAC.   This entry completely defines the BUG, including its
type (BUGHLT, BUGCHK, or BUGINF) and documentation.

The format of the DEFBUG macro is:

                DEFBUG (TYP,TAG,MOD,WORD,STR,LOCS,HELP)


     For a description of the arguments to this macro see  the  SWSKIT
article called BUGS.MEM.  

In order to make listings (output from MACRO or CREF) more informative
than  before,  the  BUG  macro  will  cause the statement of the short
description displayed in the listing where the BUG  macro  is  called.
Also,  the  flavor of bug (INF, CHK, or HLT) and whether it's hardware
or software related will be  displayed  in  the  listing.   Hence  the
OVRDTA bug would appear in the listing as

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 87
BUG'TYP MACRO CHANGES FOR VERSION 4 OF TOPS-20


          BUG(OVRDTA)
          ;BUG Type:              hardware-related BUGINF
          ;BUG description:       PHYSIO - OVERDUE TRANSFER ABORTED


     When fully documented, the BUGS.MAC file will be extremely useful
for  specialists.  It will describe, in one convenient place, what the
additional data printed on the console is, what caused  the  bug,  and
what the site or specialist should do if that particular bug occurs.

Here is a section of the current BUG definition/documentation for  the
BUG GIVTMR from BUGS.MAC:

DEFBUG(INF,GIVTMR,JSYSA,SOFT,<GIVOK TIMEOUT>,<<T2,FUNC>>,<

Cause:  The access control job has not responded with a  GIVOK  within
        the designated time period.

Action: If this consistently happens with the same function code,  you
        should  see  if  the  processing  of  the function can be made
        faster.

        If there is no obvious function code pattern, you may need  to
        increase  the  timeout  period  or rework the way in which the
        access control program operates.

Data:   FUNC - the GETOK function code 


        >)

INF specifies the bug is a BUGINF.  GIVTMR is the  name  of  the  bug.
JSYSA  is the module that the bug would occur in.  SOFT specifies that
it is likely the bug is caused by a software bug.  <GIVOK TIMEOUT>  is
the  bug string.  <T2,FUNC> specifies the data that will be printed on
the operator's console.  The initial spec called  for  the  descriptor
FUNC  to  be included in the operator's message but at this time, this
descriptor is just for source documentation.

The blurbs following the initial line of the BUG definition attempt to
describe  to  the  specialist,  in  a  more  detailed  manner than the
description printed on the console, what it means when this bug occurs
and  what  should be done first in order to resolve the situation.  In
this case the ACTION is to examine the GETOK routine which is executed
for  the  additional  data  FUNC.   This  routine  is getting hung up.
Sometimes, the ACTION will state to call the hot line or to submit  an
SPR.   These  descriptions  will  help the specialist be more informed
about the bugs which may occur at one of their sites and save them the
time  of  calling  the hot line or searching through the source module
for an idea of the problem.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 88
MONITOR BUILDING HINTS


                         MONITOR BUILDING HINTS
                         ======================


1. GENERAL
   =======

Judging from the  number of  requests for  help on  this subject,  the
chances are that you  will be required to  rebuild a monitor  sometime
during your career  as a  Software Specialist. The  reasons are  quite
simple.  There are customers, who simply want functionality other than
that provided  by  stock  monitors.  There  are  also  those  who  are
experiencing performance problems. We  cannot forget the sales  folks.
It is not  unusual to  have to  rebuild a monitor  in order  to run  a
benchmark. A very common example  is increasing the OFN area.  Another
quite common requirement is  to increase the  patch area (FFF).  Doing
either of these and simply submitting a build control file will  often
produce a bad monitor.

We will talk about PSECTS in  relation to the Monitor's address  space
but will  make  no attempt  to  define what  they  do. A good detailed
discussion on the Monitor's address space is on pages 2-62 to 2-73  in
the Release 4  Update Manual. Also  there is a  memo on the  Monitor's
address space in the SWSKIT.


2. BACKGROUND
   ==========

In V3A, all of the Monitor was in the same address space. Nevertheless
there was a crunch on space. As  a result some PSECTS were allowed  to
overlap. So  critical  was the  space  requirement, that  attempts  to
increase the OFN area  or FFF usually resulted  in the overlapping  of
PSECTS other the  the ones  permitted. Therein lies  the problem.  The
Monitor produced from such a process would ordinarily be useless.
                                
With  the  development  of  V4,  the  space  requirement  became  more
critical.  The Symbol Table became the object of concern. It  required
a large number of pages, and in general, it is only used  infrequently
under normal  conditions.  Hence  the Engineering  folks were  of  the
opinion that  it should  be completely  elinminated. We  objected.  It
would be a nightmare to try  to debug the monitor without symbols.  It
thus became  our  project  to  somehow keep  the  Symbol  Table  while
conforming with  the space  restrictions.  We  decided to  remove  the
Symbol Table and place it in  an alternate  address  space. It  should
be noted  that  this  action  does  not  impact  adversely  on  system
performance. With this change, the  build procedure and the  monitor's
address space were reorganized.

3. BUILD PROCEDURE
   ===============

Outlined below are some steps to guide you when rebuilding a  monitor.
Bear in mind that this  is a guide and might  not account for all  the

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 89
MONITOR BUILDING HINTS


unusual situations.  This guide however, coupled with your  experience
and common  sense will  most likely  do the  trick. PLEASE  READ  THIS
ENTIRE MEMO BEFORE ATTEMPTING TO  REBUILD  YOUR  MONITOR. Also  please
read the build BEWARE file that is on the Installation tape.
        
NOTE:   The customers Distribution Tape will have all the files needed
        to rebuild  the  monitor.  All  TOPS-20  modules  will  be  in
        TOPS-20.REL (or T2020.REL etc) The control file is  TOPS20.CTL
        (or T2020.CTL  etc).  The  link file  will be  NAME.CCL  where
        "NAME" depends upon what monitor is being used (could be 2020,
        ARPA etc.). For 2040/50, it is called LNKSCH.CCL. In any  case
        the TOPS20.CTL file  will have  the name. The  files you  will
        change will be one  of  the  PARAM's  file  and/or STG.MAC. It
        should be noted that the special LINK.EXE and MACRO.EXE needed
        to build V3A are not required under V4.
        
        If you have  the time, it  is not a  bad idea to  use all  the
        standard files and  build yourself a  "vanilla" monitor.  This
        will test  the procedure  and files  and reveal  any  problems
        peculiar to the  build itself.  Once these  are resolved,  any
        problems encountered  when you  are rebuilding  your  modified
        monitor will be related to the change itself. The time for the
        debugging phase can thus be reduced substantially.
                
STEP 1          Restore all files needed  from <4-SOURCES>. This  will
                usually contain the monitor modules (TOPS20.REL file),
                all needed source  files, all  build control,  command
                and log files.
                
STEP 2          Carefully make the source changes as needed.
                
STEP 3          Examine the TOPS20.CTL  file. This  file will  usually
                have logical name definitions and TAKE commands  along
                with other things. Also look at all referenced command
                files.
                
STEP 4          Examine the  corresponding log  file. This  will  show
                what the result of  the original build procedure  was.
                It should therefore be a template which should be used
                to judge the validity of the new Monitor. Pay  special
                attention to the section which shows the PSECT  layout
                at the  end of  the BUILD  procedure. This  shows  the
                start location,  the end  location and  the amount  of
                free space between each PSECT.  The file used by  LINK
                to set up the PSECTS is called LNKSCH.CCL. You  should
                look at this file to get an idea of what's happening.
                
                
STEP 5          Now edit the control and command files as necessary to
                reflect your environment. This will mean, among  other
                things,   changing   or   eliminating   logical   name
                definitions.  Do NOT change the order of the PSECTS in
                the LNKSCH.CCL file. Also  do not change the  starting
                value for any PSECT.  The starting value is the  value

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 90
MONITOR BUILDING HINTS


                given to the /SET: switch.
                
STEP 6          Submit  the  control  file  with  /TAG:SINGLE  switch.
                Ensure that the control  file is correct and  reflects
                accurately logical name definitions and the .CCL file.
                Also this portion  of the .CTL  file has the  commands
                necessary to compile the changed module.
                
STEP 7          When the job ends, examine your log file. Correct  any
                compilation or  missing files  errors and  go back  to
                STEP 6. Continue with STEP 8 only after all errors are
                eliminated.
                
STEP 8          At this  point  you  should have  a  MONITR.EXE.   Now
                examine the section  in the  log file  which gives  an
                outline of  the  PSECTS.   If any  PSECTS  overlap,  a
                message will  indicate  the  same.  If  there  are  no
                overlapping messages, go to  STEP 11. NOTE: There  are
                some   instances  where  PSECTs  can  overlap.  POSTCD
                and SYVAR  PSECTs are  allowed to  overlap any  xxxVAR
                PSECT. This will  not gain  very much in  storage -  4
                pages to be exact. If  you  follow the build procedure
                then overlapping  PSECTs are not allowed and therefore
                must  be  resolved.  You  are  once  again advised NOT
                to re-organize the monitor's address space.
                
STEP 9          Start with  the  first  overlapping.  Figure  out  the
                amount of words by which  the first PSECT overlap  its
                following PSECT.  Now  add  this value  to  the  start
                location of  the overlapped  PSECT. This  value  quite
                possibly will  be  location  within  a  page  i.e.  an
                address of the form 125300,  where the page number  is
                125 and the offset into the page is 300. The  starting
                address of many  PSECTs is  required to be  on a  page
                boundary i.e. an  address of the  form 126000. A  good
                rule to  follow is:  IF THE  PSECT STARTED  ON A  PAGE
                BOUNDARY BEFORE  THE BUILD,  THEN KEEP  IT ON  A  PAGE
                BOUNDARY. This would mean that you may be required  to
                add an additional value to round up to the next  page.
                For example  the  125300  value would  be  rounded  to
                126000 if the  PSECT is required  on a page  boundary.
                The PSECT  sequence and  starting  values are  in  the
                LNKSCH.CCL file.  NOTE: the  values are  all given  in
                OCTAL so add in OCTAL.

                
STEP 10         EDIT the  LNKSCH.CCL file  to reflect  this new  start
                value for the  overlapped PSECT.  Go back  to STEP  6.
                Repeat these  steps  until  there are  no  more  error
                messages. Note that changing the start location of the
                overlapped PSECT can cause it to overlap its following
                PSECT and  the  same  procedure must  be  followed  to
                resolve any conflicts. Of  course you must be  careful
                to ensure that you do not outgrow the monitors address

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 91
MONITOR BUILDING HINTS


                space. A total of the  length of all PSECTs will  tell
                you if the Monitor is too large.
                
STEP 11         At this point you should have a good Monitor. Save  it
                in the proper directory. The final test is getting  it
                up and running.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 92
EXEC DEBUGGING


                             EXEC DEBUGGING
                             --------------


Now that most SWS have micro fiche of the released EXEC and MONITOR  I
anticipate  questions  on  looking at the EXEC and MONITOR.  Here is a
cursory tutorial on  investigating  the  internals  of  the  EXEC  (or
command  processor, if you prefer).  The examples are intended to be a
guide and although the typein is correct,  the  response  may  not  be
character perfect.  You are advised to read the other chapters in this
document  for  more  information  on  DDT  and  MONITOR  snooping  and
debugging.




                      LOOKING AT THE EXEC WITH DDT
                      ============================


You can either look at the running system EXEC or your own copy of the
EXEC with DDT that is loaded with the EXEC.


I.      TO LOOK AT THE RUNNING EXEC:

First you must have WHEEL  privileges  in  order  to  use  the  ^EEDDT
command.   The  ^EEDDT command transfers control to the DDT now loaded
with EXEC, with symbols.  Now you can do all the normal DDT functions.
To  exit  from  DDT  all you do is <ESC>G , echoed as $G.  This starts
your program which is the EXEC and so now  you  are  at  EXEC  command
level.

                @ENABLE
                $^EEDDT
                DDT
                .
                .
                .
                $G
                $DIS
                @

II.     TO LOOK AT YOUR COPY OF AN EXEC(RUNNING UNDER SYSTEM EXEC):

Get your copy of the EXEC in your address space, transfer  control  to
it  and  start  DDT  as  above.   There  are  3 ways to exit from this
depending on the state you are in.  If you are in DDT you can  ^Z  out
to  get back to system EXEC.  If you are running your EXEC and want to
exit to the system EXEC you can ^EQUIT (if you are enabled)  or  "POP"
(if  you  are not enabled).  POP is preferable.  Note if you prefer to
get your EXEC and not start it in order to set breakpoints or  put  in
patches before running, see section "VI -- PATCHING" below.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 93
EXEC DEBUGGING


EXAMPLE EXITING FROM DDT:

                @GET MYEXEC.EXE 
                @SET NO CONTROL-C-CAPABILITY
                @START
                @MONNAM.TXT, TOPS-20 MONITOR (VERSION#)
                @ENA
                $^EEDDT
                DDT
                .
                .
                .
                CINITF/  -1   0         ; reset initialization flag so you can  
                                        ; run this EXEC again after it is saved
                .
                ^Z                      ; to exit and save, for example
                @                       ; now you are in the monitors EXEC
                                        ; with your EXEC in your
                                        ; address space.  You can save it, say.
                @SAV MYEXEC.EXE.2



EXAMPLE, EXITING FROM YOUR RUNNING EXEC:

                @GET MYEXEC.EXE
                @START
                @MONNAM.TXT,,TOPS-20 MONITOR(VERSION #)
                @ENA
                ^EEDDT
                DDT
                .
                .
                $G                      ; running your EXEC
                .
                .
                CINITF/  -1  0          ; clear initialization flag
                $^EQUIT                 ; return to higher (system) EXEC
                @                       ; you are in system EXEC
                @SAV NEWEXEC            ; etc.



EXAMPLE, EXITING FROM YOUR RUNNING EXEC WITH POP:

                @GET MYEXEC.EXE
                @START
                @MONNAM.TXT,,TOPS-20 MONITOR(VERSION#)
                @
                .
                .
                .
                @POP                    ; return to higher (system) EXEC.
                @                       ; now you are in system EXEC.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 94
EXEC DEBUGGING



                                        ; NOTE: you should set CINITF to 0
                                        ; if you want to save and run this
                                        ; EXEC later.  You can do it by
                                        ; DDT after the POP or ^EEDDT before
                                        ; the POP.

III.    GETTING OUT OF TROUBLE:


     Since it is true that you could get into trouble with  your  EXEC
and  not  be  able  to get out of it, CTRL/C traps or you can't POP or
whatever, there is a way to exit to the MINI-EXEC always.   First  you
must  issue ^EQUIT to get into the MINI-EXEC.  Then "S" (start) to get
back to the system EXEC.  Then get into your EXEC.   If  you  now  get
into  trouble  you  can  issue  ^P  which  will  get you back into the
MINI-EXEC.  Now you have the chance to get back  to  the  system  EXEC
with "S" (start).


        EXAMPLE:

        @ENA
        $^EQUIT
        INTERRUPT AT 15657
        MX>S
        $                               ; you are now back in system EXEC.
        $GET MYEXEC
        $
        $START
        @MONNAM.TXT, TOPS-20 MONITOR (VERSION)
                .                       ; lets say you can't do anything
                .                       ; you are in your EXEC
                .                       ; get out, get into MINI-EXEC
        ^P
        INTERRUPT AT 12345
        MX>S                            ; MINI-EXEC prompt followed by start.
        $                               ; you are now in the system EXEC.


IV.     RUNNING YOUR EXEC AS A TOP LEVEL FORK:



     Suppose that you want to run your EXEC as  the  top  level  EXEC,
that  is,  not  running under the system EXEC.  Get into the MINI-EXEC
and get your copy of the EXEC and run it as the top level EXEC.


        EXAMPLE:

        @ENA
        $^EQUIT
        INTERRUPT AT 23456

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 95
EXEC DEBUGGING


        MX>R                  ; Reset else you will MERGE rather than just GET
        MX>G <MYAREA>MYEXEC.EXE.2
        MX>S
        @                     ; Now you are in your EXEC
          .
          .
          .                   ; Lets say you want to get out 
        @^P                   ; Control-P to get to MINI-EXEC
        ABORT
        MX>R                  ; "RESET" resets your address space
        MX>E                  ; You are requesting the system EXEC
        @                     ; You are in system EXEC        

NOTE:   If you had typed "S"  rather than "E" above you  would
        have restarted your EXEC.


V.      OTHER INFORMATION:

There is one error message when trying to start DDT;  "?" implies that
you do not have sufficient privleges enabled.


     When searching for symbols you may notice that  the  module  name
DDT  gives  you  is different from the module names that are assembled
for the EXEC.  For example to open the symbol table for EXECED you say
CANDE$:  to DDT.

The following is a correspondence list:

        FILENAME.MAC    INTERNAL REFERENCE
        ==================================
        EXECDE.MAC      XDEF
        EXECGL.MAC      XGLOBS
        EXECPR.MAC      PRIV
        EXEC0.MAC       EXEC0
        EXEC1.MAC       EXEC1
        EXEC2.MAC       EXEC2
        EXEC3.MAC       EXEC3
        EXEC4.MAC       EXEC4
        EXECED.MAC      CANDE
        EXECCS.MAC      CSCAN
        EXECSU.MAC      SUBRS
        EXECMT.MAC      EXECMT
        EXECQU.MAC      EXECQU
        EXECSE.MAC      EXECSE
        EXECP.MAC       EXECP
        EXECVR.MAC      VER
        EXECMI.MAC      MIC

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 96
EXEC DEBUGGING


     The sources and .CTL file for assembling  the  EXEC  are  on  the
SWSKIT.

     If it is true that upon trying to examine a location symbolically
you get "U" implying the symbol is undefined you may have to reset the
symbol table pointers.  Look in location 770001 for the  address  that
contains  the  symbol  table pointer then look at location 116 to find
the real symbol table  pointer.   Put  the  contents  of  116  in  the
location pointed to by 770001.

        116/   762600,54463   ; real symbol table pointer

        770001/  776456       ; location of symbol table pointer
        776456/  743200,,23540     762600,,54463


VI.     PATCHING

     There is a patch command in DDT.  The form is as follows:

        $<                    ; patch before this instruction
        $$<                   ; patch after this instruction
        $>                    ; end this patch following this instruction

DDT  will  put  the patch in the EXEC patch area.  The symbol is PAT..
DDT will insert JUMPA 1,LOC+1 and JUMPA 2,LOC+2  following  the  patch
you  typed  in.   Where  LOC is the location of the instruction you're
patching.  DDT then replaces LOC, the original  INST.,  with  a  JUMPA
XXXXX,  where  XXXXX  is the patch area where your patch is now.  Then
the patch area (PAT..) is redefined to follow your last patch.


     EXAMPLE:

Get a copy of <SYSTEM>EXEC, insert  calls  to  subroutine  MUMBLE  and
subroutine  FRATZ  before  location  DING+1.  DING+1 contains PRINT Q3
originally and contains a JUMPA to the patch  area  after  the  patch.
The patch area will contain:

        CALL MUMBLE
        CALL FRATZ
        PRINT Q3
        JUMPA 1,DING+2
        JUMPA 2,DING+3


USER TYPESCRIPT FOR THE ABOVE:

        @ENABLE
        $GET<SYSTEM>EXEC
        $SAVE NUEXEC          ; you must SAVE and GET in order to write
        $GET NUEXEC           ; enable the EXEC to use DDT not ^EEDDT.
        $DDT
        DDT

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 97
EXEC DEBUGGING


        EXEC0$:               ; open symbols for module where DING is

        DING/ PUSH P,A        ; first location in routine "DING"
        DING+1/ PRINT Q3 $<   ; begin patching before location DING+1
        PAT../ 0  CALL MUMBLE ; DDT opens up PAT.. area, you add code
        PAT..+1/CALL FRATZ    ; continue to insert your patch
        $>                    ; close the patch
        PAT..+2/ PRINT Q3     ; the original instruction being replaced.
        PAT..+3/ JUMPA 1,DING+2       ; DDT inserts this return.
        PAT..+4/ JUMPA 2,DING+3       ; incase a SKIP inst.

        DING+1/  JUMPA 12345  ; JUMPA to PAT.. replaces original LOC.

        $G                    ; start your copy of EXEC etc.


     Various  methods  may  be  used  to  write-enable  the  EXEC  for
patching.   You  can  use  the  GET,  SAVE method above, or SET PAGE n
COPY-ON-WRITE, or the $W command in DDT to achieve the same results.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 98
RECOVERING FROM A BAD EXEC


                        RECOVERING FROM A BAD EXEC
                        --------------------------


     This procedure is simply a rehash of the procedure for recovering
from  the  case  in  which  the  EXEC  refuses  to  log  in.  For more
information see the article "Looking at the EXEC with DDT".

     If your system version of the EXEC blows up completely,  you  can
recover  rather  easily.   You type a ^C on the CTY, and when the EXEC
blows up you will be dumped into the MINI-EXEC.  Then you can use  the
GET  and  START commands to read in a good version of the EXEC, either
from a copy on disk, or from the distribution magtapes.

     If the problem with the EXEC is that it does not blow up, but  it
still  fails  to let you log in, then you have a harder time.  In this
case you have to bring up the system with the switches, and  bring  up
the system stand-alone.  An example of what to do from the point where
the BOOT program is loaded follows:

BOOT>/L                 ; load in the monitor
BOOT>/G141              ; start up EDDT

EDDT
DBUGSW[   0   2         ; set system as debugging
EDDTF[   0   1          ; keep EDDT around

GOTSWM$B                ; set a breakpoint after the swappable
                        ; part of the monitor has been loaded
147$G                   ; start the system
GOTSWM$1B>>   STEX+1/   HRROI T2,BOOTER+51   HRROI T2,FFF
FFF[   ""PS:<SYSTEM>OLD-EXEC.EXE"
FFF:                    ; change the name of the EXEC file
0$1B                    ; remove the GOTSWM breakpoint
$P                      ; proceed to bring up the system

^C                      ; and Control-C to get the new EXEC

If  you had no old version of the EXEC around, then change the name to
some garbage, so that the monitor can't find any such  program.   This
will  then  dump  you into the MINI-EXEC, and then you can read a good
EXEC in from magtape.

     In release 3 of the monitor, there is a new JSYS  which  is  very
useful  for  debugging  new  versions of the EXEC.  The CRJOB JSYS can
allow you to start up a new job with any program at all  as  it's  top
level  fork.   You  can  also start the job not logged in.  So you can
debug your new versions of the EXEC easily,  with  no  possibility  of
ripping yourself off.     Of course the  ^EQUIT, GET from MINI-EXEC is
still a valid sequence for starting a new top-level fork.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 99
Debugging the GALAXY System


                      Debugging the GALAXY System








1.0  INTRODUCTION

The GALAXY system presents a unique problem to the software specialist
who  is trying to debug one of its components.  Usually, any user mode
program can be debugged under TOPS-20 by running a copy of it,  loaded
with  DDT,  taking  appropriate  care  that nothing is done which will
affect any users of the system.   For  GALAXY,  however,  it  is  very
difficult  to not affect users of the system.  For example, if you are
trying to debug BATCON, you will find that QUASAR  will  very  happily
schedule batch jobs submitted by other users to be run by your BATCON.
If you are not careful, you can cause those batch jobs to be lost,  or
at least slowed down, while you are debugging.

Debugging QUASAR or ORION would be even worse.  Users would see PRINT,
SUBMIT,  etc.   commands  hang  when  you  hit a breakpoint in QUASAR.
Operators would be unable to control any system components if you were
breakpointed  in  ORION.   On  top  of  this,  the monitor knows about
QUASAR, and you may lose messages which  happen  when  users  close  a
spooled lineprinter file, or when a job logs out.

To solve these problems, the concept of a "private GALAXY system"  has
been implemented by software engineering in version 4 of GALAXY.  When
a private GALAXY system  is  operating,  all  of  its  components  are
completely  independent  of  the  primary  GALAXY system.  QUASAR, the
queue maintainer, keeps queues  that  are  separate  from  the  system
queues  and  are  failsofted  to  a different master queue file.  This
QUASAR communicates only with other components  in  the  same  private
system.   It  is  even possible to run several complete private GALAXY
systems, with the restrictions that:

     1.  All components in a private system must run  under  the  same
         user name.

     2.  Only one private system may be run by a given user.

     3.  Each  private  QUASAR  must  be  connected  to  a   different
         directory.

     4.  Each  private  ORION  must  be  connected  to   a   different
         directory.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 100
BUILDING A PRIVATE GALAXY SYSTEM


2.0  BUILDING A PRIVATE GALAXY SYSTEM

Since the changes necessary to create a  private  GALAXY  system  were
implemented  in  the version 4 source code, it is relatively simple to
build the system.  The recommended procedure is as follow:

     1.  Create a directory to for the private GALAXY system.

     2.  Restore  the  file  EXEC-FOR-DEBUGGING-GALAXY.EXE  from   the
         SWSKIT to this newly created directory.

     3.  Restore each of the following files from  the  "Subsys  files
         for  TOPS20  V4"  saveset on the TOPS-20 distribution tape to
         this directory.

                        BATCON.EXE
                        CDRIVE.EXE
                        GLXLIB.EXE
                        LPTSPL.EXE
                        OPR.EXE
                        ORION.EXE
                        PLEASE.EXE
                        QMANGR.EXE
                        QUASAR.EXE
                        SPRINT.EXE
                        SPROUT.EXE

     4.  For each component in the above list  except  GLXLIB.EXE  and
         QMANGR.EXE, perform the following steps:

         1.  Give the EXEC command "GET xxxxxx.EXE"

         2.  Give the command "DEPOSIT 135 -1"

         3.  Give the command "SAVE xxxxxx"

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 101
EXAMPLE OF A PRIVATE GALAXY BUILD


3.0  EXAMPLE OF A PRIVATE GALAXY BUILD

It is not strictly necessary to restore all of the  GALAXY  components
for  a  one  time  only  debugging session.  To debug a component like
BATCON, you would need at a minimum:

     1.  Your own copy of BATCON

     2.  Your own copy of QUASAR for BATCON to speak to

     3.  Your own copy of ORION for BATCON and QUASAR to speak to

     4.  A copy of OPR to speak to ORION to control BATCON

     5.  An EXEC which knows about your QUASAR to make queue entries

The following is a log of an example build of a private GALAXY system:


 TOPS-20 Command processor 4(560)
@ENABLE (CAPABILITIES) 
$!
$! First connect to a debugging directory
$!
$CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG> 
$!
$! Now build and save debugging .EXE files
$!
$! QUASAR, the queue maintainer
$!
$GET (PROGRAM) SYS:QUASAR.EXE.55 
$DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
 [Shared] 
$SAVE (ON FILE) QUASAR.EXE.1 !New file! (PAGES FROM) 
 QUASAR.EXE.1 Saved
$!
$! ORION, the message clearinghouse
$!
$GET (PROGRAM) SYS:ORION.EXE.53 
$DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
 [Shared] 
$SAVE (ON FILE) ORION.EXE.1 !New file! (PAGES FROM) 
 ORION.EXE.1 Saved
$!
$! OPR, the operator interface
$!
$GET (PROGRAM) SYS:OPR.EXE.55 
$DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
 [Shared] 
$SAVE (ON FILE) OPR.EXE.1 !New file! (PAGES FROM) 
 OPR.EXE.1 Saved
$!
$! BATCON, the batch controller
$!

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 102
EXAMPLE OF A PRIVATE GALAXY BUILD


$GET SYS:BATCON.EXE.39 
$DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
 [Shared] 
$SAVE (ON FILE) BATCON.EXE.1 !New file! (PAGES FROM) 
 BATCON.EXE.1 Saved
$!
$! Now a directory of what we've got
$!
$VDIRECTORY (OF FILES) *.*.* 

   MISC:<HEMPHILL.GALAXY.DEBUG>
 BATCON.EXE.1;P777700    16 8192(36)   13-Feb-80 22:00:37 
 EXEC-FOR-DEBUGGING-GALAXY.EXE.1;P777700
                         82 41984(36)  13-Feb-80 04:33:50 
 OPR.EXE.1;P777700       31 15872(36)  13-Feb-80 22:00:09 
 ORION.EXE.1;P777700     44 22528(36)  13-Feb-80 21:59:45 
 QUASAR.EXE.1;P777700    40 20480(36)  13-Feb-80 21:59:27 

 Total of 213 pages in 5 files
$

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 103
RUNNING THE PRIVATE GALAXY SYSTEM


4.0  RUNNING THE PRIVATE GALAXY SYSTEM

Starting and running a private GALAXY system  is  similar  to  running
GALAXY  in the usual manner.  First QUASAR and ORION are started, then
the component you wish to debug.  You will  also  need  OPR  to  issue
operator  commands and the modified EXEC to make queue entries.  Since
you will need about five jobs, it is usually most  convenient  to  run
each component as a separate subjob under PTYCON.



4.1  Starting QUASAR

QUASAR and ORION should be started before  everything  else.   Nothing
evil happens if you start them last, but all the other components will
be waiting for these two to start.  A suggested procedure is:

     1.  Define a subjob "Q"

     2.  Connect to it

     3.  LOGIN a job under the same user name

     4.  CONNECT that job to  the  directory  in  which  you  did  the
         private GALAXY build

     5.  ENABLE

     6.  RUN QUASAR




4.2  Starting ORION

Starting ORION is as painless as starting QUASAR:

     1.  Define a subjob "O"

     2.  Connect to it

     3.  LOGIN a job under the same user name

     4.  CONNECT that job to  the  directory  in  which  you  did  the
         private GALAXY build

     5.  ENABLE

     6.  RUN ORION

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 104
Starting OPR


4.3  Starting OPR

OPR starts up using the same formula as QUASAR and ORION:

     1.  Define a subjob "OPR"

     2.  Connect to it

     3.  LOGIN a job under the same user name

     4.  CONNECT that job to  the  directory  in  which  you  did  the
         private GALAXY build

     5.  ENABLE

     6.  RUN OPR

     7.  You may now type OPR commands to  see  if  QUASAR  and  ORION
         appear to be healthy.




4.4  Starting The Component To Be Debugged

If the component you wish to debug is QUASAR, ORION, or OPR, then  you
have  already  started  it.  Breakpoints could have been set, and when
they were hit, the component could  have  been  debugged  without  any
noticable  affect  on other users of the system.  If you wish to debug
PLEASE, BATCON, LPTSPL, CDRIVE, SPRINT, or SPROUT, do the following:

     1.  Define a subjob with an appropriate ID (e.g.  B for BATCON)

     2.  Connect to it

     3.  LOGIN a job under the same user name

     4.  CONNECT that job to  the  directory  in  which  you  did  the
         private GALAXY build

     5.  ENABLE

     6.  GET the component

     7.  Enter DDT

     8.  Set breakpoints, then start the program

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 105
Starting the Modified EXEC


4.5  Starting The Modified EXEC

The file "EXEC-FOR-DEBUGGING-GALAXY.EXE" which has  been  supplied  on
the  SWSKIT  has  exactly two commands added to its repertoire.  These
are "^ESET DEBUGGING-GALAXY" and  "^ESET  NO  DEBUGGING-GALAXY".   The
effect  of  these commands is to select which one of two PIDs (Process
IDs) to communicate with:  the system QUASAR or  the  private  QUASAR.
If  "NO  DEBUGGING-GALAXY" is set, then PRINT, SUBMIT, CANCEL, MODIFY,
and the INFORMATION commands will all  cause  communication  with  the
system  QUASAR.   If "DEBUGGING-GALAXY" is set for this EXEC, then the
commands listed will communicate with the private QUASAR run  by  that
user.

     1.  Define a subjob "E"

     2.  Connect to it

     3.  LOGIN a job under the same user name

     4.  CONNECT that job to  the  directory  in  which  you  did  the
         private GALAXY build

     5.  RUN EXEC-FOR-DEBUGGING-GALAXY

     6.  ENABLE

     7.  ^ESET DEBUGGING-GALAXY

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 106
EXAMPLE DEBUGGING SESSION


5.0  EXAMPLE DEBUGGING SESSION

The following is a log of a sample debugging session:


 TOPS-20 Command processor 4(560)
@!
@! First run PTYCON, so we can control five jobs from one terminal
@!
@PTYCON.EXE.7 
PTYCON> !
PTYCON> ! Now start up QUASAR as subjob Q
PTYCON> !
PTYCON> DEFINE (SUBJOB #) 0 (AS) Q
PTYCON> CONNECT (TO SUBJOB) Q
[CONNECTED TO SUBJOB Q(0)]

 2102 Development System, TOPS-20 Monitor 4(3245)
@LOG HEMPHILL (PASSWORD) 
 Job 21 on TTY222 13-Feb-80 22:18:05
Structure PS: mounted
Structure MISC: mounted
@ENABLE (CAPABILITIES) 
$!
$! Connect to directory where debugging .EXE files are
$!
$CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG> 
$!
$! Finally run the component
$!
$RUN (PROGRAM) QUASAR.EXE.1 
% QUASAR GLXIPC Becoming  [HEMPHILL]QUASAR     (PID = 66000031)
% QUASAR GLXIPC Waiting for ORION to start
^X
PTYCON> !
PTYCON> ! Now start up ORION as subjob O
PTYCON> !
PTYCON> DEFINE (SUBJOB #) 1 (AS) O
PTYCON> CONNECT (TO SUBJOB) O
[CONNECTED TO SUBJOB O(1)]

 2102 Development System, TOPS-20 Monitor 4(3245)
@LOG HEMPHILL (PASSWORD) 
 Job 22 on TTY223 13-Feb-80 22:19:25
Structure PS: mounted
Structure MISC: mounted
@ENABLE (CAPABILITIES) 
$!
$! Connect to directory where debugging .EXE files are
$!
$CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG> 
$!
$! Finally run the component
$!

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 107
EXAMPLE DEBUGGING SESSION


$RUN (PROGRAM) ORION.EXE.1 
% ORION  GLXIPC Alternate [HEMPHILL]QUASAR     (PID = 66000031)
% ORION  GLXIPC Becoming  [HEMPHILL]ORION      (PID = 70000032)
**** Q(0) 22:19:58 ****
% QUASAR GLXIPC Alternate [HEMPHILL]ORION      (PID = 70000032)
**** O(1) 22:19:58 ****
^X
PTYCON> !
PTYCON> ! Now start up OPR as subjob OPR
PTYCON> !
PTYCON> DEFINE (SUBJOB #) 2 (AS) OPR
PTYCON> CONNECT (TO SUBJOB) OPR
[CONNECTED TO SUBJOB OPR(2)]

 2102 Development System, TOPS-20 Monitor 4(3245)
@LOG HEMPHILL (PASSWORD) 
 Job 23 on TTY224 13-Feb-80 22:20:29
Structure PS: mounted
Structure MISC: mounted
@ENABLE (CAPABILITIES) 
$!
$! Connect to directory where debugging .EXE files are
$!
$CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG> 
$!
$! Finally run the component
$!
$RUN (PROGRAM) OPR.EXE.1 
% OPR    GLXIPC Alternate [HEMPHILL]QUASAR     (PID = 66000031)
% OPR    GLXIPC Alternate [HEMPHILL]ORION      (PID = 70000032)
OPR>
22:19:59          -- Network Node 1031 is Online --

22:19:59          -- Network Node 2137 is Online --

22:19:59          -- Network Node 4097 is Online --

22:19:59          -- Network Node DN20A is Online --

22:19:59          -- Network Node MILL20 is Online --

22:19:59          -- Network Node SYS880 is Online --
OPR>!
OPR>! Let's take a look at our brand new queues
OPR>!
OPR>SHOW QUEUES 
OPR>
22:21:21          --The Queues are Empty--
OPR>SHOW STATUS PRINTER 
OPR>
22:21:27          --There are no Devices Started--
OPR>^X
PTYCON> !
PTYCON> ! Now start up BATCON as subjob B

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 108
EXAMPLE DEBUGGING SESSION


PTYCON> !
PTYCON> DEFINE (SUBJOB #) 3 (AS) B
PTYCON> CONNECT (TO SUBJOB) B
[CONNECTED TO SUBJOB B(3)]

 2102 Development System, TOPS-20 Monitor 4(3245)
@LOG HEMPHILL (PASSWORD) 
 Job 24 on TTY225 13-Feb-80 22:21:49
Structure PS: mounted
Structure MISC: mounted
@ENABLE (CAPABILITIES) 
$!
$! Connect to directory where debugging .EXE files are
$!
$CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG> 
$!
$! Finally run the component
$!
$RUN (PROGRAM) BATCON.EXE.1 
% BATCON GLXIPC Alternate [HEMPHILL]QUASAR     (PID = 66000031)
% BATCON GLXIPC Alternate [HEMPHILL]ORION      (PID = 70000032)
^X
PTYCON> !
PTYCON> ! Now start up special EXEC as subjob E
PTYCON> !
PTYCON> DEFINE (SUBJOB #) 4 (AS) E
PTYCON> CONNECT (TO SUBJOB) E
[CONNECTED TO SUBJOB E(4)]

 2102 Development System, TOPS-20 Monitor 4(3245)
@LOG HEMPHILL (PASSWORD) 
 Job 19 on TTY226 13-Feb-80 22:23:00
Structure PS: mounted
Structure MISC: mounted
@CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG> 
@!
@! Run the special EXEC, which is provided on the SWSKIT
@!
@RUN (PROGRAM) EXEC-FOR-DEBUGGING-GALAXY.EXE.1 

 TOPS-20 Command processor 4(560)-1
@ENABLE (CAPABILITIES) 
$!
$! Make this EXEC switch from system queues to private queues
$!
$^ESET DEBUGGING-GALAXY 
$!
$! Use ordinary EXEC commands to examine private queues
$!
$INFORMATION (ABOUT) OUTPUT-REQUESTS 
[The Queues are Empty]
$INFORMATION (ABOUT) BATCH-REQUESTS 
[The Queues are Empty]
$!

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 109
EXAMPLE DEBUGGING SESSION


$! Now switch back to look at system queues
$!
$^ESET NO DEBUGGING-GALAXY 
$INFORMATION (ABOUT) OUTPUT-REQUESTS 

Printer Queue:
Job Name  Req#  Limit            User
--------  ----  -----  ------------------------
* KLERR      6   1197  DEUFEL                     On Unit:0
   Started at 22:05:47, printed 314 of 1197 pages
  XXX        3     18  KAMANITZ                   /Dest:4097
  MS-OUT    18    117  BRAITHWAITE                /Unit:0
There are 3 Jobs in the Queue (1 in Progress)

$INFORMATION (ABOUT) BATCH-REQUESTS 

Batch Queue:
Job Name  Req#  Run Time            User
--------  ----  --------  ------------------------
* DUMP      16  02:00:00  OPERATOR                In Stream:0
    Job# 17 Running DUMPER Last Label: A Runtime 0:23:55
  BATCH      2  00:05:00  BLIZARD                 /Proc:FOO
  SOURCE     8  00:05:00  BLOUNT                  /After:14-Feb-80  0:00
  SRCCOM    12  00:05:00  MURPHY                  /After:14-Feb-80  0:00
  QJD4R     13  00:05:00  SROBINSON               /After:19-Feb-80  0:00
  QAR       10  00:05:00  BLOUNT                  /After:19-Feb-80  0:14
  SAVE       1  00:05:00  FICHE                   /After:19-Feb-80  9:10
There are 7 Jobs in the Queue (1 in Progress)

$!
$! Now let's submit a batch job to our own BATCON
$!
$^ESET DEBUGGING-GALAXY 
$!
$! Make a trivial batch control file
$!
$COPY (FROM) TTY: (TO) A.CTL.1 !New file! 
 TTY: => A.CTL.1

@SY A
^Z
$!
$! And submit the job
$!
$SUBMIT (BATCH JOB) A.CTL.1 
[Job A Queued, Request-ID 1, Limit 0:05:00]
$!
$! Now examine private queues
$!
$INFORMATION (ABOUT) BATCH-REQUESTS 

Batch Queue:
Job Name  Req#  Run Time            User
--------  ----  --------  ------------------------

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 110
EXAMPLE DEBUGGING SESSION


  A          1  00:05:00  HEMPHILL              
There is 1 Job in the Queue (None in Progress)

$!
$! Our job is in the batch queue, but no batch-streams have been started
$!
$^X
PTYCON> CONNECT (TO SUBJOB) OPR
[CONNECTED TO SUBJOB OPR(2)]

OPR>START (Object) BATCH-STREAM (Stream Number) 0
OPR>
22:25:40        Batch-Stream 0  --Startup Scheduled--

22:25:40        Batch-Stream 0  --Started--
OPR>
22:25:40        Batch-Stream 0  --Begin--
                Job A Req #1 for HEMPHILL
OPR>
22:25:51        Batch-Stream 0  --End--
                Job A Req #1 for HEMPHILL
OPR>
^X
PTYCON> !
PTYCON> ! Cleaning up is easy
PTYCON> !
PTYCON> KILL (SUBJOB) ALL
PTYCON> EXIT (FROM PTYCON) 
@

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 111
TECHNICAL DETAILS


6.0  TECHNICAL DETAILS

This section is to explain what happens differently when  a  component
has  had  location  135  (.JBOPS)  poked  to  -1, and to present a few
helpful tidbits of information about debugging some of  the  programs.
.JBOPS  incidentally  is  the word in the job data area (defined under
TOPS-10) which is reserved for a  program's  OTS.   GALAXY  references
this location by the symbol "DEBUGW".



6.1  GLXLIB

GLXLIB is the GALAXY library.  It consists of  a  code  segment  which
starts  at  address 400000 and a data segment at address 600000.  Each
of the programs QUASAR, ORION, OPR, PLEASE,  BATCON,  LPTSPL,  CDRIVE,
SPRINT,  and  SPROUT uses it.  Part of the initialization code of each
of these programs maps in GLXLIB as a  "high  segment".   This  is  in
effect  an  object  time  system  for  GALAXY, with many commonly used
routines.  Most of the support for the private  GALAXY  system  is  in
this  library,  enough so that OPR, PLEASE, BATCON, LPTSPL, SPRINT and
SPROUT actually have no code which cares whether they are  part  of  a
private  GALAXY.   The  initialization code in each component looks in
three places to find GLXLIB.EXE:  first on the structure and directory
that  the  component  itself came from, second on DSK:, third on SYS:.
This search order is the same for  both  the  system  GALAXY  and  the
private one.

     The actual changes implemented for  the  private  GALAXY  are  as
follows:

     1.  Ordinarily, a component which stopcodes  will  save  a  crash
         file on disk.  When debugging, however, the crash file is not
         written.  In either case, if DDT is loaded with the  program,
         the stopcode will invoke a jump to DDT.

     2.  GALAXY components do not require receiving privileged packets
         under debugging.

     3.  Ordinarily, QUASAR and ORION get special system PIDs for IPCF
         communications.   When debugging, they get PIDs with names of
         the  form  "[username]QUASAR"  and  "[username]ORION".    All
         GALAXY components will then look for these PID names.  Even a
         pseudo-GALAXY component, such as MOUNTR or  IBMSPL,  will  be
         able to find these PIDs if its location 135 has been poked to
         -1, simply because it uses GLXLIB.

     4.  GALAXY components print messages like:
         "% QUASAR GLXIPC Waiting for ORION to start"
         only while debugging.

     5.  ORION and QUASAR print  messages  about  PIDs  they  acquire,
         like:
         "% QUASAR GLXIPC Becoming  [HEMPHILL]QUASAR     (PID =

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 112
TECHNICAL DETAILS


         66000031)"

     6.  All components print messages about  the  special  PIDs  they
         find for QUASAR and ORION, like:
         "% ORION  GLXIPC Alternate [HEMPHILL]QUASAR     (PID =
         66000031)"




6.2  QUASAR

     1.  QUASAR reads and writes private  queues  from  its  connected
         directory.  The full filespec is 
         "DSK:PRIVATE-MASTER-QUEUE-FILE.QUASAR"

     2.  QUASAR does absolutely no  privilege  checking.   Anyone  can
         modify or kill any request in the queues (if they know how to
         speak to this private QUASAR).




6.3  ORION

     1.  ORION  will  create  a   log   file   under   the   name   of
         "DSK:ORION-TEST.LOG"                instead                of
         "PS:<SPOOL>ORION-SYSTEM-LOG.001", and does no renaming of any
         old log files present.

     2.  ORION will not set up any NSP  servers  when  debugging.   It
         therefore  will  not  speak  to  remote nodes to run OPRs for
         them.  However, there  are  hooks  for  ORION  to  initialize
         "SRV:128" instead of the usual "SRV:47" when debugging.




6.4  QMANGR

QMANGR has also been modified to look for a private  QUASAR's  PID  if
the low segment has a non-zero entry in .JBOPS.



6.5  CDRIVE

CDRIVE can pose a problem to debug,  since  it  has  potentially  many
inferior forks all executing the same code, so each fork automatically
loads SDDT into its address space and jumps to it when it  starts  up.
After setting any breakpoints or otherwise modifying this fork's code,
the debugger types "GO<ESC>G" to resume the fork.  While debugging, if
the  fork  terminates (crashes), CDRIVE will not go through its normal
purging of the crashed fork, so that its status can be examined.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 113
EXAMINING GALAXY CRASH FILES


7.0  EXAMINING GALAXY CRASH FILES

All GALAXY components use the stopcode facility  supplied  by  GLXLIB.
This  facility  dumps  the  ACs, program error codes, associated error
messages, program version numbers, and the last nine locations of  the
stack  onto  the  controlling  terminal  of  the program executing the
stopcode.  In addition, a crash file is created with the name  of  the
form:   PS:<SPOOL>program-stopcode-CRASH.EXE.  This .EXE file contains
the entire core image  of  the  program  which  has  crashed,  and  is
extremely   useful   in  determining  the  cause  of  the  crash.   In
particular, there is a block of data referred to as the "crash  block"
which usually contains the information most pertinent to the debugger.
This information can be read with either DDT or FILDDT.  Its  contents
are tabulated as follows:

        Location                Data

        .SPC                    PC of stopcode

        .SCODE                  SIXBIT name of stopcode

        .SERR                   Last TOPS-20 error code

        .SACS                   Contents of the sixteen accumulators

        .SPTBL                  Base address of page table used by
                                  GLXMEM

        .SPRGM                  Name of program in SIXBIT

        .SPVER                  Program version number

        .SPLIB                  GLXLIB version number

        .LGERR                  Last GALAXY error code

        .LGEPC                  PC of last GALAXY error return

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 114
DEBUGGING MOUNTR


                            DEBUGGING MOUNTR



1.0  INTRODUCTION

     This write-up was prepared to assist developers  and  maintainers
in understanding and debugging the TOPS-20 tape and structure mounting
program, MOUNTR.   It  is  assumed  that  the  reader  has  a  working
knowledge  of TOPS-20 assembler language coding and the set of TOPS-20
monitor calls.



2.0  SOURCES OF INFORMATION

     This document will serve primarily as a guide to debugging MOUNTR
crashes.   Much of the information needed to understand the data bases
and the operation of MOUNTR resides within the first 20 or 30 pages of
the MOUNTR code itself.  Just make a listing and start reading.



3.0  MOUNTR CRASHES

When MOUNTR crashes, it saves its core image in the file,

     PS:<SPOOL>MOUNTR-CRASH.EXE

All crashes are initiated by a CALL STOP instruction.  This may result
from  a  logic  inconsistency,  or  it can happen if MOUNTR receives a
software interrupt on a panic channel.  The STOP routine gathers  some
important  data  and saves it in core.  It then types a message giving
the name of the filespec wherein it is  saving  the  core  image,  and
issues  an SSAVE JSYS to save the image.  After restoring the ACs from
the time of the crash, MOUNTR halts.

To begin debugging a MOUNTR crash, follow these steps:

     1.  GET PS:<SPOOL>MOUNTR-CRASH.EXE

     2.  Get into DDT and type STOP1$G.  This will load DDT's ACs with
         MOUNTR's  ACs  at the time of the crash and exit to the EXEC.
         Give the DDT command to the EXEC again to get back into ddt.

     3.  Look at P (AC 17).  If it contains PDL1+something, there  has
         been  a  stack  trap,  and  the routine STOPP was called as a
         result.  The location BADP contains the contents of P at  the
         time of the trap.

     4.  If P contains PDL+something, type TAB to look at the  top  of
         the  stack.   This  will  contain one plus the address of the
         CALL STOP instruction.   Type  TAB  and  ^H  to  display  the
         CALL STOP instruction that invoked the crash.  If MOUNTR died

TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 115
MOUNTR CRASHES


         as a result of a panic channel interrupt, LPC1  will  contain
         one  plus  the  address  of  the  instruction that caused the
         interrupt.



The following locations and data structures are  helpful  in  locating
the cause of difficulties in MOUNTR:


NAME    FUNCTION
----    --------
CRSHAC  Contains the ACs at the time the STOP routine was called.

LPC1    For crashes caused by panic channel interrupts, LPC1  contains
        one plus the address of the instruction that caused the crash.

MRPDB   PDB for last IPCF message received by MOUNTR

RBUF    Last IPCF message received by MOUNTR (particularly  useful  if
        SSSDAT+1 contains MRCVIH, indicating that MOUNTR crashed while
        processing an incoming IPCF message).

SSSDAT  When MOUNTR crashes, SSSDAT+1  contains  the  address  of  the
        routine that was invoked by MOUNTR's scheduler.  Starting here
        and using the stack, you can trace the execution  of  MOUNTR's
        code that led to the crash.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                             Page 116
DEBUGGING PA1050


                           DEBUGGING PA1050


In order to debug the compatibility package you must have  a  copy  of
the  file called PAT.EXE.  PA1050 is just the system name for PAT.  If
there is no copy of PAT.EXE,  then  take  the  source  program  called
PAT.MAC,  and  assemble  it.   Thereby  creating  a sharable save file
called PAT.EXE.  To debug  the  compatibility  package  the  following
steps are required.

$RESET
$GET ISAM          ;Where ISAM may be any program you choose
$MERGE PAT         ;PAT is the source name for PA1050
$DDT
PAT$:   MOVBF$b    ;You set your breakpoints here
DEBUG$G
$G                 ;You must type  $G  twice  because  of  the  double
                    symbol table


                                 NOTE

               Some  of  the  error  messages  you  may
               receive  from PA1050 may not be the true
               error  message.   To  have  the  correct
               error  message printed out use an ERJMP,
               or an ERCAL after the JSYS it fails  on.
               For  more information on ERJMP and ERCAL
               refer to  the  Monitor  Calls  Reference
               Manual.



In order to build the compatibility package the  following  steps  are
required.

$LOAD /CREF PAT.MAC
$START
$SAVE PAT
$GET PAT
$DDT
MAKEPF$G
Output file: PA1050.EXE
$
UDDT
40000,,0$X
^Z
$I MEM

The start after loading causes  the  program  to  be  moved  from  its
location  to  its  running location in high core.  The symbol table is
also moved, and the pointer adjusted.  A sharable save file  of  pages
700-777  must  be  made  for  debugging.   This  is  created  when you
MAKEPF$G, then load 40000,,0 in UDDT.  When you type I MEM you  should
now have PA1050.EXE in 700-730.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                             Page 117
COPYING FLOPPY DISKS


                         COPYING FLOPPY DISKS
                         ====================

This is  a description  of the  front end  program COP  (quick  floppy
copy). This program  should be  used to  create backup  copies of  the
distributed set of floppies.

CAUTIONARY NOTES ABOUT FLOPPY DISKS:

1)                 Only IBM floppies should be used.  Other floppies  may
                   destroy the DX11 drives.

2)                 Floppies have  a  finite  life while  mounted  in  the
                   drive. The heads do not  float, and the floppies  turn
                   continuously.  This causes the magnetic surface to  be
                   eaten away. Minimum floppy life is something like  200
                   hours.

3)                 Floppies which are dropped, badly shocked, or used  as
                   frisbees will lose their  sector headers, and will  be
                   good for nothing.

4)                 Never put a floppy which you suspect is bent into  the
                   drive -- it may damage the drive. 

5)                 COP  is discussed  also in  the  Front End File System
                   Specification  manual  in  Volume  14 of  the  TOPS-20
                   Software Notebooks, section 3.2.


COP COMMANDS:

                   The basic COP command string is of the form:

                     COP> <destination device>/<switch>=<source device>

                   To  enter  COP, type a Control-backslash to get to the
                   Parser,  then  MCR COP  to start up COP.  The floppies
                   should have  already been mounted with  MCR MOUNT, and
                   should  then be dismounted with  MCR DMOUNT  after the
                   copy.

COP SWITCHES:

                   /HE                                                Help, types a list of switches
                   /RD                                                Read Device, check for errors
                   /CP                                                Copy (default action)
                   /VF                                                Verify copy (default when copy in effect)
                   /ZE                                                Zero the device

TOPS-20 TROUBLE-SHOOTING HANDBOOK                             Page 118
COPYING FLOPPY DISKS


COP EXAMPLE:

                   The  following  sequence  of commands will  succeed in
                   copying  the contents of the floppy in DX0:  (the left
                   hand drive) onto the floppy in DX1:, and verifying the
                   operation.

                   ^\
                   PAR>MCR MOU
                   MOU>DX0:
                   Mount completed
                   MOU>DX1:
                   Mount completed
                   MOU>^Z
                   ^\
                   PAR>MCR COP
                   COP>DX1:=DX0:
                   COP>^Z
                   ^\
                   PAR>MCR DMO
                   DMO>DX0:
                   Dismount Complete
                   DMO>DX1:
                   Dismount Complete
                   DMO>^Z

                   The copy takes about two minutes, the verify about the same.
                   Take  care to  specify the  correct source  and  destination
                   devices.

CAUTIONARY NOTE--

                   If you  COP for  many generations  you will  build  up
                   ghost bad  blocks until  RSX will  declare the  floppy
                   useless. This is  because in each  generation the  bad
                   block file of the  old floppy is  copied onto the  new
                   (which will have its bad blocks in different  physical
                   locations).  A way around this  is to use PIP for  any
                   non-boot copies once every several generations.

TOPS-20 TROUBLE-SHOOTING HANDBOOK                             Page 119
THE SWSKIT TOOLS PROGRAMS


                      THE SWSKIT TOOLS PROGRAMS
                      =========================





Included on the SWSKIT are a number of utility programs, as summarized
below.   These tools have been found to have at least some usefullness
in the past in  a  debugging  environment.   Most  of  these  programs
require the user to have WHEEL or OPERATOR privleges to work, but also
most are of the "show and tell but don't touch" category, so they  are
in general "safe" to run.

We have cleaned up some of the old ones a bit, added a few  new  ones,
and  checked them all out to the extent that they will all run.  There
should even be some documentation, at least a  HELP  file,  with  each
program.

While we do not  actively  "support"  these  programs,  we  are  quite
willing  to accept complaints and suggestions and submissions from the
field.

These are  the  "standard"  tools;   the  Marlboro  Support  Group  is
generally  familiar  with their operation and quirks, and in providing
support to the field may request that one or more of the  programs  be
used at a customer site to diagnose or assist in correcting a problem.
This is generally more effective than random poking about in  DDT,  or
trying  to  learn  the peculiarities of whatever the customer may have
available.

And now, the current collection:




            PROGRAM                     DESCRIPTION


          CHANS               This   program   will   produce   system
                              configuration, and status information on
                              tapes and disks.

          DIRPNT              This program will list the  contents  of
                              the blocks in a disk directory.

          DIRTST              This program will check the format,  and
                              list   any  invalid  data  in  directory
                              files.

          DS                  This  program  will   provide   software
                              diagnostic help concerning the disk file
                              system.

          DSKERR              This program will provide  a  convenient
                              listing of the hard and soft disk errors

TOPS-20 TROUBLE-SHOOTING HANDBOOK                             Page 120
THE SWSKIT TOOLS PROGRAMS


                              that have occurred.

          DX20PC              This program will trace the microcode PC
                              in the DX20.

          EXEC-FOR-DEBUGGING-GALAXY This  EXEC  contains  commands  to
                              facilitate  debugging  a  private GALAXY
                              system.

          FILADR              This  program  will  display  the   disk
                              addresses   a  file  is  using,  or  the
                              addresses which are marked  in  the  BAT
                              block.

          JSTRAP              This program will produce information in
                              a  log on any JSYS, including the PC and
                              arguments used.

          MONRD               This program will allow  you  to  easily
                              examine the running monitor.

          READ                This program performs the same action as
                              the   CHECK  FILE  command  to  DS;   it
                              read-checks files for disk errors.

          REV                 This program will allow  you  to  easily
                              alter, edit, delete, obtain information,
                              etc.  on files.

          RSTRSH              This program  will  detect  bug  induced
                              changes  in  the  resident  monitor in a
                              dump file.

          TYPVF7              This program is useful  for  typing  out
                              the contents of a VFU file in a readable
                              form.

          UNITS               This   program   will   produce   status
                              information on disk drives.