Trailing-Edge - PDP-10 Archives - BB-H311D-RM - documentation/handbook.mem
There are 5 other files named handbook.mem in the archive. Click here to see a list.

                           TOPS-20 TROUBLE-SHOOTING HANDBOOK

                              Release 4.1 and 6.0 Edition

                                      January 1985

                             TOPS-20 Monitor Support Group
                                 Marlboro Support Group
                                   Software Services
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 2


             This document is the TOPS-20 Trouble-Shooting Handbook.   It  is  a
        collection  of  materials  designed to increase the effectiveness of the
        Software Specialist in the field in coping with TOPS-20 problems.   Some
        of  the  common "disasters" to befall TOPS-20 sites are discussed, along
        with debugging methods in general.   Though  the  information  contained
        herein  is  probably  not sufficient to make a Specialist into a TOPS-20
        "wizard", it should help  ease  the  communication  burden  between  the
        Specialist  in  the  field  and  his counterpart in Marlboro and lead to
        quicker resolution of problems.

             This document contains materials from many  sources,  and  presents
        some information not available anywhere else.  Certain sections may be a
        bit dated, but an effort has been made to remove at least  some  of  the
        old/wrong stuff along with including new articles.

             There is a continuing need to update this document as part  of  the
        SWSKIT  materials,  and  Specialists are encouraged to give the Marlboro
        Support Group feedback on these materials.  This  communication  can  be
        via the Hotline, or by writing to the following address:

                        TOPS-20 Monitor Support Group
                        Digital Equipment Corporation
                        200 Forest Street, MRO1-2/H22
                        Marlboro, Massachusetts  01752
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 3

                                   TABLE OF CONTENTS

             1.  INTRODUCTION . . . . . . . . . . . . . . . . . . . . .   2

             2.  TABLE OF CONTENTS  . . . . . . . . . . . . . . . . . .   3

             3.  POLICY STATEMENT . . . . . . . . . . . . . . . . . . .   5

             4.  PRODUCING A GOOD SPR . . . . . . . . . . . . . . . . .   6

             5.  USING SIRUS  . . . . . . . . . . . . . . . . . . . . .   9

             6.  DDT PATCHING THE TOPS-20 MONITOR . . . . . . . . . . .  16

             7.  MAPPING DIRECTORIES IN MDDT  . . . . . . . . . . . . .  20

             8.  RECOVERING FROM DIRECTORY ERRORS . . . . . . . . . . .  22

             9.  MORE ABOUT DIRECTORY PROBLEMS  . . . . . . . . . . . .  25

            10.  JSB AND PSB MAPPING  . . . . . . . . . . . . . . . . .  27

            11.  BREAKPOINTING MULTI-USER CODE  . . . . . . . . . . . .  30

            12.  USING ADDRESS BREAK TO DEBUG THE MONITOR . . . . . . .  32

            13.  RECOVERING FROM SYSTEM DISASTERS . . . . . . . . . . .  35

            14.  LOOKING AT HUNG TAPES  . . . . . . . . . . . . . . . .  41

            15.  A LOOK AT SOME OF THE DISK STUFF . . . . . . . . . . .  45

            16.  DISK FEATURES OF FILDDT  . . . . . . . . . . . . . . .  49

            17.  SUPPORTED DISK DRIVE PARAMETERS  . . . . . . . . . . .  51

            18.  SUPPORTED TAPE DRIVE PARAMETERS  . . . . . . . . . . .  52

            19.  TOPS-20 SCHEDULER TEST ROUTINES  . . . . . . . . . . .  53

            20.  TOPS-20 PAGE ZERO LOCATIONS  . . . . . . . . . . . . .  60

            21.  TOPS-20 MONITOR SECTIONS . . . . . . . . . . . . . . .  64

            22.  TOPS-20 MONITOR PSECTS   . . . . . . . . . . . . . . .  65

            23.  TOPS-20 MONITOR UNIVERSALS . . . . . . . . . . . . . .  66
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 4

            24.  KNOWN HARDWARE DEFICIENCIES LIST . . . . . . . . . . .  67

            25.  KS10 CONSOLE INFORMATION . . . . . . . . . . . . . . .  70

            26.  BOOT COMMAND STRING FUNCTIONALITY  . . . . . . . . . .  78

            27.  CRASH ANALYSIS FUNDAMENTALS  . . . . . . . . . . . . .  80

            28.  MORE CRASH ANALYSIS  . . . . . . . . . . . . . . . . .  99

            29.  REFERENCING THE CST ENTRIES UNDER RELEASE 6  . . . . . 119

            30.  THE BUG MACRO  . . . . . . . . . . . . . . . . . . . . 120

            31.  MONITOR BUILDING HINTS . . . . . . . . . . . . . . . . 122

            32.  EXEC DEBUGGING . . . . . . . . . . . . . . . . . . . . 127

            33.  RECOVERING FROM A BAD EXEC . . . . . . . . . . . . . . 133

            34.  DEBUGGING THE GALAXY SYSTEM  . . . . . . . . . . . . . 134

            35.  DEBUGGING MOUNTR . . . . . . . . . . . . . . . . . . . 148

            36.  DEBUGGING PA1050 . . . . . . . . . . . . . . . . . . . 151

            37.  COPYING FLOPPY DISKS . . . . . . . . . . . . . . . . . 152

            38.  THE SWSKIT DOCUMENTATION FILES . . . . . . . . . . . . 154

            39.  THE SWSKIT TOOLS PROGRAMS  . . . . . . . . . . . . . . 157
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 5


             There is a great confusion concerning the materials  that  make  up
        the  SWSKIT  tape, and their legal standing.  This memo is an attempt to
        clear up some of those problems.

             The SWSKITs are made up of an assortment of materials  intended  to
        increase  the effectiveness of the software specialist.  These materials
        include program sources not normally distributed or sold for a  premium;
        internal  and  company  confidential documentation, which may be in part
        incomplete or actually incorrect, but supplied for the information value
        on  subsystems  which may be insufficiently documented through the usual
        channels;  documentation  for  specialists  specially  produced  by  the
        corporate  support people;  and utility programs produced and maintained
        to some extent by  corporate  support.   In  addition,  the  SWSKIT  may
        contain  special  or pre-release versions of supported software provided
        for the incremental value a specialist  may  obtain  from  the  software
        under  controlled circumstances.  In time, utilities from the SWSKIT may
        evolve into supported products.

             All of the SWSKIT materials are proprietary to  DIGITAL,  and  were
        never  intended  to  be  just  given  to  the  customer.  Obviously, the
        materials which are otherwise  sold  cannot  be  given  away;   and  the
        company confidential materials should not be.  While it is expected that
        the tools programs may wind up being used at customer sites, neither are
        they gifts to the customer.  An effort must be made to protect DIGITAL's
        rights to these proprietary materials.  For instance,  a  PL90  contract
        retains  rights  to  all materials provided to the customer.  Deleting a
        tool program after use at  a  customer  site  indicates  intent.   There
        should  be  an awareness that if a customer incurs damages due to use of
        some program given to him by  the  specialist,  even  though  improperly
        used, then DIGITAL may be seen to be at least in part responsible.  This
        should be avoided.

             In  summary,  the  SWSKIT  is  a  tool  provided  to  increase  the
        effectiveness  of  the  specialist,  especially  with regard to PL90 and
        debugging activity, but the rights to all materials remain with  DIGITAL
        and the specialist should act accordingly.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 6

                                  PRODUCING A GOOD SPR

                   A software specialist is  often  asked  to  assist  with  the
              submission  of  SPRs for a customer.  It is always discouraging to
              have  problems  getting  an  answer  to  an   SPR   for   entirely
              non-technical  reasons.  For that reason, below are some hints for
              producing a "good" SPR which will  help  in  getting  the  problem
              solved more quickly.

              1.0  THE SPR FORM

              Much of the data on the SPR  form  is  unimportant,  until  it  is
              omitted.   The  line  of  product data is one.  Try to isolate the
              problem to the correct component, since that  will  determine  who
              first  receives  the SPR.  This will remove the time it takes for,
              say the COBOL maintainer, to determine that  the  problem  is  not
              really  in  COBOL,  but  in PA1050 or the monitor, and the time it
              takes for the next maintainer to become familiar with the problem.
              Something  which  crashes  the system is ALWAYS a monitor problem,
              even if it is an EXEC command which causes the problem, or a short
              BASIC program.

                   If you really have a problem, be sure to mark  the  "problem"
              box,  and  don't  use  words  like  "we  suggest  you  correct the
              following situation...".  If the people who  handle  the  incoming
              paperwork  think  they  have  a  suggestion,  it  may  get  routed
              elsewhere, and never seen by the appropriate maintainers.   A  few
              problems have been greatly delayed this way.

                   The priority boxes are not super-critical, but if you have  a
              problem  which  is  holding  up production, or crashing the system
              several times a day, try to make a note of that somewhere  in  the
              description  of  the problem and mark the high-priority box.  That
              should let the maintainer know that  a  work-around  may  also  be
              appropriate in the short term.  Customer-marked high priority SPRs
              are generally the first priority for answering.

                   The phone number of the submitter could be important  if  the
              problem  is  of  such a nature that it proves not-reproducible, or
              the  complexity  is  such  that  futher  clarification   just   to
              understand  the  problem  might  be needed.  Your number here as a
              software specialist provides a more informal contact  than  direct
              maintainer-to-customer  confrontation,  although the customer will
              be contacted directly if that is most expedient.

                   The attachments--be sure to mark some of these boxes  if  you
              send  along  supporting  materials.  Since these can get separated
              from the form, this will help keep them from  getting  permanently
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 7
        THE SPR FORM


                   The "DO NOT PUBLISH" box is for security problems and ways to
              crash   the   system.    We   double-check  this  during  incoming
              processing, but if the box is checked you can be sure that the SPR
              will not be published unanswered.

                   Describe the problem as clearly  as  possible  in  the  space
              provided.   Try  to  provide enough detail to easily reproduce the
              problem.  Concentrate on the description of the problem,  and  any
              diagnosis  you  may  have made.  Attempting to declare a "cure" is
              not always good idea because the actual correction may  be  of  an
              entirely  different  nature  for a number of reasons.  However, if
              you have something that works, the information could  be  of  use.
              Just  don't  count  on that exact change being the actual fix.  If
              the problem  is  not  reproducible  from  the  description  given,
              chances  are  that  something  you  left  out  is  relevant to the
              problem.  Unless the problem directly concerns them,  things  like
              logical  names,  mounted  structures,  and  other  features  often
              obscure the problem.  For the purpose of the problem  description,
              a terminal listing of an occurrance is often highly desirable, and
              it is sometimes a  good  idea  to  create  a  brand-new  directory
              without  any  fancy  LOGIN.CMD  setups or user groups and so on to
              demonstrate the problem.


                   As above, the listing from a terminal session is often a very
              good  attachment.   Try  to  include all the relevant information.
              Again, sometimes things like logical  names,  file  and  directory
              protections,  user  groups,  and  other  job-state  variables  are
              important and should be  included.   Inclusion  of  data  such  as
              program  version  numbers  and  edit  levels  can be essential for
              products with large numbers of edits.  If you are  complaining  of
              monitor problems, which patches you have installed could be useful
              information.  Terminal sessions should be as  clear  as  possible.
              It  should be made obvious just what is going on or the maintainer
              may just see a series of commands and think "So?".  Concurrent  or
              after the fact commenting is one way to accomplish this.

                   Many times there  is  a  program  which  exercises  the  bug.
              Sometimes  these  programs are alright as they are, but often they
              are giant COBOL monsters working on a multi-RP06  data  base,  and
              very  unwieldy  for  a  maintainer  to  try  to work with.  If the
              program can be reduced to a small subset,  do  so.   Many  monitor
              problems often turn out to be reproducible from a set of arguments
              to a single JSYS.  If it is a question of  incorrect  output  from
              some  program, it is helpful to send along all the files needed to
              reproduce the problem, and the files of incorrect output.  In  the
              case  of  programs with multiple edits to field-image, this speeds
              up the maintainer, since he does not have to manually apply  those
              edits  to attempt to recreate your versions, and he can also check
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 8

              the installation of the edits, if that  is  appropriate.   And  in
              case  the  problem  proves  to  be not easily reproducible the bad
              output can at least be examined for clues.

                   In the case of a monitor crash, the  problem  may  have  been
              reduced  to  a  program  of less than one page.  It is tempting to
              type this on the front of the SPR and send it in that way.   While
              the  maintainer can type in the program easily enough (if the copy
              is  both  legible  and  correct),  the  submitter  has  been  lax.
              Sometimes,  that short program will not cause a crash, even though
              run thousands of times under varying conditions by the maintainer.
              And  even  when  it  does  cause  the  crash  the  first time, the
              submitter has lengthened the turn-around by not sending  the  dump
              from  the  crash along with the SPR.  Sending the dump solves both
              problems.  If the problem is not reproducible with ease, the  dump
              is  VITAL  to further understanding.  And having the dump to start
              with speeds up the work of the maintainer who now does not need to
              schedule  stand alone to try to exercise the bug and cause a crash
              so he has a dump to look at.

                   When sending a dump, always send the unrun monitor along with
              it.   If  you  don't, you are just causing a delay in handling the
              problem while the maintainer tries it against the  standard  ones,
              which  involves  finding tapes with the standard ones, and loading
              them...  If you are running an unpatched standard monitor, and you
              refuse  to send it, at least tell which one it is somewhere on the
              form.  The unrun monitor is also useful for checking the existence
              and correct installation of patches when that becomes an issue.

                   The current preferred tape format is 9-track, 1600bpi, and in
              standard  DUMPER  format,  not  in  INTERCHANGE format, since file
              information can be lost that way.  Take the time to get a  listing
              of  a directory of the tape and include it with the tape.  It will
              help to speed things up, as if it is obvious  from  the  directory
              that something is missing, faster feedback is generated.  There is
              also the indication that the tape will  indeed  be  readable  when
              received,  and  will  partly eliminate the usual first step of the
              maintainer in getting a directory of the tape.

                   As a final word, remember  that  the  SPR  is  now  the  ONLY
              official  mechanism  to  get  software  problems  resolved  in the
              development code for Autopatch  and  future  versions.   NO  other
              method  is guaranteed to work.  So be sure an SPR is generated for
              every problem, preferably by the customer;  and be  sure  the  SPR
              does not make the problem harder to solve.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 9

                                      USING SIRUS

             Did you know that you can dial into a Marlboro  development  system
        and  type  out almost any patch that the Marlboro Support Group has made
        to -10 or -20 software in the last several  years?   The  program  which
        does this is called SIRUS, and with it you can:

             1.  Search through all the patches to a particular product, if  you
                 know a problem exists but don't know what the patch is or don't
                 know if we've heard of the problem.  If you find the patch  you
                 want, you can then type it out.

             2.  Type out a particular patch to a  particular  product,  if  you
                 know what the edit number is.

             3.  Obtain the status of any SPR, including the entire answer if it
                 has been answered.

             By using SIRUS, you can get patches whenever the system is up, even
        if it's two A.  M.  and the Hotline is closed.  You can print patches in
        your local office without having to wait for a specialist in Marlboro to
        mail  you  a  copy.  You can be sure that the patch you have is correct.
        (Dictating patches over the Hotline is very prone to  errors.)  Even  if
        the  problem you are experiencing cannot be found in SIRUS, you can help
        us when you call by so stating.  We immediately know  that  the  problem
        you are having is a new one.

             There have been several articles  about  SIRUS  in  previous  Large
        Buffers,  but  none have been oriented towards specialists in the field.
        This one is!

             To use SIRUS, dial into system CHERRY in Marlboro, log in, and then
        run it.  In more detail:

             1.  Dial into system CHERRY.  The following number will connect you
                 to the machines in Marlboro at 300 or 1200 baud.

                                    231-1550  (DTN)

                 You will now be talking to  a  MICOM  data  switch  which  will
                 autobaud  your  input  if  your type carriage returns.  It will
                 then prompt for a  system  to  connect  to.   You  should  type
                 "CHERRY"  followed  by a return.  Once the machine notices you,
                 type "SET HOST CHERRY" to insure  that  you  are  connected  to
                 system  CHERRY.   If  you  get  the message "?Undefined Network
                 Node", the machine is down (try again later).

             2.  To login, type "LOGIN 37,#".  When the machine requests a name,
                 type one in.  You WILL need a password, which you can obtain by
                 calling the Hotline operator.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 10

             3.  To run SIRUS, just type "R SIRUS".  SIRUS takes several seconds
                 to  initialize itself and then prompts you with "PRODUCT [H]*".
                 At this point, type either "10<CRLF>" or  "20<CRLF>"  depending
                 on whether the customer of concern is running TOPS10 or TOPS20.
                 SIRUS then prompts you with "[H] *".   You  are  now  at  SIRUS
                 command level.

             SIRUS has many commands, but only a few  are  of  interest  to  the
        field specialist.  They are:

             1.  H -- for Help.  This may be typed anytime  SIRUS  precedes  its
                 prompt with "[H]".

             2.  EX -- for Exit.  Use this to exit  SIRUS.   Then  type  K/N  to
                 logout, and hang up.

             3.  PP -- for Peruse PCOs.  PCO stands for Product Change Order and
                 essentially  means  a  patch.   This  command  is  used to look
                 through patches for a particular product  if  you  aren't  sure
                 which patch you want.

             4.  GP -- for Get PCO.  This is used to type out a particular patch
                 once you know which one you want.

             5.  GS -- for Get SPR.  Use  this  to  retrieve  information  on  a
                 particular SPR.

             6.  NP -- for New Product.  Use this command if you type the  wrong
                 answer  to  "PRODUCT  [H]*"  as  mentioned  above, or use it in
                 association with the PP command as described below.  SIRUS will
                 prompt you for a product again.

             The three most useful of these commands are PP, GP, and GS.

        3.0  PP Command

             Use this command to peruse the patches for a particular product  --
        e.g.   LINK  or  603  (monitor)  or  BATCON  --  if  you  want to find a
        particular patch you know exists, or if you want to know if the  support
        group  has  heard  of and fixed some problem you are experiencing with a
        product.  After you type "PP<CRLF>" SIRUS will prompt for  a  component.
        Here  type the program you're interested in -- LINK, BATCON or whatever.
        A response of LIST will type the programs SIRUS  knows  about  and  then
        prompt you for a component again.

             Once you type in the component, SIRUS  prompts  with  "[H] PCO #:".
        There are two reasonable responses to this.  The first is ALL.  (Type NO
        to the subsequent question about a file.) This will  give  you  a  short
        summary  of  all  the  patches  available for this product, one line per
        patch.  This includes a PCO number, the SPR for  which  this  patch  was
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 11
        PP Command

        written,  the  edit  number  corresponding  to the patch (for the TOPS10
        monitor this is the MCO number),  a  keyword  describing  the  bug,  the
        maintainer  who  wrote  the  patch, and the date it was made.  The other
        response you might type here is simply <CRLF>.  In this case SIRUS  will
        type  out  the  symptom  of  the  newest  PCO,  and then prompt you with
        "NEXT?".  By continuing to type carriage returns, you can type  all  the
        symptoms  of  all  the  patches for this product, from the newest to the
        oldest.  When you have found  the  patch  you  want  (remember  the  PCO
        number), type RETURN to get back to SIRUS command level.

             If you did not find your symptom while perusing, and  your  product
        exists  on  both  TOPS10 and TOPS20, you should also search the PCOs for
        the alternate operating system.  To do this, type NP  to  SIRUS  command
        level, and then type in the other product number when SIRUS asks for it.
        Then peruse PCOs for your product as you did before.

        4.0  GP Command

             This is used to print out a patch once you  know  the  PCO  number.
        The PCO number is printed while you are perusing PCOs and is of the form
        10-product-nnn or 20-product-nnn.  After  typing  GP  to  SIRUS  command
        level,  SIRUS  prompts  for a PCO number.  The leading "10-" or "20-" is
        supplied by SIRUS, so your response should be of the form "product-nnn".

             In response, SIRUS types out information about the patch.  The  two
        most  useful  data are labeled VLD and SAE.  VLD stands for validity and
        is the version of the software to  which  the  patch  applies.   SAE  is
        Source  After  Edit  and is the edit or MCO number of the patch.  To get
        the actual text of the patch, respond  YES  to  SIRUS's  question  "Show
        Write-up File?".

        5.0  GS Command

             This is used to get the status of an SPR.  SIRUS will prompt for an
        SPR  number,  and  then  will  provide  you  with info about the SPR you
        specified.   This  includes  the  site  that  submitted  the  SPR,   the
        specialist  responsible  for  the  SPR,  and  date received and the date
        closed, if the SPR has been answered.  If answered,  it  will  also  say
        whether  or  not an auxiliary file was written for the SPR and what PCOs
        (if any) were included.  The aux file is an introductory paragraph which
        is written for most SPR answers.  For SPRs which do not require patches,
        the aux file constitutes the entire answer.  The aux file can  be  typed
        by  responding YES to "SHOW AUXILIARY FILE?".  The PCOs can be typed out
        with the GP command.

             Finally, if SIRUS begins to give you error messages such  as  "File
        not found", EX from SIRUS and mount a special disk pack with the monitor
        command "MOUNT SIRS:".  Then try again.  This gives you access  to  more
        PCOs and aux files than are normally available.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 12
        GS Command

             For more information, see the example run of SIRUS below, in  which
        user  input  is  shown  underlined, or the article on SIRUS published in
        volume 409 of the Large Buffer.  Finally, SIRUS is for  use  by  DIGITAL
        personnel  only.  DO NOT give out instructions for its use or the system
        CHERRY phone numbers to customers.

        .R SIRUS
         - -----

        PRODUCT [H]* 20
        [H] *PP
        [PCO LIMIT FOR 'D60SPL' IS 15]
        [H] PCO #:<CR>
        DATE: 09-JUL-79 BY: BENCE

        Jobs sent to the LPT queue from D60SPL are  given  a  random
        file name and are billed to OPERATOR.

        DATE: 09-JUL-79 BY: WEISBACH

        If the spooler is pausing, typing a  GO  can  result  in  an
        illegal instruction.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 13
        GS Command

        NEXT? ALL
        PCO 015 SPR 12355             (6,022) KEY= LNAME      BENCE      09-JUL-79
        PCO 014 SPR 12225  OUTOUT     (6,020) KEY= PAUSE      WEISBACH   09-JUL-79
        PCO 013 SPR 11660  LODVFU 6013(6,014) KEY= VFU        WEISBACH   09-JUL-79
        PCO 012 SPR 13244  D60CRE 103 (6,032) KEY= CARD       L.NEFF     06-JUL-79
        PCO 011 SPR        D60CR4 103 (6,015) KEY= CARDS      L.NEFF     03-JUL-79
        PCO 010 SPR        REQUEU 103 (6,030) KEY= CTQMFQ     L.NEFF     14-JUN-79
        PCO 009 SPR 12588  INTCTC 1   (6,026) KEY= CONTROL C  TEEGARDEN  17-MAY-79
        PCO 008 SPR 12881  OUTE.6 103 (6,025) KEY= REQUEUE    NEFF       17-APR-79
        PCO 007 SPR 12139         103 (6,019) KEY= ILLEGAL    WEISBACH   27-OCT-78
        PCO 006 SPR 12005             (0) KEY= SIMULTANEO BENCE      22-SEP-78
        PCO 005 SPR 11672  ENDJOB 103 (6,018) KEY= QUASAR     BENCE      18-SEP-78
        PCO 004 SPR 11841  D60STK 103 (6,016) KEY= BAD        WEISBACH   23-AUG-78
        PCO 003 SPR 11476  TTYOUT 103 (6,010) KEY= OVERWRITE  WEISBACH   12-MAY-78
        PCO 002 SPR 11431  OUTE.6     (6,007) KEY= INTERRUPTS WEISBACH   12-APR-78
        PCO 001 SPR 11456  D60SPL     (6,006) KEY= BLANK      WEISBACH   03-APR-78
        [H] PCO #: RETURN
        [H] *GP
        [H] PCO #: 20-D60SPL-8
        [20-D60SPL-008 RETRIEVED]
        PROG:   NEFF
        KEYS: REQUEUE    /  
        ROUTNS: OUTE.6 /  
        VLD:    103(2304)
        SBE     %103 (6,024)
        SAE     %103 (6,025)
        CRIT:   N
        DOC:    N 
        F/D:    F
        TEST FILE:     :          [        ]
        P-IND:  10
        [WRITE-UP FILE]
        008             NEFF
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 14
        GS Command

             If a job is requeued because of a  communications  failure,  with
        D60SPL  reporting  that  the  station  has  signed off, then, when the
        station signs on again, the print file  will  be  restarted  from  its
        beginning, not from the last checkpoint.


             When the  error  is  detected,  routine  OUTE.6  calls  IBACK  to
        backspace  the  file  five  pages.   IBACK  zeroes  the  page counter,
        J$RNPP(J), and rewinds the  file,  in  the  belief  that  the  forward
        spacing  code  will  update  the page count as it skips to the correct
        page.  However, D60SPL discovers the error is not recoverable  and  it
        requeues  the job immediately.  Since the page count is never updated,
        DOREQ requeues the job to start at the beginning of the file.


             Preserve the page at which to resume printing over  the  call  to
        IBACK.  if the job is to be requeued immediately, restore J$RNPP(J) so
        that the job will be requeued and checkpointed five  pages  back  from
        its current position.
        File 1) DSK:D60SPL.MAC[4,1022]  created: 1724 09-Apr-1979
        File 2) DSK:D60SPL.MAC[4,417]   created: 1625 10-Apr-1979

        1)1             LPTEDT==6024                    ;EDIT LEVEL
        1)              LPTWHO==1                       ;WHO LAST PATCHED
        2)1             LPTEDT==6025                    ;EDIT LEVEL
        2)              LPTWHO==1                       ;WHO LAST PATCHED
        1)4     ;*****End of Revision History*****
        2)4     ;6025   If a job printing on a remote printer is interruped by
        2)      ;       a communications failure, requeue to start five pages ba
        2)      ;       instead of at beginning of file.  LLN, SPR # 20-12881,
        2)      ;       10-APR-79
        2)      ;*****End of Revision History*****
        1)179           PUSHJ   P,IBACK                 ;BACKSPACE THE FILE
        1)              PUSHJ   P,INTON                 ;[6007]TURN INTERRUPTS B
                ACK ON
        1)              PUSHJ   P,D60NRY                ;PERFORM "NOT READY" DIA
        1)               JRST   OUTE.7                  ;ERROR IS UNRECOVERABLE
        1)              TELL    OPR,[ASCIZ /![LPT...  continueing!]
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 15
        GS Command

        2)179   ;**;[6025] ADD SEVERAL LINES AT OUTE.6 + 13L.  LLN, 10-APR-79
        2)              MOVE    T1,J$RNPP(J)            ;[6025] CALCULATE THE NE
        2)              SUB     T1,N                    ;[6025]  DESTINATION PAG
        2)              PUSH    P,T1                    ;[6025]  AND SAVE IT
        2)              PUSHJ   P,IBACK                 ;BACKSPACE THE FILE
        2)              PUSHJ   P,INTON                 ;[6007]TURN INTERRUPTS B
                ACK ON
        2)              PUSHJ   P,D60NRY                ;PERFORM "NOT READY" DIA
        2)               JRST   [POP    P,J$RNPP(J)     ;[6025] RESTORE PAGE NO.
                 FOR REQUEUE
        2)                       JRST   OUTE.7]         ;[6025] ERROR IS UNRECOV
        2)              POP     P,(P)                   ;[6025] THROW AWAY DESTI
        2)                                              ;[6025] PAGE - FORWARD S
        2)                                              ;[6025] CODE WILL HANDLE
        2)              TELL    OPR,[ASCIZ /![LPT...  continueing!]
        [H] *EX


        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 16

                            DDT PATCHING THE TOPS-20 MONITOR

             This article discusses how DDT patches are made to TOPS-20.

             From time to time the Marlboro Support Group has  to  describe  and
        explain  the DDT patching of TOPS-20 to Specialists from the field.  The
        following is an explanation, if not a justification,  of  the  way  some
        things are done.

             A DDT patch to TOPS-20 as published is, in essence, a terminal  log
        of a session applying the patch by hand.  This differs from the sometime
        practice of a control file containing only the typein to DDT.   The  raw
        typein  has  a few disadvantages with respect to the log:  It is hard to
        display in a publication format like  the  Software  Dispatch  the  bare
        control  characters like linefeeds and tabs that might be used, and even
        harder to edit around them with the  only  currently  supported  editor,
        EDIT.   In addition, the full typescript allows some confidence building
        (or cause for concern) if the DDT typeout from application of the  patch
        is  (is  not)  the  same  as  the typescript.  The published patch IS an
        actual typescript, and is  "proof"  that  the  patch  CAN  be  correctly

             In applying  the  patch,  the  basic  methodology,  lacking  innate
        knowledge,  is  to  just  start  typing from the typescript whenever the
        computer goes into input wait.  Any "$" appearing in a DDT session which
        is  not  the prompt from the enabled EXEC should be the result of typing
        an ESCAPE.  (ESCAPE is sometimes referred to  as  ALTMODE  or  ALT.)  In
        order  to  avoid confusion, we try never to use any dollar sign symbols,
        and hopefully should make special note of any that might occur.

             Starting at the top of a session, there are usually a few  comments
        about  the  patch.   If  we  are currently patching multiple releases of
        TOPS-20, the specific release for the patch should be noted here.   Also
        noted  should  be any hardware or monitor dependencies:  KS- or KL-only,
        or 2040, 2060, or ARPA only, etc.

             The first monitor command is an ENABLE, followed by a  GET  of  the
        monitor  file  to be patched.  Unless we are patching an existing patch,
        our published patches always show us patching a "virgin"  monitor  file,
        one  without  any previous patches installed.  You should always be able
        to duplicate the patch typescript yourself on an unpatched monitor.

             At this point we do a START 140 command to get into DDT.  There  is
        a  fine distinction at this step between typing START 140 and typing DDT
        to get into DDT.  START 140 starts up EDDT (Exec-mode  DDT)  running  in
        user  mode,  which is the required action.  Typing DDT to the EXEC would
        merge  SYS:UDDT.EXE  with  the  monitor  EXE  file  and  start  up  UDDT
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 17

        (User-mode  DDT), which is not what we want.  In fact, with Release 4 of
        TOPS-20 the EXEC is clever enough to start up EDDT for  us  on  the  DDT
        command  also,  but  even  so, for the sake of consistency, and to avoid
        confusion, published patches should still use START 140.

             After entering DDT, it is common to select the local  symbol  table
        for  the  module  to  be  patched  in  case  there might be local symbol
        conflicts, etc.  This is done using the  MODULE-NAME$:   (ESCAPE  colon)

             Next follows the body of the patch.  We purposely avoid the fancier
        DDT  commands when applying patches in order to avoid confusion.  We try
        to limit ourselves to a few DDT commands:

                ADDRESS/ (slash)to open the location at ADDRESS
                ADDRESS[ (open-square-bracket)
                                similar to / but typeout numeric not symbolic
                RETURN          to close the current location, storing any new
                                value specified
                LINE-FEED       to close the current location, storing any new
                                value specified, and open the next location
                TAB             a convenience command used to close the current
                                location and open the location specified by the
                                last reference; commonly used to get to and
                                open location FFF immediately after inserting a
                                JRST FFF instruction in the code
                SYMBOL: (colon) to define a symbol at the current location;
                                usually to redefine FFF: further down in the
                                patch space
                FFF$< (ESCAPE open-angle-bracket) or
                FFF$$< (ESCAPE ESCAPE open-angle-bracket)
                                to start a patch in the patch area named FFF
                $> (ESCAPE close-angle-bracket)
                                to terminate a patch, which installs the jumps
                                back to the inline code, redefines the FFF
                                symbol value past the used patch space, and then
                                inserts the initial jump to the patch into the
                                inline code

        Those who apply patches are of course free to use the more sophisticated
        DDT commands to achieve the same effect.

             A few TOPS-20 peculiarities  should  be  explained  here.   TOPS-20
        patches  are  applied  using  the FFF patch area.  The default DDT patch
        area symbol, PAT.., (used if no argument  is  given  to  an  $<  or  $$<
        command)  should  NEVER  be  used.   You  are apt to wind up with system
        crashes since the PAT..  area may not be locked down.  FFF is defined in
        the  module  STG.MAC  (which goes to the customers), and the area is 100
        octal words long for  version  4.1  and  defined  by  the  user-settable
        parameter  FFFSZE in version 6.0 (currently has value 400).  FFF is part
        of the resident monitor code PSECT RSCOD for v4.1  and  the  data  PSECT
        RSDAT  for  v6.0,  and  is always in memory.  Special care must be taken
        when installing patches not to overrun the patch area, which could  also
        result  in system crashes.  The first symbol past the FFF area is DTSCNW
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 18

        for v4.1, and SVDTRJ for v6.0.  If that symbol shows up while attempting
        to install a patch, you may be in trouble.


                       For Release 5  and  5.1  of  TOPS-20,  the
                       patch  area  was  moved  and  is no longer
                       found in STG, but in POSTLD, at the end of
                       the  RSDAT  PSECT, and so requires changes
                       to the LINK CCL file to expand  the  area.
                       Care  should  be  taken  so  that the next
                       PSECT is not overlapped with patches.  The
                       space   reserved   is   now   400  (octal)

             There is another patch space defined in TOPS-20,  called  SWPF,  in
        the  swappable  portion of the monitor.  We always use FFF in preference
        to SWPF since first, SWPF can only be  used  for  patches  to  swappable
        code,  but  FFF will work for either.  Second, two patch areas in common
        use might be confusing to the customers, specialists, and us.  Third, if
        we  get  a  dump to examine from a customer, we can always check the FFF
        area for possible (bad) patch installation.  SWPF might be swapped  out,
        and not in the dump.

             Unconventionally enough, the symbols FFF, FFF1, and  FFF2  are  all
        defined together in STG.MAC with the same value.  When DDT decides which
        to type out when printing the symbolic form of an address, it finds FFF2
        first,  which accounts for the common appearance of FFF2 in patches.  In
        addition, just the symbol FFF is  redefined  on  patch  installation  to
        always  point  to the first free word of the remaining patch area.  FFF1
        and FFF2 are  never  redefined,  and  so  should  always  point  to  the
        beginning of the initial patch area built into the monitor.  FFF2 should
        never have been explicitly referenced as typeIN to DDT;   any  occurance
        in  a  patch should be known to be from DDT typeOUT, probably from a DDT
        LINE-FEED command.  This  is  a  common  source  of  error  in  applying
        patches;   writing  over  earlier patch area by typing in the FFF2-based

             Normally,  in  a  DDT  patch,  lines  which  follow   one   another
        immediately in the published patch are the result of typing LINE-FEED at
        the end of the line, and not RETURN and the next address  symbol.   When
        the  $<  and  $$<  commands  are  used, all lines from that point to the
        terminating $> command should have  been  ended  with  LINE-FEED,  using
        successive locations in the patch space.  The patches should show breaks
        in this form by inserting extra blank lines in the  published  patch  to
        indicate a new "sub-section" of the patch.

             The patching session is ended by the ^Z (Control-Z) command to exit
        DDT properly.  The Control-Z command is the correct way to exit from DDT
        when applying patches.  It allows DDT to do any  final  cleanup  it  may
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 19

        need  to  do.   Exiting  via  Control-C  is NOT recommended when you are
        installing patches, and is NOT guaranteed to work.

             Finally, the patched monitor is saved away on  a  disk  file.   The
        published  typescript  shows  creating  a  new  generation of the system
        MONITR.EXE file, but a more conservative approach is to save the patched
        monitor  as  some  other  name, and try running it experimentally during
        system time before installing it as the default monitor.

             And now for an annotated example:

        @ENABLE (CAPABILITIES)          !Appropriate releases noted above.
        $GET SYSTEM:MONITR              !Get the monitor
        $START 140                      !Enter user mode EDDT

        ENQ$:                           !Open the symbol table for the module

        FFF/   0   XXX:   410300,,T2    !Store into the patch area and define
        FFF2+1/   0   FFF:              ! label XXX: to point to it; redefine
                                        ! FFF to be the new first unused word
        STRCMP+5/   MOVE T3,T2   FFF$<  !Begin an $< patch at FFF
        FFF/   0   LDB T3,XXX           !This line and the next are ended by
        FFF+1/   0   CAIN T3,5          ! LINE-FEEDs
        FFF+2/   0   RET$>              !Terminate the patch
        FFF+3/   MOVE T3,T2             !These 4 lines are typed out by DDT on
        FFF+4/   JUMPA T1,STRCMP+6      ! terminating the patch
        FFF+5/   JUMPA T2,STRCMP+7
        STRCMP+5/   JUMPA FFF2+1        !And another blank line indicating end
                                        ! of this sub-patch region
        ^Z                              !Control-Z to exit DDT properly
        $SAVE SYSTEM:MONITR             !Save away the patched monitor
         <SYSTEM>MONITR.EXE.2 Saved
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 20

                          MAPPING DIRECTORIES IN MDDT

             Release 3 and later of TOPS-20 can take advantage of  the  extended
        addressing features of the model B processor.  Some of the data has been
        reorganized and moved into non-zero sections of  the  addressing  space.
        One  of  the  things  moved was directories.  Directories are now mapped
        into section 2, starting at the beginning of the section.  Thus the  old
        procedure  of  reading  a  user's  directory in MDDT is no longer valid.
        This will describe how to map a directory correctly, for release  3  and

             You first have to find  out  the  structure  number  and  directory
        number  for  the directory to be mapped.  You can use the TRANSL command
        to get the directory number, or use the  ^EPRINT  command  to  list  the
        directory  information.   As  an  example,  suppose you want to find the
        directory and structure information  for  the  directory  SNARK:<CURDS>.
        You use TRANSL and obtain the results:

                SNARK:<CURDS> (IS) SNARK:[4,117]

        The "programmer number" obtained is the directory number, in octal.   In
        this  example,  the directory number is 117.  If the directory is in bad
        shape, and you can't run TRANSL or use ^EPRINT, you will  have  to  find
        out the directory number by looking at the output from a DLUSER or ULIST
        run, or from BUGCHK output.

             To find the structure number, you have  to  work  harder.   If  the
        structure  is  mounted  as  PS:,  its structure number is always 0.  For
        structures mounted other than PS:, you do the following.  You  get  into
        MDDT,  and  look  at  the  table STRTAB.  This table contains all of the
        addresses of the structure data blocks in the system.  The first word of
        each  structure  data  block  is  the  structure name in SIXBIT.  So you
        search the tables looking for the desired structure.   The  offset  into
        the table STRTAB is then the structure number.  For our example:

                JSYS 777$X
                STRTAB/   ,8[   /   PS
                STRTAB+1/      M^I   /   REL3
                STRTAB+2/      M_%   /   SNARK
        In the example above, you see that PS:  is the first structure, followed
        by the structures REL3:  and SNARK:.  Since the offset into STRTAB was 2
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 21

        for SNARK:, the structure number you want is 2.

             Knowing the structure number and the directory number, you can  now
        map  the  directory  and  look  at  it.   When  the directory is mapped,
        location DIRORA will point to the area in the monitor you  can  find  it
        at.   To  map the directory, you call the routine MAPDIR which is in the
        module DIRECT.  It takes two arguments.  The directory  number  goes  in
        AC1,  and the structure number goes in AC2.  For our example, the output
        looks like:

                DIRORA[   740000
                740000/   ?

                1!   117
                2!   2
                CALL MAPDIR$X

                740000[   400300,,100

        The skip return from MAPDIR  means  you  have  successfully  mapped  the
        directory.   You  can  now  look at the whole directory by examining the
        proper locations.  The number of pages that are mapped by MAPDIR is  the
        length  of  a directory, so the whole thing is available to look at.  By
        examining or changing location 740000+N in core, you  are  examining  or
        changing  location  N  of the directory.  When you are finished, you can
        just leave MDDT by jumping to MRETN or by typing ^C.

             In release 3 and after, however, when you examine  location  DIRORA
        after calling MAPDIR, it doesn't have to contain a section zero address.
        If it does, then your machine cannot support extended addressing and the
        monitor  is  running  the  same  as release 2 did.  In this case you can
        ignore the rest of this document.  If your machine  does  have  extended
        addressing,  when  you  examine  location DIRORA you will see the number
        2,,0.  This address is now in section 2 of the monitor.

             For Release 4 of TOPS-20, the various  flavors  of  DDT  have  been
        trained  to  understand  extended  addresses, so the mapping contortions
        used for 3 and  3A  are  unnecessary.   On  extended  machines  one  can
        reference section two directly as below:

                DIRORA[   2,,0

                2,,0[   400300,,100

        When done, you can still just ^C out or jump to MRETN.

             NOTE:  if you have the Release 5  version  of  MDDT/EDDT  that  has
        sticky  current  address  section  (see DDTxx.MEM) then be careful about
        doing an MRETN$G after examining section 2, as a crash will result  from
        transferring to MRETN in section 2.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 22

                            RECOVERING FROM DIRECTORY ERRORS

             Sometimes after a monitor crash due to disk problems, some  of  the
        directories  on  the  system  will  contain  errors.  These errors cause
        BUGCHKs such as DIRFDB, NAMBAD, DIRPG0, and  DIRPG1.   It  is  sometimes
        possible  to  find  the  error  in  the  directory by getting into MDDT,
        mapping the directory, finding what  is  wrong,  and  fixing  it.   This
        procedure is described in the SWSKIT.  However, this is not always easy,
        and may take a lot of time.  It is therefore better  in  many  cases  to
        simply delete the bad directory and recreate it.  This is easy to do for
        most  directories.   But  special  procedures  are  necessary  for   the
        directories  <SYSTEM> and <SUBSYS>.  The rest of this memo will describe
        the methods of recovering from bad directories, handling  in  particular
        the difficult case of the <SYSTEM> directory.

             You can first try to give the EXPUNGE command with the REBUILD  and
        PURGE subcommands.  If the problem with the directory is very simple, it
        may  fix  your  problem.   As  an   example,   suppose   the   directory
        PS:<SICK-DIRECTORY> is incorrect.  You would type:

                $$REBUILD (SYMBOL TABLE)

             If this does not help the problem, you  will  have  to  delete  the
        directory and then recreate it.  Before proceeding, you should make sure
        that any files you can reference are copied  to  another  directory,  or
        else are saved on tape.  Now first try to delete the directory normally,
        as follows:

                $BUILD (USER) PS:<SICK-DIRECTORY>

             If this is successful, then simply recreate  the  directory  again,
        and  restore  the  user's files.  You should recreate the directory with
        the same directory number as it had before, so that DLUSER's  data  will
        still be correct.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 23

             The procedure above will fail if either the directory is mapped  by
        another  job,  or  if  it is totally unusable.  If it is mapped, and the
        directory is a random user, you can  wait  until  the  directory  is  no
        longer  in  use,  or you can take the system stand-alone so that no user
        can reference it.

             If the directory is totally unusable, you will then have to try  to
        delete it the hard way.  Before proceeding, you should try to delete and
        expunge all files in the directory.  This will minimize  the  amount  of
        lost  pages  that will result.  Now there are two cases to consider.  If
        the directory is not a sub-directory, you type the following:


             If the directory is a subdirectory, you modify the above command by
        replacing  "ROOT-DIRECTORY"  by  the  name of the next higher directory.
        Thus if the directory was PS:<ANOTHER.BAD-ONE>, you type:

                 <ANOTHER>BAD-ONE.DIRECTORY.1 [OK]

             The above procedure tells the monitor to treat the  directory  file
        like a normal file, and to delete it as such.  This means that any files
        in the directory will become "lost".  The disk pages  can  be  recovered
        later  with  CHECKD.   If  the  above works, you simply can recreate the
        directory and restore the files.

             The only reason the above command should fail is if  the  directory
        is  still  mapped.   For  PS:<SUBSYS>,  you  can  bring  up  the  system
        stand-alone so that no programs are run from it,  and  then  delete  it.
        For  PS:<SYSTEM>,  even taking the system stand-alone will not help, for
        it is always mapped by job 0.  But there are two procedures you can  use
        which do work.

             The safest method can be used if the user's  system  has  mountable
        structures.   If you have built another PS: structure, you can mount the
        pack with the bad directory as an alias, and then the directory will not
        be mapped and can be deleted.  As an example:

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 24


             Then you can build the new directory, restore the files to it,  and
        then  use  it  again for your normal PS: pack.  Be sure to build the new
        directory with the same number.  This is especially  important  for  the
        special system directories.

             If you do not have another disk drive or another PS:  disk,  or  if
        you  don't  want  to  bother MOUNTing the disk, you can fix the <SYSTEM>
        area by using MDDT.  The basic idea is to patch the monitor so  that  it
        no longer thinks that the directory is in use.  This is done as follows:


                INTERRUPT AT 17117
                CHKOFN/   JSP CX,.SAVE   JRST RSKP

             Then  you  should  have  no  problems   deleting   the   directory.
        Immediately  after doing the delete, you should reload the system.  When
        the system restarts, you can read the monitor and the EXEC  either  from
        the  distribution  magtape  or from another directory where you had kept
        copies.  Then recreate the <SYSTEM> area, making sure  to  give  it  the
        same  directory number as it had before.  Then you can restore the files
        and let the users back on.  Finally, you should run  CHECKD  to  recover
        the lost pages.

             NOTE:  The special system directory numbers are:
                        1 - <ROOT-DIRECTORY>
                        2 - <SYSTEM>
                        3 - <SUBSYS>
                        4 - <ACCOUNTS>
                        5 - <OPERATOR>
                        6 - <SPOOL>
                        7 - <NEW-SYSTEM>
                       10 - <NEW-SUBSYS>
                       11 - <SYSTEM-ERROR>
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 25

                             MORE ABOUT DIRECTORY PROBLEMS


        NOTE -- Use the methods documented in the Operators Guide before
                resorting to the methods below.

             1.  There is a file on the SWSKIT called DIRTST.EXE which will test
                 for inconsistencies in the directory pointers.

                        $RU DIRTST

                 This will tell you just about everything.

             2.  Another program on  SWSKIT  is  DIRPNT  which  prints  out  the
                 contents on the chained FDB's, entire directory, FDB, or symbol
                        To run it:

                        $RU DIRPNT

                 And answer the questions.   This  also  may  not  work  if  the
                 headers are bad.

             3.  If you get a BUGCHK:

                 Go into the monitor with MDDT  and  set  a  breakpoint  at  the
                 BUGCHK  address,  say, FDBBAD.  Do the functions that cause the
                 BUGCHK;  DIR, say.  Trace down the bug.  The relevent  listings
                 are  PROLOG  and  DIRECT.   These give the directory format and
                 useful symbols.

             4.  If the pointers are destroyed or confused you can  map  in  the
                 directory as follows:

                        $^EQUIT                 ; get into MINI-EXEC
                        MX>/                    ; get into MDDT

                        ; To map in the directory, put the directory number
                        ; in AC1.  You can obtain the number from DLUSER or
                        ; TRANSL or BUILD.  The structure number goes in AC2.

                        ; To find  the  structure  number look  at  the  table
                        ; STRTAB.  STRTAB contains a  list of pointers to  the
                        ; SDBs of structures that are mounted.  The  structure
                        ; numbers are equal to the offset into the STRTAB.  To
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 26

                        ; find  out  which  structure   has  structure  number
                        ; 3 look at STRTAB+3.   Contents of that location points
                        ; to the SIXBIT structure name.

                        STRTAB/  54321          ; str number 0
                        STRTAB+1/  56776        ; str no 1
                        STRTAB+2/  12345        ; str no 2
                        12345$6T/       FOO     ; str no 2 is FOO:

                        1/ DIRECTORY NUMBER
                        2/ STR NUMBER
                        CALL MAPDIR$X

                        ; Now you can  look at the  header pointers etc.,  and
                        ; fix things  up  if  you're lucky.
                        ; See the section on system disasters for a checklist
                        ; of things that could be wrong with the directory.
                        ; Go back to the MINI-EXEC.


             5.  If you can't (or don't want to) recover the existing files  you
                 can  delete  the directory and restore the files using a DUMPER
                 tape.  See the previous article for methods of deleting the bad
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 27

                An Easy Way to Examine the PSB and JSB of Another Job

        There is an occasional need to look at the state in  detail  of  another
        job  on the system.  A common reason for doing this is to find the cause
        and cure of a "hung job" which cannot be logged out.  To find  out  what
        the  job  is doing you usually start by looking at the JSYS stack in the
        PSB.  But you cannot examine such data easily because the fork  data  in
        the  PSB  and  the  job data in the JSB are not in the monitor's address
        space until the fork is run.  If you try to look at the PSB or JSB using
        MDDT  you will see the data for your own fork.  The SWSKIT program MONRD
        can provide just this sort of information, but has  a  few  limitations,
        and one occasionally needs "direct" access to the data for another fork.
        To get it, you must do what the monitor does, and that is to map it.

        The procedure to do so is this:

             1.  Do a "GET" of the file the monitor  was  loaded  from,  usually

             2.  Enter user mode DDT in the file you got, and then do a JSYS 777
                 to get into MDDT.

             3.  Find out the SPT indexes as before, and call MSETMP to map  the
                 PSB or JSB to the USER address space, in the correct place!!

             4.  Return from MDDT, and examine PSB and JSB  locations  directly,
                 and see the correct data in the right place.

             5.  When you are done, just ^C and do a RESET.

        The rest of this document will document step by step how  the  procedure
        above  is done, by using an example.  Assume that we wish to examine the
        state of fork 105, which belongs to job 21.  We then begin:
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 28

        @ENABLE                                 !Get a copy of the monitor
        $START 140                              !Get into user DDT

        JSYS 777$X                              !Enter MDDT

        !Following is an example of the procedure to map the JSB of a job:

        FKJOB+105[   25,,2035                   !Get the SPT index of the JSB
                                                !of fork 105

        T1!      2035,,0                        !Put SPT index in left half
        T2!      540000,,JSBPGA                 !* Flags and where to map to
        T3!      JSLSTA'1000-JSBPGA'1000        !Number of pages to map

        CALL MSETMP$X                           !Do the mapping

        !Following is an example of the procedure to map the PSB of a fork:

        FKPGS+105[   2657,,2332                 !Get the SPT index of the PSB
                                                !of fork 105

        T1!      2332,,PSBMAP-PSBPGA            !Put SPT index in left half,
                                                !and offset in right half
        T2!      540000,,PSSPSA                 !* Flags and where to map to
        T3!      PSBMSZ                         !Number of pages to map

        CALL MSETMP$X                           !Do the mapping
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 29

        !Example of returning to user mode and looking at data from both
        !the PSB and the JSB of the fork:

        MRETN$G                                 !Return to user mode

        USRNAM[   3                             !Examine job's user name
        USRNAM+1[   422050,,546230   $T;DBELL   

        CTRLTT[   777777,,777777                !Controlling terminal

        FILBYT+MLJFN[   4400,,334010            !Start of data block for JFN 1

        PPC/   T1,,DISXE#+2                     !Current PC of the fork

        PAC+17/   -215,,UPDL+62                 !Current stack pointer

        UPDL/   CHKHO5#                         !First few stack locations
        UPDL+1/   CAM CHKAE0#+12   
        UPDL+2/   CHKHO5#   
        UPDL+3/   CAM CHKAE0#+12   
        UPDL+4/   T1,,.COMND+1   

        !Example of terminating the mapping we have done:

        $RESET                                  !To finish, just quit and reset

        The procedure as given above maps the JSB and PSB write-enabled.  So  if
        you  find  something  you want to change, you can simply deposit the new
        value into the location.  If you want the data  to  be  write-protected,
        then  change  the  540000  to  500000  in  the  two steps marked with an

        WARNING:  The procedure of mapping things into your user  address  space
        has  its  limitations.   Mapping  the JSB and PSB works because the user
        core used for mapping was previously empty.  In general,  you  can  only
        map things into your user core if your core pages are either nonexistant
        or are private.  If you call MSETMP or SETMPG and map something  over  a
        shared  page,  the  old  file  page is unmapped without the share counts
        being updated, which prevents your job from logging out later.   To  get
        around  this  problem  you  can  BLT your core image to force all of the
        pages to be private.

        The SWSKIT tools program MONRD is able to examine the JSB and PSB of any
        job/fork  on  the  system,  and is now the preferred method of obtaining
        this sort of information, unless the ability to modify the data  or  use
        advanced features of DDT is required.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 30


             When inserting a breakpoint into the running monitor, you have to
        be  careful  that  no other users will execute the code containing the
        breakpoint.  If some other user hits the breakpoint, they will blow up
        with an illegal instruction since MDDT will not be there to handle the
        breakpoint.  This normally limits the places you can set  breakpoints,
        since  most  of the monitor can be gotten to by any user.  Even if you
        run the system stand-alone, it is possible that the  routine  you  are
        debugging  will  be called by job 0.  However, it is still possible to
        do such debugging, even on a system which is not stand-alone, and this
        document will describe how this is done.

             The essential element of this technique is to put in the patch in
        such  a  way  that  only  your own fork can ever reach the breakpoint.
        First you write a simple routine which will skip if it  is  not  being
        run  by your particular fork.  This can be done easily if you remember
        that the location FORKX contains the currently  running  fork  number.
        An example of such a routine is the following:  

        JSYS 777$X

        FORKX[   23                     ; check our fork number

        FFF/   0   NOTME:   PUSH P,T1   ; save an AC
        NOTME+1/   0   MOVE T1,FORKX    ; get currently running fork number
        NOTME+2/   0   CAIE T1,23       ; is it us=23?
        NOTME+3/   0   AOS -1(P)        ; no, setup skip return
        NOTME+4/   0   POP P,T1         ; restore the saved AC
        NOTME+5/   0   POPJ P,          ; and return to caller
        NOTME+6/   0   FFF:             ; reset the position of FFF

        The  routine above simply saves AC T1, gets the currently running fork
        number, compares it with your own fork number which  you  obtained  by
        looking at location FORKX, and skips if they differ.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 31

             Now assume that you want to set a breakpoint into  the  following
        code, which is in the routine BLKSCN in the module DIRECT.  

        BLKSC2/   HLRZ C,BLKTAB(B)
        BLKSC2+1/   CAME A,C
        BLKSC2+2/   AOBJN B,BLKSC2
        BLKSC2+4/   HRRZ B,BLKTAB(B)

        Assume  you  want  the  breakpoint  at  location BLKSC2+3.  You do the

        BLKSC2+3/   JUMPGE B,BLKSCE   FFF$<     ; patch this location
        FFF/   0   PUSHJ P,NOTME                ; call the NOTME routine
        FFF+1/   0   .$B   JFCL$>               ; me if it gets here, set breakpoint
        FFF+2/   JUMPGE B,BLKSCE
        FFF+3/   JUMPA A,BLKSC2+4
        FFF+4/   JUMPA B,BLKSC2+5
        BLKSC2+3/   JUMPA NOTME+6

        Notice  that  the  breakpoint  has  been  set  in the JFCL instruction
        following the call to NOTME.  Only your fork will execute it,  so  you
        can  now  debug the section of code while other users are executing it
        at the same time.  Remember to remove  the  breakpoint  when  you  are

             To run a particular program while  having  breakpoints  set,  you
        must  remember  that  the breakpoint has to be set by the same process
        which you expect to hit it.  So for example, typing ^EQUIT, setting  a
        breakpoint,  returning  to  the EXEC and running your program will not
        work.  You must enter MDDT and set the breakpoints from  your  program
        you want to debug.  As an example:  

        $GET PROGRAM    ; get the program to be used
        $DDT            ; enter DDT
        JSYS 777$X      ; and enter MDDT from there


        MRETN$G         ; return to the context of the test program
        $G              ; start the test program
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 32

                        Using Address Break to Debug the Monitor

        Sometimes when examining a set of dumps, you will notice  the  crashes
        are  caused  by  some  location  being destroyed.  If you have no idea
        where the destruction is done from, finding the problem could be  very
        difficult.   One  useful procedure in such cases is to use the address
        break feature of the hardware to track down the  problem  (except  for
        2020's!).   The  only  problem is that the use of address break is not
        obvious.  This is a manual describing how to use address break in  the
        TOPS-20 monitor for releases 4.1(model A)/5.1 and 6.0 (model B).

             In order to use address break, four things must be done.   First,
        the  current routines the monitor uses to set address breaks for users
        must be disabled.  Secondly, your own address break must be  set  from
        MDDT  or  EDDT.   Thirdly,  instructions  which  you  want  to execute
        properly have to be modified so that they will not cause  an  unwanted
        address  break.  Finally, breakpoints must be placed in the monitor so
        that the state of the monitor can be examined when the  address  break
        occurs.  The following is a step by step example of doing this.

        1.      Load the monitor for debugging, and enter EDDT.  The procedure
                starting from BOOT is the following:

                BOOT>/L                         ;Load monitor but don't start it
                BOOT>/G140                      ;Start EDDT
                DBUGSW/   0   2                 ;Set debugging mode
                EDDTF/   0   1                  ;Keep EDDT once system starts
                GOTSWM$B                        ;Install useful breakpoint
                SYSGO1$G                        ;Start the monitor

                [PS MOUNTED]
                $1B>>GOTSWM   0$1B              ;Remove breakpoint now

        2.      Disable the monitor's normal changing of  the  address  break.

                For Release 4.1 this is currently done at two places:
                KISSAV+4/   DATAO UNPFG1+26   JFCL      ;Disable instruction
                SETBRK+12/   DATAO A   JFCL             ;Here too

                For Release 6 do not change these locations.  Routine STEXDM
                used in the next step will take care of this.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 33

        3.      Set your own address break at the desired location.  Refer  to
                the Hardware Reference Manual for details.  The instruction to
                set an address break is:
                DATAO APR,ADDR          ;Note:  APR = 0
                where ADDR contains the following fields:
                Bits            Description
                ----            -----------
                  9             Break at given address on instruction fetches
                 10             Break at given address on reads
                 11             Break at given address on writes
                 12             0=exec address space, 1=user address space
                13-35           Address to break on.
                So now assume you want  to  catch  a  bug  which  is  blasting
                location  CURDS.   You want to break only for writes, and want
                to use exec virtual space.  Therefore you type the following:
                For Release 4.1:

                FFF/   0   100000000+CURDS      ;Put data in convenient place
                DATAO APR,FFF$X                 ;Set the address break
                For Release 6, STEXDM will set the break and notify the monitor:

                T1/   0   100000000+CURDS       ;Put data in convenient place
                CALL STEXDM$X                   ;Set the address break
        4.      Now you want to disable address  break  for  all  instructions
                which you expect to change the given location.  Assume in this
                example that  only  location  DIDDLE  should  change  location
                CURDS.  Then you do the following for a model B CPU:
                FFF!   IT:                      ;Define location to get old flags
                IT+1!                           ;Old PC
                IT+2!                           ;New flags
                IT+3!   IT+4                    ;New PC
                IT+4!   EXCH IT                 ;Save AC and get old flags
                IT+5!   TLO 1000                ;Set address break inhibit bit
                IT+6!   EXCH IT                 ;Restore flags and AC
                IT+7!   XJRSTF IT               ;Return to caller
                IT+10!   FFF:                   ;Redefine FFF
                DIDDLE/   MOVEM A,CURDS   FFF$< ;Insert patch
                FFF/   0   XPCW IT$>            ;Call above routine
                FFF+1/   0   MOVEM A,CURDS      ;Typed by DDT when finishing patch
                FFF+2/   0   JUMPA A,DIDDLE+1
                FFF+3/   0   JUMPA B,DIDDLE+2
                DIDDLE/   MOVEM A,CURDS   JUMPA IT+10
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 34

                The  XPCW IT  instruction is used to save the old PC at IT and
                IT+1,  and take a new PC from IT+2 and IT+3.  There the old PC
                is changed to include the address break inhibit bit.   Then  a
                XJRSTF IT  is  done  which  returns  to  the caller.  The next
                instruction then executes without causing  an  address  break.
                You   have  to  insert  the  XPCW  IT   instruction  at  every
                instruction you want to succeed.
                For model A CPUs the procedure is similar, but a little easier:
                FFF!   IT:                      ;Define location to hold PC
                IT+1!   EXCH IT                 ;Get old PC and save AC
                IT+2!   TLO 1000                ;Set address break inhibit flag
                IT+3!   EXCH IT                 ;Restore PC and AC
                IT+4!   JRSTF @IT               ;Return to caller
                IT+5!   FFF:                    ;Redefine FFF
                DIDDLE/   MOVEM A,CURDS   FFF$< ;Insert patch
                FFF/   0   JSR IT$>             ;Call above routine
                FFF+1/   0   MOVEM A,CURDS      ;Typed by DDT when finishing patch
                FFF+2/   0   JUMPA A,DIDDLE+1
                FFF+3/   0   JUMPA B,DIDDLE+2
                DIDDLE/   MOVEM A,CURDS   JUMPA IT+5
        5.      Now put the breakpoints into  the  monitor  so  that  when  an
                address  break  occurs, you will get into EDDT.  There are two
                locations to patch, one for PI level and one for non-PI level.

                ADRCMP$B                        ;Set breakpoint at non-PI routine
                PFCD23$B                        ;Set breakpoint at PI routine
                $P                              ;Now let the monitor proceed

        6.      When either of the above breakpoints is hit, the flags and  PC
                of  the  instruction which caused the address break will be in
                locations TRAPFL and TRAPPC.    If the address break was  from
                JSYS  level  (breakpoint  was to ADRCMP and location INSKED is
                zero) then an $P will proceed properly.  If the address  break
                was  from  the  scheduler  or  from PI level, doing $P will be
                useless since the monitor will then BUGHLT because it  doesn't
                want to see an address break under these conditions.  However,
                this is ok if all you want  to  do  is  find  the  instruction
                causing the trashing.

              If the location still gets trashed after trying to catch it this
         way,  either your procedure is wrong;  e.g. by trying this on a  2020
         (which has no address break feature); the location is  being  changed
         by  some  IO  being  done  (RH20s, DTEs, etc); or else the machine is
         having some hardware problems.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 35

                            RECOVERING FROM SYSTEM DISASTERS

        There are some common system  disasters  which  in  many  cases  can  be
        recovered  from  quickly and with a minimum of effort.  The four we will
        discuss in this article are:

             1.  Hung Terminals

             2.  Hung Jobs

             3.  Hung SETSPD

             4.  Trashed Disks

        1.0  HUNG TERMINALS

        Hung terminals are usually the result of two problems.  Either the speed
        has  been  set  incorrectly  for  that terminal type or a problem exists
        between the KL and the front end.  If the problem  is  a  result  of  an
        improper  speed  setting,  then  simply  resetting  the  speed  will  be
        sufficient.  On the other hand, if the  problem  is  due  to  some  sync
        problem  between  the KL and the 11 then the easiest way to recover from
        this is to reload the front end.  This can be  done  by  depressing  the
        halt switch on the operator's console of the 11 and then placing it back
        in the enable state.  After about fifteen seconds, the message

                                [DECsystem-20 continued]

        to be printed on the CTY.  If this fails to free the  terminal,  perhaps
        the problem is a hung job.  See the discussion under that heading.

        2.0  HUNG JOBS

        There are a number of circumstances which arise which  cause  a  job  to
        become  hung,  usually  waiting for some resource to free up, some share
        count to become zero etc.  Some times, these  tests  will  never  become
        satisfied,  the  Job  has its PSI system turned off, and as a result the
        job becomes Hung.  Freeing it up can be very tricky.  The first thing to
        try  is  to  log  the job out from some other terminal.  If this doesn't
        succeed in freeing the job up, then the next best thing is to detach the
        job  from  the  terminal  and  allow  it  to sit there.  It may be using
        negligible amounts of CPU time and  cause  no  adverse  affects  to  the
        system.   To  zap  the job may crash the system which, in most cases, is
        not the desirable approach.

        The next time the system is reloaded, be sure  to  get  a  dump  of  the
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 36
        HUNG JOBS

        system with the hung job and submit it as an SPR (see the SWSKIT article
        about getting informative Dumps).

        3.0  HUNG SETSPD

        This is a fairly common problem brought on by some hardware problem.  It
        is  possible  to bring the system up without running SETSPD under JOB 0,
        logging in, and then trying to run SETSPD under some other operator job.
        If  SETSPD  then  hangs, it is possible to CONTROL/C out of the program,
        edit n-CONFIG.CMD to remove the commands suspected  of  hanging  SETSPD,
        and  retrying.   In  this  way,  while  waiting  for  the  problem to be
        resolved, it is possible to continue timesharing.

        To bring the system up without running SETSPD  automatically,  one  need
        only  install  the  following  patch to the MONITOR using EDDT on system
        start up.

                  EDDTF[   0   -1
                  DBUGSW[   0   2
                  [PS MOUNTED]
                  RUNDD3+7/   PUSHJ P,RUNDII   JFCL     (or at RUNDD3+16 for
                  %%No SETSPD

        The system will then come up as usual except that SYSJOB will  not  run.
        After  successfully  deciding the problem with SETSPD, SYSJOB can be run
        by typing


        This will cause all the commands in the SYSJOB.RUN file to  be  executed
        by SYSJOB.

        4.0  TRASHED DISKS

        This is surely  one  of  the  biggest  headaches  facing  a  specialist.
        Trashed  disks  come  in many forms and recovering from these requires a
        good knowledge of the structure of the TOPS-20 file system.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 37

        If the structure cannot  be  mounted,  it  is  because  of  one  of  the
        following reasons:

             1.  Inconsistency in either of the HOM blocks

                 1.  Word HOMNAM (1) of either HOM block not SIXBIT/HOM/

                 2.  Word HOMCOD (176) of either HOM block not 707070

                 3.  Word HOMHOM (5) of first HOM block not 1,,12

                 4.  Word HOMHOM (5) of second HOM block not 12,,1

                 5.  Word HOMFSN (173) of either HOM block not 20040,,47524

                 6.  Word HOMFSN+1 (174) of either HOM block not 51520,,31055

                 7.  Word HOMFSN+2 (175) of either HOM block not 20060,,20040

                 8.  Right half of word HOMLUN (4) of either home  block  either
                     refers  to a unit greater than the left half of word HOMLUN
                     or it refers to a UNIT already verified

                 9.  Word HOMSNM (3) of either home block does  not  agree  with

                10.  No disk address for index block  in  word  HOMRXB  (10)  of
                     either HOM blocks

             2.  Inconsistencies in Root-Directory page 0

                 1.  Directory number in Directory page 0 of Root-Directory  not

                 2.  Directory block type (DRTYP) of Root-Directory page  0  not
                     400300 (.TYDIR)

                 3.  Relative Page number (DRRPN) of Root-Directory page 0 not 0

                 4.  Top of symbol table (DRSTP) of Root-Directory page 0 out of
                     Directory bounds

                 5.  Pointer to first free block (DRFFB) of Root-Directory  page
                     0 not in page 0 of the directory

                 6.  Pointer to Directory Name String (DRNAM) not under start of
                     symbol table

                 7.  Directory name pointer (DRNAM) not 0 and Name string  block
                     length (NMLEN) not at least 2 words long
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 38

                 8.  Directory name pointer (DRNAM) not  0  and  directory  name
                     block header (NMTYP) not 400001 (.TYNAM)

                 9.  Password block pointer not  0  and  password  string  block
                     length (NMLEN) not at least 2 words long

                10.  Password block pointer not  0  and  password  string  block
                     header (NMTYP) not 400001 (.TYNAM)

                11.  Account string block pointer not 0 and Account string block
                     length (NMLEN) not at least 2 words long

                12.  Account string block pointer not 0 and Account string block
                     header (NMTYP) not 400001 (.TYNAM)

                13.  Remote alias list pointer not  0  and  Remote  alias  block
                     length (NMLEN) not at least 2 words long

                14.  Remote alias list pointer not  0  and  Remote  alias  block
                     header  (NMTYP) not 400001 (.TYNAM) and so on down the next

             3.  Inconsistencies in Block types  or  free  space  in  subsequent
                 pages of the directory.

                 All blocks in the directory (including free space) begin with a
                 block  header  which  specifies  type  and length.  Immediately
                 following one block should be a header for  a  new  block.   If
                 this scheme is corrupted, the mount will fail.

                 1.  Header of a block not

                     1.  (.TYNAM)  400001

                     2.  (.TYEXT)  400002

                     3.  (.TYACC)  400003

                     4.  (.TYUNS)  400004

                     5.  (.TYFDB)  400100

                     6.  (.TYDIR)  400300

                     7.  (.TYFRE)  400500

                     8.  (.TYFBT)  400600

                     9.  (.TYGDB)  400700

                    10.  (.TYRNA)  401000
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 39

                 2.  Header of a block is NAMTYP and Block length not at least 2

                 3.  Header of a block is EXTTYP and block length not at least 2

                 4.  Header of a block is ACCTYP and block length not at least 3

                 5.  Header of a block is USRTYP and block length not at least 3

                 6.  Header of a block is FDBTYP and

                     1.  Block length not at least 30 (.FBLN0) words long

                     2.  Pointer to Author String (.FBAUT) not 0 and points to a
                         block  outside  of  the  directory or points to a block
                         that does not meet the tests for a user name string  as
                         described above.

                     3.  Pointer to Last Writer String (.FBLWR) not 0 and points
                         to  a  block  outside  of  the directory or points to a
                         block that does not meet the  tests  for  a  user  name
                         string block as described above.

                     4.  Pointer to Account String (.FBACT) is not less than  or
                         equal  to  zero and it points to a block outside of the
                         directory or it points to a block that  does  not  meet
                         the  tests  for  an  account  string block as described

                     5.  Pointer to Name String (.FBNAM) is not 0 and it  points
                         to  a  block outside of the directory or it points to a
                         block that does not meet the tests for  a  Name  String
                         Block as described above.

                     6.  Pointer to Extension String (.FBEXT) is not  0  and  it
                         points to a block outside of the directory or it points
                         to a  block  that  does  not  meet  the  tests  for  an
                         Extension String Block as described above.

                 7.  Header of a block is DIRTYP and

                     1.  Header is not on a page boundary

                     2.  Relative page number (DRRPN) not  the  calculated  page

                     3.  Pointer to first free block (DRFFB) does not point to a
                         location within the current directory page
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 40

                     4.  Directory number (DRNUM) not 1.

                 8.  Header of a block is FRETYP and block is not at  least  two
                     words or Pointer to next free block (FRNFB) is not zero and
                     points to a location not on the same page as current

                 9.  Last block did not end at DRFTP (address specified on first
                     page of directory)

             4.  BAT blocks inconsistent.

                 1.  Either block does not contain SIXBIT/BAT/ in BATNAM (offset
                     0 in block)

                 2.  Either block does not contain 606060 in BATCOD (offset  176
                     in block)

                 3.  Sector number of the BAT block (BATBLK) not the true sector
                     of block

                 4.  The BAT blocks to  not  compare  exactly  with  each  other
                     through word 176 of the blocks

             5.  Checksum of the Root-directory Index Block does not agree  with
                 the checksum calculated.

                 Checksums are calculated as follows:

                 CHKSUM = 0 ;
                 For I = 0 to 777
                     If XB(I) = 0 then 
                         CHKSUM = CHKSUM + I
                         CHKSUM = CHKSUM + XB(I) ;

                 where XB is the first word of the index block.

        As you can see, there are  many  things  that  could  be  wrong  with  a
        structure  that  inhibits it from being mounted.  The consistency of the
        structure can be checked quite  easily  using  the  FILDDT  commands  of
        STRUCTURE and DISK, discussed elsewhere in the SWSKIT.

        For structures which are badly trashed, the only sane way of  recovering
        is  to  rebuild  the  structure  using  a  catastrophe tape.  For simple
        inconsistencies such as a bad BAT block, CHECKD does the job well.   For
        more  involved  trashes  which  can not be recovered from a back up tape
        (because of a forgetful system manager) the above information can be  of
        great help.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 41

                                 LOOKING AT HUNG TAPES

        A number of problems of the general classification "tape hang" have been
        reported,  and  will  probably  always exist as long as we use magtapes.
        Although there are apparently several variants of the problem, there are
        some  things  which  can  be done by a suitably cautious specialist when
        presented with a hung tape drive.   Listed  below  are  some  techniques
        which can be used in an attempt to investigate and perhaps alleviate the
        problem.  These things should, in general, be harmless  to  the  system,
        barring  mis-typing  in  MDDT.  As a result, perhaps they will not clear
        the problem.

        There are several tables that are used in relation to tape drives.  Some
        of  these tables are indexed by MT unit number, some by MTA unit number.
        In general, it can be  said  that  if  a  table  name  begins  with  the
        characters MT, it will be indexed by MTA or physical unit number, and if
        the table name begins with TL or TP, it will be indexed by MT or logical
        unit  number.   The  TL  and TP tables will usually have something to do
        with the tape labeling system.  This article concerns itself mainly with
        the more important tables relating to MTAs (physical tape units).

        When playing with the tape subsystem, certain care should be taken.  For
        instance,  it  always  helps  if  no one else is actively using the tape
        drives while you attempt something like reloading the  microcode  for  a

        1.  Finding the Tape Drive

        There are several tables  parallel  to  each  other  which  concern  the
        ownership  of  a  tape drive.  Those of interest are DEVNAM, DEVCHR, and
        DEVUNT.  At DEVNAM+n is the device name in SIXBIT.   At  DEVUNT+n  is  a
        word with the left half set to the assigner's job number, -1 if free, or
        -2 if being controlled by the allocator.  The right  half  contains  the
        unit number.  Note that with tape allocation turned on, MTAs will always
        indicate that job 0 has the drive assigned and that the offset to the MT
        unit  number  will contain the job number of a user.  At DEVCHR+n is the
        device characteristics word.  Knowing the devicename or the owning  job,
        one can use DDT to find the table offset.  See the example below.

        2.  Grabbing the Drive

        Knowing the offsets into DEVUNT, the device assignment can be  freed  by
        putting  -1  into  the  left  half of the appropriate DEVUNT entry.  The
        drive can then be assigned by the normal ASSIGN command to the EXEC.  In
        dealing  with  the  allocator, your own job number can be placed here if
        necessary.  The drive, however, will still be in no state to use.   Note
        that  the  appropriate DEVUNT entry would be the one referring to the MT
        not the MTA.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 42

        3.  Clearing External Errors

        Make sure that there is a tape of some sort mounted, and  the  drive  is
        placed  on-line.   Having  a  write-enable  ring in the tape may help in
        being sure the unit is functional if the hung condition is cleared.

        4.  Checking the UDB

        Next, the Unit Data Block status should be  reset.   This  word  can  be
        found using the MTCUTB table.  This table is indexed by MTA unit number,
        the left half is the address of the channel data block  (CDB),  and  the
        right  half contains the address of the UDB.  The status word of the UDB
        should then be reset to the base state.  The right half should  be  left
        alone--it basically contains drive type.  The left half should have only
        bit 16 set, which indicates  a  tape  type  device  (US.TAP).   The  old
        contents should be remembered for purposes of later analysis.

        5.  Checking the Status

        Now, table MTASTS  is  examined,  indexed  by  MTA  unit  number  again.
        Remember the old contents.  Then clear the word to zero.

        6.  Example

            @enaBLE (CAPABILITIES) 
                        !MTA OFFSETS IN THE DEVxxx TABLES.
                        !DEVNAM HAS SIXBIT DEVICE NAMES
            devnam+21/   HLRZM P2,FKBSPW+217(T1)   $6t;MTA0     
            DEVNAM+22/   MTA1     
            DEVNAM+23/   MTA2     
            DEVNAM+24/   MTA3     
            DEVNAM+40/   MTA17     
            mtan=20             !ROOM FOR 20 (OCTAL) TAPE DRIVES HAS BEEN ALLOCATED
            mtindx[   777765,,5   !BUT ONLY 5 ACTUAL TAPE DRIVES ARE ON THIS SYSTEM
                        !THE MTs WILL APPEAR AFTER MTAs IN THE DEVxxx TABLES SO
            devnam+41/   HLRZM P1,@0   $6t;MT0      
            DEVNAM+42/   MT1      
            DEVNAM+43/   MT2      
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 43

            DEVNAM+44/   MT3      
            DEVNAM+60/   MT17
                        !THE MTxxxx TABLES FOR MTAs AND OFFSETS INTO THE TLxxxx AND
                        !TPxxxx TABLES FOR MTs
            devunt+21[   0   !MTA UNIT ZERO (MTA0: FROM DEVNAM ABOVE) ASSIGNED TO JOB 0
            DEVUNT+22[   1   !JOB 0,,MTA1:
            DEVUNT+23[   2   !JOB 0,,MTA2:
            DEVUNT+24[   3   !JOB 0,,MTA3:
            DEVUNT+25[   4   !JOB 0,,MTA4:
            DEVUNT+26[   5   !JOB 0,,MTA5:
            DEVUNT+27[   777777,,6   !UNASSIGNED,,MTA6:
            DEVUNT+40[   777777,,17   !UNASSIGNED,,MTA17:
                        !DV%PSD=400000 INDICATES A PSEUDO DEVICE
            devunt+41[   32,,400000   !PSEUDO DEVICE MT0: IS ASSIGNED TO
                                      !JOB 32 OCTAL (JOB 26 IN DECIMAL)
            DEVUNT+42[   777776,,400001   !CONTROLLED BY ALLOCATOR,,MT1:
            DEVUNT+43[   777776,,400002   !     "     "       "   ,,MT2:
            DEVUNT+44[   777776,,400003   !     "     "       "   ,,MT3:
            DEVUNT+60[   777776,,400017   !     "     "       "   ,,MT17:
            tlabr0[   405000,,0   !BIT 0 INDICATES A VALID VOLUME IS MOUNTED ON MTA5
            mtcutb+5[   730437,,730625   !CDB,,UDB FOR MTA5 BEING USED BY JOB 26
                                         !WHO KNOWS IT AS MT0 (SEE ABOVE)
            730625[   102,,157  !FIRST WORD OF UDB FOR MTA5
                                !US.WLK=1B11  ==> WRITE LOCKED
                                !US.TAP=1B16  ==> TAPE TYPE DEVICE
                                !.UTT70=17B35 ==> TU70
                            !HASN'T BEEN REFERENCED BY THE USER YET
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 44

            mretn$g             !TO RETURN TO SDDT FROM MDDT
            ^Z                  !TO RETURN TO THE EXEC FROM SDDT

        If clearing MTASTS and UDBSTS for the drive doesn't seem  to  clear  the
        problem,  you  will probably have to do more digging around to find some
        other, more obscure, inconsistency in the MTA/MT tables.   This  can  be
        accomplished  by referring to the monitor tables under MTA-STORAGE-AREA.
        As always, extreme caution should be exercised while fooling  around  in
        MDDT  as  you can accidentally trash some random location in the monitor
        just by hitting a carriage return at the wrong time.

        One last note should  be  made  about  the  monitor  tables  here.   The
        description of the DEVUNT table would lead one to believe that the right
        half will contain a -2 if the device is under control of the  allocator.
        If  the  device is under control of the allocator, the -2 will appear in
        the left half.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 45

                           A LOOK AT SOME OF THE DISK STUFF

        This article is a front for  the  PHYPAR  module,  which  is  where  the
        information  may  be reliably obtained, and should serve as the ultimate
        reference for these problems.

                Much of the system debugging you  will have to deal with  will
        involve the DEC-20  hardware.  There always  seems to be  a large  gap
        between what the  diagnostics can  tolerate and what  the monitor  can
        tolerate in the way of malfunctioning hardware.  The monitor will  not
        always point you to  the real disk or  magtape problem, say, but  will
        crash after  something has  gone wrong  a few  minutes ago  somewhere.
        Most of the hardware problems that we have had to deal with that  were
        really difficult to track  down and point the  Field Service rep.   to
        were problems with disk hardware.  The following is information  which
        you can use to  help Field Service trace  down problems which are  not
        reported in  the diagnostics.   In most  cases the  Field Service  rep
        knows what all the status  bits etc.  mean but  has  not been able  to
        find them in the monitor crashes or running monitor.

                CHNTAB is  an  ordered  list  of  Channel  Data  Block
                addresses starting with channel 0.  RH20-0 data  block
                address is in the first word etc.

                CDB is the Channel Data  Block.  There is one CDB  per
                channnel.   The   CDB   contains   channel   dependent
                instructions and data, pointers to the unit data block
                (UDB) in the case of  RPO4, RP05, and RP06's.  In  the
                case of tapes  the pointer  is to the Kontroller  Data
                Block (TM02/3) which points in turn to the UDBs.   The
                CDB also  contains  information  about  the  currently
                active unit.   When  the channel  interrupts,  control
                passes (via  a JSP)  to CDBINT.   The CDB  address  is
                stored in AC1, P1 and the principal analysis  routine,
                PHYINT, is called.
        NOTE:   The CDBs are referenced in modules PHYSIO, PHYH2 (RH20
                code), PHYM2 (TM02/3 code)  and PHYP4  (RP04,05,06,07s
                code).  The  Channel  Data  Block is  defined  in  the
                module PHYPAR.  The address that you get in CHNTAB  is
                really a pointer  to word0 which  contains the  status
                bits for this controller (CDBSTS).  Look in PHYPAR for
                the table  definition.  Some  words of  interest  are:
                CDBaddress  +   CDBSTS:   status   and   configuration
                information CDBaddress + CDBUDB:  8 word table of  UDB
                (or KDB) addresses.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 46

                The status bits which are  also defined in PHYPAR  are
                listed here for your convenience:

                CS.OFL==1B0             ; offline
                CS.AC1==1B1             ; primary command active
                CS.AC2==1B2             ; secondary command active
                CS.ACT==CS.AC1!CS.AC2   ; any active
                CS.MAI==1B3             ; channel is in maintenance mode
                CS.MRQ==1B4             ; maintenance mode requested for unit
                CS.ERC==1B5             ; error recovery in progress
                CS.STK==1B6             ; channel supports command stacking
                CS.ACL==1B7             ; alternate command list is current

                BITs 30-32              ; PIA field
                BITs 33-35              ; channel type field

                Kontroller Data Block  Defined in  PHYPAR also.
                Referenced in PHYM2, PHYPAR, PHYSIO.  Words  of
                interest are:

                KDBADDR+KDBSTS:         ; flags unit type
                KDBADDR+KDBUDB:         ; UDB table first word (1 word/UDB)

                Unit Data Block.  There is one UDB per unit associated
                with a CDB or KDB.  The UDB contains information about
                the current activity on the unit in question.  The UDB
                is defined in PHYPAR as well.  Some words of  interest
                are noted  below.   Look  in the  listings  for  other

                UDBADDR + UDBSTS:       ; status and configuration info (see below)
                UDBADDR + UDBERR:       ; error recovery status word
                UDBADDR + UDBERP:       ; error reporting work area if non 0
                UDBADDR + UDBRED:       ; reads - sectors if disk, frames if tape
                UDBADDR + UDBWRT:       ; writes - sectors if disk, frames if MTA
                UDBADDR + UDBSRE:       ; soft read errors
                UDBADDR + UDBSWE:       ; soft write errors
                UDBADDR + UDBHRE:       ; hard read errors
                UDBADDR + UDBHWE:       ; hard write errors
                UDBADDR + UDBPS1:       ; current cylinder if disk, cur file if MTA
                UDBADDR + UDBPS2:       ; current sector within cyl if disk, record
                                        ;  in file if tape
                UDBADDR + UDBSPE:       ; soft positioning error
                UDBADDR + UDBHPE:       ; hard positioning error        

                                        ; NOTE - there are several other UDB words
                                        ; including a device dependent portion
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 47


                US.OFS==1B0     ; off line or unsafe
                US.CHB==1B1     ; check HOME blocks before any normal I/O
                US.POS==1B2     ; positioning in progress
                US.ACT==1B3     ; active
                US.BAT==1B4     ; on if bad BAT blocks on this unit
                US.BLK==1B5     ; lock bit for this units BAT blocks
                US.PGM==1B6     ; dual port switch in (A or B)
                US.MAI==1B7     ; unit is in maintenance mode
                US.MRQ==1B8     ; maintenance mode requested on this unit
                US.BOT==1B9     ; unit is at BOT
                US.REW==1B10    ; unit is rewinding
                US.WLK==1B11    ; unit is write locked
                US.MAL==1B12    ; maintenance mode allowed on this unit
                US.OIR==1B13    ; operator intervention required, set at
                                ;  interrupt level, checked periodically.
                US.OMS==1B14    ; once a minute message to operator,  used in
                                ;  conjunction with US.OIR.
                US.PRQ==1B15    ; positioning required on this unit
                US.TAP==1B16    ; device type tape
                US.PSI==1B17    ; tape - online/offline/rewind done transition


                .UTRP4 = 1      ; RP04
                .UTRS4 = 2      ; RS04 (drum)
                .UTT16 = 3      ; TU16 (TU45)
                .UTTM2 = 4      ; TM02 as a unit
                .UTRP5 = 5      ; RP05
                .UTRP6 = 6      ; RP06
                .UTRP7 = 7      ; RP07
                .UTRP8 = 10     ; RP08
                .UTRM3 = 11     ; RM03
                .UTTM3 = 12     ; TM03 AS A UNIT
                .UTT77 = 13     ; TU77
                .UTTM7 = 14     ; TM78
                .UTT78 = 15     ; TU78
                .UTDXA = 16     ; DX20-A FOR TAPES
                .UTT70 = 17     ; TU70
                .UTT71 = 20     ; TU71
                .UTT72 = 21     ; TU72
                .UTT73 = 22     ; TU7x
                .UTDXB = 23     ; DX20-B FOR DISKS
                .UTP20 = 24     ; RP20
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 48


                BLOCK 0:        ; 11 bootstrap
                BLOCK 1:        ; primary HOME block
                BLOCK 2:        ; primary BAT block
                BLOCKS 3-11:    ; reserved
                BLOCK 12        ; secondary HOME block
                BLOCK 13        ; secondary BAT block

        The places where the  disk pages for  the above are  stored is in  the
        table HOME.  HOME  is defined in  STG. The BAT  blocks are defined  in
        PROLOG and the HOME blocks are defined in DSKALC.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 49

                                   DISK FEATURES OF FILDDT

            The FILDDT  shipped after  release  4 of TOPS-20 has two new  commands
            in relation to disk file structure maintenance.  They are:

                STRUCTURE (FOR PHYSICAL I/O IS) disk-structure
                        Examines the specified disk structure.

                        Examines the specified disk unit.

            These are privileged functions and one must ENABLE to use them.

            These two commands are nearly  identical.  Their difference is in  the
            way the structure  is identified.   To use the  STRUCTURE command  the
            structure must  be  mounted.   The STRUCTURE  command  is  useful  for
            examining a multi-pack  structure.  The  DRIVE command  is useful  for
            examining the  file system  of a  structure which  cannot be  mounted.
            Channel, controller, and unit  numbers can be  found from the programs
            UNITS, DS, SYSDPY, or OPR.
            Word addressing is in the same format as in other forms of DDT.
            It is easier  to understand exactly  what the disk  will look like  in
            FILDDT if you keep in mind that all sectors will be packed in the  DDT
            address space, without regard for sector size, starting at DDT address
            0.  For instance, on an RP06 there are four sectors per memory page or
            200 (octal) words per sector.  Therefore, sector zero of the structure
            will begin at FILDDT address 0 and end at memory address 177  (octal).
            Sector 1 will begin at address 200 and end at 377.  All supported disk
            drives except the RP20 have 200 (octal) words per sector.  On the RP20
            there are 1000 (octal) words per sector (one page).  All  index  block
            addresses and most monitor disk addresses are in sectors.  That is why
            it is important to be able  to translate between sector addresses  and
            FILDDT memory addresses.
            The FILDDT option of  ENABLE PATCHING is also  available for use  with
            the DRIVE and  STRUCTURE command.  With  this option on,  the user  is
            able  to  modify  specific  words  on  the  structure.   Another  very
            convenient FILDDT command  one may  use in conjunction  with the  disk
            commands is LOAD (symbols from) input file spec.  One may specify  any
            file here but a useful one is SYSTEM:MONITR.  The symbol table to  the
            MONITOR has  HOM  block  sector addresses,  FDB offsets  etc.  When  a
            file's  symbols  are  loaded,  one may  also define  his own  symbols.
            This is useful to remember addresses of data structures on the  units.
            For example, after finding the index block to a file, one could define
            a symbol, FILIDX at that address for easy referencing later on.

            When examining  a multi-pack  structure using  the STRUCTURE  command,
            addressing the first unit is exactly as if there were only one unit in
            the structure.  FILDDT addresses of  sectors on the other units  begin
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 50

            immediately  after  the  last  address  for  the  first  unit  of  the
            structure.  For example, consider  that we would  like to examine  the
            BAT blocks for the second unit of a two pack STR: on RP06 drives.
            An RP06 contains 304000. sectors per unit and 128.  words per sector.
            The first FILDDT address for the second unit of a RP06 two  pack  STR:
            is  304000.*128.=38912000. or 224340000 (octal)
            [22722 symbols loaded from file]
            [Looking at file structure PS:]

                ; starting address of second unit in structure plus sector
                ; address of BAT blocks (2) times words per sector gives
                ; FILDDT address of start of BAT blocks for that unit

            224,,340400[   424164,,0   $6T;   BAT       ; Found it
            For another example, let's say we would like to find the start of  the
            ROOT-DIRECTORY symbol table:

            NWSEC=200                   ; number of words per sector
            HM1BLK=1                    ; sector number of HOM block
            HOMRXB=10                   ; offset for index block of ROOT-DIRECTORY
                                        ; HOM block sector number times words
                                        ; per sector equals address of HOM start
            HM1BLK*NWSEC[   505755,,0   $6T;HOM  
            HM1BLK*NWSEC+HOMRXB[   10,,5740 ; plus offset to address of index block
                                        ; sector number of index block times
                                        ; words per sector gives address of
            5740*NWSEC[   10,,5744      ; ROOT-DIRECTORY index block
                                        ; NOTE:  Bit 14 (DSKAB) specifies this
                                        ; address as a disk sector address.
                                        ; sector addresses are bits 15-35
            RTDIDX:                     ; define symbol for index block here
                                        ; sector number of first page of
                                        ; ROOT-DIRECTORY times number of words
                                        ; per sector gives the address of first
            5744*NWSEC[   400300,,100   ; page of ROOT-DIRECTORY
            RTDIR0:                     ; define start of page 0 of ROOT-DIR
            RTDIR0+3[   30610           ; plus 3 for start of symbol table
                                        ; NOTE: adr is a 'directory address'
                                        ;       offset 610 of directory page 30
            RTDIDX+30[   10,,6250       ; get sector adr of page 30 of ROOT-DIR
                                        ; sector adr of page 30 times words per
                                        ; sector gives address of page 30 of
                                        ; ROOT-DIRECTORY.
            6250*NWSEC+610[   400400,,1 ; Add offset for symbol table start
            RTDSYM:                     ; Define a symbol here
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 51

                            SUPPORTED DISK DRIVE PARAMETERS

           ----  ------------   -----   --------------  ----------      ---

           RP04    38,000.      Pack       6            Massbus         KL(2)

           RP05    38,000.      Pack       6            Massbus         KL(2)

           RP06    76,000.      Pack       3(3)         Massbus         KL/KS

           RM03    30,340.      Pack       2            Massbus         KS

           RP20   201,420.      Fixed      3(4)         Massbus+DX20B   KL

           RP07   216,376.      Fixed      2            Massbus         KL

           RA80    53,508.      Fixed      6            CI20+HSC50      KL

           RA81   200,928.      Fixed      3            CI20+HSC50      KL

           RA60    90,516.      Pack       3            CI20+HSC50      KL

           (1) -- depends on addressing, MXPGUN, MXSTRU, and BTBSIZ; SPD is final
           (2) -- disk model no longer sold
           (3) -- 2 packs/structure on a KS or Model A machine
           (4) -- 1 spindle/structure on Model A machines
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 52

                            SUPPORTED TAPE DRIVE PARAMETERS

           ----  -----    -------  ------------   ----------    ---     -----

           TU45  75ips   800/1600   8(KL)/4(KS)   TM02/TM03     KL/KS    (1)(2)

           TU70   100    800/1600        8        DX20-A/TX02   KL       (1)

           TU71   100     556/800        8        DX20-A/TX02   KL       (1)(3)

           TU72   100   1600/6250        8        DX20-A/TX02   KL       (4)

           TU77   125    800/1600        4        TM02/TM03     KL/KS    (2)

           TU78   125   1600/6250        4        TM78          KL

           TA78   125   1600/6250        4        HSC50/TS78    KL       (5)

           (1) -- tape model no longer sold
           (2) -- TM02 controller no longer sold
           (3) -- 7 track model
           (4) -- TX05 option allows 16 drives/DX20 using 2 TX02s
           (5) -- Planned for some TOPS-20 release after 6.0
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 53

                            TOPS-20 SCHEDULER TEST ROUTINES

        The following is a tabulation of (hopefully) all of the scheduler  tests
        used  by  the TOPS-20 monitor, time-frame approximately Release 6.  This
        includes ARPA and DECNET tests.  This is  the  data  one  finds  in  the
        monitor table FKSTAT indexed by fork number for forks which have blocked
        and left the GOLST (i.e.  LH(FKPT) contains WTLST).  The format  of  the
        FKSTAT  table  words  is TEST DATA,,TEST ROUTINE ADDRESS.  The scheduler
        test routines are called periodically to determine if a process  can  be
        unblocked.   This is indicated by a skip return from the scheduler test.
        A nonskip return is taken if the process cannot yet be unblocked.

        When examining the monitor because of a hung job  or  fork,  the  FKSTAT
        table  can  often reveal the reason the fork is hung, and this sometimes
        even allows corrective action to be taken.

        The table below gives routine name, what you should expect to see in the
        FKSTAT  table,  and  the  module in which the scheduler test is defined,
        followed finally by a short description of what the particular condition
        is which is being tested.

        Those tests defined in PAGUTL are found in  PAGEM  in  earlier  monitors
        than release 6.0.

                                    SCHEDULER TESTS

         ----           ----------------------------------------        -------

        BALTST          [CONNECTION #,,BALTST]                          [NETWRK]
                        Wait for network bit allocation.

        BATTST          [UNIT #,,BATTST]                                [DSKALC]
                        Wait for US.BLK, the lock bit for the BAT blocks
                        on the unit, in the UDB to be zero.

        BLOCKM          [TIME,,BLOCKM]                                  [SCHED]
                        Wait for TIME in BLOCKM format which is the low
                        order 17 bits of the desired future time to be
                        compared against a suitably masked TODCLK.

        BLOCKT          [TIME,,BLOCKT]                                  [SCHED]
                        Wait for TIME in BLOCKT format which is a
                        value that is shifted left 10 bits and compared
                        against a suitably masked TODCLK, providing a
                        longer delay than BLOCKM, but less precision.

        BLOCKW          [TIME,,BLOCKW]                                  [SCHED]
                        Wait for TIME in BLOCKW format (same as BLOCKM).
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 54

         ----           ----------------------------------------        -------

        CDRBLK          [UNIT NUMBER,,CDRBLK]                           [CDRSRV]
                        Wait for card-reader offline, or not waiting for
                        a card.

        CHKLOK          [ADDRESS,,CHKLOK]                               [NSPSRV]
                        Wait for NSP block lock at address to free.

        COFTST          [TIME,,COFTST]                                  [MEXEC]
                        Wait for job in FKJOBN to be attached or time
                        in BLOCKT form to elapse.

        D6BWT           [INDEX,,D6BWT]                                  [DTESRV]
                        Wait for D6STS(INDEX) to be .GE. zero, indicating
                        a free condition.

        D6DWT           [INDEX,,D6DWT]                                  [DTESRV]
                        Wait for D6%DDN to be set in D6STS(INDEX) to
                        indicate read data done.

        D6RWT           [INDEX,,D6RWT]                                  [DTESRV]
                        Wait for D6%RDN to be set in D6STS(INDEX) to
                        indicate response header.

        D6WKT           [INDEX,,D6WKT]                                  [DTESRV]
                        Wait for timer in D6CLK(INDEX) to expire.

        DBWAIT          [DTE #,,DBWAIT]                                 [DTESRV]
                        Wait for the TO-10 doorbell from the given DTE.

        DGLTST          [0,,DGLTST]                                     [DIAG]
                        Wait for DIAGLK lock to be free.

        DGUIDL          [UDB ADDRESS,,DGUIDL]                           [DIAG]
                        Wait for the unit to show as idle in the UDB.

        DGUTST          [UDB ADDRESS,,DGUTST]                           [DIAG]
                        Wait for the maintenance bit to set in the UDB.

        DISET           [ADDRESS,,DISET]                                [SCHED]
                        Wait for contents of ADDRESS to be zero.

        DISGET          [ADDRESS ,,DISGET]                              [SCHED]
                        Wait for contents of ADDRESS to be positive.

        DISGT           [ADDRESS,,DISGT]                                [SCHED]
                        Wait for contents of ADDRESS to be greater than

        DISLT           [ADDRESS,,DISLT]                                [SCHED]
                        Wait for contents of address to be less than
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 55

         ----           ----------------------------------------        -------

        DISNT           [ADDRESS,,DISNT]                                [SCHED]
                        Wait for contents of ADDRESS to be non-zero.

        DMPTST          [COUNT,,DMPTST]                                 [IO]
                        Wait for COUNT to be less than DMPCNT to indicate
                        dump mode buffers freed.

        DSKRT           [PAGE #,,DSKRT]                                 [PAGEM]
                        Wait for CSTAGE for PAGE # to not be PSRIP,
                        meaning disk read completed.

        DWRTST          [PAGE #,,DWRTST]                                [PAGUTL]
                        Wait for DRWBIT to clear in CST3(PAGE #),
                        meaning write completed.

        ENQTST          [FORK #,,ENQTST]                                [ENQ]
                        Wait for the lock on ENFKTB+FORK #.

        FEBWT           [ADDRESS OF FE UDB,,FEBWT]                      [FESRV]
                        Wait for EOF or input bytes available from FE.
                        Wake also on invalid assignment.

        FEDOBE          [ADDRESS OF FE UDB,,FEDOBE]                     [FESRV]
                        Wait for output buffer empty and all bytes are
                        acknowledged by the FE.  Wake also if not a 
                        valid assignment.

        FEFULL          [ADDRESS OF FE UDB,,FEFULL]                     [FESRV]
                        Wait for the current count of output bytes to be
                        less than the count of bytes in the interrupt
                        buffer.  Wake also on invalid assignment.

        FORCTM          [SUPERIOR FORK INDEX,,FORCTM]                   [SCHED]
                        Identifiable wait forever, forced termination.

        FRZWT           [PREVIOUS TEST,,FRZWT]                          [FORK]
                        Identifiable wait forever, frozen fork.

        HALTT           [SUPERIOR FORK INDEX,,HALTT]                    [SCHED]
                        Identifiable wait forever for halted fork.

        HIBERT          [TIME,,HIBERT]                                  [SCHED]
                        Wait for TIME in BLOCKT format.

        HUPTST          [<0:9>TIME<10:17>HOST #,,HUPTST]                [NETWRK]
                        Wait for IMPHRT bit set for host or time out in
                        BLOCKW form.

        IDVTST          [0,,IDVTST]                                     [IMPDV]
                        Wait for the lock on IDVLCK to free, lock it.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 56

         ----           ----------------------------------------        -------

        IMPBPT          [0,,IMPBPT]                                     [IMPDV]
                        Wait for IMPFLG nonzero, or IBPTIM timer to run
                        out, or IDVLCK lock free and output scan needed
                        for the IMP.

        JB0TST          [TIME,,JB0TST]                                  [MEXEC]
                        Wait for JB0FLG set nonzero for explicit request
                        or time in BLOCKT form to elapse.

        JRET            [0,,JRET]                                       [SCHED]
                        Wait forever, interruptible.

        JSKP            [0,,JSKP]                                       [SCHED]
                        Unconditional skip used to schedule immediately.

        JTQWT           [0,,JTQWT]                                      [SCHED]
                        Wait for JSYS trap queue.

        LCKTSS          [ADDRESS,,LCKTSS]                               [IO]
                        Wait for lock at ADDRESS to unlock, lock it.

        LKDSPT          [0,,LKDSPT]                                     [STG]
                        Wait for room in LDTAB table of directories
                        currently locked.

        LKDTST          [INDEX INTO LDTAB,,LKDTST]                      [STG]
                        Wait for bit in LCKDBT to clear, indicating
                        directory unlocked.

        LODWAT          [ADDRESS OF STATUS WORD,,LODWAT]                [LINEPR]
                        Wait for flag LP%LHC to set in the addressed
                        word, indicating loading has completed of the
                        VFU or RAM file.

        LPTDIS          [UNIT ADDRESS,,LPTDIS]                          [LINEPR]
                        Wait for an error condition on the addressed
                        unit, or for all buffers cleared and no bytes
                        still in the front-end, before finishing close
                        operation on the device.

        MTARWT          [IORB ADDRESS,,MTARWT]                          [MAGTAP]
                        Wait for IRBFA in the IORB to indicate that this
                        IORB is no longer active.

        MTAWAT          [UNIT #,,MTAWAT]                                [MAGTAP]
                        Wait for all outstanding IORBs for unit to be

        MTDWT1          [UNIT #,,MTDWT1]                                [MAGTAP]
                        Wait for the count of outstanding requests on the
                        unit to go to one.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 57

         ----           ----------------------------------------        -------

        NCPLKT          [0,,NCPLKT]                                     [NETWRK]
                        Wait for lock NCPLCK to free, lock it.

        NOTTST          [<0:8>CONNECTION #<9:17>STATE,,NOTTST]          [NETWRK]
                        Wait for connection to leave state.

        NSPTST          [0,,NSPTST]                                     [NSPSRV]
                        Wait for KDPFLG nonzero, indicating KMC11 wants
                        service, or MSGQ nonzero, indicating messages to

        NVTNTT          [<0:8>OPTION #,<9:17>LINE #,,NVTNTT]            [TTNTDV]
                        Wait for completed NVT negotiation.

        OFNLKT          [OFN,,OFNLKT]                                   [PAGUTL]
                        Wait for OFN unlocked--SPTLKB zero in SPTH(OFN).

        PIDWAT          [FORK #,,PIDWAT]                                [IPCF]
                        Wait for bit for fork in PDFKTB to set.

        RLDTST          [0,,RLDTST]                                     [DTESRV]
                        Wait for master DTE running.

        SEBTST          [0,,SEBTST]                                     [SYSERR]
                        Wait for SECHKF to go nonzero before starting
                        Job 0 task to write queued SYSERR entries.

        SEEALL          [0,,SEEALL]                                     [TTYSRV]
                        Waits for SNDALL to go to zero, indicating the
                        send-all buffer available.

        SPCTST          [0,,SPCTST]                                     [DTESRV]
                        Wait for a node.

        SPMTST          [0,,SPMTST]                                     [PAGUTL]
                        Wait for page in SPMTPG to be on SPMQ or the
                        time SPMTIM to expire.

        SQLTST          [0,,SQLTST]                                     [IMPDV]
                        Wait for the special queues lock SQLCK and lock

        STRTST          [SDB ADDRESS OF STRUCTURE,,STRTST]              [MSTR]
                        Wait for the structure lock to be free.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 58

         ----           ----------------------------------------        -------

        STSWAT          [ADDRESS OF STATUS WORD,,STSWAT]                [CDRSRV]
                        Wait for flag CD%SHA to come on in the addressed
                        word, indicating that cardreader status has

        STSWAT          [ADDRESS OF STATUS WORD,,STSWAT]                [LINEPR]
                        Wait for flag LP%SHA to set in the addressed
                        word, indicating that printer status has

        SUSFKT          [FORK #,,SUSFKT]                                [FORK]
                        Wait for fork to be on WTLST in either SUSWT
                        OR FRZWT.

        SWPRT           [PAGE #,,SWPRT]                                 [PAGEM]
                        Wait for CSTAGE for PAGE # to not be PSRIP,
                        meaning swap read completed.

        SWPWTT          [0,,SWPWTT]                                     [PAGEM]
                        Wait for NRPLQ nonzero.  Increment CGFLG each
                        time test is unsuccessful.

        TCIPIT          [FORK #,,TCIPIT]                                [TTYSRV]
                        Waits for no interrupts pending for FORK #.

        TCITST          [LINE #,,TCITST]                                [TTYSRV]
                        Wait for line inactive, no fork in input wait,
                        or input buffer non-empty.

        TCOTST          [LINE #,,TCOTST]                                [TTYSRV]
                        Wait for line inactive, or output buffer not
                        too full to add a character to it.

        TRMTS1          [0,,TRMTS1]                                     [FORK]
                        Identifiable wait forever for inferior fork termination.

        TRMTST          [FORK #,,TRMTST]                                [FORK]
                        Wait for FORK # to be on WTLST for either HALTT
                        or FORCTM.

        TRP0CT          [MINIMUM NRPLQ,,TRP0CT]                         [PAGEM]
                        Wait for NRLPQ to be above stated minimum or
                        normal minimum.  Increment CGFLG each time
                        test is unsuccessful.

        TSACT1          [LINE #,,TSACT1]                                [TTYSRV]
                        Wait until line inactive, becoming active, or
                        has a full length dynamic block assigned.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 59

         ----           ----------------------------------------        -------

        TSACT2          [LINE #,,TSACT2]                                [TTYSRV]
                        Wait for line available--inactive or fully

        TSACT3          [LINE #,,TSACT3]                                [TTYSRV]
                        Wait for line inactive--dynamic data unlocked.

        TSTSAL          [0,,TSTSAL]                                     [TTYSRV]
                        Wait for SALCNT to go to zero, indicating the
                        send-all is finished for this buffer.

        TTBUFW          [NUMBER,,TTBUFW]                                [TTYSRV]
                        Wait for NUMBER of buffers.

        TTIBET          [LINE #,,TTIBET]                                [TTYSRV]
                        Wait for line inactive or input buffer empty.

        TTOAV           [LINE #,,TTOAV]                                 [TTYSRV]
                        Wait for line inactive and output buffer not

        TTOBET          [LINE #,,TTOBET]                                [TTYSRV]
                        Wait for line inactive or output buffer empty.

        UDITST          [0,,UDITST]                                     [PHYSIO]
                        Wait for at least two free IORBs on UIOLST.

        UDWDON          [IORB ADDRESS,,UDWDON]                          [PHYSIO]
                        Wait for IS.DON to set in IRBSTS for this IORB.

        UPBGT           [CONNECTION INDEX,,UPBGT]                       [IMPDV]
                        Wait for LTDF connection done flag to set, or
                        output buffers to appear.

        USGWAT          [0,,USGWAT]                                     [JSYSA]
                        Wait for lock on queued USAGE blocks to free.

        VVBWAT          [UNIT #,,VVBWAT]                                [TAPE]
                        Wait for the MDA to reset TPVV handling EOV.

        WATTST          [<0:8>CONNECTION #<9:17>STATE,,WATTST]          [NETWRK]
                        Wait for connection to be in state.

        WTFKT           [FORK #,,WTFKT]                                 [FORK]
                        Wait for fork to be on WTLST.

        WTSPTT          [PAGE #,,WTSPTT]                                [SCHED]
                        Wait for share count on PAGE # to go to 1.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 60

                              TOPS-20 PAGE ZERO LOCATIONS

        The following text outlines the uses of  memory  in  page  zero  of  the
        TOPS-20 monitor as of Release 5.

        ====   ======== =====

          0-17    --    Shadow ACs, not used.

         20     SCTLW   Scheduler halt request word (see SWTST in SCHED).   Word
                         of   function  bits,  current  functions  include  Halt
                         timesharing, wait for system down,  manual  pause,  and
                         reset FE protocol.

         21       --    Used by BOOT to build CCW lists (unused by monitor).

         22       --    Same as 21;  both unused for KS10 systems.

         23     CRSHTM  Initial time for  reload;   -1  =>  time  not  set  yet.
                         Contains   the  date/time  that  the  system  was  last
                         reloaded.   May  see  -1  after  forced  reload  on  KS
                         processor.   BUGSTO  (APRSRV) copies TADIDT into it for
                         each BUGHLT/CHK/INF.

         24     SEBQOU  Pointer to queued SYSERR blocks not yet written.

         25     MMAPWD  Pointer to MMAP for SETSPD.  Contains MMAP.

         26     BUGHAD  Code around SYSLD1 (STG) puts LH into  BUGCHK,  RH  into
                         BUGHLT  after  a  reload.   No  one else uses it, so it
                         should contain zero.

         27     CRSTD1  Current time is saved here on each BUGHLT/CHK/INF.  This
                         is the value that gets into the SYSERR block.  Contains
                         the   date/time   for   the   system's   most    recent

         30     SHLTW   Scheduler  halt  word,  depositing  a  nonzero  contents
                         requests system shutdown.

         31     RLWORD  KS  only;   used  for  front-end  communication,  flags,
                         keep-alive, etc.  (see PROKS).  Unused on KL.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 61

         32     CTYIWD  KS only;  used for front-end communication, used for the
                         CTY input location.  Unused on KL.

         33     CTYOWD  KS only;  used for front-end communication, used for the
                         CTY output location.  Unused by KL.

         34     KLIIWD  KS only;  used for front-end communication, used for the
                         KLINIK input location.  Unused by KL.

         35     KLIOWD  KS only;  used for front-end communication, used for the
                         KLINIK output location.  Unused by KL.

         36       --    Unused/reserved.  Holds KS RHBASE during boot.

         37       --    Unused/reserved.  Holds KS unit number during boot.

         40     .JBUUO  Monitor's location 40.  Holds KS tape info during boot.

         41     .JB41   Monitor's LUUO dispatch word.
                         Contains XPCW LUUBLK.

         42-43    --    Unused/reserved.

         44     .JBREL  Job Data Area word filled in by LINK.  Contains 777.

         45-67    --    Unused/reserved.

         70     PWRTRP  Location executed by the front-end on powerfail restart.
                         Contains JRST PWRRST.

         71     RLDADR  Executed  by  the  front-end  on  certain   (keep-alive)
                         reloads.   APRSRV  demands  this  location be PWRTRP+1.
                         Contains XPCW RLODPC which winds up  at  RLDHLT  for  a
                         KPALVH BUGHLT.

         72       --    Contains address of EDDTF word.

         73     CRSTAD  Is supposed to contain date/time of last crash.  Code in
                         STG  checks  it  to  decide  to  restore  the data from
                         BUGHAD.  During system startup for KL-10s the  word  is
                         used   to   set   the   reload  date/time  if  nonzero.
                         Apparently it gets no real  use  on  KS-10s.   Contains
                         zero while system is in normal operation.

         74     .JBDDT  JOBDDT location.
                         Contains DDT (EDDT entry point).

         75     .JBHSO  Unused/reserved.

         76       --    Contains address of DBUGSW word.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 62

         77       --    Contains address of DCHKSW word.

        100-107   --    Reserved for use by the front-end command language.

        110     STSBLK  KL-Status block pointer, virtual address.  Contains zero
                         if status reporting is not enabled.

        111       --    Physical address (MAP) of above virtual address.

        112     .JBEDV  Pointer to Exec Data Vector
                         Contains MONEDV.

        113-114   --    Unused/reserved.

        115-117   --    Unused/reserved.

        120     .JOBSA  TOPS-10 style start address.
                         Contains NPVARZ+1,,EVGO.

        121     .JBFF   Contains first free address not loaded by LINK.
                         Contains NPVARZ+1.

        122-132   --    Unused/reserved.

        133     .JBCOR  Job Data Area location set by LINK.  LH contains highest
                         low  segment  address loaded with data.  RH refers to a
                         SAVE argument for highest page.

        134-136   --    Unused/reserved.

        137     .JBVER  Job Data Area version number word.
                         Contains current monitor version number.

        140     EVDDT   Monitor startup transfer vector;  enter EDDT.
                         Contains JRST DDTX.

        141       --    Reset and go to EDDT location.
                         Contains JRST SYSDDT.

        142     EVDDT2  Copy of 140.
                         Contains JRST DDTX.

        143     EVSLOD  Entry to initialize file system, used for installation.
                         Contains JRST SYSLOD.

        144     EVVSM   Entry to verify swappable monitor on startup.
                         Contains JRST SYSVSM.

        145     EVRST   Restart the system location.
                         Contains JRST SYSRST.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 63

        146     EVLDGO  Reload and start the system location.
                         Contains JRST SYSGO.

        147     EVGO    Start the monitor location.
                         Contains JRST SYSGO1.

        150     DDTPRS  DDT present flag;  EDDT is present if nonzero.
                         Contains -1 initially, cleared later for EDDTF not set.

        151     BUTRXB  Defined in BOOT and STG but not  used  (BOOT  reads  the
                         disk   address  of  the  Root-directory  from  the  HOM
                         blocks).  Contains zero.

        152     BUTMUN  Defined in BOOT and STG but not  used  (BOOT  reads  the
                         values  from  the  HOM blocks, and uses variable MAXUNI
                         instead).  Contains zero.

        153-162 BUTDRT  Defined in BOOT and STG but not used (BOOT uses internal
                         variable  DSKTAB  for  logical  to  physical  structure
                         mapping).  Contains zeros.

        163-201 BUTCMD  ASCIZ file  name  of  monitor;   used  for  booting  the
                         swappable monitor with calls to VBOOT for segments.

        202     BUTPGS  Start,,End virtual addresses of VBOOT  pages.   Used  to
                         reference and finally unlock/destroy VBOOT pages.

        203     BUTEPT  Contains in LH:  Address of the VBOOT EPT page.
                         RH:  Address of the VBOOT page table page.

        204     BUTPHY  Contains in LH:  Minus number of pages to map.
                         RH:  Address of first page to map  (for  the  monitor).
                         Typically contains -6,,NPVARZ for four pages of code, a
                         file data page and an index block page.  Used with  the
                         value in BUTVIR.

        205     BUTVIR  Virtual address of first page of BOOT to map.  Typically
                         will contain 772000.  Used in conjunction with BUTPHY.

        206     BOOTFL  BOOT flags word, 0 => normal, nonzero =>  special  boot.
                         The  contents  is supposed to be the index into a table
                         (BOOTD) designating how to boot the swappable  monitor.
                         An ILBOOT BUGHLT results if the index is too large.  In
                         the SYSGO routine the value IRBOOT is put into  BOOTFL;
                         the table BOOTD contains entries of JRST GSMDSK for all
                         entries but the IRBOOT offset, which has JRST GSMIRB.

        207-236 PHYPZS  Formerly  used  for  page  zero  I/O  use   by   PHYSIO.
                         Currently unused, contain zero.

        237-777   --    Not used, contain zero.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 64

                                TOPS-20 MONITOR SECTIONS

        The TOPS-20 monitor makes use of a number of sections of  address  space
        on  extended  addressing  machines.   For  release  6,  this  number has
        increased.  The following table lists the defined  monitor  sections  at
        approximately the timeframe of release 6.0.

              Number    Symbol  Use of the Section
              ------    ------  ------------------

                 0      MSEC0   Section zero data and code
                 1      MSEC1   Section one data and code
                 2      DRSECN  Mapped directories
                 3      IDXSEC  Mapped disk index table
                 4      BTSEC   Mapped disk bit table
                 5      SYMSEC  Monitor symbol table, DDT, CSTs,...
                (6)*    ANBSEC  ARPANET buffers
                 -      CISEC   "CI" memory driver
                (7)     TABSEC  Tables - DST,...
                 -      DNBSE1  DECnet buffers
                 -      DNBSE2  DECnet buffers
                 -      CTSSEC  CTS terminal database
               (10)     CFSSEC  CFS buffers
               (11)     INTSEC  ARPANET (Internet) buffers
               (12)     RESSEC  RSE, NRE, NRPE psects - resident free space
               (13)     SWFSEC  Swappable free space
               (14)     FFMSEC  Variable - symbol set to value of first free
                                assignable section
               (37)     HGHSEC  Highest possible section value on KL-10 processor

           * Numbers in parentheses represent the values from a "typical" 6 monitor.
             These numbers are assigned dynamically.  See STG.MAC for the definition
             of the MSECN macro and an explanation of assignable sections.   All the
             sections from SYMSEC on are new for 6.0.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 65

                                 TOPS-20 MONITOR PSECTS

        The TOPS-20 monitor code assembles into a number of PSECTs with  varying
        purposes.   Release  6 has created even more PSECTs to worry about.  The
        following table lists the defined monitor PSECTs  at  approximately  the
        timeframe  of  release 6.0.  Most PSECT beginnings are defined in LDINIT
        with the rest in STG.  POSTLD terminates  all  the  PSECTs  and  handles
        whatever  address  space  rearrangement  is  necessary before saving the
        monitor EXE file.

                Release PSECT   Purpose
                ------- -----   -------

                        RSCOD   Resident monitor code and constant data
                        INCOD   Resident initialization code and constant data
                   6    SZCOD   Section-zero-only resident code
                   5    RSDAT   Resident non-zeroed data
                        PPVAR   Processor-private pages
                        RSVAR   Resident zero-initialized data
                   6    SYVAR   Symbol table data
                        NRVAR   Swappable zero-initialized data
                        PSVAR   PSB data
                        JSVAR   JSB data
                        NRCOD   Swappable monitor code and constant data
                        NPVAR   Swappable page variables
                        POSTCD  POSTLD code and data segment
                   6    ERVAR   Extended section resident variables
                   6    ENVAR   Extended section swappable variables
                   6    EPVAR   Extended section swappable page variables
                        BGSTR   Bugstring texts
                        BGPTR   Pointers to bugstrings

        See SWSKIT document MONITOR-ADDRESS-SPACE.MEMOS for more detail.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 66

                            TOPS-20 MONITOR UNIVERSAL FILES

        The following universal files are used to build a TOPS-20 monitor.

                Release  Name   Function
                -------  ----   --------

                        ACTSYM  Accounting file symbol definitions
                        MONSYM  General monitor symbol definitions
                        MACSYM  General monitor defintions

                 6.0    ANAUNV  ARPANET TCP/IP symbol definitions
                 6.0    GLOBS   Global symbol satisfaction definitions
                 6.0    MSCPAR  MSCP symbol defintions
                        NSPPAR  DECNET symbol definitions
                        PHYPAR  PHYSIO-level device symbol definitions
                  *     PROKL   KL-10 specific definitions
                  *     PROKS   KS-10 specific definitions
                        PROLOG  General monitor definitions
                 6.0    SCAPAR  SCA symbol definitions
                        SERCOD  SYSERR (SPEAR) file symbol definitions

                 6.1**  CTERMD  CTERM symbol definitions
                 6.1    D36PAR  DECNET-36 symbol definitions
                 6.1    NIPAR   Definitons for NI-20 service
                 6.1    SCPAR   DECNET session control symbol definitions
                 6.1    TTYDEF  Monitor terminal definitions


                *  PROKL and PROKS have been combined into PROLOG for Release
                   6.0 and no longer exist independently.

                ** The 6.1 universals are not definite at this time.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 67

                            Known Hardware Deficiencies List

        This is a collected list of known hardware characteristics which show up
        from  time  to  time  as  part  of certain reported problems.  This says
        nothing about whether these characteristics are  bugs  or  features,  or
        whether  they will ever be fixed or changed, but merely attempts to make
        them known internally.

             1.  DZ11 - Cannot set the speed to zero in the hardware,  can  only
                 turn off the receiver clock.

             2.  TM78 - ANSI ASCII was  not  included  in  the  hardware  format

             3.  TM78 - a formatter problem  (corrected  by  ECO  12/83)  causes
                 unreported data loss at end-of-tape.

             4.  TM78 - a formatter problem causes the data mode byte packing to
                 change  from core-dump to hi-density "randomly" while reading a
                 record.  TOPS-20 does not normally see  the  problem  since  it
                 usually  appears  as an overrun that gets retried successfully.
                 An ECO is planned.

             5.  TM02 - Can generate bad parity which it  passes  to  memory  to
                 cause  the  system  memory  parity  errors  when  the  data  is
                 referenced.  This is still seen with Rev 12 to the RH20.

             6.  TM03 - A chip race condition in the M8915 board has been  known
                 to  occur  where a function register has wrong value because it
                 has not settled.  This generates a device error  which  appears
                 transient;   i.e.   CRLFing  DUMPER  tries  the  read again and

             7.  TM03 - ANSI ASCII was  not  included  in  the  hardware  format
                 modes.   The  TM03  does  not set format error if ANSI ASCII is
                 selected.  It will usually get a frame error;  if the  transfer
                 is  a  multiple  of  7/8  bit bytes, the frame error is not set

             8.  TM03 - When using industry-compatible  mode,  reads  not  of  a
                 multiple of four bytes will produce strange results.  The bytes
                 are counted, but the extra bytes are  not  written  to  memory,
                 leaving garbage.

             9.  TM03 - if an error ocurs while rewinding, the  monitor  may  be
                 left in a state of waiting for the rewind to complete, the tape
                 being unusable.  The easiest way to clear this condition is  to
                 reset the TM03, most easily done by the customer by powering it
                 down and back up.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 68

            10.  TM03 - if the TM03 loses synch during the  PE  preamble  bytes,
                 and  it  reaches  9,  it  will  raise  the postamble instead of
                 generating the proper error.  This can result in lost  records.
                 The  usual  symptom  is  a  frame  count  error.   This case is
                 recognizable by the residual frame count being the same as  the
                 initial frame count.

            11.  VT100 - on a VT100 without the extended memory, one can confuse
                 the  internal  microprogram enough to have it clear sections of
                 the screen on Control-U, Control-R, "clear to end of screen" in
                 132 column mode, etc.

            12.  VT125 - especially with printer port option, is known  to  hang
                 in  an  XOFF state that cannot be cleared without resetting the

            13.  VT240 - especially with printer port option, is known  to  hang
                 in  an  XOFF state that cannot be cleared without resetting the
                 terminal.  Reportedly this happens much more  often  than  with
                 the VT125.

            14.  RH20 - perfectly willing to store bad parity data  into  memory
                 until Rev 12.  May still do so.

            15.  DX20 - is unwilling to allow registers to be examined after  it
                 has  started  I/O.   Can  cause  register  access errors if not
                 programmed in correct sequence.

            16.  DX20 - there is a race type condition where the DX20  generates
                 an  an  interrupt  request on channel 5 for some condition, but
                 the code is playing with the DX20 and handles the condition, so
                 it lowers its request, however the KL has latched the interrupt
                 and tries to process it, but no one will respond.  So it  tries
                 the 40+2n type, which gives a PI5ERR occasionally.

            17.  DX20/TU71 - the DX20 microcode does not set the 556 bpi density
                 correctly   for   TU71  (7-track)  drives.   This  can  be  set
                 successfully from the maintenance panel.

            18.  DX20/TX03 - With dual-porting between systems,  if  the  system
                 issues a drive clear to the DX20 during serious error recovery,
                 or when booting the DX, the DX  resets  the  TX03.   The  reset
                 traps  the  TX03  to  zero,  leaving any operation on the other
                 channel in a hung state.

            19.  DX20/TX03- when dual-porting between  systems,  DX2FGS  BUGCHKs
                 occur.  The DX2FGS timer in TOPS-20 has been made larger.  This
                 should lessen the occurence  of  these  BUGCHKs,  but  may  not
                 eliminate them entirely.

            20.  LP20 - at least one of the printers fails to go  off-line  when
                 there  is  anything  in the print line buffer, even if the drum
                 gate is opened.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 69

            21.  LP26 - fails to go offline when there is something left in  the
                 print  line buffer.  When it runs out of paper, it goes offline
                 several lines from the actual bottom of the page.

            22.  LP27 - will not  accept  the  alternate  start  bytes  for  6/8
                 lines-per-inch in a VFU file.  Gets a VFU load error.  Use only
                 "LEFT-ALONE"  with  the  "LINES-PER-INCH"  command  to  MAKVFU.
                 NORMAL.VFU is fine.

            23.  RP07 -  runs  several  hundred  microdiagnostics  at  power-up,
                 causing  hundreds  of  interrupts  and  keep-alives on -10s and

            24.  RP20 - when using the dual-port option, RP20s regularly lose  a
                 full  rotation  when  trying  to do read/write next operations.
                 This happens about once every ten seconds and  will  result  in
                 slightly degraded performance.

            25.  RP20 - may evidence RMR (register modification refused)  errors
                 due  to  a limp servo mechanism on one of the drives.  Multiple
                 queued  operations  complete  before  the   drive   disconnects
                 properly and sets GO correctly for the controller to handle the

            26.  KL10 Microcode - the ADJBP instruction does  not  work  on  the
                 last location of a page.  Corrected in 5.1 microcode.

            27.  KS-10 Front End - Rev.  3.  exhibits problems with  the  KLINIK
                 line.   If  the  link is in use, it is possible to lock out the
                 CTY.  There are problems with the password check on  subsequent
                 tries, and problems with line hang-up.  A software fix has been
                 implemented which clears the KLINIK output word after  queueing
                 the KLINIK request.  This appears to solve the problems.

            28.  KS-10 Front  End  -  Rev.   3.   exhibits  some  problems  with
                 powerfail  restart.   If  the  power  returns  in less that 3.5
                 seconds or so the restart will hang.  In addition  if  Rev.   3
                 and  Rev.  2 boards are mixed, there is no powerfail restart or
                 reload capability.

            29.  KS10 - during a forced reload, the halt status block is written
                 twice,  first when halting and second when rebooting;  thus the
                 second time wipes any valuable data from the first time.   It's
                 once again the 8080 that's responsible.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 70

                           KS10 PROCESSOR CONSOLE INFORMATION


        ^Z      ;enter USER mode
        ^\      ;enter CONSOLE mode
        MK XX   ;Marks microcode word at CRAM address XX (sets bit 95)
        UM XX   ;Unmarks Microcode at CRAM address XX
        MB      ;load only bootstrap of currently selected magtape
        LA XX   ;Load/set KS10 Memory Address
        LI XX   ;Load/set I/O address
        LK XX   ;Load/set 8080 address
        LC XX   ;Load/set CRAM address to be written/read
        EM      ;Examine KS10 Memory (last Memory location specified)
        EM XX   ;Examine KS10 Memory location XX
        EN      ;Examine Next (either from last EK, EM or EI)
        EB      ;Examine BUS and 8080 control registers
        EI      ;Examine I/O (last I/O address specified)
        EI XX   ;Exmaine I/O address XX
        EK      ;Examine 8080 location
        EK XX   ;Examine 8080 address XX
        DM XX   ;Deposit KS10 Memory last addressed, XX data (36 bits)
        DN XX   ;Deposit next (depending on last DK, DM or DI) XX data
        DB XX   ;Deposit BUS, XX data (36 bits)
        DI XX   ;Deposit I/O, XX data (16,18 or 36 bits)
        DK XX   ;Deposit XX (8 bits) into 8080 (Data can only be deposited
                ;in RAM addresses)
        MR      ;MASTER RESET
        CS      ;CPU clock start
        CH      ;CPU clock halt
        CP XX   ;CPU clock pulse (XX=NR of pulses -- default 1 pulse)
        SI      ;Single Instruction
        LF XX   ;Load diagnostic write function (0-7) specifying 12 bits of
                ;microcode (see note at end ****)
        DF XX   ;Deposit Field, write microcode bits according to last LF-command
        EC      ;Examine CRAM ..curr. Control reg, no clocks .. current loc as addr.
        EC XX   ;Examine CRAM at address XX
        DC XX   ;Deposit CRAM, XX is at least 32 octal characters. Address 
                ;previously loaded by LC command
        EX XX   ;EXecute KS10 instruction XX
        ST XX   ;STart KS10 at address XX. Console enters user mode
        SM XX   ;Start microcode at XX (SM 1 causes dump of HALT-status block !!) 
                ;Default is 0 -- Start microcode
        HA      ;HALT KS10 (execute HALT-instruction -- causes microcode to
                ; write HSB and then to enter HALT-loop)
        SH      ;SHUTDOWN (deposit non-zero data in memory location 30)
                ; causing TOPS20 to shut down
        CO      ;Continue (causes microcode to leave HALT-loop)
        PE X    ;Parity Enable (0=disable, 1=DRAM-par, 2=CRAM-par
                ; 4=clock-par error stop, 5=DPE/DPM, 6=CRA/CRM, 7=enable all)
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 71

        CE X    ;CACHE enable (0=OFF, 1=ON, <CR>=show current state)
        TE X    ;CPU timer (1 MSEC) enable (0= OFF, 1=ON, <CR>=show current state)
        TP X    ;CPU TRAPS enable (0=OFF, 1=ON (enables paging),
                ;<CR>=show current state)
        LT      ;Lamp Test, lights three lamps of front panel
        RC      ;Read CRAM direct, functions 0-17
                ; (no resets, no load diag adr, no CPU clock) (see note at end ****)
        EJ      ;Examine Jumps -- prints CRAM address signals (Current CRAM address, 
                ;next CRAM address, jump address, subroutine return address)
        TR XX   ;TRACE - repeats CP and EJ commands until any character typed
                ;XX (if typed) is desired CRAM stop-address
        PM      ;Pulse Microcode (issue single CP and EJ)
        ZM      ;Zero KS10 MOS Memory (beware -- slow)
        RP      ;Repeat - repeats last command, or line of commands which it delimits
                ; Any character (except CNTRL-O) typed will stop repeat
                ;EXAMPLE: EM 0, EK 0, EC 0, RP will repeat execution of this line
        BT      ;Boot SYSTEM -- load CRAM from designated disk (see DS)
                ; via memory then load monitor boot from disk and start at 1000
        BT 1    ;same as BT, but loads diagnostic monitor SMMON and starts at 20000
        LB      ;Load Bootstrap from designated disk (see DS)
        LB 1    ;Load Bootstrap diagnostic monitor SMMON
        DS      ;Disk Select for bootstrap or microcode verification. Command prompts 
                ;to specify UNIT NUMBER (default 0), RHBASE (default 776700), 
                ;and UNIBUS ADAPTER (default 1) to load from when booting
        MS      ;Magtape Select for bootstrap or microcode verification. Command 
                ;prompts to specify UNIT NUMBER (default 0), RH BASE (default 772440),
                ;UNIBUS ADAPTER (default 3), SLAVE NUMBER (default 0), and 
                ;DENSITY (default 1600 BPI) of magtape to boot from
        MT      ;Magtape Boot system from selected magtape
        MT 1    ;BOOT diagnostic monitor SMMAG from magtape
        PW      ;clears KLINIK password, or sets it (6 char's max)
        KL x    ;KLINIK control:  0 = off, 1 = on for remote CTY access
        BC      ;BOOT Check. PROM code which tests the basic 2020 system
                ; load path from the UNIBUS adaptor into the CRAM via memory.

        ^U      ;rub out current line
        ^O      ;switch: first one stops CTY-output, second one resumes CTY-output
        ^S      ;stop TTY-output and hangs 8080 waiting for CONTROL-Q (see below)
        ^Q      ;resumes TTY-output
        ^C      ;stops whatever the 8080 is doing
        RUB-OUT ;rub out previous character typed

        NOTE:   Several commands may be put on a single line, separated by commas.

        NOTE:   Additional information on KS10 console commands can be found 
                 in the KS10 MAINTENANCE GUIDE
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 72

        *****   CRAM Bit Formats

                LF-Command CRAM Bits            RC-Command CRAM Data
                --------------------            ---------------------

                LF      CRAM bits               RC      Data
                --      ---------               --      ------------------------------

                0       00-11                   0       CRAM bits 00-11
                1       12-23                   1       Next CRAM address
                2       24-35                   2       CRAM subroutine return address
                3       36-47                   3       current CRAM address
                4       48-59                   4       CRAM bits 12-23
                5       60-71                   5       CRAM bits 24-35 (Copy A)
                6       72-83                   6       CRAM bits 24-35 (Copy B)
                7       84-95                   7       0s
                                                10      Parity bits A-F
                                                11      KS10 bus bits 24-35
                                                12      CRAM bits 36-47 (Copy A)
                                                13      CRAM bits 36-47 (Copy B)
                                                14      CRAM bits 48-59
                                                15      CRAM bits 60-71
                                                16      CRAM bits 72-83
                                                17      CRAM bits 84-95
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 73


        ?A/B            A and B copies of CRAM bits did not match
        ?BC             BOOT Check failed
        ?BFO            Input Buffer Overflow
        ?BN             Received Bad Number on input (character typed is not an 
                         octal number)
        ?BT             Device error or timeout during BOOT operation
        ?BUS            BUS polluted on power-up
        ?CHK            PROM checksum failed
        ?DNC            Did Not Complete HALT
        ?DNF            Did Not Finish instruction
        ?FRC            had a forced reload
        ?IA             Illegal Argument (address out of range, etc.)
        ?IL             ILLEGAL Instruction
        ?KA             KEEP ALIVE failed
        ?MRE            Memory Refresh Error (MEM BUSY stayed set too long,
                         because it didn't release data on a write to memory)
        ?NBR            Console was not granted BUS on a request
        ?NDA            Received No Data Acknowledge on memory request
        ?NR-SCE         Non-Reversible Soft CRAM error.
        ?NXM            Referenced NoneXistent Memory location
        ?PAR ERR        Report clock-freeze due to parity error,
                         and type out READ IO of 100,303,103
        ?PWL            Password Length error
        ?RA             Command Requires Argument
        ?RUNNING        CPU clock running (command typed requires clock to be stopped
                         and may fail)
        %SCE            Soft CRAM error
        ?UI             Unknown Interrupt


        BT SW                   message says BOOTING, using BOOT switch
        BUS 0-35                message header for EB command
        CYC                     cycle type for DB command
        C CYC                   typed on DB-command if COM/ADR cycle blew
        D CYC                     "             "      DATA    cycle blew
        HLTD                    message "HALTED/XXXXXX " where xxxxxx is data
        KS10>                   prompt message
        OFF                     message, says current state is off
        ON                      message, says current state is on
        RCVD                    data received on bus
        SENT                    data sent to bus
        >>UBA?                  query for UNIBUS adapter
        >>UNIT?                 query for unit to use
        >>RHBASE?               query for RH11 base register address to use
        >>DENS?                 query tape density
        >>SLV?                  query tape slave number
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 74


                On an error-condition, detected by the 8080, the
                Fault-light will go on and a message of the form

                        ?BT XXXYYY

                will be printed on the CTY.

        The following error-codes are only "rough" pointers, they can be
        caused by any of the following problems:

                Disk not a disk at all
                Wrong unit selected (see DS-command)
                Home blocks not readable or not there
                Home blocks not set by SMFILE for 8080
                8080 File-system garbage

        XXX=001 Disk error encountered while trying to read HOME-blocks
                Can mean incorrect RHBASE specified, wrong UBA selected,
                bad disk drive, neither  home block or alternate home
                block has home block ID ("HOM" in sixbit)

        XXX=002 Disk error encountered while trying to read the page of
                pointers, which make up the "8080-File-System"
                Can mean pack is not in format for 8080 loading, home blocks
                bombed, bad drive or pack

        XXX=003 Disk error encountered while trying to read a page of
                microcode - can mean pack is not in 8080 format, or bad drive or 

        XXX=004 Microcode did not successfully start running after a BT, MT,
                MB, or LB command.  This error will occur when an LB is done
                before the system microcode is loaded.
        XXX=010 Disk error encountered while trying to read PRE-BOOT

        YYY     are the lower 8 bits of the 8080 address of the failing
                "Channel Command List" operation. Normally it is here
                a good bet to do an "EI" to get the contents of the
                RH11 register that has the error-bits set !
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 75


        The following ERROR-messages can point to the following problem areas:

                Magtape is no magtape at all
                Wrong unit selected (see MS-command)
                Magtape is not bootable (no microcode, no PRE-BOOT)

        XXX=001 Error trying to read microcode first page
                Can mean wrong unit selected, wrong RHBASE address, wrong UBA
                selected, wrong slave number, wrong density, bad drive, bad
                controller, bad tape, tape in wrong format

        XXX=003 Error trying to read additional pages of microcode

        XXX=010 Error trying to read in PRE-BOOT program
                May occur while doing a skip over the microcode file, or
                while reading the PRE-BOOT itself

        YYY     see above (disk-section)


        PRE-BOOT is loaded from Disk or Magtape (see 8080 commands DS, MS,
                 BT, BT 1, MT, MT 1)

        PRE-BOOT is written onto the disk using "SMFILE.EXE", it also is written on
        "standard" Diagnostic-tapes  and onto the "MONITOR-INSTALLATION"-tapes.

        PRE-BOOT is loaded by the 8080 into MEMORY-locations 1000 and up, and starts
        at 1000.  The ERROR-halts are:

                1001    found "bad" core-transfer address
                         (page 1 is illegal - can't overload PRE-BOOT)
                1003    No RH11 Base Address
                1004    Magtape Skip failure
                1002    Disk Retry error or Magtape Read error

        At ERROR-halt time the following MEMORY-Locations contain the useful INFO :

                        Disk-Booting                    Magtape-Booting
                        ------------                    ---------------

                100     "8080" disk-address             Not used
                101     Memory transfer address         same
                102     T3, selection pickup pointer    same
                103     RPCS1-register                  MTCS1-register
                104     RPCS2-register                  MTCS2-register
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 76

                105     RPDS - register                 MTDS - register
                106     RPER1-register                  MTER1-register
                107     RPER2-register (RP06 only)      Not used
                110     RPER3-register                  Not used
                111     UBA Page RAM loc 0              same
                112     UBA-status register             same
                113     Version Nr. of PRE-BOOT         same

                Note: The Version Nr. of PRE-BOOT will be the same as the Version Nr.
                of SMFILE. The "8080" disk-address is in the form " CYL SEC SURF "

        TO DO AN :

                EM 77
                ...... AND TYPE SOMETHING AFTER ADDRESS 115
                ...... AND THEN TELL US WHAT HE SEES
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 77

        8080-Communication-Area (KS10 Memory)

        The 8080 maintains and services an in-core communication area.
        Currently used are words 31 to 40.  See PROKS/PROLOG for more info.

        Word  Bits      Meaning
        ----  ----      -------
          31            Keep Alive and Status word
                4           Reload Request
                5           Keep Alive active
                6           KLINIK active
                7           PARITY Error detect enabled
                8           CRAM Parity Error detect enabled
                9           DRAM Parity Error detect enabled
                10          CACHE enabled
                11          1 msec enabled
                12          TRAPS enabled
                20-27       Keep Alive counter field
                32          BOOT SWITCH BOOT
                33          POWER FAIL
                34          Forced RELOAD
                35          Keep Alive failed to change
          32            KS-10 CTY input word (from 8080)
                20-27      0 -- no action, 1 -- CTY character pending
                28-35      CTY-character
          33            KS-10 CTY output word (to 8080)
                20-27      0 -- no action, 1 -- CTY character pending
                28-35      CTY-Character
          34            KS-10 KLINIK user input word (from 8080)
                20-27      0 -- no action, 1 -- KLINIK character,
                           2 -- KLINIK active, 3 -- KLINIK carrier loss
                28-35      KLINIK-Character
          35            KS-10 KLINIK user output word (to 8080)
                20-27      0 -- no action, 1 -- KLINIK character, 2 -- Hangup request
                28-35      KLINIK-Character
          36            BOOT RH-11 Base Address
          37            BOOT Drive Number
          40            Magtape Boot Format and Slave Number

        OUTPUT process KS10 ==> 8080

         Load character and flag into  33,   set 8080-interrupt,   8080 examines
           33 and gets character, clears interrupt, sends character to hardware,
           clears 33 and sets KS-10 interrupt.

        INPUT process 8080 ==> KS10

         8080 gets interrupted "TTY-char available",   8080 gets character and
          delivers into input-word (31) with flag(s) and sets KS-10 interrupt.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 78

                           BOOT COMMAND STRING FUNCTIONALITY

        The BOOT program is usually invoked in one of two  ways:   invisibly  as
        part  of  a  system dump and reload on a crash condition, or by explicit
        invocation and response to the  BOOT>  prompt,  usually  with  a  simple
        carriage return.

        BOOT, however, possesses substantially more command line  functionality,
        at  least  some  of which can be useful to the Specialist in a debugging
        situation.  This document tries to explain some of that functionality in
        the context of the BOOT for Release 5 of TOPS-20.

        BOOT parses a command string of the form:


        with the restrictions that some switches have  precedence  and  so  some
        combinations  are  meaningless,  and  that  the  directory must NOT be a
        subdirectory, i.e.  <SYSTEM> is legal, but <SYSTEM.MONITORS> is not.

        The "default" command string (in response to a simple  carriage  return)
        is:   PS:<SYSTEM>MONITR.EXE/R  -  i.e.   load  and  start  the  resident

        The available switches are:

          /M    MERGE - merge specification with current memory.

          /L    LOAD  -  load  according  to  specification   (suppresses
                  default startup).

          /A    ALL - load all of specified  file,  useful  for  avoiding
                  bounds values;  loads up to page 377.

          /R    RUN - run specification - this is the default.  Load  and
                  start  at  the  EXE  file entry vector location.  If no
                  (firstpage,lastpage) specification is given, then  BOOT
                  will look in .JBSYM for the last location of the symbol
                  table and load up to that point (this assumes that this
                  is  the  last  location  in  the resident monitor).  If
                  .JBSYM is zero, or there is no page zero  in  the  .EXE
                  file  to  find it in, then the old assembled-in default
                  of (0,340) will be used.

          /D    DUMP - dump on given specification.  The default here  is
                  PS:<SYSTEM>DUMP.EXE.1 but other existing files could be
                  used, e.g.  if the normal dump file  kept  causing  the
                  dump to fail with ?IO Error because of bad pages or was
                  too small, etc.  one could do something like:
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 79


                  BOOT has  special  address  space  knowledge  which  it
                  applies  to  writing out the monitor dumps, such as not
                  writing out pages overwritten by BOOT, etc.

          /S    SAVE - similar to /D, but no special  knowledge  applied,
                  saves  according  to  specification.  Useful for things
                  like saving image of BOOT for debugging.

          /Gadr GO - transfer control to location adr;  for example /G141
                  to invoke EDDT.

          /E    EDDT - load and transfer to EDDT.  This  is  a  shorthand
                  method  for  the  old  two command sequence of /L, then
                  /G141;  e.g.


          /I    INFORMATION - displays the current version  of  BOOT  and
                  the  version  numbers  of  any DX20A or DX20B microcode
                  assembled into BOOT.

        The monitor uses the (lowerbound,upperbound) construct  in  loading  the
        swappable  monitor in multiple passes into the available physical memory
        by building the appropriate command string to  merge  the  next  set  of
        pages and invoking BOOT at the VBOOT entry point multiple times.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 80

                          TOPS-20 CRASH ANALYSIS FUNDAMENTALS

        TOPS-20 crash analysis is a complex subject which can never be be  fully
        taught since the areas of interest change constantly as new versions are
        released, with new problems;  and as problems are fixed,  leaving  areas
        of  code  "stable",  and  no longer hot analysis prospects.  Hence, like
        diagnostics, things like crash analyzers tend to  evolve  into  software
        that  can  produce enormous quantities of uninteresting data after a lot
        of computes, but come up with a bottom line  of  "I  don't  know  what's

        These articles attempt to present the fundamental tools and methods that
        usually  wind  up  getting used on most crashes, and explain some of the
        data structures that often need to be passed through on the way  to  the
        answer  to  the  problem.   Some  effort  is  made  to  give some of the
        traditional methods by which hiding bugs may be forced into the open.

        CRASH DUMPS:

        Each time there is a BUGHLT there is an automatic dumping of the  system
        core image into PS:<SYSTEM>DUMP.EXE.  If there is sufficient room on the
        disk the data that was  previously  in  DUMP.EXE  will  be  copied  into
        DUMP.CPY  by SETSPD after the system is reloaded.  DUMP.CPY does not get
        deleted and you may find several generations of DUMP.CPY.

        TOPS-20 will not create a dump of  the  Monitor  unless  the  system  is
        properly  prepared  to  do so.  This means that there must first exist a
        file called PS:<SYSTEM>DUMP.EXE that will  accomodate  the  dump.   This
        file  can  be  found  on the distribution tape for TOPS-20, or it can be
        created by using the MAKDMP program, which will accept the  memory  size
        from  the user, and create the proper sized file.  The file must contain
        a sufficient number of pages equal to  the  total  number  of  pages  of
        physical  memory  in  the DECSYSTEM-20 plus enough pages to hold the EXE
        file directory for the dump (generally one), minus the number  of  pages
        that  BOOT  overwrites,  and which will not be present in the dump.  For
        example, a system that has 1024K words of memory should have a  DUMP.EXE
        file  that  is  about 2048 pages long.  It is important to remember that
        the number of pages in the dump file must  be  twice  the  size  of  the
        machine's memory capacity in K words.

        It is possible to give a FILENAME/D to BOOT to specify where to dump the
        monitor,  so it is possible to put up another pack, or whatever to get a
        dump for those situations where there is no  existing  DUMP.EXE  on  the
        pack to dump into.  The filename given must exist however, and not be in
        a subdirectory, or too small, or not all the memory will be saved.   See
        the article on BOOT commands for more info.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 81


        Normally, when the system has  crashed  for  whatever  reason,  it  will
        reload  itself  using the BOOT program.  This Auto-reload feature can be
        suppressed, by giving the "SET NOT RELOAD" or "CLEAR RELOAD" command  to
        the  PARSER.   The  PARSER must first be set in PROGRAMMER mode, via the
        "SET CONSOLE PROGRAMMER command.  These commands do not apply to 2020's,
        of  course.  There is a location in the 8080 which, when it contains the
        right  number,  will  prevent  automatic  reloads  after  crashes.   The
        location  depends  on  the  revision level of the ROM, which is typed at
        system startup.  The following commands will turn off auto-reload:

                ROM level 0.1                   ROM level 4.2
                        KS10>LK 20255                   KS10>LK 20256
                        KS10>DK 303                     KS10>DK 303

        Also, patching the BUGHLT  code  where  the  reload  is  requested  will
        prevent an auto-reload.  Placing a JFCL in locations BUGH2+3 and BUGH2+4
        in the running  monitor  will  prevent  the  monitor  from  issuing  its

        BOOT has a limited file system capability  when  creating  the  file  to
        contain  the  dump,  and  in  this manner avoids complicating a possibly
        compromised file structure during the reload.  It  is  for  this  reason
        that  the DUMP.EXE file must already exist on the public structure;  for
        BOOT can find it there, but it can not create it if it does not  already
        exist.   Also,  because BOOT resides in main memory of the host (KL10 or
        KS10) processor, small portions of the Monitor will be overwritten  when
        BOOT  is  loaded into memory.  Currently, BOOT is written into that area
        of the resident Monitor that normally contains pure code, and as such is
        not  usually  of  much  consequence.   When  one  needs to refer to this
        portion of the code, either the listings or fiche should be used.

        If for some reason the system fails to auto-reload,  then  it  is  still
        possible  to  obtain a copy of the dump.  To do this, the front end must
        have at least loaded the BOOT program, and the console will display  the
        BOOT prompt:


        BOOT has a number of  commands  that  may  be  used  to  manipulate  the
        contents of the processor memory;  in this case, the command we will use
        will cause BOOT to copy the contents of memory into PS:<SYSTEM>DUMP.EXE:

                                BOOT>/D    or   BOOT>filename/D
                                BOOT>           BOOT>
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 82

        At this point the system may be brought up normally, and the analysis of
        the dump may begin.

        Similarly, a KL-10  system  may  be  set  to  suppress  the  auto-reload
        facility,  and  the CTY will prompt with the KLI> prompt.  Simply typing
        the word "BOOT" will load the BOOT program into memory.  There are cases
        where  the  system may be completely hung, and it is unclear how to best
        initiate an orderly shutdown.  Obviously, it is always possible to  type
        the  control-backslash  (^\)  character  at  the  CTY  to  get  into the
        front-end parser, but then what  can  be  done?   The  front-end  parser
        allows  the  operator  to  force  the  processor  to jump to a specified
        location, and in the case described above, this feature may be  used  to
        force  a  BUGHLT.   This can be done after typing ^\, with the following

                        PAR>SET CONSOLE PROGRAMMER
                         CONSOLE MODE: PROGRAMMER
                        PAR>JUMP 71

        causing the console to return to USER  mode,  connected  to  the  KL-10.
        This  will be followed immediately by a KPALVH BUGHLT (Keep Alive Halt),
        and the system will perform the  usual  BUGHLT  procedures.   The  above
        command  forces the processor to jump to location 71, which in turn will
        cause the BUGHLT, sweeping the cache to ensure all  of  the  dump  taken
        will contain valid data.  Simply forcing the processor to halt, and then
        reBOOTing and getting a dump will cause the cache to be invalidated, and
        random locations in the dump will not contain valid data.

        On the 2020 the equivalent command is "KS10>ST 71".


        The  front-end  will  generally  create  a  crash   dump   file   called
        PS:<SYSTEM>0DUMP11.BIN, containing the core image of the PDP-11.  If the
        front-end is hung, and none of the terminals are  usable,  it  is  still
        possible to obtain a dump of the -11.  By setting the HALT/ENABLE switch
        of the -11 to the HALT position, and then back to the  ENABLE  position,
        the KL-10 will force the -11 to reload.  In the process of reloading the
        -11, the KL will indicate to the -11 that it has reloaded, and send  the
        necessary  information  to set up the terminals, and unit record devices
        connected to the -11.  The -11 will, in the process of  reloading,  dump
        the  old core image into the 0DUMP11.BIN file mentioned earlier.  In the
        event that the problem will be the subject  of  an  SPR,  the  front-end
        crash dump should also be included on the DUMPER tape with the SPR.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 83


        First when analyzing software or software/hardware problems be sure  you
        have the proper tools:

             1.  A SWSKIT on magtape to provide further tools and documentation.

             2.  A full copy of the current release microfiche MONITOR and  EXEC
                 or equivalent listings online or on paper.

             3.  A copy  of  the  Monitor  Tables  document  from  the  Software
                 Notebooks or the SWSKIT tape.

             4.  A MONITOR CALLS Reference Manual.

             5.  A SPEAR (formerly SYSERR) manual.

             6.  A listing of the SPEAR/SYSERR log, especially  if  hardware  is

             7.  The  CTY  log  for  BUGHLTs  and  BUGINFs  or   other   problem
                 indications, or an accurate reproduction of this information.

             8.  Any other manuals you may need for reference such as the proper
                 version  Installation  Guide,  Operators Guide, System Managers
                 Guide, etc.

             9.  The BUGS.MAC file for releases 4.1, 5.0, 5.1.

            10.  The TOPS-20.BWR file for documentation of known  exceptions  to
                 the normal documents.

            11.  The current FILDDT.EXE to examine the dump.

            12.  The MONITR.EXE responsible for the crash to load symbols  from,
                 and, of course

            13.  The DUMP.EXE or DUMP.CPY resulting from the crash.

        You will need the SWSKIT and perhaps listings of the latest versions  of
        monitor modules in case the microfiche are not up to date.  FILDDT is on
        the customers distribution tape.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 84

        Be sure you have analysed the SPEAR log.  This is  the  easiest  way  to
        determine  the  hardware  state of the machine at the time of the crash.
        Be sure, also, that you have looked up  the  BUGHLT  and/or  BUGCHKs  in
        question  in  the  listings  (microfiche)  and  have  at  least read the
        comments around them.  Probably tracing down how it got called is a good
        idea.   If  you happen to be without a GLOB (provided on microfiche) you
        can find the BUGHLT tag of interest in the monitor as follows:

                $GET <SYSTEM>MONITR.EXE
                $ST 140
                ILMNRF?                 ; BUGHLT of interest followed by "?"
                PAGEM G                 ; it is defined in PAGEM and is global

        Some other useful bits of information:  There is a GLOB listing provided
        in the microfiche which contains a list of all the global symbols in the
        monitor.  Most of the datasymbols are defined in the module STG.MAC.  If
        you don't know a tag name but want to look at the storage for DTEs, say,
        look through STG.  STG also contains some small portion of  code  mostly
        to do with restart, start, auto reload, dispatches for PI channels and a
        few scheduler tests.  STG stands for storage.  Note that some stuff  may
        be  defined in PROLOG, and of course lots of stuff is defined throughout
        the monitor.  You may also want to get a listing of MACSYM to be able to
        understand  the  macros  you  see  while  reading  the monitor listings;
        MONSYM is also useful at times.  Be sure you know how  PARAMS  has  been
        changed in case it has.  See BUILD.MEM on the distribution tapes for the
        currently distributed information on what to do to change various system
        parameters  in  PARAM0.MAC.   Be  sure that you know about any variables
        that the site may have changed in STG as well.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 85


        Debugging a complex, multi-process software system is largely  a  matter
        of  absorbing  sufficient  knowledge,  experience and folklore about the
        particular system with a considerable element of personal preference, or
        'taste'  also  involved.   This  document  is  a  cursory description of
        features built into the system to aid debugging, and  such  folklore  as
        can be described in written English.

        There are four different versions of DDT that may be used to examine the
        monitor.   Each  is  used  for  a  different  purpose  and  has  special
        capabilites.  The versions of DDT are:

             1.  UDDT (user DDT) used to examine or modify the MONITR.EXE file.

             2.  MDDT (monitor DDT)  used  to  examine  or  modify  the  running
                 monitor under timesharing.

             3.  EDDT (exec DDT) used to examine or modify the  running  monitor
                 from the CTY in a stand-alone mode.

             4.  FILDDT used to examine dumps.

        All the DDT's are versions of TOPS-20 DDT documented in the TOPS-20  DDT
        manual,  and have all of the features described in the manual.  See also
        the document DDT41.MEM.

        The use of all four versions of the  DDT's  is  the  same  and  will  be
        described later, however, each version is started differently.


        To use UDDT to modify your MONITR.EXE file on system, you must give  the
        following EXEC commands:

                @GET <SYSTEM>MONITR.EXE
                @START 140      (on systems after Release 4, @DDT works too)

        This causes EDDT to start in user mode.  This is the same  DDT  that  is
        used when examining any program.  You may now look at or change any part
        of the monitor.  If you make changes to the monitor and want to save it,
        you  should  get  back  to the EXEC by typing ^Z.  Then you may save the
        monitor.  You will probably have to be enabled  in  order  to  save  the
        monitor   back  in  <SYSTEM>.   This  is  the  safest,  best,  and  only
        recommended method of putting patches into the monitor.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 86


        A version of DDT which runs in  monitor  space  is  available.   It  can
        examine  and change the running monitor, and can breakpoint code running
        as a process but not  at  PI  or  scheduler  level.   When  patching  or
        breakpointing the monitor, the normal write protection must be defeated,
        either by setting DBUGSW to 2 on startup, or  calling  SWPMWE.   If  you
        insert  breakpoints  with  MDDT,  remember monitor code is reentrant and
        shared so that the breakpoint could be hit by any other process  in  the
        system.   In  this  event,  the other process will most likely crash the
        system since it will be executing a JSR to a page full of zeros.

        To use MDDT you must have WHEEL or  OPERATOR  capabilities.   You  first
        issue the EXEC command:


                        ; You are now in the mini-exec and receive a  prompt
                        ; of MX>.  Now you give the "/" command:
                        ; You are now put into MDDT.  To return to the  EXEC
                        ; you can  issue  a ^Z  or  a ^C  which  produces  a
                        ; message like "INTERRUPT AT 17372" and returns  you
                        ; to the mini-exec.  If  you type a  ^P in MDDT  you
                        ; will get a  message, "ABORT", and  be returned  to
                        ; the mini-exec.  If you once go into the  mini-exec
                        ; the CONTROL-P interrupt is enabled and typing this
                        ; character will return you to the mini-exec.   This
                        ; is a  good thing  to use  when debugging  programs
                        ; that do  CONTROL-C trapping.   From the  mini-exec
                        ; you may give either:
                        ; or
                        ; The S is filled  out as START and  the E as  EXEC.
                        ; Both of  these commands  will  return you  to  the
                        ; EXEC. See the section EXEC-DEBUGGING for more info
                        ; about ^P and getting  out of the  EXEC to MX>  and
                        ; returning from MX> to either your copy of the EXEC
                        ; or the system EXEC.

                        ; You may also give the command:

                        ; From MDDT to return  directly to the EXEC.   While
                        ; in MDDT you may examine  any core location in  the
                        ; running monitor.  If you wish to change any of the
                        ; locations in the protected  monitor you must  give
                        ; the command:
                CALL SWPMWE$X   or      $W
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 87

                        ; To write enable the monitor.  After you have  made
                        ; your changes you must give the command:

                CALL SWPMWP$X

                        ; to write protect the monitor again.

        MDDT may also be entered from process level via JSYS:

                JSYS 777$X
                MDDT%$X ; will enter MDDT from the context of the current process

        To return to user context:


        To use SETMPG to map pages to this context:

                Page 677 has been traditionally used for this; but any unused
                page may be used.  To make sure that the page is currently
                unused type:

                ADDRESS/   ?    ; the question mark from DDT indicates that the
                                ; page is nonexistent.

                when the destination page has been found, set up AC2 as:

                AC2/ ACCESS,,677000

                If the page has its own SPT slot:

                AC1/SPT INDEX

        If the source page does not have its own SPT slot,  it  will  belong  to
        either a file or process page table.  It will be represented as an index
        into this page table:


                Access = read or/and or write access
                Read/Write access = 140000 in LH

        Therefore, to map a page, call with either:

                AC1/SPT INDEX OF PAGE


        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 88

                AND SAY:

                CALL SETMPG$X

        The page will then be  mapped  to  page  677.   In  examining  locations
        677000-677777, you will be looking at the contents of the page.

        If you desire to map another page into this  slot,  merely  call  SETMPG
        again  with  arguments  for the new page.  You need not first un-map the
        old page.  However, when you are finished, page 677 should be  un-mapped
        in the following manner:

                CALL SETMPG$X


        Calling SETMPG incorrectly can crash the system.  Be  CAREFUL!   Do  not
        use SETMPG on a time sharing system if a crash will cause bad feelings.

        NOTE:  if you have the Release 5 version of MDDT/EDDT  that  has  sticky
        current  address  section (see DDTxx.MEM) then be careful about doing an
        MRETN$G  after  examining  section  2,  as  a  crash  will  result  from
        transferring to MRETN in section 2.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 89



                       Not to be confused with ^EEDDT command  to
                       get   into  UDDT  used  with  the  command
                       processor.  See separate document on  EXEC
                       DEBUGGING for that.

        To get into EDDT you must bring the system up using the switch-register.
        See  the  DECSYSTEM-20 Operators Guide for a discussion of switches.  Go
        through the KLINIT dialog and when you get  the  prompt  BOOT>,  respond

                BOOT>/L         or      BOOT>/E   (in version 5 or later)

        The "/L" command causes the monitor to be loaded, but not started.   The
        "/G141"  starts  the  monitor  at location 141, which is a jump to EDDT.
        You can use EDDT like UDDT under timesharing on the MONITR.EXE  file  by
        giving the following commands:

                $GET <SYSTEM>MONITR.EXE
                $START 140

        EDDT is linked into the monitor and is always there.  You may  also  get
        to  EDDT  from  MDDT (providing EDDT is locked down, see EDDTF below) by
        issuing the following:


        from MDDT.  This stops timesharing.  To resume  timesharing  and/or  get
        back to MDDT give the command:

                MDDT$G                  ; back to MDDT
                MRETN$G                 ; back to normal timesharing

        Breakpoints may be inserted in the resident monitor with EDDT,  but  not
        in  the  swappable  monitor in general, because its pages may be swapped
        out and be unavailable to EDDT.  You can bring them in by typing:

                SKIP LOC$X              ; where LOC is some address not in core

        and  then  set  the  breakpoint.   The   swappable   monitor   must   be
        write-enabled to set breakpoints.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 90

        There are some locations in the monitor that are very useful when  using
        EDDT  for  debugging.   They  must  be  set before going on to start the

        They are:

                EDDTF   1        keep EDDT in core when system comes up
                        0        delete DDT when system comes up (default)

                DBUGSW  0        do not stop on BUGHLTs, crash and reload
                        1        stop on BUGHLTs (hit EDDT breakpoint)
                        2        write enable the monitor,
                                 do not start up SYSJOB, and stop on
                                 BUGHLTs.  Also it dosn't run CHECKD
                                 automatically on startup.

                DCHKSW  0        do not stop on BUGCHKs (default)
                        1        stop on BUGCHKs (hit EDDT breakpoint)

                DINFSW  0        do not stop on BUGINFs (default)
                        1        stop on BUGINFs (hit EDDT breakpoint)

        In addition the symbol  GOTSWM  appears  in  the  code  just  after  the
        swappable  monitor  is  loaded.   So, if you want to debug the swappable
        part of the monitor  you  must  put  a  breakpoint  at  GOTSWM  (to  get
        swappable part in core) by,


        Then start the MONITOR by,


                CALL SWPMLK$X

        CALL SWPMLK is used to lock swappable monitor  in  core  for  debugging.
        You  must have sufficient physical memory to give this command since the
        resident plus swappable monitor  is  rather  large.   To  start  up  the
        monitor  after  you  have  gone  into  EDDT  and set up your breakpoints
        (remember the last two are used for BUGHLT and BUGCHK) give the command:


        If you are in EDDT and DBUGSW is not 2, that is, the  monitor  is  write
        protected,  you  can  use the routines SWPMWE and SWPMWP to write enable
        and write protect the monitor, i.e.  CALL SWPMWE$X in DDT.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 91


        FILDDT is distributed on the customer software tape.

        The following is a chewed-up FILDDT.HLP file.


        Reads specified file and builds internal symbol table.  This must be the
        first  command  to FILDDT before "GET" when looking at a dump.  You will
        most probably use <SYSTEM>MONITR.EXE which would have been  the  monitor
        running at the time of the dump.


        Loads a file for DDT to examine.  If you are looking at a  monitor  dump
        you  must  load  DUMP.CPY  explicitly.   FILDDT looks for MUMBLE.EXE not
        MUMBLE.CPY.  That is, DUMP<ESC> will tell you that there is no such file
        or  will load DUMP.EXE.  When looking at a dump and you wish to load the
        symbols you must first issue  the  load  command  followed  by  the  get
        command.   Be  sure  that the file from which you get the symbols is the
        same version as the dump.  Be sure,  also  that  the  monitor  that  was
        dumped  is  the  same  monitor  you use for symbols.  That is, don't get
        MONMED symbols to use with MONBCH etc.


        Returns to command level.  You then may type a save command  if  a  load
        command  was  just  done  to preload symbols.  You will get a version of
        FILDDT that has the symbols you just loaded in it so you no longer  need
        to  "LOAD"  symbols.   You now have a monitor specific FILDDT, which was
        common practice for TOPS-10, but is not generally done for TOPS-20.


        Types something like this text.


        Allows writing on an existing file specified by a GET.


        Assumes file is raw binary (i.e.  no ACs, and not an EXE file).
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 92


                EP$U    Sets monitor context for FILDDT mapping.  EP is a symbol
                        which is equal to the page number of the EPT.  (Rel 4)

           <CTRL/E>     Returns to FILDDT command level.


        The resident monitor may be looked at without any difficulties, but  the
        swappable  monitor  may  not be in core at the time of the dump.  If the
        value of the symbol is in the swappable monitor you  must  sometimes  go
        through  the  monitor  map  to  find  where the location really is.  The
        location MONCOR contains the number of pages of resident monitor and the
        location  SWPCP0  contains the first page of real core for swapping.  So
        if the value of the symbol is greater than contents of MONCOR times 1000
        then  it  is  in  swappable  monitor.   This also applies to non-monitor
        pages:  mapped file pages, and pages from other processes and their JSBs
        and PSBs.

        If the page of the swappable monitor you want to look at is in  core  it
        will  probably not be in core in the location that it's address refer to
        since the dump is of core and relocation of pages does not  happen.   To
        find  where  a  symbol  really  is  in  the  dump, first type the symbol
        followed by an "=".  DDT will respond with the  value  of  this  symbol.
        The  value  of  the  symbol  can be divided into two, three octal digit,
        fields.  The high order three digits are the page  number  and  the  low
        order three digits are the offset into the page.

        If the value of the symbol is 324621 the high order three  digits,  324,
        are  the page number and the low order three digits, 621, are the offset
        into the page.  To find the location of the page in question in the dump
        you  must  look  at  the  monitor  map  indexed by the page number.  For


        would give you the monitor map word for page 324.   This  word  contains
        some  protection  bits for the page and the address of the page when the
        dump was taken.

        The page may have been in core, on the swapping area or on the  disk  at
        the time of the dump.

                If bits 14-17 in the monitor map word are non-zero the page  was
                on the swapping area or disk and is no longer available.

                If bits 14-17 are zero then the page was in core, and the  right
                half  of  the  word  contains the page number in the dump of the
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 93

                page you are looking for (the dump program overwrites some pages
                of memory, therefore it does not contain these pages.)

        If the page was in core the new address of the symbol  you  are  looking
        for  can be found by using the page number from the monitor map word and
        appending the offset into the page  to  it.   For  example  if  MMAP+324
        contains  104000,,256;   then  the  new  address  of our symbol would be

        All addresses in the swappable monitor must be resolved in this  manner.
        In  addition  the pages of the JSB and PSB must sometimes be resolved in
        this manner.  There are some locations and tables in  the  monitor  that
        make this easy:

                NAME    INDEX   DESCRIPTION

                FORKX   none    Number of the fork that was running at the time of
                                the dump, -1 if in the scheduler.
                JOBNO   In PSB  Job number to which current fork belongs.
                FKJOB   Fork #  Job number,,SPT index of JSB
                JOBDIR  Job #   logged in directory number
                JOBPT   Job #   controlling TTY number,,top fork number
                FKSTAT  Fork #  test data,,address of fork wait routine
                FKPGS   Fork #  SPT index of page table,,SPT index of PSB

        SPT indexes are indexes into a share pointer table starting at SPT.   To
        find  the  PSB of fork 20, you first look at FKPGS+20.  If this location
        contains 425,,426, the word at SPT+426 is the pointer to the PSB.   This
        pointer  can  point  to disk, swap area, or a page in the dump.  If bits
        14-17 are zero it is a pointer to a page in the dump and the right  half
        of the SPT word is the page number of the PSB in the dump.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 94

        ------  ------  ------

        The monitor contains a considerable number of internal redundancy checks
        which  generally  serve  to  prevent  unexpected  hardware  or  software
        failures from cascading into severely destructive reactions.   Also,  by
        detecting  failures  early,  they  tend  to  expedite  the correction of

        There are three failure routines,  BUGINF  and  BUGCHK  and  BUGHLT  for
        lesser  and  greater  severity  of failures.  Calls to them with JSR (or
        PUSHJ P, for Release 5 or later BUGCHKs and  BUGINFs)  are  included  in
        code  by  use  of  a macro which records the locations and a text string
        describing the failure.  The general form is:

            for 4.1:    BUG (NAME,<DATA>)


        Where TYPE is HLT or CHK or INF, MODULE is  the  source  file,  DATA  is
        addtitional  data,  HARD  is  the  hardware/software flag, STRING is the
        short text and EXPLANATION the long text explanation of the cause.   The
        strings  are constructed during loading and are dumped into a file.  The
        BUGSTRINGS.TXT file will produce an ordered listing of the bug  messages
        for operator or programmer use.

        BUGCHK (or BUGINF) is used where the inconsistency detected is  probably
        not  fatal  to the system or to the job being run, or which can probably
        be corrected automatically.

        BUGHLT is used where the failure detected is likely to preclude  further
        proper  operation  of the system or file storage might be jeopardized by
        attempted further operation.


                       The exact form  the  BUGHLT/CHK/INF  macro
                       takes  is  different  for releases [3A and
                       before], [4.0, 4.1, 5.0,  5.1],  and  [6.0
                       and   after],   and  different  files  and
                       assembly forms are used, though the action
                       of the code remains essentially unchanged.
                       See the separate  article  on  the  BUGxxx
                       macro for details.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 95


        A monitor cell, DBUGSW, controls the behavior of BUGHLT and BUGCHK  when
        they  are  called.   DBUGSW  is  set  according to whether the system is
        attended by system programmers.

        If C(DBUGSW)=0, the system is not attended by system programmers, so all
        automatic crash handling is invoked.  BUGCHK will return +1 immediately,
        appearing effectively as NOP.  BUGHLT will, if called from the scheduler
        or at PI level, invoke a total reload from the disk and a restart of the
        system.  The BUGCHK/INF output will appear on the CTY and in the  SYSERR
        log when JOB ZERO gets around to them.

        If the system continues to run or is restarted properly, the location of
        the  bug  (saved  over a reload) and its message will be reported on the

        If C(DBUGSW).NEQ.0,  the  system  is  attended,  and  one  of  the  EDDT
        breakpoints will be hit.  This allows the programmer to look for the bug
        and/or possibly correct the  difficulty  and  proceed.   There  are  two
        defined  non-zero  settings of DBUGSW, 1 and 2, which have the following

                C(DBUGSW) = 1 

                        Operation is the same as with 0 except for breakpoint
                        action.  In particular, the monitor is write protected
                        and SYSJOB is started at startup as described.
                C(DBUGSW) = 2

                        Is used for actual system debugging. The monitor is
                        not write protected so that it may conveniently
                        be patched or breakpointed, and the SYSJOB operation
                        is not started to save time.

                        BUGCHK and BUGHLT procedures are the same as for 1.

        The following is a summary of DBUGSW settings:

        SETTING                 0               1               2
        MEANING                 Unattended      Attended        Debugging

        BUGCHK action           NOP             Hit Breakpoint  Hit Breakpoint
        BUGHLT action           Crash System    Hit Breakpoint  Hit Breakpoint
        Monitor write protect?  Yes             Yes             No
        CHECKD on startup?      Yes             Yes             No
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 96

        Other console functions:

        In addition to EDDT, several other entry points are defined as  absolute
        addresses.  The machine may be started at these as appropriate.

        EVDDT   140     JRST EDDT               ; go to EDDT
                141     JRST SYSDDT             ; reset and go to EDDT
        EVDDT2  142     JRST EDDT               ; copy of EDDT address
        EVSLOD  143     JRST SYSLOD             ; initialize file system
        EVVSM   144     JRST SYSVSM             ; verify swap mon on startup
        EVRST   145     JRST SYSRST             ; restart
        EVLDGO  146     JRST SYSGOX             ; reload and start
        EVGO    147     JRST SYSGO1             ; start

        The soft restart (address 145, EVRST)  restarts  all  I/O  devices,  but
        leaves  the system tables intact.  If it is successful, all jobs and all
        (or all but 1) process will continue in  their  previous  state  without
        interruption.   This  may be used if an I/O device has malfunctioned and
        not recovered properly.  The total restart  initializes  core,  swapping
        storage and all monitor tables.

        A very limited set of control functions for debugging purposes has  been
        built  into the scheduler.  To invoke a function, the appropriate bit or
        bits are set into location 20 via MDDT.  The word is scanned  from  left
        to right (JFFO).  The first 1 bit found will select the function.  Refer
        to routine SWTST in SCHED for the current details.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 97

        DDT TRICKS:

        Here are a few useful tidbits to use in DDT when tracking down problems:

             1.  Enter MDDT from a program

                                JSYS 777$X

             2.  Return to program from MDDT

                                MRETN$G                 ! Return from MDDT

             3.  Set a breakpoint in the swappable monitor in EDDT at FOO:

                                BOOT>/L                 ! Load the resident monitor
                                BOOT>/G141              ! Start EDDT
                                EDDTF[   0   1          ! Set debugging flags
                                DBUGSW[   0   2
                                GOTSWM$B   147$G        ! Breakpoint after SWPMON in
                                $1B>>GOTSWM/MOVEI T1,FKPTRS
                                SKIP FOO$X              ! Make sure FOO is in core
                                FOO$B                   ! Set breakpoint

             4.  Find all forks of job J in MDDT or EDDT or FILDDT

                                -1,,0$M                 ! Set compare flag

             5.  Map a directory in MDDT or EDDT

                                1!   directory number   ! Put DIR in AC1
                                2!   structure number   ! and STR in AC2
                                CALL MAPDIR$X           ! Call routine to map it

             6.  Write-enable monitor in MDDT or EDDT

                                CALL SWPMWE$X

             7.  Write-protect monitor in MDDT or EDDT

                                CALL SWPMWP$X
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 98

             8.  Lock swappable monitor in memory in MDDT or EDDT

                                CALL SWPMLK$X

             9.  Set monitor context for mapping in FILDDT

                                EP$U                    ! Have FILDDT do address mapping

            10.  Select unmapped physical addressing in FILDDT

                                $U                      ! Clear address mapping

            11.  Select user virtual address space mapping in FILDDT

                                FKPGS+forknumber/   x,,y
                                SPT+x/   n              ! If LH(n) .NE. 0 => swapped out
                                n$1U                    ! n is address of page table

            12.  Enter EDDT from MDDT


            13.  Return to MDDT from EDDT


            14.  See if mapped job has been in MDDT from FILDDT

                 Releases 4.1, 5.0, 5.1:
                                DDTPGA/   ?             ! Page non-existence ==> NO

                 Release 6:
                                DDTPXA/   ?             ! Page non-existence ==> NO

            15.  Find what module defines a symbol from any DDT


            16.  Reference CSTn table entries

                 Releases 4.1, 5.0, 5.1:
                                CST0/   n               ! Reference directly by symbol

                 Release 6:
                                CST0X[   x,,y   $Q<CST0: ! Define symbol from CSTnX table
                                CST0/   n               ! contents, then use that symbol
                                                        ! Note: CST5 access same as 5.1
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 99
        TOPS-20 Crash Dump Analysis

                            MORE TOPS-20 CRASH DUMP ANALYSIS

             1.0  INTRODUCTION

             The purpose of this article is to provide some basic guidelines for
             those   who   have  never  analyzed  a  TOPS-20  crash  dump.   The
             information contained in this article refers to versions 4.1,  5.0,
             5.1,  and 6.0 of the TOPS-20 Monitor, although the basic principles
             will also apply to earlier and later versions of the Monitor.  None
             of  the  concepts included in this article can be considered highly
             advanced;  indeed it is doubtful that there  exists  an  "advanced"
             methodology in crash dump analysis.  Such techniques are the result
             of nothing more than the continual exercise of  the  basic  skills.
             In  all  cases,  the  person who is to perform the analysis must be
             familiar with the internal structures of the  Monitor.   Obviously,
             one  must  know where to look for a potential problem before hoping
             to solve it.  For this reason, this article assumes that the reader
             has  an  in-depth  knowledge of the basic structures of the TOPS-20

             2.0  GENERAL INFORMATION

             It would not be practical to define a method  of  approaching  each
             BUGHLT  in  the  system, but the state of the system at the time of
             the crash may be defined in terms of the data  structures  that  it
             accesses.   By  looking  at  the Monitor's stack, the status of the
             current job, and process, and the condition of the Monitor's tables
             that were in use by the code that BUGHLTed, we can define a limited
             number of "types" of crashes, e.g.,  a  scheduler  crash,  a  pager
             crash,  an  APR  or  device interrupt crash.  Each crash will occur
             while the Monitor is using a specific subset of the  internal  data
             structures  of  the system.  We will attempt to limit the number of
             "types" of crashes based upon the function being performed  by  the
             Monitor  at  the time of the crash.  In the sections following this
             general information, we will suggest some of  the  areas  to  check
             when  looking  at  each  type  of  crash.   This information is not
             complete, but  contains  some  of  the  information  that  is  more
             significant in each particular context.

             When you look at a dump, you should first try to find why the  dump
             occured  by looking at the location BUGHLT.  If BUGHLT is zero then
             you should check the CTY log to find out why the dump was taken and
             for  information like the PC at the time of the dump and the status
             of the PI system.  If BUGHLT is non-zero it is the address of where
             the   BUGHLT  was  issued.   You  should  look  up  the  BUGHLT  in
             BUGSTRINGS.TXT or BUGS.MAC or the source code  to  find  additional
             information about the BUGHLT.  If at this point you are not sure as
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 100

             to why the BUGHLT occured, you will have to look  at  the  listings
             for more information.  A copy of BUGSTRINGS.TXT is in Appendix A of
             the Operators manual.  You can find the location of the call to the
             BUGHLT by typing the BUGHLT tag to DDT followed by a "?".  DDT will
             tell which monitor module the BUGHLT is in and you can go  to  your
             microfiche  and  read  all  about  the conditions precipitating the

             Next if necessary look at FORKX.  If it contains a -1 the scheduler
             was  running;   otherwise  it  is  the  number of the fork that was
             running when the crash occurred.  The registers are saved at BUGACS
             on  a  BUGHLT,  but if BUGACS+17 contains something,,BUGPDL+n, then
             the registers are invalid and you must go to the SYSERR  buffer  to
             get  the  good registers.  This is done by adding to the right half
             of the SYSERR buffer pointer, SEBQOU, the offset  into  the  buffer
             for  the  heading  and  ACs, SEBDAT+BG%ACS.  This value points to a
             block of 16 words containing the users ACs.  You may have to  chain
             down  more  than  one  queued-up  SYSERR entry to get to the BUGHLT

             Some other locations of interest in the initial stages are:

                LOCATION        DESCRIPTION

                SVN             Monitor version number string
                BUTCMD          BOOT filespec for loading the monitor
                LSTERR          Code of the last error encounterd by process
                USRNAM          User name string
                P               Current stack pointer
                JOBNO           Job number of currently running process
                JOBPNM+(JOBNO)  SIXBIT program name of running program
                UAC             User's ACs when he did his last JSYS
                PAC             Monitor's ACs
                PPC             Process' PC
                UPDL            User's pushdown stack while in a JSYS
                NSKED            0 => ok to run scheduler
                                >0 => cannot run scheduler
                INTDF           -1 => ok to receive software interrupts
                                >=0 => cannot receive software interrupts

             It may be useful to know the status of a fork when it  is  hung  or
             you are unsure of its status.  This can be determined by looking at
             FKSTAT indexed by the fork number.  The right half of this location
             is  the  address  of a test routine and the left half is data to be
             tested.  For example if FKSTAT+12 contains 23,,FKWAT, then fork  12
             is  waiting for fork 23 to complete.  FKWAT is a routine that waits
             for another fork to complete and its data (the  left  half  of  the
             word)  is the number of the fork it is waiting for.  There are many
             different wait routines and you will have to look at  the  code  to
             see  what  individual ones are waiting for, or refer to the section
             on scheduler tests elsewhere in this manual.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 101

             You can easily determine all of the forks associated with a job  by
             giving the commands:


             Where N is the job you are  looking  for.   A  fork  structure  can
             usually  be  determined  by  looking at the FKSTAT of the forks and
             seeing which forks are waiting on which forks.  A FKSTAT  of  FKSKP
             indicates a fork is inactive.

             You should refer to STG.MAC for other fork and job tables and other
             locations  in  the  PSB  and  JSB  of  interest.   All of the above
             locations can be examined with MDDT or EDDT while  the  monitor  is
             running.   Of  course  at these times you do not have to go through
             MMAP and the PSB and JSB that are in core are your own.

             There are two separate patch areas in the monitor (FFF  and  SWPF).
             FFF is the resident patch area and SWPF is the swapable patch area.
             These two symbols should be updated  to  point  to  the  next  free
             location   in  the  patch  area  when  a  patch  is  inserted.   By
             convention, all distributed  patches  are  applied  at  FFF.   This
             serves the purposes of reducing confusion, always working until the
             patch area is exhausted, and leaving patches always  present  in  a
             dump for the cases where that is important.

             2.1  Identifying The Type Of Crash

             The Monitor performs several basic operations, each  of  which  has
             its  own  set  of  tables  and  data  structures.   Some  of  these
             operations can be defined as:

             1.  BUGHLT processing

             2.  JSYS processing

             3.  Page faults

             4.  PSI Service

             5.  Scheduling

             6.  DTE interrupt Service

             7.  Initiating I/O transfers (queueing)

             8.  Device interrupt Service

             9.  APR interrupt Service
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 102

             2.2  The BUGHLT Itself

             There are specific areas in any crash dump that can be examined  to
             determine  the  status and context of the system at the time of the
             crash.  The most obvious of these is the  location  called  BUGHLT,
             which  will  contain the address whence the BUGHLT code was called.
             It is good practice to remember when looking at this  address  that
             there are portions of the monitor that were overwritten by the BOOT
             program, when the dump was taken, and therefore,  the  contents  of
             the  address  that  called  the  BUGHLT code, that is, the location
             whose address is contained in location "BUGHLT", may not  point  to
             the  same  code  that  the  fiche or the listings indicate.  A good
             example of such a BUGHLT is a PTNIC1, one that is  a  part  of  the
             APRSRV code, which is overwritten by BOOT.

             See the separate discussion of the BUGxxx macro in its  many  forms
             for more information on this useful source of problem explanation.

             The BUGHLT's are performed  by  using  the  XCT  instruction  of  a
             location  that contains a JSR BUGHLT instruction.  In the locations
             following the JSR BUGHLT, is the list of additional data addresses,
             and  then  the  name  of  the  BUGHLT,  in  SIXBIT  format, such as
             "PTNIC1".  Finally in the event of multiple BUGCHK's,  BUGINF's  or
             even  nested BUGHLT's, the location "BUGNUM" contains the number of
             BUGHLT's, BUGCHK's, and BUGINF's since the  last  system  start-up.
             This  location  is  most helpful in obtaining a clearer view of the
             circumstances of the crash.  The case of  the  BUGHLT  code  itself
             causing  a  BUGHLT  is  extremely  unusual, but in certain cases of
             extreme degradation of the  system's  data  bases  or  "pure"  code
             pages, this is a possibility.

             2.3  Summary Of PC Storage

             The storage of  previous  state  PC  is  often  context  dependent,
             however, some of the standard cases are listed below:

             1.  Crash PC - stored in location BUGHLT.

             2.  PC of JSYS - two copies are stored on the UPDL stack.

             3.  PFL/PPC - contain the current flags and PC of  the  process  at
                 the last context switch.  This might be a user or EXEC mode PC.

             4.  PIFL/PIPC - contain the flags and PC while a software interrupt
                 (PSI) is in progress.

             5.  SKDFL/SKDPC - PC saved here while process is blocking, in  case
                 of context switch.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 103

             6.  MONFL/MONPC - PC saved here while process  is  starting  nested
                 JSYS, in case of context switch.

             7.  ENSKR/ENSKR+1 - PC saved  here  while  entering  scheduler  via
                 ENTSKD, in case of context switch.

             2.4  Summary Of AC Storage

             There are various areas that the ACs will be found, often depending
             on the context of the crash.  The "general" ones are:

             1.  UAC - previous context ACs are stored here  when  the  user  is
                 context  switched.  These are the ACs the last time the process
                 was dismissed.  If in a nested JSYS,  these  are  the  ACs  the
                 nested  JSYS  was  called  with;   the user ACs are in the UACB

             2.  UACB and ACBAS - the UACB block is  the  AC  stack  for  nested
                 JSYSes, and the location ACBAS (shifted left four) is the index
                 to the current set.

             3.  PAC - the EXEC mode ACs for a process are stored here when  the
                 process is dismissed.

             4.  PIAC - the EXEC mode ACs for a process are stored here  when  a
                 software interrupt (PSI) is in progress.

             5.  BUGACS - the EXEC mode ACs at the time of the crash.

             6.  BUGACU - the previous context ACs at the time of the crash.

             2.5  The Monitor's Stacks

             The next piece of valuable information is contained  in  the  stack
             pointer,  P.   This  location will point to one of several possible
             monitor stacks, and will give a strong indication about the context
             of  the  monitor at the time of the crash.  Identifying the type of
             BUGHLT will usually be a direct indication of which stack  will  be
             in  use, however under certain circumstances, the monitor may crash
             while changing from one stack to another, and such  a  circumstance
             could  provide  a  useful insight into the state of the system just
             before the crash.  The following are the names of several  possible
             monitor stacks, and the context under which each of them is used:
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 104

             BUGPDL    This stack is used while  performing  BUGHLT  processing.
                       It  will normally only be important if the system crashes
                       in a nested BUG.

             BUGSPL    This stack is used when generating KLSTAT blocks.

             UPDL      This  is  the  user  stack,  in  that  it  is  used  when
                       processing a user's JSYS in exec mode.  Whenever any user
                       executes a JSYS, this area in his PSB  is  used  for  the
                       stack.   Those  processes  under  job 0 which run in exec
                       mode will also use this stack.

             TRAPSK    This stack is used by the paging code whenever a  process
                       page  faults.   Normally a page fault will occur while in
                       the midst of performing some other function,  such  as  a
                       JSYS, and the stack pointer at the time of the page fault
                       will be in location TRAPAP, which in turn  will  in  this
                       case point to UPDL plus some offset.

             CFSSTK    This stack is used processing CFS code.

             PIPDB     This is used by the software interrupt handler.

             SKDPDL    This stack is used by the scheduler.

             DTESTK    This stack is used by the DTE interrupt service routines.

             PHYPDL    This stack is used by  PHYSIO  code  in  the  process  of
                       queueing  I/O  request blocks (IORB's).  These IORB's are
                       the  means  by  which  RH20/RH11   data   transfers   are

             PHYIPD    This stack  is  used  by  the  PHYSIO  interrupt  service
                       routines, and therefore is the interrupt-level equivalent
                       of PHYPDL.  It is important to remember  that  these  two
                       stacks  are  independent of each other, and should not be

             PI5STK    This stack is used  for  unvectored  PI5  interrupts,  eg

             PI6STK    This stack is used  for  unvectored  PI6  interrupts,  eg

             PIXSTK    This stack is used while processing  spurious  unvectored

             MEMPP     This stack is used when processing APR interrupts.

             IMSTK     This stack is used processing AN20 interrupt code.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 105

                  The stack that is being used, and the  section  of  code  that
             executed  the  BUGHLT  will  indicate  the  type of BUGHLT that has
             occurred, file  system  BUGHLT's  will  be  observed  either  while
             performing  a JSYS, servicing an interrupt, or otherwise attempting
             to access a file system that has been corrupted  to  the  point  of
             being unusable.

             3.0  BUGHLT CONTEXT (BUGPDL)

                  The previous PC will be stored in location  BUGHLT.   The  ACs
             are saved in the block at BUGACS (and loaded into the ACs by FILDDT
             by default), hence the saved stack pointer is  at  BUGACS+17.   The
             previous  context ACs are stored in the block at BUGACU.  These are
             the user mode ACs unless in a nested JSYS at the time of the crash,
             in  which case BUGACU has the ACs the current JSYS was called with,
             and the user mode ACs are in the UACB block.

                  The stack is set to BUGPDL.  In the case of a  nested  BUGHLT,
             AC17 will point to BUGPDL, and location BUGLCK will display:

                o  -1 => no BUG in progress

                o   0 => one BUG in progress (the usual case)

                o  +N => N nested bugs in progress (very unusual - bugs
                         during the BUGxxx code)

             4.0  JSYS CONTEXT (UPDL)

                  When a process executes a JSYS, the Monitor performs the  JSYS
             by  dispatching through a table called JSTAB to the proper routine.
             These routines are named by convention as the JSYS  name,  preceded
             by  a  ".",  thus  the  routine  to perform the JSYS PMAP is called
             ".PMAP::".  This name is always a global  symbol.   The  last  JSYS
             executed  in  user  context is saved in the PSB for the process, in
             location KIMUU1, and KIMUU1+1.

                KIMUU1/   flags,,104000
                    +1/   JSYS number

                  The second of these locations will contain the dispatch offset
             in  JSTAB;   this  number,  when  combined  with  the  JSYS  opcode
             (104000,,0), is the last JSYS performed by the user.   This,  then,
             will  point  indirectly  through the JSTAB table to the place where
             the user  JSYS  began  processing.   By  following  the  code,  and
             examining the stack, it is often possible to reconstruct the events
             leading to the crash.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 106

                  The stack will  contain  two  copies  of  the  user's  program
             counter  (PC)  and  flags in the first four locations of UPDL.  The
             PSB location MPP will contain the stack pointer at the time of last
             JSYS,  and  each  time the Monitor performs a JSYS internally, this
             data is pushed onto the stack, and set to the current value of P.

             Initial JSYS stack set-up:

                UPDL/     PC
                UPDL+1/   flags
                UPDL+2/   PC
                UPDL+3/   flags

             JSYS in Monitor context (nested JSYS):

                UPDL+n/   INTDF         ;old interrupts-deferred flag
                      /   MPP           ;previous PC, or level of nesting
                      /   Return PC of nested JSYS
                      /   PC flags

                  So, MPP is the stack pointer for the return PC block.  If this
             is  a  nested JSYS, the ACs are saved in UACB at the proper nesting

             Some other useful locations in JSYS context are:

                                     JSB Locations

             USRNAM    This contains the name of the user, in ASCII.

                                     PSB Locations

             JOBNO     Contains the number of the job for this process.

             FORKN     Contains the fork number for the top fork of the  job  in
                       the  left  half  of  the word, and the fork number of the
                       current fork in the right.

             INTDF     Contains -1 if process is OKINT, 0 or  greater  if  NOINT
                       (defer all software interrupts for this job)

             NSKED     Contains 0 if process is OKSKED, 1 or greater if  NOSKED.
                       (defer scheduling of other forks)
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 107

                Monitor Fork Tables - indexed by the current fork number

             FKCNO     Contains the SPT offset that points to the second page of
                       the PSB in the left half of this word.

             FKINT     Contains the  pseudo-interrupt  communications  register,
                       with  flags  in  the  left  half  describing  the type of
                       request, and the channel number of  the  request  in  the
                       right half.

             FKINTB    Contains the pseudo-interrupt  channel  requests  pending
                       since the fork's last PSI interrupt.

             FKJOB     Job number of the fork in the left half,  and  SPT  index
                       for the JSB in the right half.

             FKJTQ     Part of a doubly linked list of forks  that  are  waiting
                       program  software interrupt the Monitor.  JTLST points to
                       the top fork on the list.

             FKNR      Contains in bits 0-8 the age stamp value at the last time
                       local garbage collection was performed.

             FKPGS     Contains the SPT indices for the process page  table,  in
                       the left half, and the PSB in the right half.

             FKPGST    Contains the address of the routine to test  for  balance
                       set  wait  satisfied in the right half, with test data in
                       the left.  If the fork is not in the  balance  set,  this
                       contains  the  time  of  day that the fork entered a wait

             FKPT      Part of a linked list of forks on a particular  schedular
                       list,  such  as GOLST, WTLST, etc.  The right half of the
                       word contains the address of  the  next  element  in  the
                       list,  and  the  left half contains the amount of runtime
                       the fork's  job  will  have  accumulated  when  the  fork
                       exceeds its Balance Set Hold time.

             FKQ1      Contain the  fork's  remaining  run  quantum.   When  the
                       quantum  expires, the fork is moved to a lower run queue,
                       and given the appropriate new quantum.

             FKQ2     Contains the fork's schedular queue level  number  in  the
                       left  half,  and  the  list  address, i.e.  GOLST, WTLST,
                       etc., in the right.

             FKSTAT    Contains the address of the schedular test routine  which
                       will determine when the fork is available to be placed on
                       the GOLST.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 108

             FKTIME    Contains the time of day, in internal  format,  that  the
                       fork was placed on its current run queue.

             FKWSP     Contains the number of physical  pages  assigned  by  the
                       fork  in  the right half, and the working set size of the
                       fork when the fork entered the balance set in the left.

             5.0  PAGER CONTEXT (TRAPSK)

                  Page faults trap through the user's UPT, by  placing  the  old
             flags  and  PC  for  the  process  in  locations  UPTPFL and UPTPFO
             respectively, and taking the new PC from location  UPTPFN.   UPTPFN
             will  usually contain the address PGRTRP, which is the beginning of
             the page fault code.

                  The location being referenced and therefore causing  the  page
             fault  is  stored in UPTPFW, also called TRAPS0.  This contains the
             virtual address that page faulted in bits 13-35.   Bit  0  of  this
             word indicates if the location is in user or exec (monitor) address
             space.  If this bit is set, the address is in user address space.

                  The PGRTRP code copies TRAPS0 into TRAPSW (before release  6),
             in  case  of recursion.  This code will determine the nature of the
             page fault, and attempt to resolve it.  UPTPFL and UPTPFO are  also
             called TRAPFL and TRAPPC respectively.

                  The old stack pointer is saved in  location  TRAPAP  (this  is
             only  relevant  if  the page fault occurred in exec mode).  The new
             stack, TRAPSK, is set up according  to  the  context  of  the  page
             fault,  i.e.,  user  context,  monitor  context,  or recursive page
             fault.  The form of the stack changes for Release  6.   First,  for
             earlier releases:

                  A page fault in user mode causes the stack to be set  up  with
             the  runtime,  return  PC,  and  return PC flags in the first three
             locations of the stack:

                        TRAPSK/     runtime
                        TRAPSK+1/   return PC
                        TRAPSK+2/   return PC flags

                  Page faults from monitor context have  the  following  initial
             stack set-up:  (prior to release 6)

                        TRAPSK/     AC1
                        TRAPSK+1/   AC2
                        TRAPSK+2/   AC3
                        TRAPSK+3/   AC4
                        TRAPSK+4/   AC7
                        TRAPSK+5/   AC16
                        TRAPSK+6/   TRAPSW
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 109

                        TRAPSK+7/   runtime
                        TRAPSK+10/  PC
                        TRAPSK+11/  PC flags

             Recursive page faults will cause the following set up in TRAPSK, at
             the time of the page fault:  (prior to release 6)

                        /   AC1
                        /   AC2
                        /   AC3
                        /   AC4
                        /   AC7
                        /   AC16
                        /   TRAPSW
                        /   PC
                        /   PC flags

                  For release 6, the code becomes more uniform, and  the  format
             of  the  stack  is  the  same  for  all cases;  however, some stack
             offsets are not made use of for all types  of  page  faults  --  as

                  The code at PGRTRP sets up the TRAPSK stack, pushes CX on  it,
             and the calls PFAULT, which has the following TRVAR:


             and so the stack looks like this:

                TRAPSK/   CX
                    +1/   return address of call to PFAULT
                    +2/   AC15 saved by TRVAR
                    +3/   AC1                   =PFHACS
                    +4/   AC2
                    +5/   AC3
                    +6/   AC4
                    +7/   AC7 = FX
                   +10/   runtime               =PFHTIM
                   +11/   TRVAR temp location   =PFHTMP
                   +12/   UPTPFW page fail word =PFHPFW
                   +13/   TRAPFL flags          =PFHFL
                   +14/   TRAPPC pc             =PFHPC
                   +15/   .TRRET

             Recursive page faults will  indicate  the  level  of  recursion  in
             TRAPC.   This  location  is  normally  set to -1 and is incremented
             every time the page fault code is called, and  decremented  when  a
             page fault has been satisfied.

                  In examining a pager crash, it is usually a good idea to begin
             by  tracing  down the Monitor's table entries for the location that
             faulted.  This location is stored in location TRAPS0.  The identity
             of  the page causing the trap is stored in location TRPID, and will
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 110

             be in either of two forms:  page table number  in  left,  and  page
             number in right, or simply the page table number in the right.  The
             page table number is an SPT index, and the page number, if any,  is
             an  offset  into the page table pointed to by that SPT slot.  There
             are four Core  Status  Tables  (CST's)  indexed  by  physical  page
             number, that are used to keep track of each page in the machine.  A
             page fault crash will usually have bad data in either the SPT  slot
             indicated  in  TRPID,  or  one  of  the CST's for the physical page
             pointed to indirectly through that SPT  slot.   If  TRPID  contains
             PTN,,PN,  then  find location SPT+PTN.  This should have a physical
             page number in the right half.  Look at this physical page,  offset
             by  PN  in  TRPID  to  find the pointer to the page that caused the
             fault.  Shared and indirect pointers in this  location  will  point
             through  another  SPT  location,  but  private  pointers will point
             directly at the physical page that we are looking  for.   If  TRPID
             contains just PTN, then SPT+PTN will point directly at the physical
             page we are looking for.  Knowing the physical page number,  it  is
             now possible to examine the CST tables for that page.

             CST0      Used principally by the  pager  hardware,  this  location
                       will  contain  the Process Use Register, mentioned in the
                       FKCNO table above, and the age stamp.

             CST1      Contains the system lock count, and  the  backup  address
                       for  the  page.   The  lock count indicates the number of
                       systen events necessary before the page will  be  swapped
                       out,  and  the  backup  address for the page.  The system
                       should never swap out a page with a non-zero lock  count.
                       The  backup  address  can be a disk or drum address for a
                       page in memory.

             CST2      Contains the home map location of the  page,  and  should
                       match the contents of TRPID.

             CST3      Is used by the software  to  create  lists  of  pages  in
                       various  states  of  use.   Those pages available for use
                       will be on the Replaceable Queue, and linked together  in
                       a doubly linked list.  Those pages awaiting swapping will
                       be on a swapping device  queue,  and  part  of  a  singly
                       linked  list.   Pages in use will contain the fork number
                       of the owner in bits 3-14, and the local disk address for
                       PHYSIO for the page.

             CST5      Contains the list of short I/O  Request  Blocks  (IORB's)
                       associated with the page.

                  A few other significant locations for page faults are:
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 111

             RPLQ      Points to the beginning of the Replaceable Queue in CST3.

             NRPLQ     Contains the number of pages on the Replaceable Queue.

             SWPLST    Points to the beginning of the PHYSIO swap list, in CST3.

             NOF       Contains the number of OFN's in use in the SPT.

             6.0  PSI CONTEXT (PIPDB)

                  Tables FKINT and FKINTB will be useful in determining the type
             and  timing  of  PSI  interrupts  pending at the time of the crash.
             When a process has a PSI interrupt pending, it is  flagged  in  the
             FKINT entry for that fork, and the scheduler will take note of this
             event and set the PPC location in  the  PSB  for  that  process  to
             contain  the  address  PIRQ.   This  action takes place at location
             SCHED5 in the scheduler.

                  The next time that the  process  is  ready  to  run,  it  will
             continue  at location PIRQ, which will set up the PSI stack, PIPDB.
             SCHED5 also moves the PSI request word from FKINT to PIMSK  in  the
             PSB.   Thus, it is possible to check this location for the last PSI
             request that was scheduled.

                  The old contents of PPC and PFL are stored in PIPC and PIFL by
             the  SCHED5  routine,  so  these  will indicate the point where the
             process was interrupted.  The ACs are stored in the block at  PIAC,
             hence the previous stack pointer is at PIAC+17.

             7.0  SCHEDULER CONTEXT (SKDPDL)

                  The scheduler is usually invoked in one of two ways:   through
             a  software interrupt initiated by channel 3 PI routine, indicating
             that a set period of time has  elapsed  since  the  last  scheduler
             cycle,  or  through  the  ENTSKD  macro, which is used by a running
             process that is about to dismiss.  In this  way  the  scheduler  is
             guaranteed  to  run at regular intervals, or whenever the system is

                  The primary entry point to the scheduler  is  SCHED0.   It  is
             through  this  location control passes whenever the running process
             dismisses, or whenever  one  of  the  two  scheduler  clock  cycles

                  Briefly, the  hardware  traps  on  every  clock  tick  through
             location TIMVIL in the EPT.  This location contains the instruction
             XPCW  TIMINT.   Again,  as  in  the  device  interrupt  code,  this
             instruction  causes  the  flags  and  PC  to be placed in locations
             TIMINT, and  TIMINT+1,  and  control  passes  to  the  location  in
             TIMINT+3,  which in this case is TIMIN0.  TIMIN0 determines whether
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 112

             or not  it  is  time  to  run  the  scheduler,  and  dismisses  the
             interrupt.   The  path  taken  by  the  KS-10 processor is slightly
             different, taking a 40+2*n interrupt on the CPU channel (3), but it
             winds  up  in  the  same  place  (TIMIN0)  when  it  determines the
             interrupt was for a clock tick.

                  If the scheduler is to be run,  TIMIN0  initiates  a  software
             interrupt  on  channel  7,  which  causes  a  trap  through the EPT
             location KIEPT+56 to PISC7R.  The instruction executed in  KIEPT+56
             is  an XPCW PISC7R, causing the old PC and flags to be deposited at
             PISC7R, and control to begin at PISC7+1.  The PISC7  code  sets  up
             PPC and PFL to contain the old PC and flags, from PISC7R, and saves
             the process ACs at the time of the interrupt in a block of the  PSB
             called PAC.

                  Having set up for  scheduler  context,  the  PISC7  code  then
             transfers  control  to  the  SCHED0 routine.  Similarly, the ENTSKD
             macro does an XPCW ENSKR, causing a jump to the ENSKED routine that
             does the context switch.

                  The stack is set to SKDPDL.  The previous PC is stored by  the
             code  in  PFL  and PPC in the PSB.  The ACs are stored in PAC (exec
             mode) and UAC (previous context ACs)  in  the  PSB.   The  previous
             stack pointer is in the saved ACs.

             Some other useful locations in scheduler context:

              o  FORKX

                 contains a -1 if no fork is chosen or the fork  number  of  the
                 chosen fork.

              o  INTDF

                 Contains -1 if process is OKINT, 0 or greater if  NOINT  (defer
                 all software interrupts for this job)

              o  NSKED

                 Contains 0 if process  is  OKSKED,  1  or  greater  if  NOSKED.
                 (defer scheduling of other forks)

              o  SKEDF3

                 If nonzero, will cause the scheduler to reevaluate the  balance
                 set and reschedule all forks.

              o  SKEDF1

                 If nonzero, indicates that a fork has been chosen to  run,  and
                 the scheduler should set the fork context.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 113

              o  SKEDFC

                 If nonzero, forces a clear of the balance set and memory.

              o  RSKED

                 Contains the instruction to be executed when a  NOSKED  process
                 goes OKSKED.

              o  INSKED

                 If nonzero, indicates the scheduler  overhead  cycle  has  been

              o  SSKED

                 Holds the NOSKED fork number, if any.

              o  SCKATM

                 The software clock that generates a channel 7 interrupt when it
                 has been decremented to zero.

              o  GOLST

                 Points to the beginning of the GOLST in the FKPT table.

              o  WTLST

                 Points to the Wait list in the FKPT table.

              o  TTILST

                 Points to the TTY input wait list in the FKPT table.

              o  FRZLST

                 Points to the list of frozen forks.

              o  WT2LST

                 Points to the list of forks waiting to be unblocked.  (UNBLK1)

              o  TRMLST

                 Points to the  list  of  forks  waiting  for  another  fork  to

              o  SUMNR

                 Contains the number of reserved pages.  (locked in memory)
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 114

              o  BALSHC

                 Contains the number of pages reserved due to shared access.


                  DTE interrupts also dispatch through  locations  in  the  EPT,
             depending  upon which DTE is interrupting.  For each DTE that could
             exist on a system (4), there is an eight word block in the EPT used
             to  keep  up-to-date  information for that DTE.  Not all of the DTE
             blocks will necessarily be used, however they will all exist in the
             EPT.   These blocks begin at location DTEEBP.  The format of one of
             these blocks is described below.  The DTE  interrupt  executes  the
             third word in this block, which contains a XPCW DTEN0.

                  The old PC and flags will be stored at  location  DTEN0,  and,
             since  DTEN0+3 contains ".+1", the system will begin processing the
             interrupt at location DTEN0+4.

                  The flags and PC will be stored at DTETRA and the  ACs  stored
             at DTEACB (previous stack at DTEACB+17).  The new stack will be set
             to DTESTK.

                  DTEN0 will then use INTDTE to  process  the  interrupt.   This
             code can be found in the DTESRV module of the monitor.

             The DTE control block:

                DTEEBP/   To -11 byte pointer
                DTETBP/   To -10 byte pointer
                DTEINT/   "XPCW DTEN0"          ;dispatch for DTE-0
                      /   reserved
                DTEEPW/   Examine Protection Word
                DTEERW/   Examine Relocation Word
                DTEDPW/   Deposit Protection Word
                DTEDRW/   Deposit Relocation Word

             Note that the labels above  apply  only  to  DTE-0,  and  that  the
             remaining DTE's must be offset by DTE-number X 8.

                Some other useful locations in the EPT:

                DTEFLG/   Operation Complete Flag
                DTECFK/   Clock Interrupt Flag
                DTECKI/   Clock Interrupt Instruction
                DTET11/   To -11 argument
                DTEF11/   From -11 argument
                DTECMD/   Command Word
                DTESEQ/   DTE20 Operation Sequence Number
                DTEOPR/   Operation In Progress Flag
                DTECHR/   Last Typed Character
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 115

                DTETMD/   Monitor TTY Output Complete Flag
                DTEMTI/   Monitor TTY Input Flag
                DTESWR/   Console Switch Register

             These location are found at offsets 444 through 457 in the EPT.


                  All disk and tape I/O is initiated through the PHYSIO code, by
             calling  PHYSIO  with  a  pointer to an I/O Request Block (IORB) in
             AC1, and the addresses of the Channel Data  Block  (CDB)  and  Unit
             Data Block (UDB) in AC2 (CDB,,UDB).  PHYSIO validates the arguments
             passed to it, and then determines whether the IORB belongs  on  the
             Position  Wait Queue (PWQ) or the Transfer Wait Queue (TWQ).  These
             two queues are pointed to by offsets UDBPWQ and UDBTWQ in  the  UDB
             for  the  device.   Note that these are offsets into the UDB, which
             will be in resident free space,  as  well  as  the  CDB's.   During
             processing, PHYSIO will keep the following information in the ac's:

                P1/   address of the CDB
                P2/   address of the KDB (for tapes or RP20) or 0
                P3/   address of the UDB
                P4/   address of the IORB being processed

             Since PHYSIO is called via the PUSHJ P, instruction,  the  previous
             PC  is saved on the caller's stack.  The P and Q AC's are stored on
             the stack via the SAVEPQ macro.

                  PHYSIO does use a private stack, and the old stack pointer  is
             saved in PHYSVP.

                  Also, because PHYSIO does use a private stack, it is necessary
             for the process calling PHYSIO to be NOSKED.  Also take note of the
             fact that IORB's are associated with the physical pages  of  memory
             that  are  involved with the I/O through pointers in the CST5 table
             for those pages.  See the next section for more information in this


                  Device interrupts, in this context, refer  to  disk  and  tape
             interrupts,  those devices connected through the RH20's.  Each RH20
             channel has a "Channel Logout" area at the beginning of EPT.   This
             logout  area  is  four words in length for each channel, the fourth
             word of which contains an instruction to execute on  an  interrupt.
             This  instruction causes the system to dispatch to code actually in
             the CDB for the channel.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 116

                  On  the  2020,  the  interrupts  work  differently.   The  EPT
             contains pointers to SM10 vector tables starting at address SMTEPT.
             The number of the interrupting UBA (1 or 3) is used as an offset to
             SMTEPT  to  find the proper vector table, and then the function and
             device (read done, DZ11, etc...) is used  as  an  offset  into  the
             vector  table  which  contains  the appropriate XPCW instruction to
             transfer control to the correct routine.

                  The previous PC and flags are saved in  the  area  immediately
             preceding  the CDB;  offset CDBINT (value -6) is the location where
             the flags and PC  are  stored.   When  the  interrupt  occurs,  the
             hardware executes the instruction in the channel logout area, which
             is "XPCW loc".  "Loc" is the address of the CDB for  this  channel,
             offset  by  CDBINT  (-6).   The XPCW instruction saves the flags at
             CDBINT(CDB), the PC at the next location, and gets  the  new  flags
             and  PC  from  the next two locations.  This area of the CDB, then,
             contains the following:

                CDBINT(CDB)/   old flags
                    -5(CDB)/   old PC
                    -4(CDB)/   new flags (0)
                    -3(CDB)/   new PC ( ".+1")
                    -2(CDB)/   MOVEM P1,CDBSVQ(CDB) ; save P1 in CDBSVQ
                    -1(CDB)/   JSP P1,PHYINT        ; dispatch to interrupt code
                CDBSTS(CDB)/   status and configuration flags

                  PHYINT sets up to use the stack PHYIPD, and saves the  ACs  in
             the  block  at  PHYACS,  therefore the previous stack pointer is at

                  The KLIPA code takes a 40+2*n interrupt  through  the  EPT  to
             EPT+52,  thence  to  PISC5:  (in STG) and from there to KLPSV:  (in
             PHYKLP) and finally to PHYINT:.

                  The PHYINT code, then, resolves the interrupt, and returns  to
             the  old PC by JRSTing through offset CDBJEN in the CDB.  This part
             of the CDB contains the following:

                CDBJEN(CDB)/   BLT 17,17
                         +1/   DATAO RH,CDBRST
                         +2/   XJEN CDBINT(P1)

             The last of these locations causes the system to  resume  where  it
             was interrupted.  During processing of the interrupt, the following
             information may be found:

                P1/   address of the CDB
                P2/   address of the KDB or 0
                P3/   address of the UDB
                P4/   address of the IORB or argument code:

                        (P4) < 0 - schedule a channel cycle
                        (P4) = 0 - dismiss interrupt
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 117

                        (P4) > 0 - complete current request (IORB address)

                  When the system is attempting to perform  I/O  to  or  from  a
             specific page of physical memory, that page is locked into core, by
             incrementing the lock count in the CST1 location for that page.  If
             a  device  error  occurs during the transfer of data for that page,
             then the CST5 entry for that page will  have  either  a  short  I/O
             Request  Block  (IORB)  or  a  pointer to a long (magtape or DSKOP)
             IORB.  The short IORB is only one word in length and  is  used  for
             disk  transfer requests, i.e., swapping.  In either case, the first
             word of an IORB, called IRBSTS, contains flags  that  describe  the
             success  or  failure  of  the transfer.  It may be helpful to check
             these locations in the event of a PHYINT crash.

                  The following offsets contain useful  information  for  PHYSIO

             In the UDB:

                UDBPS1/   cylinder number
                UDBPS2/   surface,, sector number
                UDBERC/   error retry count
                UDBERR/   status function for error retry

             In the CDB:

                CDBCNI/   status of channel when interrupt began.

             11.0  APR INTERRUPT CONTEXT (MEMPP)

                  These interrupts are the result of one  of  numerous  hardware
             errors being detected -- memory parity error, address parity error,
             NXM error, cache directory parity error, SBUS error, IO page  fail,
             etc.   APR Interrupts, like Device interrupts, are vectored through
             the EPT, but in the case of the APR interrupts, the vector location
             is  a  part  of  the priority interrupt scheme.  These are priority
             channel 3 interrupts, and dispatch through location KIEPT+46, which
             contains  an XPCW PIAPRX.  This is the channel 3 interrupt routine.
             As in the case of the device interrupt, the XPCW PIAPRX will  cause
             the PC and flags to be stored at locations PIAPRX and PIAPRX+1, and
             the processor will then jump to the location  stored  in  PIAPRX+3,
             which  is  PIAPR+1.  PIAPR actually dismisses the APR interrupt, or

                  This routine will set up its own stack, MEMPP.   The  previous
             stack pointer will be stored in MEMAP.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 118

                  The current AC block is switched and so the ACs are not stored
             in memory.

                  One unusual aspect about handling APR interrupts is  that  the
             PIAPR  code  changes the page fault trap vector, mentioned earlier,
             from PGRTRP to MEMPTP, in UPTPFN, to handle the special case  of  a
             page fault in APR interrupt context.


                  The interrupt stack is set  to  IMSTK  via  the  MKNCTS  macro
             called  in  STG.   Interrupts  enter through the XPCW at the NTIINT
             offset in the NCT, eg NTIINT+(NCTVT).  The previous PC is stored in
             a   doubleword   at   NTIPCW+(NCTVT).    The   ACs  are  stored  at
             NTSVAC+(NCTVT), so the previous stack is at NTSVAC+(NCTVT)+17.

                  The location NCINPC+(NCTVT)  contains  the  initial  interrupt
             dispatch  address.   The  dispatch  addresses for message input and
             output are NTIDSP+(NCTVT) and NTODSP+(NCTVT) respectively.  See the
             definition of the NCT table in ANAUNV.MAC.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 119


             Under Release 6 of TOPS-20, a number of data structures  have  been
        relocated  out  of section zero of the monitor's address space.  In some
        cases this necessitated changes in the way those  data  structures  were
        accessed,  and how they are accessed via FILDDT in crash dumps.  The CST
        tables (with the exception of CST5) are one such data  structure.   They
        are  accessed  in  the monitor by indirect reference through a series of
        tables with names of the form CSTnX, e.g.  CST3X to reference CST3.  The
        tables  are  16 words long, where CSTnX + m is an indirect word pointing
        to CSTn and indexed by register m.  Therefore  the  monitor  can  use  a
        construct  such  as  MOVE  T1,@CST0X+P1 where previous monitors used the
        form MOVE T1,CST0(P1) to fetch the CST0 entry for  the  page  number  in
        register P1.

             The following is an example of a method that can be used in  FILDDT
        to  access  the CST tables in a crash dump, assuming we want to find out
        the CST information for page 237:

        [38136 symbols loaded from file]
        [ACs copied from BUGACS to 0-17]
        [Looking at file GIDNEY:<SYSTEM>DUMP.EXE.1]

        EP$U                                    ! Establish virtual mapping

        0,,CST0X[   5,,203000   $Q<CST0:        ! Define symbols CST0,...,CST3
        0,,CST1X[   5,,217000   $Q<CST1:        ! from the contents of the
        0,,CST2X[   5,,233000   $Q<CST2:        ! CSTnX tables zeroth location
        0,,CST3X[   5,,247000   $Q<CST3:   

        CST5=203001                             ! CST5 was not moved for v6

        CST0+237[   556000,,400321   .=5,,203237! Now we can reference the CST
        CST1+237[   101,,0                      ! entries for page 237 in the
        CST2+237[   624,,237                    ! same old way we did for
        CST3+237[   77770,,0                    ! earlier releases.
        CST5+237[   556000,,400321   


             See SWSKIT document MONITOR-ADDRESS-SPACE.MEMOS for more detail.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 120

                                     THE BUG MACRO

             In the various releases of TOPS-20  the  BUG  macro  mechanism  has
        changed  in  form  many  times.  It has remained essentially the same in
        essence however.  The use of  the  BUG  macro  is  to  generate  a  code
        sequence  in the monitor to report the occurrance of a software-detected
        error and, in the  case  of  a  BUGHLT,  crash  the  system.   The  code
        generates an XCT bugname which is a call to the proper routine to handle
        the BUGHLT/CHK/INF and provides the argument list of additional data for
        the  BUG.   In  3- and 4-series monitors, the call is via JSR to BUGHLT,
        BUGCHK, or BUGINF.  In 5-series and later monitors, the call is via  JSR
        to  BUGHLT  or  CALL  to BGCCHK or BGCINF.  In addition, the single line
        descriptive text is appended to the BGSTR PSECT, and  a  pointer  to  it
        placed in the BGPTR PSECT.

             With current monitors, a document called  BUGHLT  Documentation  is
        included  with  the Software Notebook set, which brings together all the
        additional data that is now part of the BUG description.  This should be
        considered an essential debugging document.

             For 3-series monitors, all of the information for the BUG was found
        in-line  in  the  source file.  There was only a single line descriptive
        text, and so all information  about  the  condition  had  to  be  gotten
        directly from the code.

             For 4-series monitors, there is a file  called  BUGS.MAC  which  is
        part  of  the  monitor build process and which contains the detailed BUG
        descriptions as part of the DEFBUG macros.  BUGS.MAC assembles  as  part
        of the build of PROLOG.UNV, and the calls to the BUG macro in the source
        look like:  BUG(bugname,<additional data>).  For example:


        That is, essentially all the descriptive text is in the  BUGS.MAC  file,
        and not in the source.  DEFBUG and BUG are defined in PROLOG.

             For 5-series monitors, the same method as for 4-series monitors  is
        used,  with  the  additional  data field descriptors taking on mnemonics
        instead of the "D".  The descriptive text is still all in BUGS.MAC.

             With Release 6, the procedure changes again.  The whole  BUG  macro
        text  moves  back in-line in each of the source modules, like Release 3,
        however, the long argument list with the long descriptive text  remains.
        The BUGS.MAC file disappears.  The calling name becomes BUG.  instead of
        just BUG without the period, and some new argument options are added.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 121

             Here is a description from PROLOG, and an example of the  new  form
        for Release 6 of TOPS-20:

        ;Macros for defining BUGs
        ;General format for in-line bug macro call is:
        ;TYP -  Flavor, HLT, CHK, or INF
        ;TAG -  Name of BUG
        ;MODULE -
        ;       Name of module in which BUG occurs.
        ;WORD - Flavor of BUG.  For instance, HARD for hardware-caused, SOFT
        ;       for software-caused.
        ;STR -  Short descriptive string describing cause of BUG, which gets
        ;       printed on CTY when BUG occurs.
        ;LOC -  List of locations whose contents should be displayed when the
        ;       BUG occurs.  Each location must be followed by a comma and
        ;       then a one-word descriptor of what the datum represents, for
        ;       instance UNIT or CHN.  Each pair of locations and descriptors
        ;       must be in angle brackets, and the angle-bracketed pairs must
        ;       be separated by commas with the entire LOC argument in angle
        ;       brackets.
        ;HELP - General documentation for the BUG
        ;CONTIN - Optional continuation address after BUGCHK or BUGINF is
        ;       logged.  Assumed to be in same section with call.

        For example, from PAGEM.MAC, the PCIN0 BUGCHK:

                    BUG.(CHK,PCIN0,PAGEM,SOFT,<PAGEM - PC has gone into section 0>,

        Cause:  A reference has been made to RSCOD or NRCOD in section 0.
                This should not happen because section 0 code cannot
                reference data in extended sections.  As an expedient,
                the page being referenced will be mapped to section 1
                with an indirect pointer.

             There is further information in the SWSKIT article BUGS.MEM.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 122

                                 MONITOR BUILDING HINTS

        1. GENERAL

        Judging from the  number of  requests for  help on  this subject,  the
        chances are that you  will be required to  rebuild a monitor  sometime
        during your career  as a  Software Specialist. The  reasons are  quite
        simple.  There are customers, who simply want functionality other than
        that provided  by  stock  monitors.  There  are  also  those  who  are
        experiencing performance problems. We  cannot forget the sales  folks.
        It is not  unusual to  have to  rebuild a monitor  in order  to run  a
        benchmark. A very common example  is increasing the OFN area.  Another
        quite common requirement is  to increase the  patch area (FFF).  Doing
        either of these and simply submitting a build control file will  often
        produce a bad monitor.

        We will talk about PSECTS in  relation to the Monitor's address  space
        but will  make  no attempt  to  define what  they  do. A good detailed
        discussion on the Monitor's address space is on pages 2-62 to 2-73  in
        the Release 4  Update Manual. Also  there is a  memo on the  Monitor's
        address space in the SWSKIT.

        2. BACKGROUND

        In V3A, all of the Monitor was in the same address space. Nevertheless
        there was a crunch on space. As  a result some PSECTS were allowed  to
        overlap. So  critical  was the  space  requirement, that  attempts  to
        increase the OFN area  or FFF usually resulted  in the overlapping  of
        PSECTS other the  the ones  permitted. Therein lies  the problem.  The
        Monitor produced from such a process would ordinarily be useless.
        With  the  development  of  V4,  the  space  requirement  became  more
        critical.  The Symbol Table became the object of concern. It  required
        a large number of pages, and in general, it is only used  infrequently
        under normal  conditions.  Hence  the Engineering  folks were  of  the
        opinion that  it should  be completely  eliminated.  We  objected.  It
        would be a nightmare to try  to debug the monitor without symbols.  It
        thus became  our  project  to  somehow keep  the  Symbol  Table  while
        conforming with  the space  restrictions.  We  decided to  remove  the
        Symbol Table and place it in  an alternate  address  space. It  should
        be noted  that  this  action  does  not  impact  adversely  on  system
        performance. With this change, the  build procedure and the  monitor's
        address space were reorganized.


        Outlined below are some steps to guide you when rebuilding a  monitor.
        Bear in mind that this  is a guide and might  not account for all  the
        unusual situations.  This guide however, coupled with your  experience
        and common  sense will  most likely  do the  trick. PLEASE  READ  THIS
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 123

        read the build BEWARE file that is on the Installation tape.
        NOTE:   The customers Distribution Tape will have all the files needed
                to rebuild  the  monitor.  All  TOPS-20  modules  will  be  in
                TOPS-20.REL (or T2020.REL etc) The control file is  TOPS20.CTL
                (or T2020.CTL  etc).  The  link file  will be  NAME.CCL  where
                "NAME" depends upon what monitor is being used (could be 2020,
                ARPA etc.). For 2040/50, it is called LNKSCH.CCL. In any  case
                the TOPS20.CTL file  will have  the name. The  files you  will
                change will be one  of  the  PARAM's  file  and/or STG.MAC. It
                should be noted that the special LINK.EXE and MACRO.EXE needed
                to build V3A are not required under V4.
          ====> The very first thing to do is to use all the standard files to
                build a "vanilla" monitor without any changes.  This will show
                most of the bugs in your attempt without worrying  about  what
                you are changing having an effect; and hence, should result in
                a substantially reduced debugging time.

        STEP 1          Restore all files needed  from <n-SOURCES>. This  will
                        usually contain the monitor modules (TOPS20.REL file),
                        all needed source  files, all  build control,  command
                        and log files.
        STEP 2          Carefully make the source changes as needed.
        STEP 3          Examine the TOPS20.CTL  file. This  file will  usually
                        have logical name definitions and TAKE commands  along
                        with other things. Also look at all referenced files.
        STEP 4          Examine the  corresponding log  file. This  will  show
                        what the result of  the original build procedure  was.
                        It should therefore be a template which should be used
                        to judge the validity of the new Monitor. Pay  special
                        attention to the section which shows the PSECT  layout
                        at the  end of  the BUILD  procedure. This  shows  the
                        start location,  the end  location and  the amount  of
                        free space between each PSECT.  The file used by  LINK
                        to set up the PSECTS is called LNKSCH.CCL. You  should
                        look at this file to get an idea of what's happening.
        STEP 5          Now edit the control and command files as necessary to
                        reflect your environment. This will mean, among  other
                        things,   changing   or   eliminating   logical   name
                        definitions.  Do NOT change the order of the PSECTS in
                        the LNKSCH.CCL file. Also  do not change the  starting
                        value for any PSECT.  The starting value is the  value
                        given to the /SET: switch.
        STEP 6          Submit  the  control  file  with  /TAG:SINGLE  switch.
                        Ensure that the control  file is correct and  reflects
                        accurately logical name definitions and the .CCL file.
                        Also this portion  of the .CTL  file has the  commands
                        necessary to compile the changed module.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 124

        STEP 7          When the job ends, examine your log file. Correct  any
                        compilation or  missing files  errors and  go back  to
                        STEP 6. Continue with STEP 8 only after all errors are
        STEP 8          At this  point  you  should have  a  MONITR.EXE.   Now
                        examine the section  in the  log file  which gives  an
                        outline of  the  PSECTS.   If any  PSECTS  overlap,  a
                        message will  indicate  the  same.  If  there  are  no
                        overlapping messages, go to  STEP 11. NOTE: There  are
                        some   instances  where  PSECTs  can  overlap.  POSTCD
                        and SYVAR  PSECTs are  allowed to  overlap any  xxxVAR
                        PSECT. This will  not gain  very much in  storage -  4
                        pages to be exact. If  you  follow the build procedure
                        then overlapping  PSECTs are not allowed and therefore
                        must  be  resolved.  You  are  once  again advised NOT
                        to re-organize the monitor's address space.
        STEP 9          Start with  the  first  overlapping.  Figure  out  the
                        amount of words by which  the first PSECT overlaps its
                        following PSECT.  Now  add  this value  to  the  start
                        location of  the overlapped  PSECT. This  value  quite
                        possibly will  be  location  within  a  page  i.e.  an
                        address of the form 125300,  where the page number  is
                        125 and the offset into the page is 300. The  starting
                        address of many  PSECTs is  required to be  on a  page
                        boundary i.e. an  address of the  form 126000. A  good
                        rule to  follow is:  IF THE  PSECT STARTED  ON A  PAGE
                        BOUNDARY BEFORE  THE BUILD,  THEN KEEP  IT ON  A  PAGE
                        BOUNDARY. This would mean that you may be required  to
                        add an additional value to round up to the next  page.
                        For example  the  125300  value would  be  rounded  to
                        126000 if the  PSECT is required  on a page  boundary.
                        The PSECT  sequence and  starting  values are  in  the
                        LNKSCH.CCL file.  NOTE: the  values are  all given  in
                        OCTAL so add in OCTAL.
        STEP 10         EDIT the  LNKSCH.CCL file  to reflect  this new  start
                        value for the  overlapped PSECT.  Go back  to STEP  6.
                        Repeat these  steps  until  there are  no  more  error
                        messages. Note that changing the start location of the
                        overlapped PSECT can cause it to overlap its following
                        PSECT and  the  same  procedure must  be  followed  to
                        resolve any conflicts. Of  course you must be  careful
                        to ensure that you do not outgrow the monitors address
                        space. A total of the  length of all PSECTs will  tell
                        you if the Monitor is too large.
        STEP 11         At this point you should have a good Monitor. Save  it
                        in the proper directory. The final test is getting  it
                        up and running.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 125

                          MONITOR BUILDING NOTES FOR RELEASE 6

             There have been even more changes  in  POSTLD  processing  and  the
        monitor's  address  space for version 6 of TOPS-20, some of which should
        be taken into account when attempting to build a new  monitor  from  the
        control  file.   This  is a list of the changes from the version 4.1/5.1
        build procedures.

             1.  A new file SYSFLG.MAC appears for version 6 builds.  This  file
                 is  not  used  explicitly in the customer build, but is used to
                 create PROLOG.UNV (and BOOT,  by  the  way).   SYSFLG  contains
                 system   configuration  flags  and  conditional  settings,  and
                 replaces the files KSPRE.MAC and KLPRE.MAC which now  disappear
                 (into  PROLOG  for  the  most  part),  along with PROKS.UNV and

             2.  The command  file  ASEMBL.CMD  has  been  split  more  or  less
                 arbitrarily  into  two  files:   ASMBL1.CMD  and  ASMBL2.CMD to
                 perform exactly the same function, but to put less of a  burden
                 on  the  EXEC.   These two files now contain comments about the
                 files to be compiled also, by the way.

             3.  There is a change in the  DDT  dialog  used  to  establish  the
                 breakpoints for BUGHLT and BUGCHK.  An additional breakpoint is
                 set at DDTIBP (which is XCT'ed by DDTINI) with  the  breakpoint
                 set  to proceed when hit.  The purpose for this is so that when
                 the monitor reaches a given state in  initializing  the  system
                 paging,  we  hit  a  DDT  breakpoint and DDT can then sense the
                 state of the world, according to the monitor, and can  set  its
                 own  internal  state  however  it  needs  to  reflect  extended
                 addressing considerations for EDDT.

             4.  POSTLD now tries to make PSECT juggling easier  by  making  one
                 try  itself.   If  the given configuration does not work due to
                 overlaps, POSTLD will try to write what should be a working set
                 of  values  (if  possible)  into two new files:  LNKNEW.CCL and
                 PARNEW.MAC.  It will then have  BATCON  transfer  to  an  error
                 label,  where  the  monitor load is tried again using these new
                 files.  There is a third new parameter file:   LNKINI.CCL  that
                 is  used  in  conjunction with LNKNEW.CCL, and does not contain
                 PSECT settings, which is also used in the try-again load.

             5.  The format of the PSECT map printed by POSTLD has changed  very
                 slightly,  but  the  content is still the same.  There are some
                 new PSECTs.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 126

             6.  POSTLD now writes the MONITR.EXE file using extended  sections.
                 This  has some implications for BOOT, which must now know about
                 extended sections, and any other program which might have  some
                 embedded knowledge of what the monitor .EXE file looks like.

             7.  The BUGSTF conditional feature  has  been  removed,  since  the
                 bugstrings  have  been  moved  out  of the way, and there is no
                 additional benefit derived from deleting them.

             8.  The HIDSYF conditional/feature has likewise been removed, as it
                 is assumed that the monitor symbol table is always hidden now.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 127

                                     EXEC DEBUGGING

        Now that most SWS have micro fiche of the released EXEC  and  MONITOR  I
        anticipate  questions  on  looking  at  the EXEC and MONITOR.  Here is a
        cursory tutorial on investigating the internals of the EXEC (or  command
        processor,  if you prefer).  The examples are intended to be a guide and
        although the typein is  correct,  the  response  may  not  be  character
        perfect.   You  are  advised to read the other chapters in this document
        for more information on DDT and MONITOR snooping and debugging.

                              LOOKING AT THE EXEC WITH DDT

        You can either look at the running system EXEC or your own copy  of  the
        EXEC with DDT that is loaded with the EXEC.


        First you must have WHEEL privileges in order to use the ^EEDDT command.
        The  ^EEDDT  command  transfers control to the DDT now loaded with EXEC,
        with symbols.  Now you can do all the normal  DDT  functions.   To  exit
        from  DDT all you do is <ESC>G , echoed as $G.  This starts your program
        which is the EXEC and so now you are at EXEC command level.



        Get your copy of the EXEC in your address space, transfer control to  it
        and start DDT as above.  There are 3 ways to exit from this depending on
        the state you are in.  If you are in DDT you can ^Z out to get  back  to
        system  EXEC.   If  you  are  running  your EXEC and want to exit to the
        system EXEC you can ^EQUIT (if you are enabled) or "POP" (if you are not
        enabled).   POP  is preferable.  Note if you prefer to get your EXEC and
        not start it in order to  set  breakpoints  or  put  in  patches  before
        running, see section "VI -- PATCHING" below.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 128


                @GET MYEXEC.EXE 
                @MONNAM.TXT, TOPS-20 MONITOR (VERSION#)
                CINITF/  -1   0         ; reset initialization flag to
                .                       ; run this EXEC again after saving
                ^Z                      ; to exit and save, for example
                @                       ; now you are in the monitors EXEC
                                        ; with your EXEC in your address
                @SAV MYEXEC.EXE.2       ; space.  You can save it, say.


                @GET MYEXEC.EXE
                @MONNAM.TXT,,TOPS-20 MONITOR(VERSION #)
                CINITF/  -1  0          ; clear initialization flag
                $G                      ; running your EXEC
                $^EQUIT                 ; return to higher (system) EXEC
                $                       ; you are in system EXEC
                $SAV NEWEXEC            ; etc.


                @GET MYEXEC.EXE
                @MONNAM.TXT,,TOPS-20 MONITOR(VERSION#)
                @POP                    ; return to higher (system) EXEC.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 129

                @                       ; now you are in system EXEC.

                                        ; NOTE: you should set CINITF to 0
                                        ; if you want to save and run this
                                        ; EXEC later.  You can do it by DDT
                                        ; after the POP or ^EEDDT before.


             Since it is true that you could get into trouble with your EXEC and
        not be able to get out of it, CTRL/C traps or you can't POP or whatever,
        there is a way to exit to the MINI-EXEC always.  First  you  must  issue
        ^EQUIT  to  get into the MINI-EXEC.  Then "S" (start) to get back to the
        system EXEC.  Then get into your EXEC.  If you now get into trouble  you
        can  issue  ^P which will get you back into the MINI-EXEC.  Now you have
        the chance to get back to the system EXEC with "S" (start).


                INTERRUPT AT 15657
                $                               ; now back at system EXEC.
                $GET MYEXEC
                @MONNAM.TXT, TOPS-20 MONITOR (VERSION)
                        .                       ; let's say your EXEC can't
                        .                       ; do anything - you are hung
                        .                       ; get out, get into MINI-EXEC
                INTERRUPT AT 12345
                MX>S                            ; MINI-EXEC prompt then start.
                $                               ; now back at the system EXEC.


             Suppose that you want to run your EXEC as the top level EXEC,  that
        is,  not  running under the system EXEC.  Get into the MINI-EXEC and get
        your copy of the EXEC and run it as the top level EXEC.


                INTERRUPT AT 23456
                MX>R                  ; Reset so you will MERGE not GET
                MX>G <MYAREA>MYEXEC.EXE.2
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 130

                @                     ; Now you are in your EXEC
                  .                   ; Lets say you want to get out 
                @^P                   ; Control-P to get to MINI-EXEC
                INTERRUPT AT 12345
                MX>R                  ; "RESET" resets your address space
                MX>E                  ; You are requesting the system EXEC
                @                     ; You are in system EXEC        

        NOTE:   If you had typed "S"  rather than "E" above you  would
                have restarted your EXEC.


             Once you have made a change to your personal copy of the EXEC,  you
        may  wish  to  have  your  edited  EXEC  run  as the SYSTEM EXEC.  It is
        necessary  to  make  the  saved  EXEC  non-writable  before   using   it


                @ENABLE (CAPABILITIES) 
                $GET (PROGRAM) PS:<SYSTEM>EXEC.EXE

                81. pages, Entry vector loc 6000 len 3

                0        PS:<SYSTEM>EXEC.EXE.1  1   R, CW, E
                6-125    PS:<SYSTEM>EXEC.EXE.1  2-121   R, E

                .               ;Make the edits
                $SET PAGE-ACCESS (OF PAGES) 6:125 (ACCESS) NO WRITE 
                $!SAVE THE NEW EXEC
                $SAVE EXEC.EXE.2 !New generation! (PAGES FROM) 6 (TO) 125 
                 EXEC.EXE.2 Saved
                $COPY (FROM) EXEC.EXE (TO) SYSTEM:EXEC.EXE.197 !New generation!
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 131


             When searching for symbols you may notice that the module name  DDT
        gives  you is different from the module names that are assembled for the
        EXEC.  For example to open the symbol table for EXECED you  say  CANDE$:
        to DDT.

        The following is a correspondence list for EXECs before version 5:

                FILENAME        MODULE NAME     FILENAME        MODULE NAME
                ===========================     ===========================
                EXECDE.MAC      XDEF            EXEC0.MAC       EXEC0
                EXECGL.MAC      XGLOBS          EXEC1.MAC       EXEC1
                EXECPR.MAC      PRIV            EXEC2.MAC       EXEC2
                EXECED.MAC      CANDE           EXEC3.MAC       EXEC3
                EXECCS.MAC      CSCAN           EXEC4.MAC       EXEC4
                EXECSU.MAC      SUBRS           EXECMT.MAC      EXECMT
                EXECVR.MAC      VER             EXECQU.MAC      EXECQU
                EXECMI.MAC      MIC             EXECSE.MAC      EXECSE
                                                EXECP.MAC       EXECP

        For Release 5 of the EXEC, the TITLE statements in the EXEC modules have
        been  changed  to  match the module names so that this concordance is no
        longer necessary.

             The sources and .CTL file for assembling the EXEC are part  of  the

             If it is true that upon trying to examine a  location  symbolically
        you  get  "U" implying the symbol is undefined you may have to reset the
        symbol table pointers.  Look in location 770001  for  the  address  that
        contains  the symbol table pointer then look at location 116 to find the
        real symbol table pointer.  Put the contents  of  116  in  the  location
        pointed to by 770001.

                116/   762600,54463   ; real symbol table pointer

                770001/  776456       ; location of symbol table pointer
                776456/  743200,,23540     762600,,54463

        VII.    PATCHING

             There is a patch command in DDT.  The form is as follows:

                $<                    ; patch before this instruction
                $$<                   ; patch after this instruction
                $>                    ; end patch following this instruction

        DDT will put the patch in the EXEC patch area.  The symbol is PAT..  DDT
        will  insert  JUMPA  1,LOC+1  and  JUMPA 2,LOC+2 following the patch you
        typed in.  Where LOC is the location of the instruction you're patching.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 132

        DDT  then  replaces  LOC,  the original INST., with a JUMPA XXXXX, where
        XXXXX is the patch area where your patch is now.  Then  the  patch  area
        (PAT..) is redefined to follow your last patch.


        Get a copy of  <SYSTEM>EXEC,  insert  calls  to  subroutine  MUMBLE  and
        subroutine  FRATZ  before  location  DING+1.   DING+1  contains PRINT Q3
        originally and contains a JUMPA to the patch area after the patch.   The
        patch area will contain:

                CALL MUMBLE
                CALL FRATZ
                PRINT Q3
                JUMPA 1,DING+2
                JUMPA 2,DING+3


                $SAVE NUEXEC          ; you must SAVE and GET in order to
                $GET NUEXEC           ; write-enable the EXEC and use DDT
                $DDT                  ; instead of ^EEDDT
                EXEC0$:               ; open symbols for module where DING is

                DING/ PUSH P,A        ; first location in routine "DING"
                DING+1/ PRINT Q3 $<   ; begin patching before location DING+1
                PAT../ 0  CALL MUMBLE ; DDT opens up PAT.. area, you add code
                PAT..+1/CALL FRATZ    ; continue to insert your patch
                $>                    ; close the patch
                PAT..+2/ PRINT Q3     ; the original instruction being replaced.
                PAT..+3/ JUMPA 1,DING+2       ; DDT inserts this return.
                PAT..+4/ JUMPA 2,DING+3       ; incase a SKIP inst.

                DING+1/  JUMPA 12345  ; JUMPA to PAT.. replaces original LOC.

                $G                    ; start your copy of EXEC etc.

             Various methods may be used to write-enable the EXEC for  patching.
        You  can use the GET, SAVE method above, or SET PAGE n COPY-ON-WRITE, or
        the $W command in DDT to achieve the same results.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 133

                                RECOVERING FROM A BAD EXEC

             This procedure is simply a rehash of the procedure for recovering
        from  the  case  in  which  the  EXEC  refuses  to  log  in.  For more
        information see the article "Looking at the EXEC with DDT".

             If your system version of the EXEC blows up completely,  you  can
        recover  rather  easily.   You type a ^C on the CTY, and when the EXEC
        blows up you will be dumped into the MINI-EXEC.  Then you can use  the
        GET  and  START commands to read in a good version of the EXEC, either
        from a copy on disk, or from the distribution magtapes.

             If the problem with the EXEC is that it does not blow up, but  it
        still  fails  to let you log in, then you have a harder time.  In this
        case you have to bring up the system with the switches, and  bring  up
        the system stand-alone.  An example of what to do from the point where
        the BOOT program is loaded follows:

        BOOT>/L                 ; load in the monitor
        BOOT>/G141              ; start up EDDT

        DBUGSW[   0   2         ; set system as debugging
        EDDTF[   0   1          ; keep EDDT around

        GOTSWM$B                ; set a breakpoint after the swappable
                                ; part of the monitor has been loaded
        147$G                   ; start the system
        GOTSWM$1B>>   STEX+1/   HRROI T2,BOOTER+51   HRROI T2,FFF
        FFF:                    ; change the name of the EXEC file
        0$1B                    ; remove the GOTSWM breakpoint
        $P                      ; proceed to bring up the system

        ^C                      ; and Control-C to get the new EXEC

        If  you had no old version of the EXEC around, then change the name to
        some garbage, so that the monitor can't find any such  program.   This
        will  then  dump  you into the MINI-EXEC, and then you can read a good
        EXEC in from magtape.

             In release 3 of the monitor, there is a new JSYS  which  is  very
        useful  for  debugging  new  versions of the EXEC.  The CRJOB JSYS can
        allow you to start up a new job with any program at all  as  it's  top
        level  fork.   You  can  also start the job not logged in.  So you can
        debug your new versions of the EXEC easily,  with  no  possibility  of
        ripping yourself off.     Of course the  ^EQUIT, GET from MINI-EXEC is
        still a valid sequence for starting a new top-level fork.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 134
        Debugging the GALAXY System

                              Debugging the GALAXY System

        1.0  INTRODUCTION

        The GALAXY system presents a unique problem to the  software  specialist
        who  is  trying  to debug one of its components.  Usually, any user mode
        program can be debugged under TOPS-20 by running a copy  of  it,  loaded
        with DDT, taking appropriate care that nothing is done which will affect
        any users of the system.  For GALAXY, however, it is very  difficult  to
        not affect users of the system.  For example, if you are trying to debug
        BATCON, you will find that QUASAR will very happily schedule batch  jobs
        submitted  by  other  users  to  be  run by your BATCON.  If you are not
        careful, you can cause those batch jobs to be lost, or at  least  slowed
        down, while you are debugging.

        Debugging QUASAR or ORION would be even worse.  Users would  see  PRINT,
        SUBMIT,  etc.   commands  hang  when  you  hit  a  breakpoint in QUASAR.
        Operators would be unable to control any system components if  you  were
        breakpointed  in ORION.  On top of this, the monitor knows about QUASAR,
        and you may lose messages  which  happen  when  users  close  a  spooled
        lineprinter file, or when a job logs out.

        To solve these problems, the concept of a "private  GALAXY  system"  has
        been  implemented  in GALAXY and the EXEC.  When a private GALAXY system
        is operating, all of its components are completely  independent  of  the
        primary  GALAXY system.  QUASAR, the queue maintainer, keeps queues that
        are separate from the system queues and are failsofted  to  a  different
        master  queue file.  This QUASAR communicates only with other components
        in the same private system.  It is even possible to run several complete
        private GALAXY systems, with the restrictions that:

             1.  All components in a private system must run under the same user

             2.  Only one private system may be run by a given user.

             3.  Each private QUASAR must be connected to a different directory.

             4.  Each private ORION must be connected to a different directory.


                       This text is oriented towards version  4.0
                       of   GALAXY,   and  there  may  be  slight
                       differences for version 4.2 or later.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 135
        Debugging the GALAXY System


        Since the changes necessary to  create  a  private  GALAXY  system  were
        implemented  in  the  version  4 source code, it is relatively simple to
        build the system.  The recommended procedure is as follow:

             1.  Create a directory for the private GALAXY system.

             2.  Restore the file EXEC-FOR-DEBUGGING-GALAXY.EXE from the  SWSKIT
                 to  this  newly  created directory.  For Release 5 of the EXEC,
                 the  distributed  EXEC  replaces  the  need  for  this  special

             3.  Restore each of the following files from the proper saveset  on
                 the TOPS-20 distribution tape to this directory.

                                BATCON.EXE              PLEASE.EXE
                                CDRIVE.EXE              QMANGR.EXE
                                GLXLIB.EXE              QUASAR.EXE
                                LPTSPL.EXE              SPRINT.EXE
                                OPR.EXE                 SPROUT.EXE

             4.  For each component in the  above  list  except  GLXLIB.EXE  and
                 QMANGR.EXE, perform the following steps:

                 1.  Give the EXEC command "GET xxxxxx.EXE"

                 2.  Give the command "DEPOSIT 135 -1"

                 3.  Give the command "SAVE xxxxxx"


        It is not strictly necessary to restore all of the GALAXY components for
        a  one  time  only debugging session.  To debug a component like BATCON,
        you would need at a minimum:

             1.  Your own copy of BATCON

             2.  Your own copy of QUASAR for BATCON to speak to

             3.  Your own copy of ORION for BATCON and QUASAR to speak to

             4.  A copy of OPR to speak to ORION to control BATCON

             5.  An EXEC which knows about your QUASAR to make queue entries

        The following is a log of an example build of a private GALAXY system:
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 136
        Debugging the GALAXY System

        $! First connect to a debugging directory
        $! Now build and save debugging .EXE files
        $! QUASAR, the queue maintainer
        $SAVE (ON FILE) QUASAR.EXE.1 !New file! (PAGES FROM) 
         QUASAR.EXE.1 Saved
        $! ORION, the message clearinghouse
        $SAVE (ON FILE) ORION.EXE.1 !New file! (PAGES FROM) 
         ORION.EXE.1 Saved
        $! OPR, the operator interface
        $GET (PROGRAM) SYS:OPR.EXE.55 
        $SAVE (ON FILE) OPR.EXE.1 !New file! (PAGES FROM) 
         OPR.EXE.1 Saved
        $! BATCON, the batch controller
        $GET SYS:BATCON.EXE.39 
        $SAVE (ON FILE) BATCON.EXE.1 !New file! (PAGES FROM) 
         BATCON.EXE.1 Saved
        $! Now a directory of what we've got
        $VDIRECTORY (OF FILES) *.*.* 

         BATCON.EXE.1;P777700    16 8192(36)   13-Feb-80 22:00:37 
                                 82 41984(36)  13-Feb-80 04:33:50 
         OPR.EXE.1;P777700       31 15872(36)  13-Feb-80 22:00:09 
         ORION.EXE.1;P777700     44 22528(36)  13-Feb-80 21:59:45 
         QUASAR.EXE.1;P777700    40 20480(36)  13-Feb-80 21:59:27 

         Total of 213 pages in 5 files
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 137
        Debugging the GALAXY System


        Starting and running a private  GALAXY  system  is  similar  to  running
        GALAXY  in  the  usual manner.  First QUASAR and ORION are started, then
        the component you wish to debug.   You  will  also  need  OPR  to  issue
        operator  commands  and  the EXEC to make queue entries.  Since you will
        need about five  jobs,  it  is  usually  most  convenient  to  run  each
        component as a separate subjob under PTYCON.

        4.1  Starting QUASAR

        QUASAR and ORION should be started before everything else.  Nothing evil
        happens  if  you  start  them last, but all the other components will be
        waiting for these two to start.  A suggested procedure is:

             1.  Define a subjob "Q"

             2.  Connect to it

             3.  LOGIN a job under the same user name

             4.  CONNECT that job to the directory in which you did the  private
                 GALAXY build

             5.  ENABLE

             6.  RUN QUASAR

        4.2  Starting ORION

        Starting ORION is as painless as starting QUASAR:

             1.  Define a subjob "O"

             2.  Connect to it

             3.  LOGIN a job under the same user name

             4.  CONNECT that job to the directory in which you did the  private
                 GALAXY build

             5.  ENABLE

             6.  RUN ORION
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 138
        Debugging the GALAXY System

        4.3  Starting OPR

        OPR starts up using the same formula as QUASAR and ORION:

             1.  Define a subjob "OPR"

             2.  Connect to it

             3.  LOGIN a job under the same user name

             4.  CONNECT that job to the directory in which you did the  private
                 GALAXY build

             5.  ENABLE

             6.  RUN OPR

             7.  You may now type OPR commands to see if QUASAR and ORION appear
                 to be healthy.

        4.4  Starting The Component To Be Debugged

        If the component you wish to debug is QUASAR, ORION, or  OPR,  then  you
        have already started it.  Breakpoints could have been set, and when they
        were hit, the component could have been debugged without  any  noticable
        affect  on  other  users  of  the  system.  If you wish to debug PLEASE,
        BATCON, LPTSPL, CDRIVE, SPRINT, or SPROUT, do the following:

             1.  Define a subjob with an appropriate ID (e.g.  B for BATCON)

             2.  Connect to it

             3.  LOGIN a job under the same user name

             4.  CONNECT that job to the directory in which you did the  private
                 GALAXY build

             5.  ENABLE

             6.  GET the component

             7.  Enter DDT

             8.  Set breakpoints, then start the program
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 139
        Debugging the GALAXY System

        4.5  Starting The Modified EXEC

        The file "EXEC-FOR-DEBUGGING-GALAXY.EXE" which has been supplied on  the
        SWSKIT  has  exactly  two  commands  added to its repertoire.  These are
        these  commands  is  to  select  which  one of two PIDs (Process IDs) to
        communicate with:  the system QUASAR or  the  private  QUASAR.   If  "NO
        DEBUGGING-GALAXY"  is  set,  then PRINT, SUBMIT, CANCEL, MODIFY, and the
        INFORMATION commands  will  all  cause  communication  with  the  system
        QUASAR.   If  "DEBUGGING-GALAXY" is set for this EXEC, then the commands
        listed will communicate with the private QUASAR run by that  user.   For
        Release  5  or later of the EXEC, the distributed EXEC incorporates this
        functionality   in   the   "^ESET   PRIVATE-QUASAR"   and   "^ESET    NO
        PRIVATE-QUASAR" commands, and the special EXEC is unneeded.

             1.  Define a subjob "E"

             2.  Connect to it

             3.  LOGIN a job under the same user name

             4.  CONNECT that job to the directory in which you did the  private
                 GALAXY build

             5.  RUN EXEC-FOR-DEBUGGING-GALAXY (or the Release 5 or later EXEC)

             6.  ENABLE

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 140
        Debugging the GALAXY System


        The following is a log of a sample debugging session:

         TOPS-20 Command processor 4(560)
        @! First run PTYCON, so we can control five jobs from one terminal
        PTYCON> !
        PTYCON> ! Now start up QUASAR as subjob Q
        PTYCON> !
        PTYCON> DEFINE (SUBJOB #) 0 (AS) Q

         2102 Development System, TOPS-20 Monitor 4(3245)
         Job 21 on TTY222 13-Feb-80 22:18:05
        Structure PS: mounted
        Structure MISC: mounted
        $! Connect to directory where debugging .EXE files are
        $! Finally run the component
        % QUASAR GLXIPC Becoming  [HEMPHILL]QUASAR     (PID = 66000031)
        % QUASAR GLXIPC Waiting for ORION to start
        PTYCON> !
        PTYCON> ! Now start up ORION as subjob O
        PTYCON> !
        PTYCON> DEFINE (SUBJOB #) 1 (AS) O

         2102 Development System, TOPS-20 Monitor 4(3245)
         Job 22 on TTY223 13-Feb-80 22:19:25
        Structure PS: mounted
        Structure MISC: mounted
        $! Connect to directory where debugging .EXE files are
        $! Finally run the component
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 141
        Debugging the GALAXY System

        % ORION  GLXIPC Alternate [HEMPHILL]QUASAR     (PID = 66000031)
        % ORION  GLXIPC Becoming  [HEMPHILL]ORION      (PID = 70000032)
        **** Q(0) 22:19:58 ****
        % QUASAR GLXIPC Alternate [HEMPHILL]ORION      (PID = 70000032)
        **** O(1) 22:19:58 ****
        PTYCON> !
        PTYCON> ! Now start up OPR as subjob OPR
        PTYCON> !

         2102 Development System, TOPS-20 Monitor 4(3245)
         Job 23 on TTY224 13-Feb-80 22:20:29
        Structure PS: mounted
        Structure MISC: mounted
        $! Connect to directory where debugging .EXE files are
        $! Finally run the component
        $RUN (PROGRAM) OPR.EXE.1 
        % OPR    GLXIPC Alternate [HEMPHILL]QUASAR     (PID = 66000031)
        % OPR    GLXIPC Alternate [HEMPHILL]ORION      (PID = 70000032)
        22:19:59          -- Network Node 1031 is Online --

        22:19:59          -- Network Node 2137 is Online --

        22:19:59          -- Network Node 4097 is Online --

        22:19:59          -- Network Node DN20A is Online --

        22:19:59          -- Network Node MILL20 is Online --

        22:19:59          -- Network Node SYS880 is Online --
        OPR>! Let's take a look at our brand new queues
        22:21:21          --The Queues are Empty--
        22:21:27          --There are no Devices Started--
        PTYCON> !
        PTYCON> ! Now start up BATCON as subjob B
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 142
        Debugging the GALAXY System

        PTYCON> !
        PTYCON> DEFINE (SUBJOB #) 3 (AS) B

         2102 Development System, TOPS-20 Monitor 4(3245)
         Job 24 on TTY225 13-Feb-80 22:21:49
        Structure PS: mounted
        Structure MISC: mounted
        $! Connect to directory where debugging .EXE files are
        $! Finally run the component
        % BATCON GLXIPC Alternate [HEMPHILL]QUASAR     (PID = 66000031)
        % BATCON GLXIPC Alternate [HEMPHILL]ORION      (PID = 70000032)
        PTYCON> !
        PTYCON> ! Now start up special EXEC as subjob E
        PTYCON> !
        PTYCON> DEFINE (SUBJOB #) 4 (AS) E

         2102 Development System, TOPS-20 Monitor 4(3245)
         Job 19 on TTY226 13-Feb-80 22:23:00
        Structure PS: mounted
        Structure MISC: mounted
        @! Run the special EXEC, which is provided on the SWSKIT

         TOPS-20 Command processor 4(560)-1
        $! Make this EXEC switch from system queues to private queues
        $! Use ordinary EXEC commands to examine private queues
        [The Queues are Empty]
        [The Queues are Empty]
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 143
        Debugging the GALAXY System

        $! Now switch back to look at system queues

        Printer Queue:
        Job Name  Req#  Limit            User
        --------  ----  -----  ------------------------
        * KLERR      6   1197  DEUFEL                     On Unit:0
           Started at 22:05:47, printed 314 of 1197 pages
          XXX        3     18  KAMANITZ                   /Dest:4097
          MS-OUT    18    117  BRAITHWAITE                /Unit:0
        There are 3 Jobs in the Queue (1 in Progress)


        Batch Queue:
        Job Name  Req#  Run Time            User
        --------  ----  --------  ------------------------
        * DUMP      16  02:00:00  OPERATOR                In Stream:0
            Job# 17 Running DUMPER Last Label: A Runtime 0:23:55
          BATCH      2  00:05:00  BLIZARD                 /Proc:FOO
          SOURCE     8  00:05:00  BLOUNT                  /After:14-Feb-80  0:00
          SRCCOM    12  00:05:00  MURPHY                  /After:14-Feb-80  0:00
          QJD4R     13  00:05:00  SROBINSON               /After:19-Feb-80  0:00
          QAR       10  00:05:00  BLOUNT                  /After:19-Feb-80  0:14
          SAVE       1  00:05:00  FICHE                   /After:19-Feb-80  9:10
        There are 7 Jobs in the Queue (1 in Progress)

        $! Now let's submit a batch job to our own BATCON
        $! Make a trivial batch control file
        $COPY (FROM) TTY: (TO) A.CTL.1 !New file! 
         TTY: => A.CTL.1

        @SY A
        $! And submit the job
        $SUBMIT (BATCH JOB) A.CTL.1 
        [Job A Queued, Request-ID 1, Limit 0:05:00]
        $! Now examine private queues

        Batch Queue:
        Job Name  Req#  Run Time            User
        --------  ----  --------  ------------------------
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 144
        Debugging the GALAXY System

          A          1  00:05:00  HEMPHILL              
        There is 1 Job in the Queue (None in Progress)

        $! Our job is in the batch queue, but no batch-streams have been started

        OPR>START (Object) BATCH-STREAM (Stream Number) 0
        22:25:40        Batch-Stream 0  --Startup Scheduled--

        22:25:40        Batch-Stream 0  --Started--
        22:25:40        Batch-Stream 0  --Begin--
                        Job A Req #1 for HEMPHILL
        22:25:51        Batch-Stream 0  --End--
                        Job A Req #1 for HEMPHILL
        PTYCON> !
        PTYCON> ! Cleaning up is easy
        PTYCON> !
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 145
        Debugging the GALAXY System


        This section is to explain what happens differently when a component has
        had  location  135  (.JBOPS)  poked  to -1, and to present a few helpful
        tidbits of information about debugging some  of  the  programs.   .JBOPS
        incidentally  is  the  word in the job data area (defined under TOPS-10)
        which is reserved for a program's OTS.  GALAXY references this  location
        by the symbol "DEBUGW".

        6.1  GLXLIB

        GLXLIB is the GALAXY library.  It  consists  of  a  code  segment  which
        starts  at address 400000 and a data segment at address 600000.  Each of
        and  SPROUT  uses  it.  Part of the initialization code of each of these
        programs maps in GLXLIB as a "high  segment".   This  is  in  effect  an
        object  time  system for GALAXY, with many commonly used routines.  Most
        of the support for the private GALAXY system is in this library,  enough
        so  that OPR, PLEASE, BATCON, LPTSPL, SPRINT and SPROUT actually have no
        code which cares whether  they  are  part  of  a  private  GALAXY.   The
        initialization  code  in  each  component  looks in three places to find
        GLXLIB.EXE:  first on the structure and  directory  that  the  component
        itself  came  from, second on DSK:, third on SYS:.  This search order is
        the same for both the system GALAXY and the private one.

             The actual changes  implemented  for  the  private  GALAXY  are  as

             1.  Ordinarily, a component which stopcodes will save a crash  file
                 on  disk.   When  debugging,  however,  the  crash  file is not
                 written.  In either case, if DDT is loaded  with  the  program,
                 the stopcode will invoke a jump to DDT.

             2.  GALAXY components do not require receiving  privileged  packets
                 under debugging.

             3.  Ordinarily, QUASAR and ORION get special system PIDs  for  IPCF
                 communications.   When  debugging,  they get PIDs with names of
                 the form "[username]QUASAR" and "[username]ORION".  All  GALAXY
                 components  will  then  look  for  these  PID  names.   Even  a
                 pseudo-GALAXY component, such as MOUNTR or IBMSPL, will be able
                 to  find  these  PIDs if its location 135 has been poked to -1,
                 simply because it uses GLXLIB.

             4.  GALAXY components print messages like:
                 "% QUASAR GLXIPC Waiting for ORION to start"
                 only while debugging.

             5.  ORION and QUASAR print messages about PIDs they acquire, like:
                 "% QUASAR GLXIPC Becoming  [HEMPHILL]QUASAR     (PID =
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 146
        Debugging the GALAXY System

             6.  All components print messages about the special PIDs they  find
                 for QUASAR and ORION, like:
                 "% ORION  GLXIPC Alternate [HEMPHILL]QUASAR     (PID =

        6.2  QUASAR

             1.  QUASAR reads and  writes  private  queues  from  its  connected
                 directory.  The full filespec is

             2.  QUASAR does  absolutely  no  privilege  checking.   Anyone  can
                 modify  or  kill any request in the queues (if they know how to
                 speak to this private QUASAR).

        6.3  ORION

             1.  ORION  will   create   a   log   file   under   the   name   of
                 "DSK:ORION-TEST.LOG"                 instead                 of
                 "PS:<SPOOL>ORION-SYSTEM-LOG.001", and does no renaming  of  any
                 old log files present.

             2.  ORION will not set up  any  NSP  servers  when  debugging.   It
                 therefore  will not speak to remote nodes to run OPRs for them.
                 However, there are hooks  for  ORION  to  initialize  "SRV:128"
                 instead of the usual "SRV:47" when debugging.

        6.4  QMANGR

        QMANGR has also been modified to look for a private QUASAR's PID if  the
        low segment has a non-zero entry in .JBOPS.

        6.5  CDRIVE

        CDRIVE can pose a problem  to  debug,  since  it  has  potentially  many
        inferior  forks  all executing the same code, so each fork automatically
        loads SDDT into its address space and jumps to it  when  it  starts  up.
        After  setting  any breakpoints or otherwise modifying this fork's code,
        the debugger types "GO<ESC>G" to resume the fork.  While  debugging,  if
        the  fork  terminates  (crashes),  CDRIVE will not go through its normal
        purging of the crashed fork, so that its status can be examined.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 147
        Debugging the GALAXY System


        All GALAXY components use the  stopcode  facility  supplied  by  GLXLIB.
        This  facility  dumps  the  ACs,  program  error codes, associated error
        messages, program version numbers, and the last nine  locations  of  the
        stack  onto  the  controlling  terminal  of  the  program  executing the
        stopcode.  In addition, a crash file is created with  the  name  of  the
        form:   PS:<SPOOL>program-stopcode-CRASH.EXE.   This  .EXE file contains
        the entire core image of the program which has crashed, and is extremely
        useful in determining the cause of the crash.  In particular, there is a
        block of data referred to as the "crash block"  which  usually  contains
        the information most pertinent to the debugger.  This information can be
        read with either DDT or FILDDT.  Its contents are tabulated as follows:

                Location                Data

                .SPC                    PC of stopcode

                .SCODE                  SIXBIT name of stopcode

                .SERR                   Last TOPS-20 error code

                .SACS                   Contents of the sixteen accumulators

                .SPTBL                  Base address of page table used by

                .SPRGM                  Name of program in SIXBIT

                .SPVER                  Program version number

                .SPLIB                  GLXLIB version number

                .LGERR                  Last GALAXY error code

                .LGEPC                  PC of last GALAXY error return
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 148

                                    DEBUGGING MOUNTR

        1.0  INTRODUCTION

             This write-up was prepared to assist developers and maintainers  in
        understanding  and  debugging  the  TOPS-20  tape and structure mounting
        program, MOUNTR.  It is assumed that the reader has a working  knowledge
        of  TOPS-20  assembler  language  coding  and the set of TOPS-20 monitor


             This document will serve primarily as a guide to  debugging  MOUNTR
        crashes.   Much  of  the information needed to understand the data bases
        and the operation of MOUNTR resides within the first 20 or 30  pages  of
        the MOUNTR code itself.  Just make a listing and start reading.


             MOUNTR  can  be  debugged  as  a  standard  GALAXY  component,   by
        depositing  -1  in location 135 of MOUNTR.EXE.  MOUNTR will aquire a PID
        for a private copy of QUASAR and will communicate with it.

             To debug a MOUNTR which is actually recognized by the system as the
        "real"  MOUNTR  it  is  usually  best  to  run  it  as a seperate job by
        including the following commands in SYSJOB.RUN:

             GET SYS:MOUNTR

             This job can be reached by use of the ADVISE command, MOUNTR can be
        killed  and  a  new  copy can be started with appropriate breakpoints or
        patches installed.  Before MOUNTR can be patched or breakpointed  it  is
        necessary to issue the DDT command $W since MOUNTR write protects itself
        during execution.  For example:

             $ADVISE OPERATOR
              TTY2, NRT20
              TTY235, OPR
              TTY234, MOUNTR
              TTY233, PTYCON
              TTY232, EXEC
             TTY: 234
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 149

              [Pseudo-terminal, confirm]
              Escape character is <CTRL>E, type <CTRL>^? for help
              OPERATOR Job 3 MOUNTR

             LINK FROM MOSER, TTY 60
             ^C                                   !KILL OLD MOUNTR
             $GET SYS:MOUNTR                      !GET A NEW ONE
             $DDT                                 !ENTER DDT
             $W                                   !YOU MUST DO THIS
             DDSCIH/   JSP 16,SAVEQR#   .$B   
             ^Z                                   !EXIT DDT
             $START                               !START MOUNTR

             Depositing 1 in location CDFLG will  enable  CONTROL-D  interrupts.
        Typing CONTROL-D when enabled causes MOUNTR to enter DDT.

        4.0  MOUNTR CRASHES

        When MOUNTR crashes, it saves its core image in the file,


        All crashes are initiated by a CALL STOP instruction.  This  may  result
        from  a  logic  inconsistency,  or  it  can  happen if MOUNTR receives a
        software interrupt on a panic channel.  The STOP  routine  gathers  some
        important data and saves it in core.  It then types a message giving the
        name of the filespec wherein it is saving the core image, and issues  an
        SSAVE  JSYS to save the image.  After restoring the ACs from the time of
        the crash, MOUNTR halts.

        To begin debugging a MOUNTR crash, follow these steps:


             2.  Get into DDT and type STOP1$G.  This will load DDT's  ACs  with
                 MOUNTR's  ACs  at  the  time of the crash and exit to the EXEC.
                 Give the DDT command to the EXEC again to get back into DDT.

             3.  Look at P (AC 17).  If it contains  PDL1+something,  there  has
                 been  a  stack  trap,  and  the  routine  STOPP was called as a
                 result.  The location BADP contains the contents of  P  at  the
                 time of the trap.

             4.  If P contains PDL+something, type TAB to look at the top of the
                 stack.  This will contain one plus the address of the CALL STOP
                 instruction.   Type  TAB  and  ^H  to  display  the   CALL STOP
                 instruction that invoked the crash.  If MOUNTR died as a result
                 of a panic channel interrupt, LPC1 will contain  one  plus  the
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 150

                 address  of  the instruction which was executing at the time of
                 the interrupt.

        The following locations and data structures are helpful in locating  the
        cause of difficulties in MOUNTR:

        NAME    FUNCTION
        ----    --------
        CRSHAC  Contains the ACs at the time the STOP routine was called.

        LPC1    For crashes caused by panic channel  interrupts,  LPC1  contains
                one plus the address of the instruction that caused the crash.

        LSTERR  Contains the last TOPS-20 error.

        MRPDB   PDB for last IPCF message received by MOUNTR

        MSTRBK  Used as an argument block for MTOPR and MSTR monitor calls.

        RBUF    Last IPCF message received by  MOUNTR  (particularly  useful  if
                SSSDAT+1  contains  MRCVIH, indicating that MOUNTR crashed while
                processing an incoming IPCF message).

        SSSDAT  When MOUNTR  crashes,  SSSDAT+1  contains  the  address  of  the
                routine  that  was invoked by MOUNTR's scheduler.  Starting here
                and using the stack, you can trace  the  execution  of  MOUNTR's
                code that led to the crash.

        TBUF    Last IPCF message sent by MOUNTR.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 151
        DEBUGGING PA1050

                                    DEBUGGING PA1050

        In order to debug the compatibility package you must have a copy of  the
        file  called PAT.EXE.  PA1050 is just the system name for PAT.  If there
        is no copy of PAT.EXE, then take the source program called PAT.MAC,  and
        assemble  it,  thereby creating a sharable save file called PAT.EXE.  To
        debug the compatibility package the following steps are required.

        $GET PROG          ;Where PROG may be any program you choose
        $MERGE PAT         ;PAT is the source name for PA1050
        PAT$:   MOVBF$B    ;You set your breakpoints here
        $G                 ;You must type $G twice because of the double  symbol


                       Some of the error messages you may receive
                       from  PA1050  may  not  be  the true error
                       message.   To  have  the   correct   error
                       message  printed  out  use an ERJMP, or an
                       ERCAL after the JSYS  it  fails  on.   For
                       more  information on ERJMP and ERCAL refer
                       to the Monitor Calls Reference Manual.

        In order to build the compatibility  package  the  following  steps  are

        $SAVE PAT
        $GET PAT
        Output file: PA1050.EXE
        $I MEM

        The start after loading causes the program to be moved from its location
        to  its  running location in high core.  The symbol table is also moved,
        and the pointer adjusted.  A sharable save file of pages 700-777 must be
        made  for  debugging.   This  is created when you MAKEPF$G, then execute
        40000,,0 in UDDT.  When you type I MEM you should now have PA1050.EXE in
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 152

                                  COPYING FLOPPY DISKS

        This is  a description  of the  front end  program COP  (quick  floppy
        copy). This program  should be  used to  create backup  copies of  the
        distributed set of floppies.


        1)      Only IBM floppies should be used.  Other floppies  may
                destroy the DX11 drives.

        2)      Floppies have  a  finite  life while  mounted  in  the
                drive. The heads do not  float, and the floppies  turn
                continuously.  This causes the magnetic surface to  be
                eaten away. Minimum floppy life is something like  200

        3)      Floppies which are dropped, badly shocked, or used  as
                frisbees will lose their  sector headers, and will  be
                good for nothing.

        4)      Never put a floppy which you suspect is bent into  the
                drive -- it may damage the drive. 

        5)      COP  is discussed  also in  the  Front End File System
                Specification  manual  in  Volume  14 of  the  TOPS-20
                Software Notebooks, section 3.2.


                The basic COP command string is of the form:

                  COP> <destination device>/<switch>=<source device>

                To  enter  COP, type a Control-backslash to get to the
                Parser,  then  MCR COP  to start up COP.  The floppies
                should have  already been mounted with  MCR MOUNT, and
                should  then be dismounted with  MCR DMOUNT  after the


                /HE     Help, types a list of switches
                /RD     Read Device, check for errors
                /CP     Copy (default action)
                /VF     Verify copy (default when copy in effect)
                /ZE     Zero the device
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 153

        COP EXAMPLE:

                The  following  sequence  of commands will  succeed in
                copying  the contents of the floppy in DX0:  (the left
                hand drive) onto the floppy in DX1:, and verifying the

                PAR>MCR MOU
                Mount completed
                Mount completed
                PAR>MCR COP
                PAR>MCR DMO
                Dismount Complete
                Dismount Complete

                The copy takes about two minutes, the verify about the same.
                Take  care to  specify the  correct source  and  destination


                If you  COP for  many generations  you will  build  up
                ghost bad  blocks until  RSX will  declare the  floppy
                useless. This is  because in each  generation the  bad
                block file of the  old floppy is  copied onto the  new
                (which will have its bad blocks in different  physical
                locations).  A way around this  is to use PIP for  any
                non-boot copies once every several generations.  
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 154

                             THE SWSKIT DOCUMENTATION FILES

             Following is a brief synopsis of each article/file that appears  in
             the  SWSKIT documentation.  Please note that many of these articles
             are preliminary functional specs and discussions, and  may  contain
             some  information  that is completely false.  However, the material
             is provided to be used with proper caution because it does  provide
             information  not  otherwise  available in useful form at this time.
             Over time, many of these documents will  be  replaced  by  SDC-type
             materials.   For other items, these articles may be the main source
             of information for quite a while.

                          TITLE                 DESCRIPTION

                  HANDBOOK            This document is the  latest  revision  of
                                      the TOPS-20 Trouble-Shooting Handbook.

                  ACCOUNTING          This article describes the changes made to
                                      allow  the  billing rates for system usage
                                      to  change  during  the  day.    It   also
                                      explains a feature called disk accounting.

                  ACCOUNTING-TABLES   This  file  documents   the   formats   of
                                      SYSTEM-DATA.BIN   and   CHECKPOINT.BIN  in
                                      tabular format.

                  ARCHIVE             This  document  describes  some   of   the
                                      functionality of archiving, and how to use

                  BUGS                This documents describes the TOPS20 BUGxxx
                                      macro changes.

                  CI-INFO             This document  contains  files  describing
                                      the  implentation of support for the CI-20
                                      (KLIPA) I/O port for the DECSYSTEM-20.

                  CFS-INFO            This document contains descriptions of the
                                      implementation  of  the Common File System
                                      (CFS) for TOPS-20.

                  DDT-INFO            This document describes the  changes  that
                                      have  been  made  to  DDT for versions 41,
                                      41A, and 43.

                  DDP                 This document discusses  some  aspects  of
                                      DDP   (Distributed   Data  Processing)  on
                                      TOPS-20.  (Very early paper.)
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 155

                  DEBUGGING-GALAXY    This document describes  how  to  build  a
                                      private  GALAXY  system for debugging, and
                                      gives   hints   on    debugging    various

                  DX20                This document gives a brief description of
                                      the  configuration  requirements for tapes
                                      controlled by the DX20.

                  EAPGMG              This    document    describes     Extended
                                      Addressing   on   the   DECSYSTEM-20   and
                                      programming in non-zero sections.

                  EXECUTE-ONLY        This  document  describes   the   changes,
                                      restrictions,  and  implementation  of  an
                                      execute-only file capability on TOPS-20.

                  GALAXY-TABLES       This document is a  collection  of  tables
                                      for GALAXY version 4.2.

                  GALAXY-V5           This document contains discussions of  the
                                      changes   to  GALAXY  for  version  5  and
                                      specifications of the QUEUE% JSYS.

                  GETOK               This document describes three new  JSYSes:
                                      GETOK,   RCVOK,   and   GIVOK.    It  also
                                      describes the SMON function.

                  HSC-INFO            This document  describes  the  programming
                                      and  use  of the HSC-50 storage controller
                                      by release 6.0 of TOPS-20.

                  IO                  This  document  describes  some   of   the
                                      aspects of how IO is done by TOPS-20.

                  KFS                 This  document  explains  the   functional
                                      specification of the RSX-20F KLINK link.

                  KLCOM               This document  describes  the  KL10/PDP-11
                                      DTE20  protocol.   It explains such things
                                      as the protocol messages, error  messages,
                                      and bootstrap procedures.

                  KL873               This document describes the  functionality
                                      of   all   the   revisions  of  the  BM873
                                      Bootstrap ROM for  KL10  based  on  PDP-11

                  LABELED-TAPES       This document describes  TOPS20's  support
                                      of   labeled   tapes.   It  also  gives  a
                                      description  of  the  monitor  calls   and
                                      support routines that are used for labeled
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 156

                  LP20                This  functional  specification  describes
                                      the interface to the LP-20 from the KL-10.

                  MONITOR-ADDRESS-SPACE This document describes the changes made
                                      to  BOOT  and  DDT  to enable more address
                                      space.  It also explains about PSECTS, and
                                      overlapping  BGSTR  to  build monitors and
                                      the  write-protecting  of   the   resident
                                      monitor  and  the parts of the Release 6.0
                                      address space project, moving  things  out
                                      of section 0/1.

                  MONITOR-TABLES      This document displays most of the  tables
                                      in  the  monitor.   This  is a best effort
                                      based on the  ED  SERVICES  materials  and
                                      will  doubtless  be not as complete as the
                                      eventual ED SERVICES document.

                  MOS                 This document describes MOS memory and the
                                      TOPS20 monitor support of TGHA.

                  MSCP-INFO           This  document  contains  the  design  and
                                      functional  specifications for the TOPS-20
                                      implementation of the Mass Storage Control
                                      Protocol (MSCP) server and driver.

                  PARITY              This  document  describes  some   of   the
                                      changes  made to the way parity errors are
                                      handled for Release 5.

                  PERFORMANCE         This    document    talks    about     the
                                      interpretation   of   some  of  the  WATCH

                  RSX-STOP-CODES      This documents  a  list  of  RSX-20F  stop
                                      codes,  stating  their  meaning,  and  the
                                      module that contains the stop code.

                  SCA-INFO            This document describes SCA,  the  Systems
                                      Communication  Architecture  protocol used
                                      over the CI bus.

                  SCHEDULER           This  document   describes   Working   Set
                                      Swapping,  and  Release  4 and 6 Scheduler
                                      changes (Class Scheduler, SKED%, etc.).

                  SPEAR               This document discusses  how  to  run  the
                                      SPEAR program.

                  USEFE               This  document  outlines  how  to  use  FE
                                      device and program.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 157

                               THE SWSKIT TOOLS PROGRAMS

        Included on the SWSKIT are a number of utility programs,  as  summarized
        below.   These tools have been found to have at least some usefulness in
        the past in a debugging environment.  Most of these programs require the
        user  to have WHEEL or OPERATOR privileges to work, but also most are of
        the "show and tell but don't touch" category, so  they  are  in  general
        "safe" to run.

        We have cleaned up some of the old ones a bit, added a few new ones, and
        checked them all out to the extent that they will all run.  There should
        even be some documentation, at least a HELP file, with each program.

        While we do not actively "support" these programs, we are quite  willing
        to accept complaints and suggestions and submissions from the field.

        These are the "standard" tools;  the Marlboro Support Group is generally
        familiar  with  their  operation and quirks, and in providing support to
        the field may request that one or more of the  programs  be  used  at  a
        customer  site  to  diagnose or assist in correcting a problem.  This is
        generally more effective than random poking about in DDT, or  trying  to
        learn the peculiarities of whatever the customer may have available.

        And now, the current collection:

                  PROGRAM                       DESCRIPTION
                  -------                       -----------

                  ACTDMP              Converts an ACCOUNTS-TABLE.BIN  file  back
                                      into  a  sequence of commands that created
                                      it for debugging purposes.

                  CHANS               Produces system configuration, and  status
                                      information on tapes and disks.

                  DIRPNT              Lists the contents of the blocks in a disk

                  DIRTST              Checks the format, and lists  any  invalid
                                      data in directory files.
        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 158

                  PROGRAM                       DESCRIPTION
                  -------                       -----------

                  DS                  Provides    software    diagnostic    help
                                      concerning  the  disk file system.  It can
                                      also  perform  the  functions   of   READ,
                                      FILADR, and UNITS.

                  DSKERR              Provides a convenient listing of the  hard
                                      and soft disk errors that have occurred.

                  DX20PC              Traces the microcode PC in the DX20.

                  ENVIRONMENT         Types out  CPU  and  memory  configuration

                  JSTRAP              Produces information in a log on any JSYS,
                                      including the PC and arguments used.

                  MONRD               Allows you to easily examine  the  running

                  MTEST               Allows you to insert  MONITOR  instruction
                                      execution tests anywhere in the monitor.

                  REV                 Allows you to easily alter, edit,  delete,
                                      obtain information, etc.  on files.

                  SWSERR              Produces  a  convenient  listing  of   BUG
                                      HLT/CHK/INF occurrences.

                  TYPVF7              This program is useful for typing out  the
                                      contents of a VFU file in a readable form.

                  UNITS               Produces  status   information   on   disk

        [End of Handbook]