PDP-10 Archive: swskit-documentation/handbook.mem from QT020_T20_4.1_6.1_SWSKIT

Trailing-Edge - PDP-10 Archives - QT020_T20_4.1_6.1_SWSKIT_851021 - swskit-documentation/handbook.mem

There are 5 other files named handbook.mem in the archive. Click here to see a list.























                           TOPS-20 TROUBLE-SHOOTING HANDBOOK
                           =================================



                              Release 4.1 and 6.1 Edition


                                      October 1985















                             TOPS-20 Monitor Support Group
                                 Marlboro Support Group
                                   Software Services

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 2
        Introduction






                                      INTRODUCTION
                                      ------------





             This document is the TOPS-20 Trouble-Shooting Handbook.   It  is  a
        collection  of  materials  designed to increase the effectiveness of the
        Software Specialist in the field in coping with TOPS-20 problems.   Some
        of  the  common "disasters" to befall TOPS-20 sites are discussed, along
        with debugging methods in general.   Though  the  information  contained
        herein  is  probably  not sufficient to make a Specialist into a TOPS-20
        "wizard", it should help  ease  the  communication  burden  between  the
        Specialist  in  the  field  and  his counterpart in Marlboro and lead to
        quicker resolution of problems.

             This document contains materials from many  sources,  and  presents
        some information not available anywhere else.  Certain sections may be a
        bit dated, but an effort has been made to remove at least  some  of  the
        old/wrong stuff along with including new articles.

             There is a continuing need to update this document as part  of  the
        SWSKIT  materials,  and  Specialists are encouraged to give the Marlboro
        Support Group feedback on these materials.  This  communication  can  be
        via the Hotline, or by writing to the following address:

                        TOPS-20 Monitor Support Group
                        Digital Equipment Corporation
                        200 Forest Street, MRO1-2/H22
                        Marlboro, Massachusetts  01752

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 3
        Table of Contents


                                   Table of Contents
                                   -----------------




             1.  Introduction . . . . . . . . . . . . . . . . . . . . .   2

             2.  Table of Contents  . . . . . . . . . . . . . . . . . .   3

             3.  Policy Statement . . . . . . . . . . . . . . . . . . .   5

             4.  Producing a Good SPR . . . . . . . . . . . . . . . . .   6

             5.  Using SIRUS  . . . . . . . . . . . . . . . . . . . . .   9

             6.  DDT Patching the TOPS-20 Monitor . . . . . . . . . . .  16

             7.  Mapping Directories in MDDT  . . . . . . . . . . . . .  20

             8.  Recovering from Directory Errors . . . . . . . . . . .  22

             9.  More About Directory Problems  . . . . . . . . . . . .  25

            10.  JSB and PSB Mapping  . . . . . . . . . . . . . . . . .  27

            11.  Breakpointing Multi-User Code  . . . . . . . . . . . .  30

            12.  Using Address Break to Debug the Monitor . . . . . . .  32

            13.  Recovering from System Disasters . . . . . . . . . . .  35

            14.  Looking at Hung Tapes  . . . . . . . . . . . . . . . .  41

            15.  A Look at Some of the Disk Stuff . . . . . . . . . . .  45

            16.  Disk Features of FILDDT  . . . . . . . . . . . . . . .  49

            17.  Supported Disk Drive Parameters  . . . . . . . . . . .  51

            18.  Supported Tape Drive Parameters  . . . . . . . . . . .  52

            19.  TOPS-20 Scheduler Test Routines  . . . . . . . . . . .  53

            20.  TOPS-20 Page Zero Locations  . . . . . . . . . . . . .  62

            21.  TOPS-20 Monitor Sections . . . . . . . . . . . . . . .  67

            22.  TOPS-20 Monitor PSECTs   . . . . . . . . . . . . . . .  68

            23.  TOPS-20 Monitor Universals . . . . . . . . . . . . . .  69

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 4
        Table of Contents


            24.  TOPS-20 Job Zero Forks   . . . . . . . . . . . . . . .  70

            25.  Known Hardware Deficiencies List . . . . . . . . . . .  71

            26.  KS10 Console Information . . . . . . . . . . . . . . .  74

            27.  BOOT Command String Functionality  . . . . . . . . . .  82

            28.  Crash Analysis Fundamentals  . . . . . . . . . . . . .  84

            29.  More Crash Analysis  . . . . . . . . . . . . . . . . . 103

            30.  Referencing the CST Entries under Release 6  . . . . . 122

            31.  The BUG Macro  . . . . . . . . . . . . . . . . . . . . 123

            32.  Monitor Building Hints . . . . . . . . . . . . . . . . 125

            33.  EXEC Debugging . . . . . . . . . . . . . . . . . . . . 130

            34.  Recovering from a Bad EXEC . . . . . . . . . . . . . . 136

            35.  Debugging the GALAXY System  . . . . . . . . . . . . . 137

            36.  Debugging MOUNTR . . . . . . . . . . . . . . . . . . . 151

            37.  Debugging PA1050 . . . . . . . . . . . . . . . . . . . 154

            38.  Copying Floppy Disks . . . . . . . . . . . . . . . . . 155

            39.  The SWSKIT Documentation Files . . . . . . . . . . . . 157

            40.  The SWSKIT Tools Programs  . . . . . . . . . . . . . . 160

            41.  Index  . . . . . . . . . . . . . . . . . . . . . . . . 162

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 5
        Policy Statement


                       LEGAL POLICY CONCERNING THE TOPS-20 SWSKIT
                       ------------------------------------------




             There is a great confusion concerning the materials  that  make  up
        the  SWSKIT  tape, and their legal standing.  This memo is an attempt to
        clear up some of those problems.

             The SWSKITs are made up of an assortment of materials  intended  to
        increase  the effectiveness of the software specialist.  These materials
        include program sources not normally distributed or sold for a  premium;
        internal  and  company  confidential documentation, which may be in part
        incomplete or actually incorrect, but supplied for the information value
        on  subsystems  which may be insufficiently documented through the usual
        channels;  documentation  for  specialists  specially  produced  by  the
        corporate  support people;  and utility programs produced and maintained
        to some extent by  corporate  support.   In  addition,  the  SWSKIT  may
        contain  special  or pre-release versions of supported software provided
        for the incremental value a specialist  may  obtain  from  the  software
        under  controlled circumstances.  In time, utilities from the SWSKIT may
        evolve into supported or generally  distributed  products  (for  example
        FILDDT, SYSDPY, REV, CHANS, MONRD, UNITS,etc.).

             All of the SWSKIT materials are proprietary to  DIGITAL,  and  were
        never  intended  to  be  just  given  to  the  customer.  Obviously, the
        materials which are otherwise  sold  cannot  be  given  away;   and  the
        company confidential materials should not be.  While it is expected that
        the tools programs may wind up being used at customer sites, neither are
        they gifts to the customer.  An effort must be made to protect DIGITAL's
        rights to these proprietary materials.  For instance,  a  PL90  contract
        retains  rights  to  all materials provided to the customer.  Deleting a
        tool program after use at  a  customer  site  indicates  intent.   There
        should  be  an awareness that if a customer incurs damages due to use of
        some program given to him by  the  specialist,  even  though  improperly
        used, then DIGITAL may be seen to be at least in part responsible.  This
        should be avoided.

             In  summary,  the  SWSKIT  is  a  tool  provided  to  increase  the
        effectiveness  of  the  specialist,  especially  with regard to PL90 and
        debugging activity, but the rights to all materials remain with  DIGITAL
        and the specialist should act accordingly.

             THIS IS NOT A LEGAL DEPARTMENT DOCUMENT.  CONSULT LEGAL IF YOU HAVE
        ANY DEFINITE PROBLEMS REQUIRING RESOLUTION.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 6
        Producing a Good SPR


                                  PRODUCING A GOOD SPR
                                  --------------------




                   A software specialist is  often  asked  to  assist  with  the
              submission  of  SPRs for a customer.  It is always discouraging to
              have  problems  getting  an  answer  to  an   SPR   for   entirely
              non-technical  reasons.  For that reason, below are some hints for
              producing a "good" SPR which will  help  in  getting  the  problem
              solved more quickly.



              1.0  THE SPR FORM

              Much of the data on the SPR  form  is  unimportant,  until  it  is
              omitted.   The  line  of  product data is one.  Try to isolate the
              problem to the correct component, since that  will  determine  who
              first  receives  the SPR.  This will remove the time it takes for,
              say the COBOL maintainer, to determine that  the  problem  is  not
              really  in  COBOL,  but  in PA1050 or the monitor, and the time it
              takes for the next maintainer to become familiar with the problem.
              Something  which  crashes  the system is ALWAYS a monitor problem,
              even if it is an EXEC command which causes the problem, or a short
              BASIC program.

                   If you really have a problem, be sure to mark  the  "problem"
              box,  and  don't  use  words  like  "we  suggest  you  correct the
              following situation...".  If the people who  handle  the  incoming
              paperwork  think  they  have  a  suggestion,  it  may  get  routed
              elsewhere, and never seen by the appropriate maintainers.   A  few
              problems have been greatly delayed this way.

                   The priority boxes are not super-critical, but if you have  a
              problem  which  is  holding  up production, or crashing the system
              several times a day, try to make a note of that somewhere  in  the
              description  of  the problem and mark the high-priority box.  That
              should let the maintainer know that  a  work-around  may  also  be
              appropriate in the short term.  Customer-marked high priority SPRs
              are generally the first priority for answering.

                   The phone number of the submitter could be important  if  the
              problem  is  of  such a nature that it proves not-reproducible, or
              the  complexity  is  such  that  futher  clarification   just   to
              understand  the  problem  might  be needed.  Your number here as a
              software specialist provides a more informal contact  than  direct
              maintainer-to-customer  confrontation,  although the customer will
              be contacted directly if that is most expedient.

                   The attachments--be sure to mark some of these boxes  if  you
              send  along  supporting  materials.  Since these can get separated
              from the form, this will help keep them from  getting  permanently

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 7
        Producing a Good SPR


              lost.

                   The "DO NOT PUBLISH" box is for security problems and ways to
              crash   the   system.    We   double-check  this  during  incoming
              processing, but if the box is checked you can be sure that the SPR
              will not be published unanswered.

                   Describe the problem as clearly  as  possible  in  the  space
              provided.   Try  to  provide enough detail to easily reproduce the
              problem.  Concentrate on the description of the problem,  and  any
              diagnosis  you  may  have made.  Attempting to declare a "cure" is
              not always good idea because the actual correction may  be  of  an
              entirely  different  nature  for a number of reasons.  However, if
              you have something that works, the information could  be  of  use.
              Just  don't  count  on that exact change being the actual fix.  If
              the problem  is  not  reproducible  from  the  description  given,
              chances  are  that  something  you  left  out  is  relevant to the
              problem.  Unless the problem directly concerns them,  things  like
              logical  names,  mounted  structures,  and  other  features  often
              obscure the problem.  For the purpose of the problem  description,
              a terminal listing of an occurrance is often highly desirable, and
              it is sometimes a  good  idea  to  create  a  brand-new  directory
              without  any  fancy  LOGIN.CMD  setups or user groups and so on to
              demonstrate the problem.



              2.0  THE SUPPORTING MATERIALS

                   As above, the listing from a terminal session is often a very
              good  attachment.   Try  to  include all the relevant information.
              Again, sometimes things like logical  names,  file  and  directory
              protections,  user  groups,  and  other  job-state  variables  are
              important and should be  included.   Inclusion  of  data  such  as
              program  version  numbers  and  edit  levels  can be essential for
              products with large numbers of edits.  If you are  complaining  of
              monitor problems, which patches you have installed could be useful
              information.  Terminal sessions should be as  clear  as  possible.
              It  should be made obvious just what is going on or the maintainer
              may just see a series of commands and think "So?".  Concurrent  or
              after the fact commenting is one way to accomplish this.

                   Many times there  is  a  program  which  exercises  the  bug.
              Sometimes  these  programs are alright as they are, but often they
              are giant COBOL monsters working on a multi-RP06  data  base,  and
              very  unwieldy  for  a  maintainer  to  try  to work with.  If the
              program can be reduced to a small subset,  do  so.   Many  monitor
              problems often turn out to be reproducible from a set of arguments
              to a single JSYS.  If it is a question of  incorrect  output  from
              some  program, it is helpful to send along all the files needed to
              reproduce the problem, and the files of incorrect output.  In  the
              case  of  programs with multiple edits to field-image, this speeds
              up the maintainer, since he does not have to manually apply  those
              edits  to attempt to recreate your versions, and he can also check

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 8
        Producing a Good SPR


              the installation of the edits, if that  is  appropriate.   And  in
              case  the  problem  proves  to  be not easily reproducible the bad
              output can at least be examined for clues.

                   In the case of a monitor crash, the  problem  may  have  been
              reduced  to  a  program  of less than one page.  It is tempting to
              type this on the front of the SPR and send it in that way.   While
              the  maintainer can type in the program easily enough (if the copy
              is  both  legible  and  correct),  the  submitter  has  been  lax.
              Sometimes,  that short program will not cause a crash, even though
              run thousands of times under varying conditions by the maintainer.
              And  even  when  it  does  cause  the  crash  the  first time, the
              submitter has lengthened the turn-around by not sending  the  dump
              from  the  crash along with the SPR.  Sending the dump solves both
              problems.  If the problem is not reproducible with ease, the  dump
              is  VITAL  to further understanding.  And having the dump to start
              with speeds up the work of the maintainer who now does not need to
              schedule  stand alone to try to exercise the bug and cause a crash
              so he has a dump to look at.

                   When sending a dump, always send the unrun monitor along with
              it.   If  you  don't, you are just causing a delay in handling the
              problem while the maintainer tries it against the  standard  ones,
              which  involves  finding tapes with the standard ones, and loading
              them...  If you are running an unpatched standard monitor, and you
              refuse  to send it, at least tell which one it is somewhere on the
              form.  The unrun monitor is also useful for checking the existence
              and correct installation of patches when that becomes an issue.

                   The current preferred tape format is 9-track, 1600bpi, and in
              standard  DUMPER  format,  not  in  INTERCHANGE format, since file
              information can be lost that way.  Take the time to get a  listing
              of  a directory of the tape and include it with the tape.  It will
              help to speed things up, as if it is obvious  from  the  directory
              that something is missing, faster feedback is generated.  There is
              also the indication that the tape will  indeed  be  readable  when
              received,  and  will  partly eliminate the usual first step of the
              maintainer in getting a directory of the tape.

                   As a final word, remember  that  the  SPR  is  now  the  ONLY
              official  mechanism  to  get  software  problems  resolved  in the
              development code for Autopatch  and  future  versions.   NO  other
              method  is guaranteed to work.  So be sure an SPR is generated for
              every problem, preferably by the customer;  and be  sure  the  SPR
              does not make the problem harder to solve.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                 Page 9
        Using SIRUS


                                      USING SIRUS
                                      -----------

             Did you know that you can dial into a Marlboro  development  system
        and  type  out almost any patch that the Marlboro Support Group has made
        to -10 or -20 software in the last several  years?   The  program  which
        does this is called SIRUS, and with it you can:

             1.  Search through all the patches to a particular product, if  you
                 know a problem exists but don't know what the patch is or don't
                 know if we've heard of the problem.  If you find the patch  you
                 want, you can then type it out.

             2.  Type out a particular patch to a  particular  product,  if  you
                 know what the edit number is.

             3.  Obtain the status of any SPR, including the entire answer if it
                 has been answered.


             By using SIRUS, you can get patches whenever the system is up, even
        if it's two A.  M.  and the Hotline is closed.  You can print patches in
        your local office without having to wait for a specialist in Marlboro to
        mail  you  a  copy.  You can be sure that the patch you have is correct.
        (Dictating patches over the Hotline is very prone to  errors.)  Even  if
        the  problem you are experiencing cannot be found in SIRUS, you can help
        us when you call by so stating.  We immediately know  that  the  problem
        you are having is a new one.

             There have been several articles  about  SIRUS  in  previous  Large
        Buffers,  but  none have been oriented towards specialists in the field.
        This one is!

             To use SIRUS, dial into system CHERRY in Marlboro, log in, and then
        run it.  In more detail:

             1.  Dial into system CHERRY.  The following number will connect you
                 to the machines in Marlboro at 300 or 1200 baud.

                                    297-1550  (DTN)
                               (617)467-1550

                 You will now be talking to  a  MICOM  data  switch  which  will
                 autobaud  your  input  if  your type carriage returns.  It will
                 then prompt for a  system  to  connect  to.   You  should  type
                 "CHERRY"  followed  by a return.  Once the machine notices you,
                 type "SET HOST CHERRY" to insure  that  you  are  connected  to
                 system  CHERRY.   If  you  get  the message "?Undefined Network
                 Node", the machine is down (try again later).

             2.  To login, type "LOGIN 37,#".  When the machine requests a name,
                 type one in.  You WILL need a password, which you can obtain by
                 calling the Hotline operator.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 10
        Using SIRUS


             3.  To run SIRUS, just type "R SIRUS".  SIRUS takes several seconds
                 to  initialize itself and then prompts you with "PRODUCT [H]*".
                 At this point, type either "10<CRLF>" or  "20<CRLF>"  depending
                 on whether the customer of concern is running TOPS10 or TOPS20.
                 SIRUS then prompts you with "[H] *".   You  are  now  at  SIRUS
                 command level.


             SIRUS has many commands, but only a few  are  of  interest  to  the
        field specialist.  They are:

             1.  H -- for Help.  This may be typed anytime  SIRUS  precedes  its
                 prompt with "[H]".

             2.  EX -- for Exit.  Use this to exit  SIRUS.   Then  type  K/N  to
                 logout, and hang up.

             3.  PP -- for Peruse PCOs.  PCO stands for Product Change Order and
                 essentially  means  a  patch.   This  command  is  used to look
                 through patches for a particular product  if  you  aren't  sure
                 which patch you want.

             4.  GP -- for Get PCO.  This is used to type out a particular patch
                 once you know which one you want.

             5.  GS -- for Get SPR.  Use  this  to  retrieve  information  on  a
                 particular SPR.

             6.  NP -- for New Product.  Use this command if you type the  wrong
                 answer  to  "PRODUCT  [H]*"  as  mentioned  above, or use it in
                 association with the PP command as described below.  SIRUS will
                 prompt you for a product again.


             The three most useful of these commands are PP, GP, and GS.



        3.0  PP Command

             Use this command to peruse the patches for a particular product  --
        e.g.   LINK  or  603  (monitor)  or  BATCON  --  if  you  want to find a
        particular patch you know exists, or if you want to know if the  support
        group  has  heard  of and fixed some problem you are experiencing with a
        product.  After you type "PP<CRLF>" SIRUS will prompt for  a  component.
        Here  type the program you're interested in -- LINK, BATCON or whatever.
        A response of LIST will type the programs SIRUS  knows  about  and  then
        prompt you for a component again.

             Once you type in the component, SIRUS  prompts  with  "[H] PCO #:".
        There are two reasonable responses to this.  The first is ALL.  (Type NO
        to the subsequent question about a file.) This will  give  you  a  short
        summary  of  all  the  patches  available for this product, one line per
        patch.  This includes a PCO number, the SPR for  which  this  patch  was

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 11
        Using SIRUS


        written,  the  edit  number  corresponding  to the patch (for the TOPS10
        monitor this is the MCO number),  a  keyword  describing  the  bug,  the
        maintainer  who  wrote  the  patch, and the date it was made.  The other
        response you might type here is simply <CRLF>.  In this case SIRUS  will
        type  out  the  symptom  of  the  newest  PCO,  and then prompt you with
        "NEXT?".  By continuing to type carriage returns, you can type  all  the
        symptoms  of  all  the  patches for this product, from the newest to the
        oldest.  When you have found  the  patch  you  want  (remember  the  PCO
        number), type RETURN to get back to SIRUS command level.

             If you did not find your symptom while perusing, and  your  product
        exists  on  both  TOPS10 and TOPS20, you should also search the PCOs for
        the alternate operating system.  To do this, type NP  to  SIRUS  command
        level, and then type in the other product number when SIRUS asks for it.
        Then peruse PCOs for your product as you did before.



        4.0  GP Command

             This is used to print out a patch once you  know  the  PCO  number.
        The PCO number is printed while you are perusing PCOs and is of the form
        10-product-nnn or 20-product-nnn.  After  typing  GP  to  SIRUS  command
        level,  SIRUS  prompts  for a PCO number.  The leading "10-" or "20-" is
        supplied by SIRUS, so your response should be of the form "product-nnn".

             In response, SIRUS types out information about the patch.  The  two
        most  useful  data are labeled VLD and SAE.  VLD stands for validity and
        is the version of the software to  which  the  patch  applies.   SAE  is
        Source  After  Edit  and is the edit or MCO number of the patch.  To get
        the actual text of the patch, respond  YES  to  SIRUS's  question  "Show
        Write-up File?".



        5.0  GS Command

             This is used to get the status of an SPR.  SIRUS will prompt for an
        SPR  number,  and  then  will  provide  you  with info about the SPR you
        specified.   This  includes  the  site  that  submitted  the  SPR,   the
        specialist  responsible  for  the  SPR,  and  date received and the date
        closed, if the SPR has been answered.  If answered,  it  will  also  say
        whether  or  not an auxiliary file was written for the SPR and what PCOs
        (if any) were included.  The aux file is an introductory paragraph which
        is written for most SPR answers.  For SPRs which do not require patches,
        the aux file constitutes the entire answer.  The aux file can  be  typed
        by  responding YES to "SHOW AUXILIARY FILE?".  The PCOs can be typed out
        with the GP command.

             Finally, if SIRUS begins to give you error messages such  as  "File
        not found", EX from SIRUS and mount a special disk pack with the monitor
        command "MOUNT SIRS:".  Then try again.  This gives you access  to  more
        PCOs and aux files than are normally available.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 12
        Using SIRUS


             For more information, see the example run of SIRUS below, in  which
        user  input  is  shown  underlined, or the article on SIRUS published in
        volume 409 of the Large Buffer.  Finally, SIRUS is for  use  by  DIGITAL
        personnel  only.  DO NOT give out instructions for its use or the system
        CHERRY phone numbers to customers.

        .R SIRUS
         - -----


        SIRUS...3(3)
         
        [WHEN '[H]' APPEARS YOU MAY TYPE 'HELP' FOR ASSISTANCE]
         
         
        PRODUCT [H]* 20
                     --
        [H] *PP
             --
         
        [H] COMPONENT TO PERUSE: D60SPL
                                 ------
        [PCO LIMIT FOR 'D60SPL' IS 15]
        [H] PCO #:<CR>
                  ----
        [20-D60SPL-015]
         
        DATE: 09-JUL-79 BY: BENCE
        VLD: 
         
        [SYMPTOM]




        Jobs sent to the LPT queue from D60SPL are  given  a  random
        file name and are billed to OPERATOR.


         
        NEXT?<CR> 
             ----
        [20-D60SPL-014]
         
        DATE: 09-JUL-79 BY: WEISBACH
        VLD: 
         
        [SYMPTOM]




        If the spooler is pausing, typing a  GO  can  result  in  an
        illegal instruction.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 13
        Using SIRUS




         
        NEXT? ALL
              ---
        DO YOU WANT A FILE? NO
                            --
        PCO 015 SPR 12355             (6,022) KEY= LNAME      BENCE      09-JUL-79
        PCO 014 SPR 12225  OUTOUT     (6,020) KEY= PAUSE      WEISBACH   09-JUL-79
        PCO 013 SPR 11660  LODVFU 6013(6,014) KEY= VFU        WEISBACH   09-JUL-79
        PCO 012 SPR 13244  D60CRE 103 (6,032) KEY= CARD       L.NEFF     06-JUL-79
        PCO 011 SPR        D60CR4 103 (6,015) KEY= CARDS      L.NEFF     03-JUL-79
        PCO 010 SPR        REQUEU 103 (6,030) KEY= CTQMFQ     L.NEFF     14-JUN-79
        PCO 009 SPR 12588  INTCTC 1   (6,026) KEY= CONTROL C  TEEGARDEN  17-MAY-79
        PCO 008 SPR 12881  OUTE.6 103 (6,025) KEY= REQUEUE    NEFF       17-APR-79
        PCO 007 SPR 12139         103 (6,019) KEY= ILLEGAL    WEISBACH   27-OCT-78
        PCO 006 SPR 12005             (0) KEY= SIMULTANEO BENCE      22-SEP-78
        PCO 005 SPR 11672  ENDJOB 103 (6,018) KEY= QUASAR     BENCE      18-SEP-78
        PCO 004 SPR 11841  D60STK 103 (6,016) KEY= BAD        WEISBACH   23-AUG-78
        PCO 003 SPR 11476  TTYOUT 103 (6,010) KEY= OVERWRITE  WEISBACH   12-MAY-78
        PCO 002 SPR 11431  OUTE.6     (6,007) KEY= INTERRUPTS WEISBACH   12-APR-78
        PCO 001 SPR 11456  D60SPL     (6,006) KEY= BLANK      WEISBACH   03-APR-78
        [H] PCO #: RETURN
                   ------
         
         
        [H] *GP
             --
         
         
        [H] PCO #: 20-D60SPL-8
        [20-D60SPL-008 RETRIEVED]
         
        PROG:   NEFF
        COMPONENT: D60SPL
        SER/SPR:20-12881
        KEYS: REQUEUE    /  
        ROUTNS: OUTE.6 /  
        VLD:    103(2304)
        SBE     %103 (6,024)
        SAE     %103 (6,025)
        CRIT:   N
        DOC:    N 
        F/D:    F
        TEST FILE:     :          [        ]
        P-IND:  10
         
        SHOW WRITE-UP FILE? YES
                            ---
         
         
        [WRITE-UP FILE]
        008             NEFF
        [SYMPTOM]

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 14
        Using SIRUS






             If a job is requeued because of a  communications  failure,  with
        D60SPL  reporting  that  the  station  has  signed off, then, when the
        station signs on again, the print file  will  be  restarted  from  its
        beginning, not from the last checkpoint.


        [DIAGNOSIS]

             When the  error  is  detected,  routine  OUTE.6  calls  IBACK  to
        backspace  the  file  five  pages.   IBACK  zeroes  the  page counter,
        J$RNPP(J), and rewinds the  file,  in  the  belief  that  the  forward
        spacing  code  will  update  the page count as it skips to the correct
        page.  However, D60SPL discovers the error is not recoverable  and  it
        requeues  the job immediately.  Since the page count is never updated,
        DOREQ requeues the job to start at the beginning of the file.


        [CURE]

             Preserve the page at which to resume printing over  the  call  to
        IBACK.  if the job is to be requeued immediately, restore J$RNPP(J) so
        that the job will be requeued and checkpointed five  pages  back  from
        its current position.
        [FILCOM]
        File 1) DSK:D60SPL.MAC[4,1022]  created: 1724 09-Apr-1979
        File 2) DSK:D60SPL.MAC[4,417]   created: 1625 10-Apr-1979

        1)1             LPTEDT==6024                    ;EDIT LEVEL
        1)              LPTWHO==1                       ;WHO LAST PATCHED
        ****
        2)1             LPTEDT==6025                    ;EDIT LEVEL
        2)              LPTWHO==1                       ;WHO LAST PATCHED
        **************
        1)4     ;*****End of Revision History*****
        ****
        2)4     ;6025   If a job printing on a remote printer is interruped by
        2)      ;       a communications failure, requeue to start five pages ba
                ck
        2)      ;       instead of at beginning of file.  LLN, SPR # 20-12881,
        2)      ;       10-APR-79
        2)      ;*****End of Revision History*****
        **************
        1)179           PUSHJ   P,IBACK                 ;BACKSPACE THE FILE
        1)              PUSHJ   P,INTON                 ;[6007]TURN INTERRUPTS B
                ACK ON
        1)              PUSHJ   P,D60NRY                ;PERFORM "NOT READY" DIA
                LOG
        1)               JRST   OUTE.7                  ;ERROR IS UNRECOVERABLE
        1)              TELL    OPR,[ASCIZ /![LPT...  continueing!]
        ****

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 15
        Using SIRUS


        2)179   ;**;[6025] ADD SEVERAL LINES AT OUTE.6 + 13L.  LLN, 10-APR-79
        2)              MOVE    T1,J$RNPP(J)            ;[6025] CALCULATE THE NE
                W
        2)              SUB     T1,N                    ;[6025]  DESTINATION PAG
                E
        2)              PUSH    P,T1                    ;[6025]  AND SAVE IT
        2)              PUSHJ   P,IBACK                 ;BACKSPACE THE FILE
        2)              PUSHJ   P,INTON                 ;[6007]TURN INTERRUPTS B
                ACK ON
        2)              PUSHJ   P,D60NRY                ;PERFORM "NOT READY" DIA
                LOG
        2)               JRST   [POP    P,J$RNPP(J)     ;[6025] RESTORE PAGE NO.
                 FOR REQUEUE
        2)                       JRST   OUTE.7]         ;[6025] ERROR IS UNRECOV
                ERABLE
        2)              POP     P,(P)                   ;[6025] THROW AWAY DESTI
                NATION
        2)                                              ;[6025] PAGE - FORWARD S
                PACING
        2)                                              ;[6025] CODE WILL HANDLE
                 IT
        2)              TELL    OPR,[ASCIZ /![LPT...  continueing!]
        **************
        [END OF WRITE-UP FILE]
         
         
        [H] *EX
             --

        EXIT

        .

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 16
        DDT Patching the TOPS-20 Monitor





                            DDT PATCHING THE TOPS-20 MONITOR
                            --------------------------------




             This article discusses how DDT patches are made to TOPS-20.

             From time to time the Marlboro Support Group has  to  describe  and
        explain  the DDT patching of TOPS-20 to Specialists from the field.  The
        following is an explanation, if not a justification,  of  the  way  some
        things are done.

             A DDT patch to TOPS-20 as published is, in essence, a terminal  log
        of a session applying the patch by hand.  This differs from the sometime
        practice of a control file containing only the typein to DDT.   The  raw
        typein  has  a few disadvantages with respect to the log:  It is hard to
        display in a publication format like  the  Software  Dispatch  the  bare
        control  characters like linefeeds and tabs that might be used, and even
        harder to edit around them with the  only  currently  supported  editor,
        EDIT.   In addition, the full typescript allows some confidence building
        (or cause for concern) if the DDT typeout from application of the  patch
        is  (is  not)  the  same  as  the typescript.  The published patch IS an
        actual typescript, and is  "proof"  that  the  patch  CAN  be  correctly
        installed.

             In applying  the  patch,  the  basic  methodology,  lacking  innate
        knowledge,  is  to  just  start  typing from the typescript whenever the
        computer goes into input wait.  Any "$" appearing in a DDT session which
        is  not  the prompt from the enabled EXEC should be the result of typing
        an ESCAPE.  (ESCAPE is sometimes referred to  as  ALTMODE  or  ALT.)  In
        order  to  avoid confusion, we try never to use any dollar sign symbols,
        and hopefully should make special note of any that might occur.

             Starting at the top of a session, there are usually a few  comments
        about  the  patch.   If  we  are currently patching multiple releases of
        TOPS-20, the specific release for the patch should be noted here.   Also
        noted  should  be any hardware or monitor dependencies:  KS- or KL-only,
        or 2040, 2060, or ARPA only, etc.

             The first monitor command is an ENABLE, followed by a  GET  of  the
        monitor  file  to be patched.  Unless we are patching an existing patch,
        our published patches always show us patching a "virgin"  monitor  file,
        one  without  any previous patches installed.  You should always be able
        to duplicate the patch typescript yourself on an unpatched monitor.

             At this point we do a START 140 command to get into DDT.  There  is
        a  fine distinction at this step between typing START 140 and typing DDT
        to get into DDT.  START 140 starts up EDDT (Exec-mode  DDT)  running  in
        user  mode,  which is the required action.  Typing DDT to the EXEC would
        merge  SYS:UDDT.EXE  with  the  monitor  EXE  file  and  start  up  UDDT

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 17
        DDT Patching the TOPS-20 Monitor


        (User-mode  DDT), which is not what we want.  In fact, with Release 4 of
        TOPS-20 the EXEC is clever enough to start up EDDT for  us  on  the  DDT
        command  also,  but  even  so, for the sake of consistency, and to avoid
        confusion, published patches should still use START 140.

             After entering DDT, it is common to select the local  symbol  table
        for  the  module  to  be  patched  in  case  there might be local symbol
        conflicts, etc.  This is done using the  MODULE-NAME$:   (ESCAPE  colon)
        construct.

             Next follows the body of the patch.  We purposely avoid the fancier
        DDT  commands when applying patches in order to avoid confusion.  We try
        to limit ourselves to a few DDT commands:

                ADDRESS/ (slash)to open the location at ADDRESS
                ADDRESS[ (open-square-bracket)
                                similar to / but typeout numeric not symbolic
                RETURN          to close the current location, storing any new
                                value specified
                LINE-FEED       to close the current location, storing any new
                                value specified, and open the next location
                TAB             a convenience command used to close the current
                                location and open the location specified by the
                                last reference; commonly used to get to and
                                open location FFF immediately after inserting a
                                JRST FFF instruction in the code
                SYMBOL: (colon) to define a symbol at the current location;
                                usually to redefine FFF: further down in the
                                patch space
                FFF$< (ESCAPE open-angle-bracket) or
                FFF$$< (ESCAPE ESCAPE open-angle-bracket)
                                to start a patch in the patch area named FFF
                $> (ESCAPE close-angle-bracket)
                                to terminate a patch, which installs the jumps
                                back to the inline code, redefines the FFF
                                symbol value past the used patch space, and then
                                inserts the initial jump to the patch into the
                                inline code

        Those who apply patches are of course free to use the more sophisticated
        DDT commands to achieve the same effect.

             A few TOPS-20 peculiarities  should  be  explained  here.   TOPS-20
        patches  are  applied  using  the FFF patch area.  The default DDT patch
        area symbol, PAT.., (used if no argument  is  given  to  an  $<  or  $$<
        command)  should  NEVER  be  used.   You  are apt to wind up with system
        crashes since the PAT..  area may not be locked down.  FFF is defined in
        the  module  STG.MAC  (which goes to the customers), and the area is 100
        octal words long for  version  4.1  and  defined  by  the  user-settable
        parameter  FFFSZE in version 6.0 (currently has value 400).  FFF is part
        of the resident monitor code PSECT RSCOD for v4.1  and  the  data  PSECT
        RSDAT  for  v6.0,  and  is always in memory.  Special care must be taken
        when installing patches not to overrun the patch area, which could  also
        result  in system crashes.  The first symbol past the FFF area is DTSCNW

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 18
        DDT Patching the TOPS-20 Monitor


        for v4.1, and SVDTRJ for v6.0.  If that symbol shows up while attempting
        to install a patch, you may be in trouble.



                                          NOTE

                       For Release 5  and  5.1  of  TOPS-20,  the
                       patch  area  was  moved  and  is no longer
                       found in STG, but in POSTLD, at the end of
                       the  RSDAT  PSECT, and so requires changes
                       to the LINK CCL file to expand  the  area.
                       Care  should  be  taken  so  that the next
                       PSECT is not overlapped with patches.  The
                       space   reserved   is   now   400  (octal)
                       locations.



             There is another patch space defined in TOPS-20,  called  SWPF,  in
        the  swappable  portion of the monitor.  We always use FFF in preference
        to SWPF since first, SWPF can only be  used  for  patches  to  swappable
        code,  but  FFF will work for either.  Second, two patch areas in common
        use might be confusing to the customers, specialists, and us.  Third, if
        we  get  a  dump to examine from a customer, we can always check the FFF
        area for possible (bad) patch installation.  SWPF might be swapped  out,
        and not in the dump.

             Unconventionally enough, the symbols FFF, FFF1, and  FFF2  are  all
        defined together in STG.MAC with the same value.  When DDT decides which
        to type out when printing the symbolic form of an address, it finds FFF2
        first,  which accounts for the common appearance of FFF2 in patches.  In
        addition, just the symbol FFF is  redefined  on  patch  installation  to
        always  point  to the first free word of the remaining patch area.  FFF1
        and FFF2 are  never  redefined,  and  so  should  always  point  to  the
        beginning of the initial patch area built into the monitor.  FFF2 should
        never have been explicitly referenced as typeIN to DDT;   any  occurance
        in  a  patch should be known to be from DDT typeOUT, probably from a DDT
        LINE-FEED command.  This  is  a  common  source  of  error  in  applying
        patches;   writing  over  earlier patch area by typing in the FFF2-based
        symbols.

             Normally,  in  a  DDT  patch,  lines  which  follow   one   another
        immediately in the published patch are the result of typing LINE-FEED at
        the end of the line, and not RETURN and the next address  symbol.   When
        the  $<  and  $$<  commands  are  used, all lines from that point to the
        terminating $> command should have  been  ended  with  LINE-FEED,  using
        successive locations in the patch space.  The patches should show breaks
        in this form by inserting extra blank lines in the  published  patch  to
        indicate a new "sub-section" of the patch.

             The patching session is ended by the ^Z (Control-Z) command to exit
        DDT properly.  The Control-Z command is the correct way to exit from DDT
        when applying patches.  It allows DDT to do any  final  cleanup  it  may

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 19
        DDT Patching the TOPS-20 Monitor


        need  to  do.   Exiting  via  Control-C  is NOT recommended when you are
        installing patches, and is NOT guaranteed to work.

             Finally, the patched monitor is saved away on  a  disk  file.   The
        published  typescript  shows  creating  a  new  generation of the system
        MONITR.EXE file, but a more conservative approach is to save the patched
        monitor  as  some  other  name, and try running it experimentally during
        system time before installing it as the default monitor.



             And now for an annotated example:

        @
        @! PATCH TO RELEASE 3 AND 3A MONITORS TO CORRECT ENQ FROM
        @! APPENDING A REQUEST TO THE WRONG LOCK BLOCK WHEN A STRING
        @! AND USER CODE HAPPEN TO HASH TO THE SAME ADDRESS.
        @! THE MAGIC NUMBER AT XXX: IS POINT 3,T2,2
        @
        @ENABLE (CAPABILITIES)          !Appropriate releases noted above.
        $GET SYSTEM:MONITR              !Get the monitor
        $START 140                      !Enter user mode EDDT
        DDT

        ENQ$:                           !Open the symbol table for the module

        FFF/   0   XXX:   410300,,T2    !Store into the patch area and define
        FFF2+1/   0   FFF:              ! label XXX: to point to it; redefine
                                        ! FFF to be the new first unused word
        STRCMP+5/   MOVE T3,T2   FFF$<  !Begin an $< patch at FFF
        FFF/   0   LDB T3,XXX           !This line and the next are ended by
        FFF+1/   0   CAIN T3,5          ! LINE-FEEDs
        FFF+2/   0   RET$>              !Terminate the patch
        FFF+3/   MOVE T3,T2             !These 4 lines are typed out by DDT on
        FFF+4/   JUMPA T1,STRCMP+6      ! terminating the patch
        FFF+5/   JUMPA T2,STRCMP+7
        STRCMP+5/   JUMPA FFF2+1        !And another blank line indicating end
                                        ! of this sub-patch region
        ^Z                              !Control-Z to exit DDT properly
        $SAVE SYSTEM:MONITR             !Save away the patched monitor
         <SYSTEM>MONITR.EXE.2 Saved
        $

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 20
        Mapping Directories in MDDT




                          MAPPING DIRECTORIES IN MDDT
                          ---------------------------




             Release 3 and later of TOPS-20 can take advantage of  the  extended
        addressing features of the model B processor.  Some of the data has been
        reorganized and moved into non-zero sections of  the  addressing  space.
        One  of  the  things  moved was directories.  Directories are now mapped
        into section 2, starting at the beginning of the section.  Thus the  old
        procedure  of  reading  a  user's  directory in MDDT is no longer valid.
        This will describe how to map a directory correctly, for release  3  and
        later.

             You first have to find  out  the  structure  number  and  directory
        number  for  the directory to be mapped.  You can use the TRANSL command
        to get the directory number, or use the  ^EPRINT  command  to  list  the
        directory  information.   As  an  example,  suppose you want to find the
        directory and structure information  for  the  directory  SNARK:<CURDS>.
        You use TRANSL and obtain the results:

                @TRANSLATE (DIRECTORY) SNARK:<CURDS>
                SNARK:<CURDS> (IS) SNARK:[4,117]

        The "programmer number" obtained is the directory number, in octal.   In
        this  example,  the directory number is 117.  If the directory is in bad
        shape, and you can't run TRANSL or use ^EPRINT, you will  have  to  find
        out the directory number by looking at the output from a DLUSER or ULIST
        run, or from BUGCHK output.

             To find the structure number, you have  to  work  harder.   If  the
        structure  is  mounted  as  PS:,  its structure number is always 0.  For
        structures mounted other than PS:, you do the following.  You  get  into
        MDDT,  and  look  at  the  table STRTAB.  This table contains all of the
        addresses of the structure data blocks in the system.  The first word of
        each  structure  data  block  is  the  structure name in SIXBIT.  So you
        search the tables looking for the desired structure.   The  offset  into
        the table STRTAB is then the structure number.  For our example:

                @ENABLE
                $SDDT
                DDT
                JSYS 777$X
                MDDT
                $$6T
                STRTAB/   ,8[   /   PS
                STRTAB+1/      M^I   /   REL3
                STRTAB+2/      M_%   /   SNARK
                
        In the example above, you see that PS:  is the first structure, followed
        by the structures REL3:  and SNARK:.  Since the offset into STRTAB was 2

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 21
        Mapping Directories in MDDT


        for SNARK:, the structure number you want is 2.


             Knowing the structure number and the directory number, you can  now
        map  the  directory  and  look  at  it.   When  the directory is mapped,
        location DIRORA will point to the area in the monitor you  can  find  it
        at.   To  map the directory, you call the routine MAPDIR which is in the
        module DIRECT.  It takes two arguments.  The directory  number  goes  in
        AC1,  and the structure number goes in AC2.  For our example, the output
        looks like:

                DIRORA[   740000
                740000/   ?

                1!   117
                2!   2
                CALL MAPDIR$X
                <SKIP>

                740000[   400300,,100

        The skip return from MAPDIR  means  you  have  successfully  mapped  the
        directory.   You  can  now  look at the whole directory by examining the
        proper locations.  The number of pages that are mapped by MAPDIR is  the
        length  of  a directory, so the whole thing is available to look at.  By
        examining or changing location 740000+N in core, you  are  examining  or
        changing  location  N  of the directory.  When you are finished, you can
        just leave MDDT by jumping to MRETN or by typing ^C.


             In release 3 and after, however, when you examine  location  DIRORA
        after calling MAPDIR, it doesn't have to contain a section zero address.
        If it does, then your machine cannot support extended addressing and the
        monitor  is  running  the  same  as release 2 did.  In this case you can
        ignore the rest of this document.  If your machine  does  have  extended
        addressing,  when  you  examine  location DIRORA you will see the number
        2,,0.  This address is now in section 2 of the monitor.


             For Release 4 of TOPS-20, the various  flavors  of  DDT  have  been
        trained  to  understand  extended  addresses, so the mapping contortions
        used for 3 and  3A  are  unnecessary.   On  extended  machines  one  can
        reference section two directly as below:

                DIRORA[   2,,0

                2,,0[   400300,,100

        When done, you can still just ^C out or jump to MRETN.

             NOTE:  if you have the Release 5  version  of  MDDT/EDDT  that  has
        sticky  current  address  section  (see DDTxx.MEM) then be careful about
        doing an MRETN$G after examining section 2, as a crash will result  from
        transferring to MRETN in section 2.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 22
        Recovering from Directory Errors


                            RECOVERING FROM DIRECTORY ERRORS
                            --------------------------------




             Sometimes after a monitor crash due to disk problems, some  of  the
        directories  on  the  system  will  contain  errors.  These errors cause
        BUGCHKs such as DIRFDB, NAMBAD, DIRPG0, and  DIRPG1.   It  is  sometimes
        possible  to  find  the  error  in  the  directory by getting into MDDT,
        mapping the directory, finding what  is  wrong,  and  fixing  it.   This
        procedure is described in the SWSKIT.  However, this is not always easy,
        and may take a lot of time.  It is therefore better  in  many  cases  to
        simply delete the bad directory and recreate it.  This is easy to do for
        most  directories.   But  special  procedures  are  necessary  for   the
        directories  <SYSTEM> and <SUBSYS>.  The rest of this memo will describe
        the methods of recovering from bad directories, handling  in  particular
        the difficult case of the <SYSTEM> directory.

             You can first try to give the EXPUNGE command with the REBUILD  and
        PURGE subcommands.  If the problem with the directory is very simple, it
        may  fix  your  problem.   As  an   example,   suppose   the   directory
        PS:<SICK-DIRECTORY> is incorrect.  You would type:


                $EXPUNGE (DIRECTORY) PS:<SICK-DIRECTORY>,
                $$REBUILD (SYMBOL TABLE)
                $$PURGE (NOT COMPLETELY CREATED FILES)
                $$
                 PS:<SICK-DIRECTORY> [NO PAGES FREED]
                $



             If this does not help the problem, you  will  have  to  delete  the
        directory and then recreate it.  Before proceeding, you should make sure
        that any files you can reference are copied  to  another  directory,  or
        else are saved on tape.  Now first try to delete the directory normally,
        as follows:


                $BUILD (USER) PS:<SICK-DIRECTORY>
                [OLD]
                $$KILL
                [CONFIRM]
                $$
                $



             If this is successful, then simply recreate  the  directory  again,
        and  restore  the  user's files.  You should recreate the directory with
        the same directory number as it had before, so that DLUSER's  data  will
        still be correct.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 23
        Recovering from Directory Errors


             The procedure above will fail if either the directory is mapped  by
        another  job,  or  if  it is totally unusable.  If it is mapped, and the
        directory is a random user, you can  wait  until  the  directory  is  no
        longer  in  use,  or you can take the system stand-alone so that no user
        can reference it.

             If the directory is totally unusable, you will then have to try  to
        delete it the hard way.  Before proceeding, you should try to delete and
        expunge all files in the directory.  This will minimize  the  amount  of
        lost  pages  that will result.  Now there are two cases to consider.  If
        the directory is not a sub-directory, you type the following:


                $DELETE (FILE) PS:<ROOT-DIRECTORY>SICK-DIRECTORY.DIRECTORY,
                $$DIRECTORY (AND "FORGET" FILE SPACE)
                $$
                 <ROOT-DIRECTORY>SICK-DIRECTORY.DIRECTORY.1 [OK]
                $



             If the directory is a subdirectory, you modify the above command by
        replacing  "ROOT-DIRECTORY"  by  the  name of the next higher directory.
        Thus if the directory was PS:<ANOTHER.BAD-ONE>, you type:


                $DELETE (FILE) PS:<ANOTHER>BAD-ONE.DIRECTORY,
                $$DIRECTORY (AND "FORGET" FILE SPACE)
                $$
                 <ANOTHER>BAD-ONE.DIRECTORY.1 [OK]
                $



             The above procedure tells the monitor to treat the  directory  file
        like a normal file, and to delete it as such.  This means that any files
        in the directory will become "lost".  The disk pages  can  be  recovered
        later  with  CHECKD.   If  the  above works, you simply can recreate the
        directory and restore the files.

             The only reason the above command should fail is if  the  directory
        is  still  mapped.   For  PS:<SUBSYS>,  you  can  bring  up  the  system
        stand-alone so that no programs are run from it,  and  then  delete  it.
        For  PS:<SYSTEM>,  even taking the system stand-alone will not help, for
        it is always mapped by job 0.  But there are two procedures you can  use
        which do work.

             The safest method can be used if the user's  system  has  mountable
        structures.   If you have built another PS: structure, you can mount the
        pack with the bad directory as an alias, and then the directory will not
        be mapped and can be deleted.  As an example:

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 24
        Recovering from Directory Errors




                $MOUNT STRUCTURE SICK:/STRUCTURE-ID:PS:
                STRUCTURE SICK: MOUNTED
                $
                $DELETE (FILES) SICK:<ROOT-DIRECTORY>SYSTEM.DIRECTORY,
                $$DIRECTORY (AND "FORGET" FILE SPACE)
                $$
                 SICK:<ROOT-DIRECTORY>SYSTEM.DIRECTORY.1 [OK]
                $



             Then you can build the new directory, restore the files to it,  and
        then  use  it  again for your normal PS: pack.  Be sure to build the new
        directory with the same number.  This is especially  important  for  the
        special system directories.

             If you do not have another disk drive or another PS:  disk,  or  if
        you  don't  want  to  bother MOUNTing the disk, you can fix the <SYSTEM>
        area by using MDDT.  The basic idea is to patch the monitor so  that  it
        no longer thinks that the directory is in use.  This is done as follows:


                $^EQUIT

                INTERRUPT AT 17117
                MX>/MDDT
                CHKOFN/   JSP CX,.SAVE   JRST RSKP
                MRETN$G
                $



             Then  you  should  have  no  problems   deleting   the   directory.
        Immediately  after doing the delete, you should reload the system.  When
        the system restarts, you can read the monitor and the EXEC  either  from
        the  distribution  magtape  or from another directory where you had kept
        copies.  Then recreate the <SYSTEM> area, making sure  to  give  it  the
        same  directory number as it had before.  Then you can restore the files
        and let the users back on.  Finally, you should run  CHECKD  to  recover
        the lost pages.

             NOTE:  The special system directory numbers are:
                        1 - <ROOT-DIRECTORY>
                        2 - <SYSTEM>
                        3 - <SUBSYS>
                        4 - <ACCOUNTS>
                        5 - <OPERATOR>
                        6 - <SPOOL>
                        7 - <NEW-SYSTEM>
                       10 - <NEW-SUBSYS>
                       11 - <SYSTEM-ERROR>

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 25
        More about Directory Problems


                             MORE ABOUT DIRECTORY PROBLEMS
                             -----------------------------

        SOME HINTS FOR TRACING DIRECTORY PROBLEMS


        NOTE -- Use the methods documented in the Operators Guide before
                resorting to the methods below.


             1.  There is a file on the SWSKIT called DIRTST.EXE which will test
                 for  inconsistencies in the directory pointers.  This will tell
                 you just about everything.

             2.  Another program on  SWSKIT  is  DIRPNT  which  prints  out  the
                 contents on the chained FDB's, entire directory, FDB, or symbol
                 table.  This might not work completely if the headers are bad.

             3.  Still  another  useful  SWSKIT  program  is  DS  for  debugging
                 directory  problems.   It  can access the disk exclusive of the
                 directory structure and find directory pages,  search  for  bit
                 patterns,  dump  disk  pages,  extract pages to a separate file
                 based on address, index block or super index block, and a  host
                 of other functions.  See DS.MEM or DS.HLP for more information.

             4.  If you get a BUGCHK:

                 Go into the monitor with MDDT  and  set  a  breakpoint  at  the
                 BUGCHK  address,  say, FDBBAD.  Do the functions that cause the
                 BUGCHK;  DIR, say.  Trace down the bug.  The relevent  listings
                 are  PROLOG  and  DIRECT.   These give the directory format and
                 useful symbols.

             5.  If the pointers are destroyed or confused you can  map  in  the
                 directory as follows:

                        @ENA
                        $^EQUIT                 ; get into MINI-EXEC
                        MX>/                    ; get into MDDT


                        ; To map in the directory, put the directory number
                        ; in AC1.  You can obtain the number from DLUSER or
                        ; TRANSL or BUILD.  The structure number goes in AC2.

                        ; To find  the  structure  number look  at  the  table
                        ; STRTAB.  STRTAB contains a  list of pointers to  the
                        ; SDBs of structures that are mounted.  The  structure
                        ; numbers are equal to the offset into the STRTAB.  To
                        ; find  out  which  structure   has  structure  number
                        ; 3 look at STRTAB+3.   Contents of that location points
                        ; to the SIXBIT structure name.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 26
        More about Directory Problems


                        STRTAB/  54321          ; str number 0
                        STRTAB+1/  56776        ; str no 1
                        STRTAB+2/  12345        ; str no 2
                        12345$6T/       FOO     ; str no 2 is FOO:


                        1/ DIRECTORY NUMBER
                        2/ STR NUMBER
                        CALL MAPDIR$X

                        ; Now you can  look at the  header pointers etc.,  and
                        ; fix things  up  if  you're lucky.
                        ; See the section on system disasters for a checklist
                        ; of things that could be wrong with the directory.
                        ; Go back to the MINI-EXEC.

                        ^P
                        MX>START
                        $


             6.  If you can't (or don't want to) recover the existing files  you
                 can  delete  the directory and restore the files using a DUMPER
                 tape.  See the previous article for methods of deleting the bad
                 directory.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 27
        JSB and PSB Mapping


                An Easy Way to Examine the PSB and JSB of Another Job
                -----------------------------------------------------




        There is an occasional need to look at the state in  detail  of  another
        job  on the system.  A common reason for doing this is to find the cause
        and cure of a "hung job" which cannot be logged out.  To find  out  what
        the  job  is doing you usually start by looking at the JSYS stack in the
        PSB.  But you cannot examine such data easily because the fork  data  in
        the  PSB  and  the  job data in the JSB are not in the monitor's address
        space until the fork is run.  If you try to look at the PSB or JSB using
        MDDT  you will see the data for your own fork.  The SWSKIT program MONRD
        can provide just this sort of information, but has  a  few  limitations,
        and one occasionally needs "direct" access to the data for another fork.
        To get it, you must do what the monitor does, and that is to map it.

        The procedure to do so is this:

             1.  Do a "GET" of the file the monitor  was  loaded  from,  usually
                 SYSTEM:MONITR.EXE.

             2.  Enter user mode DDT in the file you got, and then do a JSYS 777
                 to get into MDDT.

             3.  Find out the SPT indexes as before, and call MSETMP to map  the
                 PSB or JSB to the USER address space, in the correct place!!

             4.  Return from MDDT, and examine PSB and JSB  locations  directly,
                 and see the correct data in the right place.

             5.  When you are done, just ^C and do a RESET.




        The rest of this document will document step by step how  the  procedure
        above  is done, by using an example.  Assume that we wish to examine the
        state of fork 105, which belongs to job 21.  We then begin:

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 28
        JSB and PSB Mapping


        @ENABLE                                 !Get a copy of the monitor
        $GET PS:<SYSTEM>MONITR.EXE
        $START 140                              !Get into user DDT
        DDT

        JSYS 777$X                              !Enter MDDT
        MDDT



        !Following is an example of the procedure to map the JSB of a job:


        FKJOB+105[   25,,2035                   !Get the SPT index of the JSB
                                                !of fork 105

        T1!      2035,,0                        !Put SPT index in left half
        T2!      540000,,JSBPGA                 !* Flags and where to map to
        T3!      JSLSTA'1000-JSBPGA'1000        !Number of pages to map

        CALL MSETMP$X                           !Do the mapping
        $


        !Following is an example of the procedure to map the PSB of a fork:


        FKPGS+105[   2657,,2332                 !Get the SPT index of the PSB
                                                !of fork 105

        T1!      2332,,PSBMAP-PSBPGA            !Put SPT index in left half,
                                                !and offset in right half
        T2!      540000,,PSSPSA                 !* Flags and where to map to
        T3!      PSBMSZ                         !Number of pages to map

        CALL MSETMP$X                           !Do the mapping
        $

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 29
        JSB and PSB Mapping


        !Example of returning to user mode and looking at data from both
        !the PSB and the JSB of the fork:


        MRETN$G                                 !Return to user mode
        $

        USRNAM[   3                             !Examine job's user name
        USRNAM+1[   422050,,546230   $T;DBELL   

        CTRLTT[   777777,,777777                !Controlling terminal

        FILBYT+MLJFN[   4400,,334010            !Start of data block for JFN 1

        PPC/   T1,,DISXE#+2                     !Current PC of the fork

        PAC+17/   -215,,UPDL+62                 !Current stack pointer

        UPDL/   CHKHO5#                         !First few stack locations
        UPDL+1/   CAM CHKAE0#+12   
        UPDL+2/   CHKHO5#   
        UPDL+3/   CAM CHKAE0#+12   
        UPDL+4/   T1,,.COMND+1   


        !Example of terminating the mapping we have done:


        ^C
        $RESET                                  !To finish, just quit and reset
        $


        The procedure as given above maps the JSB and PSB write-enabled.  So  if
        you  find  something  you want to change, you can simply deposit the new
        value into the location.  If you want the data  to  be  write-protected,
        then  change  the  540000  to  500000  in  the  two steps marked with an
        asterisk.


        WARNING:  The procedure of mapping things into your user  address  space
        has  its  limitations.   Mapping  the JSB and PSB works because the user
        core used for mapping was previously empty.  In general,  you  can  only
        map things into your user core if your core pages are either nonexistant
        or are private.  If you call MSETMP or SETMPG and map something  over  a
        shared  page,  the  old  file  page is unmapped without the share counts
        being updated, which prevents your job from logging out later.   To  get
        around  this  problem  you  can  BLT your core image to force all of the
        pages to be private.

        The SWSKIT tools program MONRD is able to examine the JSB and PSB of any
        job/fork  on  the  system,  and is now the preferred method of obtaining
        this sort of information, unless the ability to modify the data  or  use
        advanced features of DDT is required.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 30
        Breakpointing Multi-User Code


                HOW TO USE BREAKPOINTS IN CODE THAT MANY USERS EXECUTE
                ------------------------------------------------------




        When inserting a breakpoint into the running monitor,  you  have  to  be
        careful  that  no  other  users  will  execute  the  code containing the
        breakpoint.  If some other user hits the breakpoint, they will  blow  up
        with  an  illegal instruction since MDDT will not be there to handle the
        breakpoint.  This normally limits the places you  can  set  breakpoints,
        since most of the monitor can be gotten to by any user.  Even if you run
        the system  stand-alone,  it  is  possible  that  the  routine  you  are
        debugging  will be called by job 0.  However, it is still possible to do
        such debugging, even on a system which  is  not  stand-alone,  and  this
        document will describe how this is done.

        The essential element of this technique is to put in the patch in such a
        way  that  only  your own fork can ever reach the breakpoint.  First you
        write a simple routine which will skip if it is not being  run  by  your
        particular  fork.   This  can  be  done  easily if you remember that the
        location FORKX contains the currently running fork number.   An  example
        of such a routine is the following:

        @ENABLE
        $SDDT
        DDT
        JSYS 777$X
        MDDT

        FORKX[   23                     ; check our fork number

        FFF/   0   NOTME:   PUSH P,T1   ; save an AC
        NOTME+1/   0   MOVE T1,FORKX    ; get currently running fork number
        NOTME+2/   0   CAIE T1,23       ; is it us=23?
        NOTME+3/   0   AOS -1(P)        ; no, setup skip return
        NOTME+4/   0   POP P,T1         ; restore the saved AC
        NOTME+5/   0   POPJ P,          ; and return to caller
        NOTME+6/   0   FFF:             ; reset the position of FFF

        The routine above simply saves AC T1, gets the  currently  running  fork
        number,  compares  it  with  your  own fork number which you obtained by
        looking at location FORKX, and skips if they differ.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 31
        Breakpointing Multi-User Code


        Now assume that you want to set a breakpoint into  the  following  code,
        which is in the routine BLKSCN in the module DIRECT.

        BLKSC2/   HLRZ C,BLKTAB(B)
        BLKSC2+1/   CAME A,C
        BLKSC2+2/   AOBJN B,BLKSC2
        BLKSC2+3/   JUMPGE B,BLKSCE
        BLKSC2+4/   HRRZ B,BLKTAB(B)

        Assume you want  the  breakpoint  at  location  BLKSC2+3.   You  do  the
        following:


        BLKSC2+3/   JUMPGE B,BLKSCE   FFF$<     ; patch this location
        FFF/   0   PUSHJ P,NOTME                ; call the NOTME routine
        FFF+1/   0   .$B   JFCL$>               ; me if it gets here,
        FFF+2/   JUMPGE B,BLKSCE                ; set breakpoint
        FFF+3/   JUMPA A,BLKSC2+4
        FFF+4/   JUMPA B,BLKSC2+5
        BLKSC2+3/   JUMPA NOTME+6

        Notice that  the  breakpoint  has  been  set  in  the  JFCL  instruction
        following the call to NOTME.  Only your fork will execute it, so you can
        now debug the section of code while other users are executing it at  the
        same time.  Remember to remove the breakpoint when you are done.

        To run a particular program  while  having  breakpoints  set,  you  must
        remember that the breakpoint has to be set by the same process which you
        expect to hit it.  So for example, typing EQUIT, setting  a  breakpoint,
        returning  to the EXEC and running your program will not work.  You must
        enter MDDT and set the breakpoints from your program you want to  debug.
        As an example:

        @ENABLE
        $GET PROGRAM    ; get the program to be used
        $DDT            ; enter DDT
        DDT
        JSYS 777$X      ; and enter MDDT from there
        MDDT

        (PUT IN "NOTME" ROUTINE AND SET BREAKPOINTS HERE)

        MRETN$G         ; return to the context of the test program
        $
        $G              ; start the test program

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 32
        Using Address Break to Debug the Monitor


                        Using Address Break to Debug the Monitor
                        ----------------------------------------


        Sometimes when examining a set of dumps, you will notice  the  crashes
        are  caused  by  some  location  being destroyed.  If you have no idea
        where the destruction is done from, finding the problem could be  very
        difficult.   One  useful procedure in such cases is to use the address
        break feature of the hardware to track down the  problem  (except  for
        2020's!).   The  only  problem is that the use of address break is not
        obvious.  This is a manual describing how to use address break in  the
        TOPS-20 monitor for releases 4.1(model A)/5.1 and 6.0 (model B).

             In order to use address break, four things must be done.   First,
        the  current routines the monitor uses to set address breaks for users
        must be disabled.  Secondly, your own address break must be  set  from
        MDDT  or  EDDT.   Thirdly,  instructions  which  you  want  to execute
        properly have to be modified so that they will not cause  an  unwanted
        address  break.  Finally, breakpoints must be placed in the monitor so
        that the state of the monitor can be examined when the  address  break
        occurs.  The following is a step by step example of doing this.


        1.      Load the monitor for debugging, and enter EDDT.  The procedure
                starting from BOOT is the following:

                BOOT>/L                         ;Load monitor but don't start it
                BOOT>/G140                      ;Start EDDT
                EDDT
                DBUGSW/   0   2                 ;Set debugging mode
                EDDTF/   0   1                  ;Keep EDDT once system starts
                GOTSWM$B                        ;Install useful breakpoint
                SYSGO1$G                        ;Start the monitor


                [PS MOUNTED]
                $1B>>GOTSWM   0$1B              ;Remove breakpoint now


        2.      Disable the monitor's normal changing of  the  address  break.

                For Release 4.1 this is currently done at two places:
         
                KISSAV+4/   DATAO UNPFG1+26   JFCL      ;Disable instruction
                SETBRK+12/   DATAO A   JFCL             ;Here too

                For Release 6 do not change these locations.  Routine STEXDM
                used in the next step will take care of this.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 33
        Using Address Break to Debug the Monitor


        3.      Set your own address break at the desired location.  Refer  to
                the Hardware Reference Manual for details.  The instruction to
                set an address break is:
         
                DATAO APR,ADDR          ;Note:  APR = 0
         
                where ADDR contains the following fields:
         
                Bits            Description
                ----            -----------
                  9             Break at given address on instruction fetches
                 10             Break at given address on reads
                 11             Break at given address on writes
                 12             0=exec address space, 1=user address space
                13-35           Address to break on.
         
         
                So now assume you want  to  catch  a  bug  which  is  blasting
                location  CURDS.   You want to break only for writes, and want
                to use exec virtual space.  Therefore you type the following:
         
                For Release 4.1:

                FFF/   0   100000000+CURDS      ;Put data in convenient place
                DATAO APR,FFF$X                 ;Set the address break
         
                For Release 6, STEXDM will set the break and notify the monitor:

                T1/   0   100000000+CURDS       ;Put data in convenient place
                CALL STEXDM$X                   ;Set the address break
         
         
        4.      Now you want to disable address  break  for  all  instructions
                which you expect to change the given location.  Assume in this
                example that  only  location  DIDDLE  should  change  location
                CURDS.  Then you do the following for a model B CPU:
         
                FFF!   IT:                      ;Define location to get old flags
                IT+1!                           ;Old PC
                IT+2!                           ;New flags
                IT+3!   IT+4                    ;New PC
                IT+4!   EXCH IT                 ;Save AC and get old flags
                IT+5!   TLO 1000                ;Set address break inhibit bit
                IT+6!   EXCH IT                 ;Restore flags and AC
                IT+7!   XJRSTF IT               ;Return to caller
                IT+10!   FFF:                   ;Redefine FFF
         
                DIDDLE/   MOVEM A,CURDS   FFF$< ;Insert patch
                FFF/   0   XPCW IT$>            ;Call above routine
                FFF+1/   0   MOVEM A,CURDS      ;Typed by DDT when finishing patch
                FFF+2/   0   JUMPA A,DIDDLE+1
                FFF+3/   0   JUMPA B,DIDDLE+2
                DIDDLE/   MOVEM A,CURDS   JUMPA IT+10

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 34
        Using Address Break to Debug the Monitor


                The  XPCW IT  instruction is used to save the old PC at IT and
                IT+1,  and take a new PC from IT+2 and IT+3.  There the old PC
                is changed to include the address break inhibit bit.   Then  a
                XJRSTF IT  is  done  which  returns  to  the caller.  The next
                instruction then executes without causing  an  address  break.
                You   have  to  insert  the  XPCW  IT   instruction  at  every
                instruction you want to succeed.
         
                For model A CPUs the procedure is similar, but a little easier:
         
                FFF!   IT:                      ;Define location to hold PC
                IT+1!   EXCH IT                 ;Get old PC and save AC
                IT+2!   TLO 1000                ;Set address break inhibit flag
                IT+3!   EXCH IT                 ;Restore PC and AC
                IT+4!   JRSTF @IT               ;Return to caller
                IT+5!   FFF:                    ;Redefine FFF
         
                DIDDLE/   MOVEM A,CURDS   FFF$< ;Insert patch
                FFF/   0   JSR IT$>             ;Call above routine
                FFF+1/   0   MOVEM A,CURDS      ;Typed by DDT when finishing patch
                FFF+2/   0   JUMPA A,DIDDLE+1
                FFF+3/   0   JUMPA B,DIDDLE+2
                DIDDLE/   MOVEM A,CURDS   JUMPA IT+5
         
         
        5.      Now put the breakpoints into  the  monitor  so  that  when  an
                address  break  occurs, you will get into EDDT.  There are two
                locations to patch, one for PI level and one for non-PI level.

                ADRCMP$B                        ;Set breakpoint at non-PI routine
                PFCD23$B                        ;Set breakpoint at PI routine
                $P                              ;Now let the monitor proceed


        6.      When either of the above breakpoints is hit, the flags and  PC
                of  the  instruction which caused the address break will be in
                locations TRAPFL and TRAPPC.    If the address break was  from
                JSYS  level  (breakpoint  was to ADRCMP and location INSKED is
                zero) then an $P will proceed properly.  If the address  break
                was  from  the  scheduler  or  from PI level, doing $P will be
                useless since the monitor will then BUGHLT because it  doesn't
                want to see an address break under these conditions.  However,
                this is ok if all you want  to  do  is  find  the  instruction
                causing the trashing.


              If the location still gets trashed after trying to catch it this
         way,  either your procedure is wrong;  e.g. by trying this on a  2020
         (which has no address break feature); the location is  being  changed
         by  some  IO  being  done  (RH20s, DTEs, etc); or else the machine is
         having some hardware problems.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 35
        Recovering from System Disasters


                            RECOVERING FROM SYSTEM DISASTERS
                            --------------------------------


        There are some common system  disasters  which  in  many  cases  can  be
        recovered  from  quickly and with a minimum of effort.  The four we will
        discuss in this article are:


             1.  Hung Terminals

             2.  Hung Jobs

             3.  Hung SETSPD

             4.  Trashed Disks




        1.0  HUNG TERMINALS

        Hung terminals are usually the result of two problems.  Either the speed
        has  been  set  incorrectly  for  that terminal type or a problem exists
        between the KL and the front end.  If the problem  is  a  result  of  an
        improper  speed  setting,  then  simply  resetting  the  speed  will  be
        sufficient.  On the other hand, if the  problem  is  due  to  some  sync
        problem  between  the KL and the 11 then the easiest way to recover from
        this is to reload the front end.  This can be  done  by  depressing  the
        halt switch on the operator's console of the 11 and then placing it back
        in the enable state.  After about fifteen seconds, the message

                                [DECsystem-20 continued]

        should be printed on the CTY.  If  this  fails  to  free  the  terminal,
        perhaps  the  problem  is  a  hung  job.   See the discussion under that
        heading also.  If the problem is recurrent or otherwise needs debugging,
        there  are a couple of ways of gathering the necessary information.  One
        is to take a dump of the system (see the Crash Analysis section) and the
        front  end.  Another way is to use the DMPTTY program from the SWSKIT to
        display the internal terminal state for the line.  In  many  cases  this
        can  substitute  for  the  CPU dump, though any front-end state is still
        unknown.



        2.0  HUNG JOBS

        There are a number of circumstances which arise which  cause  a  job  to
        become  hung,  usually  waiting for some resource to free up, some share
        count to become zero etc.  Some times, these  tests  will  never  become
        satisfied,  the  Job  has its PSI system turned off, and as a result the
        job becomes Hung.  Freeing it up can be very tricky.  The first thing to
        try  is  to  log  the job out from some other terminal.  If this doesn't

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 36
        Recovering from System Disasters


        succeed in freeing the job up, then the next best thing is to detach the
        job  from  the  terminal  and  allow  it  to sit there.  It may be using
        negligible amounts of CPU time and  cause  no  adverse  affects  to  the
        system.   To  zap  the job may crash the system which, in most cases, is
        not the desirable approach.  Use SYSDPY and  note  the  scheduler  tests
        that  the  processes  of  the  job  are  in for later reference (see the
        Scheduler Tests section).

        The next time the system is reloaded, be sure  to  get  a  dump  of  the
        system with the hung job and submit it as an SPR (see the SWSKIT article
        about getting informative Dumps).



        3.0  HUNG SETSPD

        This is a fairly common problem brought on by some hardware problem.  It
        is  possible  to bring the system up without running SETSPD under JOB 0,
        logging in, and then trying to run SETSPD under some other operator job.
        If  SETSPD  then  hangs, it is possible to CONTROL/C out of the program,
        edit n-CONFIG.CMD to remove the commands suspected  of  hanging  SETSPD,
        and  retrying.   In  this  way,  while  waiting  for  the  problem to be
        resolved, it is possible to continue timesharing.

        To bring the system up without running SETSPD  automatically,  one  need
        only  install  the  following  patch to the MONITOR using EDDT on system
        start up.


                  BOOT>/l
                  BOOT>/g141
                  EDDT
                  EDDTF[   0   -1
                  DBUGSW[   0   2
                  GOTSWM$B
                  SYSGO1$G
                  [PS MOUNTED]
                  1B>>GOTSWM
                  RUNDD3+7/   PUSHJ P,RUNDII   JFCL   (at RUNDD3+16 for v6.0)
                  0$1B
                  $P
                  %%No SETSPD


        The system will then come up as usual except that SYSJOB will  not  run.
        After  successfully  deciding the problem with SETSPD, SYSJOB can be run
        by typing

              COPY (FROM) <SYSTEM>SYSJOB.RUN (TO) <SYSTEM>SYSJOB.COMMANDS


        This will cause all the commands in the SYSJOB.RUN file to  be  executed
        by SYSJOB.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 37
        Recovering from System Disasters


        4.0  TRASHED DISKS

        This is surely  one  of  the  biggest  headaches  facing  a  specialist.
        Trashed  disks  come  in many forms and recovering from these requires a
        good knowledge of the structure of the TOPS-20 file system.

        If the structure cannot  be  mounted,  it  is  because  of  one  of  the
        following reasons:

             1.  Inconsistency in either of the HOM blocks

                 1.  Word HOMNAM (1) of either HOM block not SIXBIT/HOM/

                 2.  Word HOMCOD (176) of either HOM block not 707070

                 3.  Word HOMHOM (5) of first HOM block not 1,,12

                 4.  Word HOMHOM (5) of second HOM block not 12,,1

                 5.  Word HOMFSN (173) of either HOM block not 20040,,47524

                 6.  Word HOMFSN+1 (174) of either HOM block not 51520,,31055

                 7.  Word HOMFSN+2 (175) of either HOM block not 20060,,20040

                 8.  Right half of word HOMLUN (4) of either home  block  either
                     refers  to a unit greater than the left half of word HOMLUN
                     or it refers to a UNIT already verified

                 9.  Word HOMSNM (3) of either home block does  not  agree  with
                     SIXBIT/STRUCTURE-NAME/

                10.  No disk address for index block  in  word  HOMRXB  (10)  of
                     either HOM blocks


             2.  Inconsistencies in Root-Directory page 0

                 1.  Directory number in page 0 of Root-Directory not 1

                 2.  Directory block type (DRTYP) of Root-Directory page  0  not
                     400300 (.TYDIR)

                 3.  Relative Page number (DRRPN) of Root-Directory page 0 not 0

                 4.  Top of symbol table (DRSTP) of Root-Directory page 0 out of
                     Directory bounds

                 5.  Pointer to first free block (DRFFB) of Root-Directory  page
                     0 not in page 0 of the directory

                 6.  Pointer to Directory Name String (DRNAM) not under start of
                     symbol table

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 38
        Recovering from System Disasters


                 7.  Directory name pointer (DRNAM) not 0 and Name string  block
                     length (NMLEN) not at least 2 words long

                 8.  Directory name pointer (DRNAM) not  0  and  directory  name
                     block header (NMTYP) not 400001 (.TYNAM)

                 9.  Password block pointer not  0  and  password  string  block
                     length (NMLEN) not at least 2 words long

                10.  Password block pointer not  0  and  password  string  block
                     header (NMTYP) not 400001 (.TYNAM)

                11.  Account string block pointer not 0 and Account string block
                     length (NMLEN) not at least 2 words long

                12.  Account string block pointer not 0 and Account string block
                     header (NMTYP) not 400001 (.TYNAM)

                13.  Remote alias list pointer not  0  and  Remote  alias  block
                     length (NMLEN) not at least 2 words long

                14.  Remote alias list pointer not  0  and  Remote  alias  block
                     header (NMTYP) not 400001 (.TYNAM) and so on down the chain


             3.  Inconsistencies in Block types  or  free  space  in  subsequent
                 pages of the directory.

                 All blocks in the directory (including free space) begin with a
                 block  header  which  specifies  type  and length.  Immediately
                 following one block should be a header for  a  new  block.   If
                 this scheme is corrupted, the mount will fail.

                 1.  Header of a block not

                     1.  (.TYNAM)  400001       6.  (.TYDIR)  400300

                     2.  (.TYEXT)  400002       7.  (.TYFRE)  400500

                     3.  (.TYACC)  400003       8.  (.TYFBT)  400600

                     4.  (.TYUNS)  400004       9.  (.TYGDB)  400700

                     5.  (.TYFDB)  400100      10.  (.TYRNA)  401000


                 2.  Header of block is NAMTYP and length not at least 2 words

                 3.  Header of block is EXTTYP and length not at least 2 words

                 4.  Header of block is ACCTYP and length not at least 3 words

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 39
        Recovering from System Disasters


                 5.  Header of block is USRTYP and length not at least 3 words

                 6.  Header of block is FDBTYP and

                     1.  Block length not at least 30 (.FBLN0) words long

                     2.  Pointer to Author String (.FBAUT) not 0 and points to a
                         block  outside  of  the  directory or points to a block
                         that does not meet the tests for a user name string  as
                         described above.

                     3.  Pointer to Last Writer String (.FBLWR) not 0 and points
                         to  a  block  outside  of  the directory or points to a
                         block that does not meet the  tests  for  a  user  name
                         string block as described above.

                     4.  Pointer to Account String (.FBACT) is not less than  or
                         equal  to  zero and it points to a block outside of the
                         directory or it points to a block that  does  not  meet
                         the  tests  for  an  account  string block as described
                         above.

                     5.  Pointer to Name String (.FBNAM) is not 0 and it  points
                         to  a  block outside of the directory or it points to a
                         block that does not meet the tests for  a  Name  String
                         Block as described above.

                     6.  Pointer to Extension String (.FBEXT) is not  0  and  it
                         points to a block outside of the directory or it points
                         to a  block  that  does  not  meet  the  tests  for  an
                         Extension String Block as described above.


                 7.  Header of a block is DIRTYP and

                     1.  Header is not on a page boundary

                     2.  Relative page number (DRRPN) not  the  calculated  page
                         number

                     3.  Pointer to first free block (DRFFB) does not point to a
                         location within the current directory page

                     4.  Directory number (DRNUM) not 1.


                 8.  Header of a block is FRETYP and block is not at  least  two
                     words or Pointer to next free block (FRNFB) is not zero and
                     points to a location not on the same page as current

                 9.  Last block did not end at DRFTP (address specified on first
                     page of directory)

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 40
        Recovering from System Disasters


             4.  BAT blocks inconsistent.

                 1.  Either block does not contain SIXBIT/BAT/ in BATNAM (offset
                     0 in block)

                 2.  Either block does not contain 606060 in BATCOD (offset 176)

                 3.  Sector number of the BAT block (BATBLK) not the true sector

                 4.  The BAT blocks do  not  compare  exactly  with  each  other
                     through word 176 of the blocks


             5.  Checksum of the Root-directory Index Block does not agree  with
                 the checksum calculated.

                 Checksums are calculated as follows:

                 CHKSUM = 0 ;
                 For I = 0 to 777
                     If XB(I) = 0 then 
                         CHKSUM = CHKSUM + I
                     Else 
                         CHKSUM = CHKSUM + XB(I) ;

                 where XB is the first word of the index block.



        As you can see, there are  many  things  that  could  be  wrong  with  a
        structure  that  inhibits it from being mounted.  The consistency of the
        structure can be checked quite  easily  using  the  FILDDT  commands  of
        STRUCTURE and DISK, discussed elsewhere in the SWSKIT.

        For structures which are badly trashed, the only sane way of  recovering
        is  to  rebuild  the  structure  using  a  catastrophe tape.  For simple
        inconsistencies such as a bad BAT block, CHECKD does the job well.   For
        more  involved  trashes  which  can not be recovered from a back up tape
        (because of a forgetful system manager) the above information can be  of
        great help, along with SWSKIT programs DS, DIRPNT, and DIRTST.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 41
        Looking at Hung Tapes


                                 LOOKING AT HUNG TAPES
                                 ---------------------




        A number of problems of the general classification "tape hang" have been
        reported,  and  will  probably  always exist as long as we use magtapes.
        Although there are apparently several variants of the problem, there are
        some  things  which  can  be done by a suitably cautious specialist when
        presented with a hung tape drive.   Listed  below  are  some  techniques
        which can be used in an attempt to investigate and perhaps alleviate the
        problem.  These things should, in general, be harmless  to  the  system,
        barring  mis-typing  in  MDDT.  As a result, perhaps they will not clear
        the problem.

        There are several tables that are used in relation to tape drives.  Some
        of  these tables are indexed by MT unit number, some by MTA unit number.
        In general, it can be  said  that  if  a  table  name  begins  with  the
        characters MT, it will be indexed by MTA or physical unit number, and if
        the table name begins with TL or TP, it will be indexed by MT or logical
        unit  number.   The  TL  and TP tables will usually have something to do
        with the tape labeling system.  This article concerns itself mainly with
        the more important tables relating to MTAs (physical tape units).

        When playing with the tape subsystem, certain care should be taken.  For
        instance,  it  always  helps  if  no one else is actively using the tape
        drives while you attempt something like reloading the  microcode  for  a
        DX20.

        1.  Finding the Tape Drive

        There are several tables  parallel  to  each  other  which  concern  the
        ownership  of  a  tape drive.  Those of interest are DEVNAM, DEVCHR, and
        DEVUNT.  At DEVNAM+n is the device name in SIXBIT.   At  DEVUNT+n  is  a
        word with the left half set to the assigner's job number, -1 if free, or
        -2 if being controlled by the allocator.  The right  half  contains  the
        unit number.  Note that with tape allocation turned on, MTAs will always
        indicate that job 0 has the drive assigned and that the offset to the MT
        unit  number  will contain the job number of a user.  At DEVCHR+n is the
        device characteristics word.  Knowing the devicename or the owning  job,
        one can use DDT to find the table offset.  See the example below.

        2.  Grabbing the Drive

        Knowing the offsets into DEVUNT, the device assignment can be  freed  by
        putting  -1  into  the  left  half of the appropriate DEVUNT entry.  The
        drive can then be assigned by the normal ASSIGN command to the EXEC.  In
        dealing  with  the  allocator, your own job number can be placed here if
        necessary.  The drive, however, will still be in no state to use.   Note
        that  the  appropriate DEVUNT entry would be the one referring to the MT
        not the MTA.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 42
        Looking at Hung Tapes


        3.  Clearing External Errors

        Make sure that there is a tape of some sort mounted, and  the  drive  is
        placed  on-line.   Having  a  write-enable  ring in the tape may help in
        being sure the unit is functional if the hung condition is cleared.

        4.  Checking the UDB

        Next, the Unit Data Block status should be  reset.   This  word  can  be
        found using the MTCUTB table.  This table is indexed by MTA unit number,
        the left half is the address of the channel data block  (CDB),  and  the
        right  half contains the address of the UDB.  The status word of the UDB
        should then be reset to the base state.  The right half should  be  left
        alone--it basically contains drive type.  The left half should have only
        bit 16 set, which indicates  a  tape  type  device  (US.TAP).   The  old
        contents should be remembered for purposes of later analysis.

        5.  Checking the Status

        Now, table MTASTS  is  examined,  indexed  by  MTA  unit  number  again.
        Remember the old contents.  Then clear the word to zero.

        6.  Example

            @enaBLE (CAPABILITIES) 
            $sddt
            DDT
            mddt%$x
            MDDT
            
            dvxstn=21   !THIS SYMBOL PROVIDES A HANDY INDEX TO THE
                        !MTA OFFSETS IN THE DEVxxx TABLES.
            
                        !DEVNAM HAS SIXBIT DEVICE NAMES
            
            devnam+21/   HLRZM P2,FKBSPW+217(T1)   $6t;MTA0     
            DEVNAM+22/   MTA1     
            DEVNAM+23/   MTA2     
            DEVNAM+24/   MTA3     
                ...
                ...
                ...
            DEVNAM+40/   MTA17     
            
            mtan=20             !ROOM FOR 20 (OCTAL) TAPE DRIVES ALLOCATED
            
            mtindx[   777765,,5   !BUT ONLY 5 ACTUAL DRIVES ARE HERE
            
                        !MTs WILL APPEAR AFTER MTAs IN THE DEVxxx TABLES SO
                        !DVXSTN+MTAN WILL BE THE OFFSET TO THE MT ENTRIES
            
            devnam+41/   HLRZM P1,@0   $6t;MT0      
            DEVNAM+42/   MT1      
            DEVNAM+43/   MT2

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 43
        Looking at Hung Tapes


            DEVNAM+44/   MT3      
                ...
                ...
                ...
            DEVNAM+60/   MT17
            
                        !DEVUNT IS PARALLEL TO DEVNAM AND PROVIDES OFFSETS INTO
                        !THE MTxxxx TABLES FOR MTAs AND OFFSETS INTO THE TLxxxx
                        !AND TPxxxx TABLES FOR MTs
            
            devunt+21[   0   !UNIT ZERO (OFFSET FROM DEVNAM) ASSIGNED TO JOB 0
            DEVUNT+22[   1   !JOB 0,,MTA1:
            DEVUNT+23[   2   !JOB 0,,MTA2:
            DEVUNT+24[   3   !JOB 0,,MTA3:
            DEVUNT+25[   4   !JOB 0,,MTA4:
            DEVUNT+26[   5   !JOB 0,,MTA5:
            DEVUNT+27[   777777,,6   !UNASSIGNED,,MTA6:
                ...
                ...
                ...
            DEVUNT+40[   777777,,17   !UNASSIGNED,,MTA17:
            
                        !DV%PSD=400000 INDICATES A PSEUDO DEVICE
                        !THE FOLLOWING ENTRIES FOR MTs WILL INDICATE
                        !THE AVAILABILITY OF LOGICAL TAPE UNITS
            
            devunt+41[   32,,400000   !PSEUDO DEVICE MT0: IS ASSIGNED TO
                                      !JOB 32 OCTAL (JOB 26 IN DECIMAL)
            DEVUNT+42[   777776,,400001   !CONTROLLED BY ALLOCATOR,,MT1:
            DEVUNT+43[   777776,,400002   !     "     "       "   ,,MT2:
            DEVUNT+44[   777776,,400003   !     "     "       "   ,,MT3:
                ...
                ...
                ...
            DEVUNT+60[   777776,,400017   !     "     "       "   ,,MT17:
            
                        !TLABR0 (INDEXED BY MT NUMBER) WILL INDICATE WHICH
                        !PHYSICAL TAPE UNIT WILL BE USED WHEN REFERENCING MT.
                        !THIS IS INDICATED BY PHYSICAL MTA NUMBER IN BITS 2-8.
            
            tlabr0[   405000,,0   !BIT 0 INDICATES A VALID VOLUME IS ON MTA5
            
            mtcutb+5[   730437,,730625   !CDB,,UDB FOR MTA5 IN USE BY JOB 26
                                         !WHO KNOWS IT AS MT0 (SEE ABOVE)
            
            
            730625[   102,,157  !FIRST WORD OF UDB FOR MTA5
                                !US.WLK=1B11  ==> WRITE LOCKED
                                !US.TAP=1B16  ==> TAPE TYPE DEVICE
                                !.UTT70=17B35 ==> TU70
            
            mtasts+5[   0   !THIS EXAMPLE INDICATES A TAPE DRIVE THAT PROBABLY
                            !HASN'T BEEN REFERENCED BY THE USER YET

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 44
        Looking at Hung Tapes


            
            mretn$g             !TO RETURN TO SDDT FROM MDDT
            <>
            ^Z                  !TO RETURN TO THE EXEC FROM SDDT
            $
            

        If clearing MTASTS and UDBSTS for the drive doesn't seem  to  clear  the
        problem,  you  will probably have to do more digging around to find some
        other, more obscure, inconsistency in the MTA/MT tables.   This  can  be
        accomplished  by referring to the monitor tables under MTA-STORAGE-AREA.
        As always, extreme caution should be exercised while fooling  around  in
        MDDT  as  you can accidentally trash some random location in the monitor
        just by hitting a carriage return at the wrong time.

        If a DX20 controller is involved,  there  may  be  a  situation  of  the
        controller  microcode  hanging  or  malfunctioning.   The SWSKIT program
        DX20PC can be useful in these situations.

        One last note should  be  made  about  the  monitor  tables  here.   The
        description of the DEVUNT table would lead one to believe that the right
        half will contain a -2 if the device is under control of the  allocator.
        If  the  device is under control of the allocator, the -2 will appear in
        the left half.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 45
        A Look at Some of the Disk Stuff



                           A LOOK AT SOME OF THE DISK STUFF
                           --------------------------------



        This article is a front for  the  PHYPAR  module,  which  is  where  the
        information  may  be reliably obtained, and should serve as the ultimate
        reference for these problems.


        Much of the system debugging you will have to deal with will involve the
        DEC-20  hardware.  There always seems to be a large gap between what the
        diagnostics can tolerate and what the monitor can tolerate in the way of
        malfunctioning  hardware.   The monitor will not always point you to the
        real disk or magtape problem, say, but will crash  after  something  has
        gone  wrong  a few minutes ago somewhere.  Most of the hardware problems
        that we have had to deal with that were really difficult to  track  down
        and  point  the Field Service rep.  to were problems with disk hardware.
        The following is information which you can use  to  help  Field  Service
        trace  down problems which are not reported in the diagnostics.  In most
        cases the Field Service rep knows what all the status  bits  etc.   mean
        but  has  not  been  able to find them in the monitor crashes or running
        monitor.

        CHNTAB:
                CHNTAB is an  ordered  list  of  Channel  Data  Block  addresses
                starting  with  channel  0.  RH20-0 data block address is in the
                first word etc.

        CDB:
                CDB is the Channel Data Block.  There is one CDB  per  channnel.
                The  CDB  contains  channel  dependent  instructions  and  data,
                pointers to the unit data block (UDB) in the case of RPO4, RP05,
                and  RP06's.   In  the  case  of  tapes  the  pointer  is to the
                Kontroller Data Block (TM02/3) which points in turn to the UDBs.
                The  CDB  also  contains  information about the currently active
                unit.  When the channel interrupts, control passes (via  a  JSP)
                to  CDBINT.   The  CDB  address  is  stored  in  AC1, P1 and the
                principal analysis routine, PHYINT, is called.


                                          NOTE

                The CDBs are referenced in  modules  PHYSIO,  PHYH2  (RH20
                code),  PHYM2  (TM02/3  code)  and  PHYP4  (RP04,05,06,07s
                code).  The Channel Data Block is defined  in  the  module
                PHYPAR.   The  address  that you get in CHNTAB is really a
                pointer to word0 which contains the status bits  for  this
                controller   (CDBSTS).   Look  in  PHYPAR  for  the  table
                definition.  Some words of interest are:
                CDBaddress + CDBSTS:  status and configuration info
                CDBaddress + CDBUDB:  table of UDB (or KDB) addresses

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 46
        A Look at Some of the Disk Stuff


                The status bits which are also defined in PHYPAR are listed here
                for your convenience:

                        CS.OFL==1B0             ; offline
                        CS.AC1==1B1             ; primary command active
                        CS.AC2==1B2             ; secondary command active
                        CS.ACT==CS.AC1!CS.AC2   ; any active
                        CS.MAI==1B3             ; channel is in maintenance mode
                        CS.MRQ==1B4             ; maintenance mode requested for unit
                        CS.ERC==1B5             ; error recovery in progress
                        CS.STK==1B6             ; channel supports command stacking
                        CS.ACL==1B7             ; alternate command list is current
                        CS.CWP==1B8             ; Channel write parity error detected
                        CS.CIP==1B9             ; CI-port channel
                        CS.DEN==1B10            ; CI port DIAG to take channel enabled
                        CS.NIP==1B12            ; NI-port channel

                        BITs 30-32              ; PIA field
                        BITs 33-35              ; channel type field


                KDB:
                Kontroller Data Block.  Defined in PHYPAR also.   Referenced  in
                PHYM2, PHYPAR, PHYSIO.  Words of interest are:

                        KDBADDR+KDBSTS:         ; flags unit type
                        KDBADDR+KDBUDB:         ; UDB table first word (1 word/UDB)


                UDB:
                Unit Data Block.  There is one UDB per unit  associated  with  a
                CDB  or  KDB.   The  UDB  contains information about the current
                activity on the unit in question.  The UDB is defined in  PHYPAR
                as  well.   Some words of interest are noted below.  Look in the
                listings for other information.

                UDBADDR + UDBSTS:       ; status and configuration (see below)
                UDBADDR + UDBERR:       ; error recovery status word
                UDBADDR + UDBERP:       ; error reporting work area if non 0
                UDBADDR + UDBRED:       ; reads - sectors/frames if disk/tape
                UDBADDR + UDBWRT:       ; writes - sectors/frames if disk/tape
                UDBADDR + UDBSRE:       ; soft read errors
                UDBADDR + UDBSWE:       ; soft write errors
                UDBADDR + UDBHRE:       ; hard read errors
                UDBADDR + UDBHWE:       ; hard write errors
                UDBADDR + UDBPS1:       ; current cylinder/file if disk/tape
                UDBADDR + UDBPS2:       ; current sector/record if disk/tape
                UDBADDR + UDBSPE:       ; soft positioning error
                UDBADDR + UDBHPE:       ; hard positioning error        

                                        ; NOTE - there are several other UDB words
                                        ; including a device dependent portion

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 47
        A Look at Some of the Disk Stuff


                Status bits in UDBSTS (First word of UDB):

                US.OFS==1B0     ; off line or unsafe
                US.CHB==1B1     ; check HOME blocks before any normal I/O
                US.POS==1B2     ; positioning in progress
                US.ACT==1B3     ; active
                US.BAT==1B4     ; on if bad BAT blocks on this unit
                US.BLK==1B5     ; lock bit for this units BAT blocks
                US.PGM==1B6     ; dual port switch in (A or B)
                US.MAI==1B7     ; unit is in maintenance mode
                US.MRQ==1B8     ; maintenance mode requested on this unit
                US.BOT==1B9     ; unit is at BOT
                US.REW==1B10    ; unit is rewinding
                US.WLK==1B11    ; unit is write locked
                US.CIP==1B12    ; unit is on a CI port
                US.OIR==1B13    ; operator intervention required, set at
                                ;  interrupt level, checked periodically.
                US.OMS==1B14    ; once a minute message to operator,  used in
                                ;  conjunction with US.OIR.
                US.PRQ==1B15    ; positioning required on this unit
                US.TAP==1B16    ; device type tape
                US.PSI==1B17    ; tape - online/offline/rewind done transition
                US.DSK==1B18    ; Disk type device
                US.OR1==1B19    ; 1st overdue rewind timer bit
                US.OR2==1B20    ; 2nd overdue rewind timer bit
                US.2PT==1B21    ; Drive is potentially dual-ported
                US.ORC==US.OR1!US.OR2; overdue rewind field
                US.TPD==1B22    ; Disk is offline to prevent three ports
                US.BDK==1B23    ; CI broadcast needed
                US.RTY==7B26    ; Retry count field (bits 24,25,26)
                US.CIA==1B27    ; CI available
                US.UNA==1B28    ; Device unavailable (like 16 bit disk)

                BITS 32-35 contain unit type code (.USTYP):

                .UTRP4 = 1      ; RP04
                .UTRS4 = 2      ; RS04 (drum)
                .UTT16 = 3      ; TU16 (TU45)
                .UTTM2 = 4      ; TM02 as a unit
                .UTRP5 = 5      ; RP05
                .UTRP6 = 6      ; RP06
                .UTRP7 = 7      ; RP07
                .UTRP8 = 10     ; RP08
                .UTRM3 = 11     ; RM03
                .UTTM3 = 12     ; TM03 AS A UNIT
                .UTT77 = 13     ; TU77
                .UTTM7 = 14     ; TM78
                .UTT78 = 15     ; TU78
                .UTDXA = 16     ; DX20-A FOR TAPES
                .UTT70 = 17     ; TU70
                .UTT71 = 20     ; TU71
                .UTT72 = 21     ; TU72
                .UTT73 = 22     ; TU7x
                .UTDXB = 23     ; DX20-B FOR DISKS

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 48
        A Look at Some of the Disk Stuff


                .UTP20 = 24     ; RP20
                .UTNOD = 25     ; CI NODE WITH NO MSCP SERVER
                .UTHSC = 26     ; HSC-50
                .UTR80 = 27     ; RA80
                .UTR81 = 30     ; RA81
                .UTR60 = 31     ; RA60
                .UTR82 = 32     ; RA82 (FUTURE)
                .UTR62 = 33     ; RA62 (FUTURE)
                .UTTA7 = 34     ; TA78


        The places where things are on the disk is as follows:

                BLOCK 0:        ; 11 bootstrap
                BLOCK 1:        ; primary HOME block
                BLOCK 2:        ; primary BAT block
                BLOCKS 3-11:    ; reserved
                BLOCK 12        ; secondary HOME block
                BLOCK 13        ; secondary BAT block

        The places where the disk pages for the above are stored is in the table
        HOME, which is defined in STG.  The BAT blocks are defined in PROLOG and
        the HOME blocks are defined in DSKALC and PROLOG.

        The SWSKIT programs DS, CHANS, and UNITS can help in displaying some  of
        this information on the running system.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 49
        Disk Features of FILDDT



                                   DISK FEATURES OF FILDDT
                                   -----------------------


        The FILDDT shipped after release 4 of TOPS-20 has two  new  commands  in
        relation to disk file structure maintenance.  They are:

              STRUCTURE (FOR PHYSICAL I/O IS) disk-structure
                        Examines the specified disk structure.

              DRIVE (FOR PHYSICAL I/O IS ON CHANNEL) c (CONTROLLER) k (UNIT) u
                      Examines the specified disk unit.

        These are privileged functions and one must ENABLE to use them.

        These two commands are nearly identical.  Their difference is in the way
        the structure is identified.  To use the STRUCTURE command the structure
        must be mounted.  The  STRUCTURE  command  is  useful  for  examining  a
        multi-pack  structure.   The  DRIVE  command is useful for examining the
        file  system  of  a  structure  which  cannot  be   mounted.    Channel,
        controller,  and  unit numbers can be found from the programs UNITS, DS,
        SYSDPY, or OPR.

        Word addressing is in the same format as in other forms of DDT.

        It is easier to understand exactly what  the  disk  will  look  like  in
        FILDDT  if  you  keep in mind that all sectors will be packed in the DDT
        address space, without regard for sector size, starting at  DDT  address
        0.   For  instance, on an RP06 there are four sectors per memory page or
        200 (octal) words per sector.  Therefore, sector zero of  the  structure
        will  begin  at  FILDDT address 0 and end at memory address 177 (octal).
        Sector 1 will begin at address 200 and end at 377.  All  supported  disk
        drives  except  the RP20 have 200 (octal) words per sector.  On the RP20
        there are 1000 (octal) words per sector (one  page).   All  index  block
        addresses  and  most monitor disk addresses are in sectors.  That is why
        it is important to be able to translate  between  sector  addresses  and
        FILDDT memory addresses.

        The FILDDT option of ENABLE PATCHING is also available for use with  the
        DRIVE  and  STRUCTURE command.  With this option on, the user is able to
        modify specific words on the structure.  Another very convenient  FILDDT
        command  one  may  use  in  conjunction  with  the disk commands is LOAD
        (symbols from) input file spec.  One may specify any  file  here  but  a
        useful  one  is  SYSTEM:MONITR.  The symbol table to the MONITOR has HOM
        block sector addresses, FDB offsets etc.   When  a  file's  symbols  are
        loaded, one may also define his own symbols.  This is useful to remember
        addresses of data structures on the units.  For example,  after  finding
        the  index  block  to  a file, one could define a symbol, FILIDX at that
        address for easy referencing later on.

        When examining a  multi-pack  structure  using  the  STRUCTURE  command,
        addressing  the  first unit is exactly as if there were only one unit in
        the structure.  FILDDT addresses of sectors on  the  other  units  begin

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 50
        Disk Features of FILDDT


        immediately  after the last address for the first unit of the structure.
        For example, consider that we would like to examine the BAT  blocks  for
        the second unit of a two pack STR:  on RP06 drives.

        An RP06 contains 304000.  sectors per unit and 128.  words  per  sector.
        The first FILDDT address for the second unit of a RP06 two pack STR:  is
        304000.*128.=38912000.  or 224340000 (octal)
                
        FILDDT>LOAD (SYMBOLS FROM) SYSTEM:MONITR.EXE
        [22722 symbols loaded from file]
        FILDDT>STRUCTURE (FOR PHYSICAL I/O IS) PS:
        [Looking at file structure PS:]

                ; starting address of second unit in structure plus sector
                ; address of BAT blocks (2) times words per sector gives
                ; FILDDT address of start of BAT blocks for that unit

        224340000+2*200=224,,340400
        224,,340400[   424164,,0   $6T;   BAT   ; Found it
                
        For another example, let's say we would like to find the start of  the
        ROOT-DIRECTORY symbol table:

        NWSEC=200               ; number of words per sector
        HM1BLK=1                ; sector number of HOM block
        HOMRXB=10               ; offset for index block of ROOT-DIRECTORY
            
                                ; HOM block sector number times words
                                ; per sector equals address of HOM start
        HM1BLK*NWSEC[   505755,,0   $6T;HOM  
        HM1BLK*NWSEC+HOMRXB[   10,,5740 ; plus offset to address of index block
                                ; sector number of index block times
                                ; words per sector gives address of
        5740*NWSEC[   10,,5744  ; ROOT-DIRECTORY index block
                                ; NOTE:  Bit 14 (DSKAB) specifies this
                                ; address as a disk sector address.
                                ; sector addresses are bits 15-35
        RTDIDX:                 ; define symbol for index block here
                                ; sector number of first page of
                                ; ROOT-DIRECTORY times number of words
                                ; per sector gives the address of first
        5744*NWSEC[   400300,,100 ; page of ROOT-DIRECTORY
        RTDIR0:                 ; define start of page 0 of ROOT-DIR
        RTDIR0+3[   30610       ; plus 3 for start of symbol table
                                ; NOTE: adr is a 'directory address'
                                ;       offset 610 of directory page 30
        RTDIDX+30[   10,,6250   ; get sector adr of page 30 of ROOT-DIR
                                ; sector adr of page 30 times words per
                                ; sector gives address of page 30 of
                                ; ROOT-DIRECTORY.
        6250*NWSEC+610[   400400,,1 ; Add offset for symbol table start
        RTDSYM:                 ; Define a symbol here
        ^E
        FILDDT>EXIT

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 51
        Supported Disk Drive Parameters




                            SUPPORTED DISK DRIVE PARAMETERS
                            -------------------------------



           TYPE  SIZE (PAGES)   MEDIA   #/STRUCTURE(1)  CONTROLLER      CPU
           ----  ------------   -----   --------------  ----------      ---

           RP04    38,000.      Pack       6            Massbus         KL(2)

           RP05    38,000.      Pack       6            Massbus         KL(2)

           RP06    76,000.      Pack       3(3)         Massbus         KL/KS

           RM03    30,340.      Pack       2            Massbus         KS

           RP20   201,420.      Fixed      3(4)         Massbus+DX20B   KL

           RP07   216,376.      Fixed      2            Massbus         KL

           RA80    53,508.      Fixed      6            CI20+HSC50      KL

           RA81   200,928.      Fixed      3            CI20+HSC50      KL

           RA60    90,516.      Pack       3            CI20+HSC50      KL


           (1) -- depends on addressing, MXPGUN, MXSTRU, and BTBSIZ; SPD is final
           (2) -- disk model no longer sold
           (3) -- 2 packs/structure on a KS or Model A machine
           (4) -- 1 spindle/structure on Model A machines

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 52
        Supported Tape Drive Parameters




                            SUPPORTED TAPE DRIVE PARAMETERS
                            -------------------------------



           TYPE  SPEED    DENSITY  #/CONTROLLER   CONTROLLER    CPU     NOTES
           ----  -----    -------  ------------   ----------    ---     -----

           TU45  75ips   800/1600   8(KL)/4(KS)   TM02/TM03     KL/KS    (1)(2)

           TU70   100    800/1600        8        DX20-A/TX02   KL       (1)

           TU71   100     556/800        8        DX20-A/TX02   KL       (1)(3)

           TU72   100   1600/6250        8        DX20-A/TX02   KL       (4)

           TU77   125    800/1600        4        TM02/TM03     KL/KS    (2)

           TU78   125   1600/6250        4        TM78          KL

           TA78   125   1600/6250        4        HSC50/TS78    KL       (5)


           (1) -- tape model no longer sold
           (2) -- TM02 controller no longer sold
           (3) -- 7 track model
           (4) -- TX05 option allows 16 drives/DX20 using 2 TX02s
           (5) -- Planned for some TOPS-20 release after 6.0

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 53
        TOPS-20 Scheduler Test Routines


                            TOPS-20 SCHEDULER TEST ROUTINES
                            -------------------------------

        The following is a tabulation of (hopefully) all of the scheduler  tests
        used by the TOPS-20 monitor, time-frame approximately Release 6.1.  This
        includes ARPA and DECNET tests.  This is  the  data  one  finds  in  the
        monitor table FKSTAT indexed by fork number for forks which have blocked
        and left the GOLST (i.e.  LH(FKPT) contains WTLST).  The format  of  the
        FKSTAT  table words is TEST DATA,,TEST ROUTINE ADDRESS.  Sometimes table
        FKSTA2 contains additional data.  (Version 6.0 and later.) The scheduler
        test  routines  are called periodically to determine if a process can be
        unblocked.  This is indicated by a skip return from the scheduler  test.
        A nonskip return is taken if the process cannot yet be unblocked.

        When examining the monitor because of a hung job  or  fork,  the  FKSTAT
        table  can  often reveal the reason the fork is hung, and this sometimes
        even allows corrective action to be taken.

        The table below gives routine name, what you should expect to see in the
        FKSTAT  table,  and  the  module in which the scheduler test is defined,
        followed finally by a short description of what the particular condition
        is which is being tested.  Use SYSDPY to view the running system.

        Those tests defined in PAGUTL are found in  PAGEM  in  earlier  monitors
        than release 6.0.



                                    SCHEDULER TESTS


         TEST           CONTENTS OF T1 AT TIME OF SCHEDULER CALL        DEFINED
         ----           ----------------------------------------        -------

        BATTST          [UNIT #,,BATTST]                                [DSKALC]
                        Wait for US.BLK, the lock bit for the BAT blocks
                        on the unit, in the UDB to be zero.

        BLOCKM          [TIME,,BLOCKM]                                  [SCHED]
                        Wait for TIME in BLOCKM format which is the low
                        order 17 bits of the desired future time to be
                        compared against a suitably masked TODCLK.

        BLOCKT          [TIME,,BLOCKT]                                  [SCHED]
                        Wait for TIME in BLOCKT format which is a
                        value that is shifted left 10 bits and compared
                        against a suitably masked TODCLK, providing a
                        longer delay than BLOCKM, but less precision.

        BLOCKW          [TIME,,BLOCKW]                                  [SCHED]
                        Wait for TIME in BLOCKW format (same as BLOCKM).

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 54
        TOPS-20 Scheduler Test Routines


         TEST           CONTENTS OF T1 AT TIME OF SCHEDULER CALL        DEFINED
         ----           ----------------------------------------        -------

        CDRBLK          [UNIT NUMBER,,CDRBLK]                           [CDRSRV]
                        Wait for card-reader offline, or not waiting for
                        a card.

        CFGVOT          [0,,CFGVOT]                                     [CFSSRV]
                        Wait for HSHTFW, HSHWVT, or HSHUGD to be set in
                        block.  FKSTA2 has pointer to block.

        CFRCNW          [TIME,,CFRCNW]                                  [CFSSRV]
                        Wait for DLYLOK to be non-positive or BLOCKT form
                        time to have expired.

        CFSRWT          [0,,CFSRWT]                                     [CFSSRV]
                        Wait for block released, wakeup timer, or no more
                        users of block.  FKSTA2 has pointer to block.

        CHKLOK          [ADDRESS,,CHKLOK]                               [NSPSRV]
                        Wait for NSP block lock at address to free.

        CLOTST          [0,,CLOTST]                                     [NIUSR]
                        Wait for PRCCP (port is closed) to be set in the
                        port block.  FKSTA2 has pointer to port block.

        COFTST          [TIME,,COFTST]                                  [MEXEC]
                        Wait for job in FKJOBN to be attached or time
                        in BLOCKT form to elapse.

        CTMTST          [0,,CTMTST]                                     [CTHSRV]
                        Wait for listen (MSGCWL), linked block on output
                        (MSGBLW), queued CTERM lines (CTMATN), or queued
                        DCN links (MSGATN) set.

        D6BWT           [INDEX,,D6BWT]                                  [DTESRV]
                        Wait for D6STS(INDEX) to be .GE. zero, indicating
                        a free condition.

        D6DWT           [INDEX,,D6DWT]                                  [DTESRV]
                        Wait for D6%DDN to be set in D6STS(INDEX) to
                        indicate read data done.

        D6RWT           [INDEX,,D6RWT]                                  [DTESRV]
                        Wait for D6%RDN to be set in D6STS(INDEX) to
                        indicate response header.

        D6WKT           [INDEX,,D6WKT]                                  [DTESRV]
                        Wait for timer in D6CLK(INDEX) to expire.

        DBWAIT          [DTE #,,DBWAIT]                                 [DTESRV]
                        Wait for the TO-10 doorbell from the given DTE.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 55
        TOPS-20 Scheduler Test Routines


         TEST           CONTENTS OF T1 AT TIME OF SCHEDULER CALL        DEFINED
         ----           ----------------------------------------        -------

        DGLTST          [0,,DGLTST]                                     [DIAG]
                        Wait for DIAGLK lock to be free.

        DGUIDL          [UDB ADDRESS,,DGUIDL]                           [DIAG]
                        Wait for the unit to show as idle in the UDB.

        DGUTST          [UDB ADDRESS,,DGUTST]                           [DIAG]
                        Wait for the maintenance bit to set in the UDB.

        DISET           [ADDRESS,,DISET]                                [SCHED]
                        Wait for contents of ADDRESS to be zero.

        DISGET          [ADDRESS ,,DISGET]                              [SCHED]
                        Wait for contents of ADDRESS to be positive.

        DISGT           [ADDRESS,,DISGT]                                [SCHED]
                        Wait for contents of ADDRESS to be greater than
                        zero.

        DISLT           [ADDRESS,,DISLT]                                [SCHED]
                        Wait for contents of address to be less than
                        zero.

        DISNT           [ADDRESS,,DISNT]                                [SCHED]
                        Wait for contents of ADDRESS to be non-zero.

        DSKRT           [PAGE #,,DSKRT]                                 [PAGEM]
                        Wait for CSTAGE for PAGE # to not be PSRIP,
                        meaning disk read completed.

        DWRTST          [PAGE #,,DWRTST]                                [PAGUTL]
                        Wait for DRWBIT to clear in CST3(PAGE #),
                        meaning write completed.

        ENQTST          [FORK #,,ENQTST]                                [ENQ]
                        Wait for the lock on ENFKTB+FORK #.

        FEBWT           [ADDRESS OF FE UDB,,FEBWT]                      [FESRV]
                        Wait for EOF or input bytes available from FE.
                        Wake also on invalid assignment.

        FEDOBE          [ADDRESS OF FE UDB,,FEDOBE]                     [FESRV]
                        Wait for output buffer empty and all bytes are
                        acknowledged by the FE.  Wake also if not a 
                        valid assignment.

        FEFULL          [ADDRESS OF FE UDB,,FEFULL]                     [FESRV]
                        Wait for the current count of output bytes to be
                        less than the count of bytes in the interrupt
                        buffer.  Wake also on invalid assignment.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 56
        TOPS-20 Scheduler Test Routines


         TEST           CONTENTS OF T1 AT TIME OF SCHEDULER CALL        DEFINED
         ----           ----------------------------------------        -------

        FORCTM          [SUPERIOR FORK INDEX,,FORCTM]                   [SCHED]
                        Identifiable wait forever, forced termination.

        FRZWT           [PREVIOUS TEST,,FRZWT]                          [FORK]
                        Identifiable wait forever, frozen fork.

        HALTT           [SUPERIOR FORK INDEX,,HALTT]                    [SCHED]
                        Identifiable wait forever for halted fork.

        HIBERT          [TIME,,HIBERT]                                  [SCHED]
                        Wait for TIME in BLOCKT format.

        INTBOT          [BIT #,,INTBOT]                                 [IPIPIP]
                        Wait for bit in INTWTB table to be zero.

        INTBPT          [0,,INTBPT]                                     [IPIPIP]
                        Wait for internet fork to be runnable (INTFLG
                        nonzero or INTTIM has passed).

        INTBZT          [BIT #,,INTBZT]                                 [IPIPIP]
                        Wait for bit in INTWTB table to be one.

        INTOOT          [<BIT1>B8+<BIT2>B17,,INTOOT]                    [IPIPIP]
                        Wait for either or both of two bits in INTWTB to
                        be set.

        INTZOT          [<BIT1>B8+<BIT2>B17,,INTZOT]                    [IPIPIP]
                        Wait for either bit1 zero or bit2 one or both
                        conditions in INTWTB.

        J0TCOT          [LINE #,,J0TCOT]                                [TTYSRV]
                        Waits for Job 0 output to CTY to complete, with
                        timeout checks.

        JB0TST          [TIME,,JB0TST]                                  [MEXEC]
                        Wait for JB0FLG set nonzero for explicit request
                        or time in BLOCKT form to elapse.

        JRET            [0,,JRET]                                       [SCHED]
                        Wait forever, interruptible.

        JSKP            [0,,JSKP]                                       [SCHED]
                        Unconditional skip used to schedule immediately.

        JTQWT           [0,,JTQWT]                                      [SCHED]
                        Wait for JSYS trap queue.

        LCKTSS          [ADDRESS,,LCKTSS]                               [IO]
                        Wait for lock at ADDRESS to unlock, lock it.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 57
        TOPS-20 Scheduler Test Routines


         TEST           CONTENTS OF T1 AT TIME OF SCHEDULER CALL        DEFINED
         ----           ----------------------------------------        -------

        LKDSPT          [0,,LKDSPT]                                     [STG]
                        Wait for room in LDTAB table of directories
                        currently locked.

        LKDTST          [INDEX INTO LDTAB,,LKDTST]                      [STG]
                        Wait for bit in LCKDBT to clear, indicating
                        directory unlocked.

        LODWAT          [ADDRESS OF STATUS WORD,,LODWAT]                [LINEPR]
                        Wait for flag LP%LHC to set in the addressed
                        word, indicating loading has completed of the
                        VFU or RAM file.

        LOKWAI          [LINE #,,LOKWAI]                                [CTHSRV]
                        Wait for link error or CTERM link can send now.

        LPTDIS          [UNIT ADDRESS,,LPTDIS]                          [LINEPR]
                        Wait for an error condition on the addressed
                        unit, or for all buffers cleared and no bytes
                        still in the front-end, before finishing close
                        operation on the device.

        MTARWT          [IORB ADDRESS,,MTARWT]                          [MAGTAP]
                        Wait for IRBFA in the IORB to indicate that this
                        IORB is no longer active.

        MTAWAT          [UNIT #,,MTAWAT]                                [MAGTAP]
                        Wait for all outstanding IORBs for unit to be
                        finished.

        MTDWT1          [UNIT #,,MTDWT1]                                [MAGTAP]
                        Wait for the count of outstanding requests on the
                        unit to go to one.

        NIDLST          [0,,NIDLST]                                     [DNADLL]
                        Wait for NIDLOK read counter lock to be free (-1).

        NISCHK          [0,,NISCHK]                                     [DNADLL]
                        Wait for RCCFLG data returned flag to be negative.

        NSPLWB          [0,,NSPLWB]                                     [LLINKS]
                        Wait for NSP lock NSPLOK to be free.

        NSPTST          [0,,NSPTST]                                     [NSPSRV]
                        Wait for KDPFLG nonzero, indicating KMC11 wants
                        service, or MSGQ nonzero, indicating messages to
                        process.

        NVTNTT          [<0:8>OPTION #,<9:17>LINE #,,NVTNTT]            [TVTSRV]
                        Wait for completed NVT negotiation.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 58
        TOPS-20 Scheduler Test Routines


         TEST           CONTENTS OF T1 AT TIME OF SCHEDULER CALL        DEFINED
         ----           ----------------------------------------        -------

        OFNLKT          [OFN,,OFNLKT]                                   [PAGUTL]
                        Wait for OFN unlocked--SPTLKB zero in SPTH(OFN).

        PIDWAT          [FORK #,,PIDWAT]                                [IPCF]
                        Wait for bit for fork in PDFKTB to set.

        RCCTST          [0,,RCCTST]                                     [NIUSR]
                        Wait for PSI pending for fork or callback received.
                        FKSTA2 has pointer to port block.

        RCRWAI          [REQUEST #,,RCRWAI]                             [LLMOP]
                        Wait for request complete in request block (RB).
                        RB address in FKSTA2.

        RCVTST          [0,,RCVTST]                                     [NIUSR]
                        Wait for PSI pending for fork of buffers available
                        in the port block.  FKSTA2 has pointer to port block.

        RLDTST          [0,,RLDTST]                                     [DTESRV]
                        Wait for master DTE running.

        SALTST          [TIME,,SALTST]                                  [TTYSRV]
                        Waits for SALLCK unlocked or time passed.

        SALWAT          [LINE #,,SALWAT]                                [TTYSRV]
                        Waits for line to finish using sendall pointer.

        SCJBLK          [0,,SCJBLK]                                     [SCJSYS]
                        Wait for PTBLK (port blocked) to be zero in the
                        port block.  FKSTA2 has pointer to port block.

        SCTLWB          [0,,SCTLWB]                                     [SCLINK]
                        Waits for session control lock SCTLOK to be free.

        SEBTST          [0,,SEBTST]                                     [SYSERR]
                        Wait for SECHKF to go nonzero before starting
                        Job 0 task to write queued SYSERR entries.

        SEEALL          [0,,SEEALL]                                     [TTYSRV4]
                        Waits for SNDALL to go to zero, indicating the
                        send-all buffer available.

        SJBGON          [0,,SJBGON]                                     [SCJSYS]
                        Wait for SLB associated with SJB to be disposed of.
                        FKSTA2 has pointer to SJB.

        SPCTST          [0,,SPCTST]                                     [DTESRV]
                        Wait for a node.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 59
        TOPS-20 Scheduler Test Routines


         TEST           CONTENTS OF T1 AT TIME OF SCHEDULER CALL        DEFINED
         ----           ----------------------------------------        -------

        SPMTST          [0,,SPMTST]                                     [PAGUTL]
                        Wait for page in SPMTPG to be on SPMQ or the
                        time SPMTIM to expire.

        SQLTST          [0,,SQLTST]                                     [IMPDV]
                        Wait for the special queues lock SQLCK and lock
                        it.

        STRTST          [SDB ADDRESS OF STRUCTURE,,STRTST]              [MSTR]
                        Wait for the structure lock to be free.

        STSWAT          [ADDRESS OF STATUS WORD,,STSWAT]                [CDRSRV]
                        Wait for flag CD%SHA to come on in the addressed
                        word, indicating that cardreader status has
                        arrived.

        STSWAT          [ADDRESS OF STATUS WORD,,STSWAT]                [LINEPR]
                        Wait for flag LP%SHA to set in the addressed
                        word, indicating that printer status has
                        arrived.

        SUSFKT          [FORK #,,SUSFKT]                                [FORK]
                        Wait for fork to be on WTLST in either SUSWT
                        OR FRZWT.

        SWPRT           [PAGE #,,SWPRT]                                 [PAGEM]
                        Wait for CSTAGE for PAGE # to not be PSRIP,
                        meaning swap read completed.

        SWPWTT          [0,,SWPWTT]                                     [PAGEM]
                        Wait for NRPLQ nonzero.  Increment CGFLG each
                        time test is unsuccessful.

        TCIPIT          [FORK #,,TCIPIT]                                [TTYSRV]
                        Waits for no interrupts pending for FORK #.

        TCITST          [LINE #,,TCITST]                                [TTYSRV]
                        Wait for line inactive, no fork in input wait,
                        or input buffer non-empty.

        TCOTST          [LINE #,,TCOTST]                                [TTYSRV]
                        Wait for line inactive, or output buffer not
                        too full to add a character to it.

        TCPABT          [FORK #,,TCPABT]                                [TCPBBN]
                        Wait for all TCP connection aborts completed.

        TCPOTS          [<TOPNF>B8+<TERRF>B17,,TCPOTS]                  [TCPJFN]
                        Wait for TCP connection open, or error state.
                        FKSTA2 has host number index.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 60
        TOPS-20 Scheduler Test Routines


         TEST           CONTENTS OF T1 AT TIME OF SCHEDULER CALL        DEFINED
         ----           ----------------------------------------        -------

        TCPTST          [ADDR,,TCPTST]                                  [TCPJFN]
                        Wait for TCP%DN (buffer done) on in word .TCPBF
                        of block at ADDR.

        TRMTS1          [0,,TRMTS1]                                     [FORK]
                        Identifiable wait forever for inferior fork termination.

        TRMTST          [FORK #,,TRMTST]                                [FORK]
                        Wait for FORK # to be on WTLST for either HALTT
                        or FORCTM.

        TRP0CT          [MINIMUM NRPLQ,,TRP0CT]                         [PAGEM]
                        Wait for NRLPQ to be above stated minimum or
                        normal minimum.  Increment CGFLG each time
                        test is unsuccessful.

        TSACT1          [LINE #,,TSACT1]                                [TTYSRV]
                        Wait until line inactive, becoming active, or
                        has a full length dynamic block assigned.

        TSACT2          [LINE #,,TSACT2]                                [TTYSRV]
                        Wait for line available--inactive or fully
                        active.

        TSACT3          [LINE #,,TSACT3]                                [TTYSRV4]
                        Wait for line inactive--dynamic data unlocked.

        TSTSAL          [0,,TSTSAL]                                     [TTYSRV4]
                        Wait for SALCNT to go to zero, indicating the
                        send-all is finished for this buffer.

        TTBUFW          [NUMBER,,TTBUFW]                                [TTYSRV]
                        Wait for NUMBER of buffers.

        TTIBET          [LINE #,,TTIBET]                                [TTYSRV]
                        Wait for line inactive or input buffer empty.

        TTOAV           [LINE #,,TTOAV]                                 [TTYSRV]
                        Wait for line inactive and output buffer not
                        empty.

        TTOBET          [LINE #,,TTOBET]                                [TTYSRV]
                        Wait for line inactive or output buffer empty.

        UDITST          [0,,UDITST]                                     [PHYSIO]
                        Wait for at least two free IORBs on UIOLST.

        UDWDON          [IORB ADDRESS,,UDWDON]                          [PHYSIO]
                        Wait for IS.DON to set in IRBSTS for this IORB.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 61
        TOPS-20 Scheduler Test Routines


         TEST           CONTENTS OF T1 AT TIME OF SCHEDULER CALL        DEFINED
         ----           ----------------------------------------        -------

        USGWAT          [0,,USGWAT]                                     [JSYSA]
                        Wait for lock on queued USAGE blocks to free.

        VBWAIT          [0,,VBWAIT]                                     [CFSSRV]
                        Wait for VOTQ to be nonzero (vote buffers are
                        available).

        VOTDWT          [0,,VOTDWT]                                     [CFSSRV]
                        Wait for HSHVRS set (restart) or HSHDLY zero or
                        all votes in.  FKSTA2 has pointer to block.

        VOTSWT          [0,,VOTSWT]                                     [CFSSRV]
                        Wait for HSHVRS set (restart) or all votes in.
                        FKSTA2 has pointer to block.

        VVBWAT          [UNIT #,,VVBWAT]                                [TAPE]
                        Wait for the MDA to reset TPVV handling EOV.

        WTFKT           [FORK #,,WTFKT]                                 [FORK]
                        Wait for fork to be on WTLST.

        WTSPTT          [PAGE #,,WTSPTT]                                [SCHED]
                        Wait for share count on PAGE # to go to 1.

        XMTTST          [0,,XMTTST]                                     [NIUSR]
                        Wait for PSI pending for fork or PRTIP (transmit
                        still in progress) zero in port block.  FKSTA2 has
                        pointer to port block.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 62
        TOPS-20 Page Zero Locations




                              TOPS-20 PAGE ZERO LOCATIONS
                              ---------------------------




        The following text outlines the uses of  memory  in  page  zero  of  the
        TOPS-20 monitor as of Release 6.1.


        ADDR   MNEMONIC USAGE
        ====   ======== =====

          0-17    --    Shadow ACs, not used.

         20     SCTLW   Scheduler halt request word (see SWTST in SCHED).   Word
                         of   function  bits,  current  functions  include  Halt
                         timesharing, wait for system down,  manual  pause,  and
                         reset FE protocol.

         21       --    Used by BOOT to build CCW lists (unused by monitor).

         22       --    Same as 21;  both unused for KS10 systems.

         23     CRSHTM  Initial time for  reload;   -1  =>  time  not  set  yet.
                         Contains   the  date/time  that  the  system  was  last
                         reloaded.   May  see  -1  after  forced  reload  on  KS
                         processor.   BUGSTO  (APRSRV) copies TADIDT into it for
                         each BUGHLT/CHK/INF.

         24     SEBQOU  Pointer to queued SYSERR blocks not yet written.

         25     MMAPWD  Pointer to MMAP for SETSPD.  Contains MMAP.

         26     BUGHAD  Code around SYSLD1 (STG) puts LH into  BUGCHK,  RH  into
                         BUGHLT  after  a  reload.   No  one else uses it, so it
                         should contain zero.

         27     CRSTD1  Current time is saved here on each BUGHLT/CHK/INF.  This
                         is the value that gets into the SYSERR block.  Contains
                         the   date/time   for   the   system's   most    recent
                         BUGHLT/CHK/INF.

         30     SHLTW   Scheduler  halt  word,  depositing  a  nonzero  contents
                         requests system shutdown.

         31     RLWORD  KS  only;   used  for  front-end  communication,  flags,
                         keep-alive, etc.  (see PROKS).  Unused on KL.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 63
        TOPS-20 Page Zero Locations


         32     CTYIWD  KS only;  used for front-end communication, used for the
                         CTY input location.  Unused on KL.

         33     CTYOWD  KS only;  used for front-end communication, used for the
                         CTY output location.  Unused by KL.

         34     KLIIWD  KS only;  used for front-end communication, used for the
                         KLINIK input location.  Unused by KL.

         35     KLIOWD  KS only;  used for front-end communication, used for the
                         KLINIK output location.  Unused by KL.

         36       --    Unused/reserved.  Holds KS RHBASE during boot.

         37       --    Unused/reserved.  Holds KS unit number during boot.

         40     .JBUUO  Monitor's location 40.  Holds KS tape info during boot.

         41     .JB41   Monitor's LUUO dispatch word.
                         Contains XPCW LUUBLK.

         42-43    --    Unused/reserved.

         44     .JBREL  Job Data Area word filled in by LINK.  Contains 777.

         45-67    --    Unused/reserved.

         70     PWRTRP  Location executed by the front-end on powerfail restart.
                         Contains JRST PWRRST.

         71     RLDADR  Executed  by  the  front-end  on  certain   (keep-alive)
                         reloads.   APRSRV  demands  this  location be PWRTRP+1.
                         Contains XPCW RLODPC which winds up  at  RLDHLT  for  a
                         KPALVH BUGHLT.

         72       --    Contains address of EDDTF word.

         73     CRSTAD  Is supposed to contain date/time of last crash.  Code in
                         STG  checks  it  to  decide  to  restore  the data from
                         BUGHAD.  During system startup for KL-10s the  word  is
                         used   to   set   the   reload  date/time  if  nonzero.
                         Apparently it gets no real  use  on  KS-10s.   Contains
                         zero while system is in normal operation.

         74     .JBDDT  JOBDDT location.
                         Contains DDTZ (EDDT entry point).

         75     .JBHSO  Unused/reserved.

         76       --    Contains address of DBUGSW word.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 64
        TOPS-20 Page Zero Locations


         77       --    Contains address of DCHKSW word.

        100-107   --    Reserved for use by the front-end command language.

        110     STSBLK  KL-Status block pointer, virtual address.  Contains zero
                         if status reporting is not enabled.

        111       --    Physical address (MAP) of above virtual address.

        112     .JBEDV  Pointer to Exec Data Vector
                         Contains MONEDV.

        113     SPRCNT  Running count of SPEAR blocks queued.
                         Used by SETSPD.  Initialized to -1.

        114-117   --    Unused/reserved.

        120     .JOBSA  TOPS-10 style start address.
                         Contains NPVARZ+1,,EVGO.

        121     .JBFF   Contains first free address not loaded by LINK.
                         Contains NPVARZ+1.

        122-132   --    Unused/reserved.

        133     .JBCOR  Job Data Area location set by LINK.  LH contains highest
                         low  segment  address loaded with data.  RH refers to a
                         SAVE argument for highest page.

        134-136   --    Unused/reserved.

        137     .JBVER  Job Data Area version number word.
                         Contains current monitor version number.

        140     EVDDT   Monitor startup transfer vector;  enter EDDT.
                         Contains JRST DDTZ.

        141       --    Reset and go to EDDT location.
                         Contains JRST SYSDDT.

        142     EVDDT2  Copy of 140.
                         Contains JRST DDTZ.

        143     EVSLOD  Entry to initialize file system, used for installation.
                         Contains JRST SYSLOD.

        144     EVVSM   Entry to verify swappable monitor on startup.
                         Contains JRST SYSVSM.

        145     EVRST   Restart the system location.
                         Contains JRST SYSRST.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 65
        TOPS-20 Page Zero Locations


        146     EVLDGO  Reload and start the system location.
                         Contains JRST SYSGO.

        147     EVGO    Start the monitor location.
                         Contains JRST SYSGO1.

        150     DDTPRS  DDT present flag;  EDDT is present if nonzero.
                         Contains -1 initially, cleared later for EDDTF not set.

        151     BUTRXB  Defined in BOOT and STG but not  used  (BOOT  reads  the
                         disk   address  of  the  Root-directory  from  the  HOM
                         blocks).  Contains zero.  Pre-version 6.

        152     BUTMUN  Defined in BOOT and STG but not  used  (BOOT  reads  the
                         values  from  the  HOM blocks, and uses variable MAXUNI
                         instead).  Contains zero.  Pre-version 6.

        153-162 BUTDRT  Defined in BOOT and STG but not used (BOOT uses internal
                         variable  DSKTAB  for  logical  to  physical  structure
                         mapping).  Contains zeros.  Pre-version 6.

        163-201 BUTCMD  ASCIZ file  name  of  monitor;   used  for  booting  the
                         swappable  monitor  with  calls  to VBOOT for segments.
                         Pre version 6.

        200     BUTCOD  Unique code to identify TOPS-20 monitor EXE.  (V6)
                         Contains BTCOD = 707707,,707707 to do V6-type load.

        201     BUTLEN  Length of BOOT communications region.  (V6)
                         Contains BTLEN = 5.

        202     BUTPGS  Start,,End virtual addresses of VBOOT  pages.   Used  to
                         reference   and  finally  unlock/destroy  VBOOT  pages.
                         Pre-version 6.

                BUTFLG  BOOT flags.  (V6)

        203     BUTEPT  Contains in LH:  Address of the VBOOT EPT page.
                         RH:  Address of the VBOOT page table page.
                         Pre-version 6.

                BUTLLM  Lower load limit for BOOT.  (V6)

        204     BUTPHY  Contains in LH:  Minus number of pages to map.
                         RH:  Address of first page to map  (for  the  monitor).
                         Typically contains -6,,NPVARZ for four pages of code, a
                         file data page and an index block page.  Used with  the
                         value in BUTVIR.  Pre-version 6.

                BUTULM  BOOT upper load limit.  (V6)
                         Highest monitor page BOOT will load EXE file data into.
                         Contains 1,,NRCODL.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 66
        TOPS-20 Page Zero Locations


        205     BUTVIR  Virtual address of first page of BOOT to map.  Typically
                         will  contain 772000.  Used in conjunction with BUTPHY.
                         Pre-version 6.

                BUTERR  Last error from BOOT.  (V6)

        206     BOOTFL  BOOT flags word, 0 => normal, nonzero =>  special  boot.
                         The  contents  is supposed to be the index into a table
                         (BOOTD) designating how to boot the swappable  monitor.
                         An ILBOOT BUGHLT results if the index is too large.  In
                         the SYSGO routine the value IRBOOT is put into  BOOTFL;
                         the table BOOTD contains entries of JRST GSMDSK for all
                         entries but the IRBOOT offset, which has  JRST  GSMIRB.
                         Pre-version 6.

        207     BUTSTA  Start address of BOOT (VBOOT) for SWPMON load.
                         Pre-version 6.

        207-236 PHYPZS  Formerly  used  for  page  zero  I/O  use   by   PHYSIO.
                         Currently unused, contain zero.  Not defined after V6.

        237     SPTWD   Physical address of start of  SPT,  used  by  SETSPD  in
                         processing the dump file for SPEAR entries.

        240     MSCWD   Physical address of start of MSECTB, used by  SETSPD  in
                         processing the dump file for SPEAR entries.

        241-477   --    Not used, contain zero.

        500-777 TMPSMM  Temporary swappable monitor map saved here during EVLDGO
                         startup.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 67
        TOPS-20 Monitor Sections


                                TOPS-20 MONITOR SECTIONS
                                ------------------------




        The TOPS-20 monitor makes use of a number of sections of  address  space
        on  extended  addressing  machines.   For  release  6.1, this number has
        increased.  The following table lists the defined  monitor  sections  at
        approximately the timeframe of release 6.1.



              Number    Symbol  Use of the Section
              ------    ------  ------------------

                 0      MSEC0   Section zero data and code
                 1      MSEC1   Section one data and code
                 2      DRSECN  Mapped directories
                 3      IDXSEC  Mapped disk index table
                 4      BTSEC   Mapped disk bit table
                 5      SYMSEC  Monitor symbol table, DDT, CSTs,...
                 6      XCDSEC  Extended code section
                 7*     MFSEC0  Variable - symbol set to value of first
                                assignable section
                (7)     TABSEC  Tables - DST,...
               (10)     DNBSE1  DECnet buffers
                 -      CTSSEC  CTS terminal database
               (11)     CFSSEC  CFS buffers
               (12)     INTSEC  ARPANET (Internet) buffers
               (13)     RESSEC  RSE, NRE, NRPE psects - resident free space
               (14)     SWFSEC  Swappable free space
               (15)     FFMSEC  Variable - symbol set to value of first free
                                assignable section
               (37)     PCDSEC  POSTCD section (user mode only)
               (37)     HGHSEC  Highest possible section value on KL-10 processor



            * Numbers in parentheses represent the values from a  "typical"  6.1
              monitor.  These numbers are assigned dynamically.  See STG.MAC for
              the definition of the MSECN macro and an explanation of assignable
              sections.   All  the  sections  from SYMSEC on are new for 6.0 and
              6.1.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 68
        TOPS-20 Monitor PSECTs


                                 TOPS-20 MONITOR PSECTS
                                 ----------------------




        The TOPS-20 monitor code assembles into a number of PSECTs with  varying
        purposes.   Release  6  and  6.1  have created even more PSECTs to worry
        about.   The  following  table  lists  the  defined  monitor  PSECTs  at
        approximately  the  timeframe of release 6.1.  Most PSECT beginnings are
        defined in LDINIT with the rest  in  STG.   POSTLD  terminates  all  the
        PSECTs  and  handles  whatever  address space rearrangement is necessary
        before saving the monitor EXE file.



                Release PSECT   Purpose
                ------- -----   -------

                        RSCOD   Resident monitor code and constant data
                        INCOD   Resident initialization code and constant data
                   6    SZCOD   Section-zero-only resident code
                   5    RSDAT   Resident non-zeroed data
                        PPVAR   Processor-private pages
                        RSVAR   Resident zero-initialized data
                   6    SYVAR   Symbol table data
                        NRVAR   Swappable zero-initialized data
                        PSVAR   PSB data
                        JSVAR   JSB data
                        NRCOD   Swappable monitor code and constant data
                        NPVAR   Swappable page variables
                        POSTCD  POSTLD code and data segment
                   6    ERVAR   Extended section resident variables
                   6    ENVAR   Extended section swappable variables
                   6    EPVAR   Extended section swappable page variables
                   6.1  ERCOD   Extended section resident code
                   6.1  ENCOD   Extended section swappable code
                   6.1  XRCOD   KDDT in extended section
                   6.1  XNCOD   MDDT in extended section
                        BGSTR   Bugstring texts
                        BGPTR   Pointers to bugstrings


        See SWSKIT document MONITOR-ADDRESS-SPACE.MEMOS for more detail.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 69
        TOPS-20 Monitor Universal Files


                            TOPS-20 MONITOR UNIVERSAL FILES
                            -------------------------------




        The following universal files are used to build a TOPS-20 monitor.



                Release  Name   Function
                -------  ----   --------

                        ACTSYM  Accounting file symbol definitions
                        MONSYM  General monitor symbol definitions
                        MACSYM  General monitor defintions


                 6.0    ANAUNV  ARPANET TCP/IP symbol definitions
                 6.0    GLOBS   Global symbol satisfaction definitions
                 6.0    MSCPAR  MSCP symbol defintions
                <6.1    NSPPAR  DECNET phase 2 and 3 symbol definitions
                        PHYPAR  PHYSIO-level device symbol definitions
                  *     PROKL   KL-10 specific definitions
                  *     PROKS   KS-10 specific definitions
                        PROLOG  General monitor definitions
                 6.0    SCAPAR  SCA symbol definitions
                        SERCOD  SYSERR (SPEAR) file symbol definitions


                 6.1    CTERMD  CTERM symbol definitions
                 6.1    D36PAR  DECNET-36 symbol definitions
                 6.1    NIPAR   Definitons for NI-20 service
                 6.1    SCPAR   DECNET session control symbol definitions
                 6.1    TTYDEF  Monitor terminal definitions



            NOTES:

                *  PROKL and PROKS have been combined into PROLOG for Release
                   6.0 and no longer exist independently.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 70
        TOPS-20 Job Zero Forks


                                 TOPS-20 JOB ZERO FORKS
                                 ----------------------




        The following is a table of the variables containing the process handles
        for forks which run under Job Zero as inferiors of the monitor.  Some of
        these variables contain local fork handles and some contain  system-wide
        fork numbers.


                 Location       Process handle for
                 --------       ------------------

                  CIDFRK        IPADMP CI-20 microcode dumping fork

                  CIFORK        Temporary CFS startup process

                  CILFRK        IPALOD CI-20 microcode loading fork

                  CTMFRK        CTERM host server system-wide fork number

                  DDMFRK        DDMP periodic disk integrity system fork number

                  DNTFRK        DECnet NSP fork (unused?)

                  EXPFRK        System structure expunge task fork

                  INTFRK        INTERNET TCP/IP task system-wide fork number

                  JB0FRK        CHKR periodic task checking system fork number

                  LODFRK        DTE Front-end reload fork

                  MOSFRK        TGHA MF-20/MG-20 memory error analysis fork

                  RLDFRK        DTE Front-end reload system-wide fork number

                  SEBFRK        SYSERR error-logging system-wide fork number

                  SJBFRK        SYSJOB program fork

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 71
        KKnown Hardware Deficiencies List


                            Known Hardware Deficiencies List
                            --------------------------------



        This is a collected list of known hardware characteristics which show up
        from  time  to  time  as  part  of certain reported problems.  This says
        nothing about whether these characteristics are  bugs  or  features,  or
        whether  they will ever be fixed or changed, but merely attempts to make
        them known internally.



             1.  DZ11 - Cannot set the speed to zero in the hardware,  can  only
                 turn off the receiver clock.

             2.  TM78 - ANSI ASCII was  not  included  in  the  hardware  format
                 modes.

             3.  TM78 - a formatter problem  (corrected  by  ECO  12/83)  causes
                 unreported data loss at end-of-tape.

             4.  TM78 - a formatter problem causes the data mode byte packing to
                 change  from core-dump to hi-density "randomly" while reading a
                 record.  TOPS-20 does not normally see  the  problem  since  it
                 usually  appears  as an overrun that gets retried successfully.
                 An ECO is planned.

             5.  TM02 - Can generate bad parity which it  passes  to  memory  to
                 cause  the  system  memory  parity  errors  when  the  data  is
                 referenced.  This is still seen with Rev 12 to the RH20.

             6.  TM03 - A chip race condition in the M8915 board has been  known
                 to  occur  where a function register has wrong value because it
                 has not settled.  This generates a device error  which  appears
                 transient;   i.e.   CRLFing  DUMPER  tries  the  read again and
                 succeeds.

             7.  TM03 - ANSI ASCII was  not  included  in  the  hardware  format
                 modes.   The  TM03  does  not set format error if ANSI ASCII is
                 selected.  It will usually get a frame error;  if the  transfer
                 is  a  multiple  of  7/8  bit bytes, the frame error is not set
                 either.

             8.  TM03 - When using industry-compatible  mode,  reads  not  of  a
                 multiple of four bytes will produce strange results.  The bytes
                 are counted, but the extra bytes are  not  written  to  memory,
                 leaving garbage.

             9.  TM03 - if an error ocurs while rewinding, the  monitor  may  be
                 left in a state of waiting for the rewind to complete, the tape
                 being unusable.  The easiest way to clear this condition is  to
                 reset the TM03, most easily done by the customer by powering it
                 down and back up.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 72
        KKnown Hardware Deficiencies List


            10.  TM03 - if the TM03 loses synch during the  PE  preamble  bytes,
                 and  it  reaches  9,  it  will  raise  the postamble instead of
                 generating the proper error.  This can result in lost  records.
                 The  usual  symptom  is  a  frame  count  error.   This case is
                 recognizable by the residual frame count being the same as  the
                 initial frame count.

            11.  VT100 - on a VT100 without the extended memory, one can confuse
                 the  internal  microprogram enough to have it clear sections of
                 the screen on Control-U, Control-R, "clear to end of screen" in
                 132 column mode, etc.

            12.  VT125 - especially with printer port option, is known  to  hang
                 in  an  XOFF state that cannot be cleared without resetting the
                 terminal.

            13.  VT240 - especially with printer port option, is known  to  hang
                 in  an  XOFF state that cannot be cleared without resetting the
                 terminal.  Reportedly this happens much more  often  than  with
                 the VT125.

            14.  RH20 - perfectly willing to store bad parity data  into  memory
                 until Rev 12.  May still do so.

            15.  DX20 - is unwilling to allow registers to be examined after  it
                 has  started  I/O.   Can  cause  register  access errors if not
                 programmed in correct sequence.

            16.  DX20 - there is a race type condition where the DX20  generates
                 an  an  interrupt  request on channel 5 for some condition, but
                 the code is playing with the DX20 and handles the condition, so
                 it lowers its request, however the KL has latched the interrupt
                 and tries to process it, but no one will respond.  So it  tries
                 the 40+2n type, which gives a PI5ERR occasionally.

            17.  DX20/TU71 - the DX20 microcode does not set the 556 bpi density
                 correctly   for   TU71  (7-track)  drives.   This  can  be  set
                 successfully from the maintenance panel.

            18.  DX20/TX03 - With dual-porting between systems,  if  the  system
                 issues a drive clear to the DX20 during serious error recovery,
                 or when booting the DX, the DX  resets  the  TX03.   The  reset
                 traps  the  TX03  to  zero,  leaving any operation on the other
                 channel in a hung state.

            19.  DX20/TX03- when dual-porting between  systems,  DX2FGS  BUGCHKs
                 occur.  The DX2FGS timer in TOPS-20 has been made larger.  This
                 should lessen the occurence  of  these  BUGCHKs,  but  may  not
                 eliminate them entirely.

            20.  LP20 - at least one of the printers fails to go  off-line  when
                 there  is  anything  in the print line buffer, even if the drum
                 gate is opened.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 73
        KKnown Hardware Deficiencies List


            21.  LP26 - fails to go offline when there is something left in  the
                 print  line buffer.  When it runs out of paper, it goes offline
                 several lines from the actual bottom of the page.

            22.  LP27 - will not  accept  the  alternate  start  bytes  for  6/8
                 lines-per-inch in a VFU file.  Gets a VFU load error.  Use only
                 "LEFT-ALONE"  with  the  "LINES-PER-INCH"  command  to  MAKVFU.
                 NORMAL.VFU is fine.

            23.  RP07 -  runs  several  hundred  microdiagnostics  at  power-up,
                 causing  hundreds  of  interrupts  and  keep-alives on -10s and
                 -20s.

            24.  RP20 - when using the dual-port option, RP20s regularly lose  a
                 full  rotation  when  trying  to do read/write next operations.
                 This happens about once every ten seconds and  will  result  in
                 slightly degraded performance.

            25.  RP20 - may evidence RMR (register modification refused)  errors
                 due  to  a limp servo mechanism on one of the drives.  Multiple
                 queued  operations  complete  before  the   drive   disconnects
                 properly and sets GO correctly for the controller to handle the
                 write.

            26.  KL10 Microcode - the ADJBP instruction does  not  work  on  the
                 last location of a page.  Corrected in 5.1 microcode.

            27.  KS-10 Front End - Rev.  3.  exhibits problems with  the  KLINIK
                 line.   If  the  link is in use, it is possible to lock out the
                 CTY.  There are problems with the password check on  subsequent
                 tries, and problems with line hang-up.  A software fix has been
                 implemented which clears the KLINIK output word after  queueing
                 the KLINIK request.  This appears to solve the problems.

            28.  KS-10 Front  End  -  Rev.   3.   exhibits  some  problems  with
                 powerfail  restart.   If  the  power  returns  in less that 3.5
                 seconds or so the restart will hang.  In addition  if  Rev.   3
                 and  Rev.  2 boards are mixed, there is no powerfail restart or
                 reload capability.

            29.  KS10 - during a forced reload, the halt status block is written
                 twice,  first when halting and second when rebooting;  thus the
                 second time wipes any valuable data from the first time.   It's
                 once again the 8080 that's responsible.

            30.  HSC50 - apparently there may be some problems under 6.1 and HSC
                 v(200) with recognizing all disk units if the HSC is configured
                 as node zero.  This is not yet well understood.

            31.  HSC50 - has been reported to hang if the HSC  console  terminal
                 runs out of paper and there is output pending.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 74
        KS10 Processor Console Information


                           KS10 PROCESSOR CONSOLE INFORMATION
                           ----------------------------------



        CSL-COMMANDS CURRENTLY IMPLEMENTED (CSL V0.161)


        ^Z      ;enter USER mode
        ^\      ;enter CONSOLE mode
        MK XX   ;Marks microcode word at CRAM address XX (sets bit 95)
        UM XX   ;Unmarks Microcode at CRAM address XX
        MB      ;load only bootstrap of currently selected magtape
        LA XX   ;Load/set KS10 Memory Address
        LI XX   ;Load/set I/O address
        LK XX   ;Load/set 8080 address
        LC XX   ;Load/set CRAM address to be written/read
        EM      ;Examine KS10 Memory (last Memory location specified)
        EM XX   ;Examine KS10 Memory location XX
        EN      ;Examine Next (either from last EK, EM or EI)
        EB      ;Examine BUS and 8080 control registers
        EI      ;Examine I/O (last I/O address specified)
        EI XX   ;Exmaine I/O address XX
        EK      ;Examine 8080 location
        EK XX   ;Examine 8080 address XX
        DM XX   ;Deposit KS10 Memory last addressed, XX data (36 bits)
        DN XX   ;Deposit next (depending on last DK, DM or DI) XX data
        DB XX   ;Deposit BUS, XX data (36 bits)
        DI XX   ;Deposit I/O, XX data (16,18 or 36 bits)
        DK XX   ;Deposit XX (8 bits) into 8080 (Data can only be deposited
                ;in RAM addresses)
        MR      ;MASTER RESET
        CS      ;CPU clock start
        CH      ;CPU clock halt
        CP XX   ;CPU clock pulse (XX=NR of pulses -- default 1 pulse)
        SI      ;Single Instruction
        LF XX   ;Load diagnostic write function (0-7) specifying 12 bits of
                ;microcode (see note at end ****)
        DF XX   ;Deposit Field, write microcode bits according to last LF-command
        EC      ;Examine CRAM ..curr. Control reg, no clocks .. current loc as addr.
        EC XX   ;Examine CRAM at address XX
        DC XX   ;Deposit CRAM, XX is at least 32 octal characters. Address 
                ;previously loaded by LC command
        EX XX   ;EXecute KS10 instruction XX
        ST XX   ;STart KS10 at address XX. Console enters user mode
        SM XX   ;Start microcode at XX (SM 1 causes dump of HALT-status block !!) 
                ;Default is 0 -- Start microcode
        HA      ;HALT KS10 (execute HALT-instruction -- causes microcode to
                ; write HSB and then to enter HALT-loop)
        SH      ;SHUTDOWN (deposit non-zero data in memory location 30)
                ; causing TOPS20 to shut down
        CO      ;Continue (causes microcode to leave HALT-loop)
        PE X    ;Parity Enable (0=disable, 1=DRAM-par, 2=CRAM-par
                ; 4=clock-par error stop, 5=DPE/DPM, 6=CRA/CRM, 7=enable all)

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 75
        KS10 Processor Console Information


        CE X    ;CACHE enable (0=OFF, 1=ON, <CR>=show current state)
        TE X    ;CPU timer (1 MSEC) enable (0= OFF, 1=ON, <CR>=show current state)
        TP X    ;CPU TRAPS enable (0=OFF, 1=ON (enables paging),
                ;<CR>=show current state)
        LT      ;Lamp Test, lights three lamps of front panel
        RC      ;Read CRAM direct, functions 0-17
                ; (no resets, no load diag adr, no CPU clock) (see note at end ****)
        EJ      ;Examine Jumps -- prints CRAM address signals (Current CRAM address, 
                ;next CRAM address, jump address, subroutine return address)
        TR XX   ;TRACE - repeats CP and EJ commands until any character typed
                ;XX (if typed) is desired CRAM stop-address
        PM      ;Pulse Microcode (issue single CP and EJ)
        ZM      ;Zero KS10 MOS Memory (beware -- slow)
        RP      ;Repeat - repeats last command, or line of commands which it delimits
                ; Any character (except CNTRL-O) typed will stop repeat
                ;EXAMPLE: EM 0, EK 0, EC 0, RP will repeat execution of this line
        BT      ;Boot SYSTEM -- load CRAM from designated disk (see DS)
                ; via memory then load monitor boot from disk and start at 1000
        BT 1    ;same as BT, but loads diagnostic monitor SMMON and starts at 20000
        LB      ;Load Bootstrap from designated disk (see DS)
        LB 1    ;Load Bootstrap diagnostic monitor SMMON
        DS      ;Disk Select for bootstrap or microcode verification. Command prompts 
                ;to specify UNIT NUMBER (default 0), RHBASE (default 776700), 
                ;and UNIBUS ADAPTER (default 1) to load from when booting
        MS      ;Magtape Select for bootstrap or microcode verification. Command 
                ;prompts to specify UNIT NUMBER (default 0), RH BASE (default 772440),
                ;UNIBUS ADAPTER (default 3), SLAVE NUMBER (default 0), and 
                ;DENSITY (default 1600 BPI) of magtape to boot from
        MT      ;Magtape Boot system from selected magtape
        MT 1    ;BOOT diagnostic monitor SMMAG from magtape
        PW      ;clears KLINIK password, or sets it (6 char's max)
        KL x    ;KLINIK control:  0 = off, 1 = on for remote CTY access
        BC      ;BOOT Check. PROM code which tests the basic 2020 system
                ; load path from the UNIBUS adaptor into the CRAM via memory.


        CONTROL CHARACTERS
        ^U      ;rub out current line
        ^O      ;switch: first one stops CTY-output, second one resumes CTY-output
        ^S      ;stop TTY-output and hangs 8080 waiting for CONTROL-Q (see below)
        ^Q      ;resumes TTY-output
        ^C      ;stops whatever the 8080 is doing
        RUB-OUT ;rub out previous character typed

        NOTE:   Several commands may be put on a single line, separated by commas.


        NOTE:   Additional information on KS10 console commands can be found 
                 in the KS10 MAINTENANCE GUIDE

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 76
        KS10 Processor Console Information



        *****   CRAM Bit Formats

                LF-Command CRAM Bits            RC-Command CRAM Data
                --------------------            ---------------------

                LF      CRAM bits               RC      Data
                --      ---------               --      ------------------------------

                0       00-11                   0       CRAM bits 00-11
                1       12-23                   1       Next CRAM address
                2       24-35                   2       CRAM subroutine return address
                3       36-47                   3       current CRAM address
                4       48-59                   4       CRAM bits 12-23
                5       60-71                   5       CRAM bits 24-35 (Copy A)
                6       72-83                   6       CRAM bits 24-35 (Copy B)
                7       84-95                   7       0s
                                                10      Parity bits A-F
                                                11      KS10 bus bits 24-35
                                                12      CRAM bits 36-47 (Copy A)
                                                13      CRAM bits 36-47 (Copy B)
                                                14      CRAM bits 48-59
                                                15      CRAM bits 60-71
                                                16      CRAM bits 72-83
                                                17      CRAM bits 84-95

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 77
        KS10 Processor Console Information


        8080-CONSOLE-ERROR-CODES
        ------------------------


        ?A/B            A and B copies of CRAM bits did not match
        ?BC             BOOT Check failed
        ?BFO            Input Buffer Overflow
        ?BN             Received Bad Number on input (character typed is not an 
                         octal number)
        ?BT             Device error or timeout during BOOT operation
        ?BUS            BUS polluted on power-up
        ?CHK            PROM checksum failed
        ?DNC            Did Not Complete HALT
        ?DNF            Did Not Finish instruction
        ?FRC            had a forced reload
        ?IA             Illegal Argument (address out of range, etc.)
        ?IL             ILLEGAL Instruction
        ?KA             KEEP ALIVE failed
        ?MRE            Memory Refresh Error (MEM BUSY stayed set too long,
                         because it didn't release data on a write to memory)
        ?NBR            Console was not granted BUS on a request
        ?NDA            Received No Data Acknowledge on memory request
        ?NR-SCE         Non-Reversible Soft CRAM error.
        ?NXM            Referenced NoneXistent Memory location
        ?PAR ERR        Report clock-freeze due to parity error,
                         and type out READ IO of 100,303,103
        ?PWL            Password Length error
        ?RA             Command Requires Argument
        ?RUNNING        CPU clock running (command typed requires clock to be stopped
                         and may fail)
        %SCE            Soft CRAM error
        ?UI             Unknown Interrupt


        OTHER 8080 CONSOLE MESSAGES
        ---------------------------

        BT SW                   message says BOOTING, using BOOT switch
        BUS 0-35                message header for EB command
        CYC                     cycle type for DB command
        C CYC                   typed on DB-command if COM/ADR cycle blew
        D CYC                     "             "      DATA    cycle blew
        HLTD                    message "HALTED/XXXXXX " where xxxxxx is data
        KS10>                   prompt message
        OFF                     message, says current state is off
        ON                      message, says current state is on
        RCVD                    data received on bus
        SENT                    data sent to bus
        >>UBA?                  query for UNIBUS adapter
        >>UNIT?                 query for unit to use
        >>RHBASE?               query for RH11 base register address to use
        >>DENS?                 query tape density
        >>SLV?                  query tape slave number

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 78
        KS10 Processor Console Information


        8080-ERROR-Messages-during-BOOTING
        ----------------------------------


        Disk:
                On an error-condition, detected by the 8080, the
                Fault-light will go on and a message of the form

                        ?BT XXXYYY

                will be printed on the CTY.


        The following error-codes are only "rough" pointers, they can be
        caused by any of the following problems:

                Disk not a disk at all
                Wrong unit selected (see DS-command)
                Home blocks not readable or not there
                Home blocks not set by SMFILE for 8080
                8080 File-system garbage

        XXX=001 Disk error encountered while trying to read HOME-blocks
                Can mean incorrect RHBASE specified, wrong UBA selected,
                bad disk drive, neither  home block or alternate home
                block has home block ID ("HOM" in sixbit)

        XXX=002 Disk error encountered while trying to read the page of
                pointers, which make up the "8080-File-System"
                Can mean pack is not in format for 8080 loading, home blocks
                bombed, bad drive or pack

        XXX=003 Disk error encountered while trying to read a page of
                microcode - can mean pack is not in 8080 format, or bad drive or 
                pack

        XXX=004 Microcode did not successfully start running after a BT, MT,
                MB, or LB command.  This error will occur when an LB is done
                before the system microcode is loaded.
                
        XXX=010 Disk error encountered while trying to read PRE-BOOT

        YYY     are the lower 8 bits of the 8080 address of the failing
                "Channel Command List" operation. Normally it is here
                a good bet to do an "EI" to get the contents of the
                RH11 register that has the error-bits set !

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 79
        KS10 Processor Console Information




        Magtape:

        The following ERROR-messages can point to the following problem areas:

                Magtape is no magtape at all
                Wrong unit selected (see MS-command)
                Magtape is not bootable (no microcode, no PRE-BOOT)

        XXX=001 Error trying to read microcode first page
                Can mean wrong unit selected, wrong RHBASE address, wrong UBA
                selected, wrong slave number, wrong density, bad drive, bad
                controller, bad tape, tape in wrong format

        XXX=003 Error trying to read additional pages of microcode

        XXX=010 Error trying to read in PRE-BOOT program
                May occur while doing a skip over the microcode file, or
                while reading the PRE-BOOT itself

        YYY     see above (disk-section)



        Error-messages-out-of-PRE-BOOT


        PRE-BOOT is loaded from Disk or Magtape (see 8080 commands DS, MS,
                 BT, BT 1, MT, MT 1)

        PRE-BOOT is written onto the disk using "SMFILE.EXE", it also is written on
        "standard" Diagnostic-tapes  and onto the "MONITOR-INSTALLATION"-tapes.

        PRE-BOOT is loaded by the 8080 into MEMORY-locations 1000 and up, and starts
        at 1000.  The ERROR-halts are:

                1001    found "bad" core-transfer address
                         (page 1 is illegal - can't overload PRE-BOOT)
                1003    No RH11 Base Address
                1004    Magtape Skip failure
                1002    Disk Retry error or Magtape Read error

        At ERROR-halt time the following MEMORY-Locations contain the useful INFO :


                        Disk-Booting                    Magtape-Booting
                        ------------                    ---------------

                100     "8080" disk-address             Not used
                101     Memory transfer address         same
                102     T3, selection pickup pointer    same
                103     RPCS1-register                  MTCS1-register
                104     RPCS2-register                  MTCS2-register

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 80
        KS10 Processor Console Information


                105     RPDS - register                 MTDS - register
                106     RPER1-register                  MTER1-register
                107     RPER2-register (RP06 only)      Not used
                110     RPER3-register                  Not used
                111     UBA Page RAM loc 0              same
                112     UBA-status register             same
                113     Version Nr. of PRE-BOOT         same

                Note: The Version Nr. of PRE-BOOT will be the same as the Version Nr.
                of SMFILE. The "8080" disk-address is in the form " CYL SEC SURF "


        THEREBY IT WILL BE POSSIBLE TO ASK A CUSTOMER WITH A PRE-BOOT FAILURE,
        TO DO AN :

                EM 77
                EN,RP
                ...... AND TYPE SOMETHING AFTER ADDRESS 115
                ...... AND THEN TELL US WHAT HE SEES

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 81
        KS10 Processor Console Information


        8080-Communication-Area (KS10 Memory)
        -------------------------------------

        The 8080 maintains and services an in-core communication area.
        Currently used are words 31 to 40.  See PROKS/PROLOG for more info.

        Word  Bits      Meaning
        ----  ----      -------
          31            Keep Alive and Status word
                4           Reload Request
                5           Keep Alive active
                6           KLINIK active
                7           PARITY Error detect enabled
                8           CRAM Parity Error detect enabled
                9           DRAM Parity Error detect enabled
                10          CACHE enabled
                11          1 msec enabled
                12          TRAPS enabled
                20-27       Keep Alive counter field
                32          BOOT SWITCH BOOT
                33          POWER FAIL
                34          Forced RELOAD
                35          Keep Alive failed to change
          32            KS-10 CTY input word (from 8080)
                20-27      0 -- no action, 1 -- CTY character pending
                28-35      CTY-character
          33            KS-10 CTY output word (to 8080)
                20-27      0 -- no action, 1 -- CTY character pending
                28-35      CTY-Character
          34            KS-10 KLINIK user input word (from 8080)
                20-27      0 -- no action, 1 -- KLINIK character,
                           2 -- KLINIK active, 3 -- KLINIK carrier loss
                28-35      KLINIK-Character
          35            KS-10 KLINIK user output word (to 8080)
                20-27      0 -- no action, 1 -- KLINIK character, 2 -- Hangup request
                28-35      KLINIK-Character
          36            BOOT RH-11 Base Address
          37            BOOT Drive Number
          40            Magtape Boot Format and Slave Number



        OUTPUT process KS10 ==> 8080
        ----------------------------

         Load character and flag into  33,   set 8080-interrupt,   8080 examines
           33 and gets character, clears interrupt, sends character to hardware,
           clears 33 and sets KS-10 interrupt.

        INPUT process 8080 ==> KS10
        ---------------------------

         8080 gets interrupted "TTY-char available",   8080 gets character and
          delivers into input-word (31) with flag(s) and sets KS-10 interrupt.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 82
        BOOT Command String Functionality


                           BOOT COMMAND STRING FUNCTIONALITY
                           ---------------------------------



        The BOOT program is usually invoked in one of two  ways:   invisibly  as
        part  of  a  system dump and reload on a crash condition, or by explicit
        invocation and response to the  BOOT>  prompt,  usually  with  a  simple
        carriage return.

        BOOT, however, possesses substantially more command line  functionality,
        at  least  some  of which can be useful to the Specialist in a debugging
        situation.  This document tries to explain some of that functionality in
        the context of the BOOT for Release 6.1 of TOPS-20.

        BOOT parses a command string of the form:

        device:<directory>file.extension.generation/switchaddress(lower,upper)

        with the restrictions that some switches have  precedence  and  so  some
        combinations  are  meaningless,  and  that  the  directory must NOT be a
        subdirectory, i.e.  <SYSTEM> is legal, but <SYSTEM.MONITORS> is not.

        The "default" command string (in response to a simple  carriage  return)
        is:   PS:<SYSTEM>MONITR.EXE/R  -  i.e.   load  and  start  the  resident
        monitor.

        The available switches are:

          /M    MERGE - merge specification with current memory.

          /L    LOAD  -  load  according  to  specification   (suppresses
                  default startup).

          /A    ALL - load all of specified  file,  useful  for  avoiding
                  bounds values;  loads up to page 377.

          /R    RUN - run specification - this is the default.  Load  and
                  start  at  the  EXE  file entry vector location.  If no
                  (firstpage,lastpage) specification is given, then  BOOT
                  will look in .JBSYM for the last location of the symbol
                  table and load up to that point (this assumes that this
                  is  the  last  location  in  the resident monitor).  If
                  .JBSYM is zero, or there is no page zero  in  the  .EXE
                  file  to  find it in, then the old assembled-in default
                  of (0,340) will be used.

          /D    DUMP - dump on given specification.  The default here  is
                  PS:<SYSTEM>DUMP.EXE.1 but other existing files could be
                  used, e.g.  if the normal dump file  kept  causing  the
                  dump to fail with ?IO Error because of bad pages or was
                  too small, etc.  one could do something like:

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 83
        BOOT Command String Functionality


                    BOOT>JUNK:<DUMPS>DEBUG.EXE.1/D
                    BOOT>

                  BOOT has  special  address  space  knowledge  which  it
                  applies  to  writing out the monitor dumps, such as not
                  writing out pages overwritten by BOOT, etc.

          /S    SAVE - similar to /D, but no special  knowledge  applied,
                  saves  according  to  specification.  Useful for things
                  like saving image of BOOT for debugging.

          /Gadr GO - transfer control to location adr;  for example /G141
                  to invoke EDDT.

          /E    EDDT - load and transfer to EDDT.  This  is  a  shorthand
                  method  for  the  old  two command sequence of /L, then
                  /G141;  e.g.

                    BOOT>NEWMON/E
                    EDDT

          /I    INFORMATION - displays the current version  of  BOOT  and
                  the  version  numbers  of  any DX20A or DX20B microcode
                  assembled into BOOT.


        /E and /I are not available in the BOOT that goes with TOPS-20 4.1.

        The monitor uses the (lowerbound,upperbound) construct  in  loading  the
        swappable  monitor in multiple passes into the available physical memory
        by building the appropriate command string to  merge  the  next  set  of
        pages and invoking BOOT at the VBOOT entry point multiple times.

        For version 6 of TOPS-20 there is another mechanism BOOT uses to try  to
        load  the  monitor  in  a  single  pass.   See  the section on Page Zero
        locations to learn about the communication region.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 84
        Crash Analysis


                          TOPS-20 CRASH ANALYSIS FUNDAMENTALS
                          -----------------------------------




        TOPS-20 crash analysis is a complex subject which can never be be  fully
        taught since the areas of interest change constantly as new versions are
        released, with new problems;  and as problems are fixed,  leaving  areas
        of  code  "stable",  and  no longer hot analysis prospects.  Hence, like
        diagnostics, things like crash analyzers tend to  evolve  into  software
        that  can  produce enormous quantities of uninteresting data after a lot
        of computes, but come up with a bottom line  of  "I  don't  know  what's
        wrong".

        These articles attempt to present the fundamental tools and methods that
        usually  wind  up  getting used on most crashes, and explain some of the
        data structures that often need to be passed through on the way  to  the
        answer  to  the  problem.   Some  effort  is  made  to  give some of the
        traditional methods by which hiding bugs may be forced into the open.



        CRASH DUMPS:
        -----------


        Each time there is a BUGHLT there is an automatic dumping of the  system
        core image into PS:<SYSTEM>DUMP.EXE.  If there is sufficient room on the
        disk the data that was  previously  in  DUMP.EXE  will  be  copied  into
        DUMP.CPY  by SETSPD after the system is reloaded.  DUMP.CPY does not get
        deleted and you may find several generations of DUMP.CPY.

        TOPS-20 will not create a dump of  the  Monitor  unless  the  system  is
        properly  prepared  to  do so.  This means that there must first exist a
        file called PS:<SYSTEM>DUMP.EXE that will  accomodate  the  dump.   This
        file  can  be  found  on the distribution tape for TOPS-20, or it can be
        created by using the MAKDMP program, which will accept the  memory  size
        from  the user, and create the proper sized file.  The file must contain
        a sufficient number of pages equal to  the  total  number  of  pages  of
        physical  memory  in  the DECSYSTEM-20 plus enough pages to hold the EXE
        file directory for the dump (generally one), minus the number  of  pages
        that  BOOT  overwrites,  and which will not be present in the dump.  For
        example, a system that has 1024K words of memory should have a  DUMP.EXE
        file  that  is  about 2048 pages long.  It is important to remember that
        the number of pages in the dump file must  be  twice  the  size  of  the
        machine's memory capacity in K words.

        It is possible to give a FILENAME/D to BOOT to specify where to dump the
        monitor,  so it is possible to put up another pack, or whatever to get a
        dump for those situations where there is no  existing  DUMP.EXE  on  the
        pack to dump into.  The filename given must exist however, and not be in
        a subdirectory, or too small, or not all the memory will be saved.   See
        the article on BOOT commands for more info.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 85
        Crash Analysis


        GETTING A DUMP FROM BOOT:
        ------------------------


        Normally, when the system has  crashed  for  whatever  reason,  it  will
        reload  itself  using the BOOT program.  This Auto-reload feature can be
        suppressed, by giving the "SET NOT RELOAD" or "CLEAR RELOAD" command  to
        the  PARSER.   The  PARSER must first be set in PROGRAMMER mode, via the
        "SET CONSOLE PROGRAMMER command.  These commands do not apply to 2020's,
        of  course.  There is a location in the 8080 which, when it contains the
        right  number,  will  prevent  automatic  reloads  after  crashes.   The
        location  depends  on  the  revision level of the ROM, which is typed at
        system startup.  The following commands will turn off auto-reload:

                ROM level 0.1                   ROM level 4.2
                        KS10>LK 20255                   KS10>LK 20256
                        KS10>DK 303                     KS10>DK 303


        Also, patching the BUGHLT  code  where  the  reload  is  requested  will
        prevent an auto-reload.  Placing a JFCL in locations BUGH2+3 and BUGH2+4
        in the running  monitor  will  prevent  the  monitor  from  issuing  its
        request.

        BOOT has a limited file system capability  when  creating  the  file  to
        contain  the  dump,  and  in  this manner avoids complicating a possibly
        compromised file structure during the reload.  It  is  for  this  reason
        that  the DUMP.EXE file must already exist on the public structure;  for
        BOOT can find it there, but it can not create it if it does not  already
        exist.   Also,  because BOOT resides in main memory of the host (KL10 or
        KS10) processor, small portions of the Monitor will be overwritten  when
        BOOT  is  loaded into memory.  Currently, BOOT is written into that area
        of the resident Monitor that normally contains pure code, and as such is
        not  usually  of  much  consequence.   When  one  needs to refer to this
        portion of the code, either the listings or fiche can be  used,  or  the
        MONITR.EXE  file itself.  At about 6.1 time, BOOT loads into pages 11-54
        of memory, or 11-62 if BOOT contains both DX20A and DX20B microcode.

        If for some reason the system fails to auto-reload,  then  it  is  still
        possible  to  obtain a copy of the dump.  To do this, the front end must
        have at least loaded the BOOT program, and the console will display  the
        BOOT prompt:

                                BOOT>

        BOOT has a number of  commands  that  may  be  used  to  manipulate  the
        contents of the processor memory;  in this case, the command we will use
        will cause BOOT to copy the contents of memory into PS:<SYSTEM>DUMP.EXE:

                                BOOT>/D    or   BOOT>filename/D
                                BOOT>           BOOT>

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 86
        Crash Analysis


        At this point the system may be brought up normally, and the analysis of
        the dump may begin.

        Similarly, a KL-10  system  may  be  set  to  suppress  the  auto-reload
        facility,  and  the CTY will prompt with the KLI> prompt.  Simply typing
        the word "BOOT" will load the BOOT program into memory.  There are cases
        where  the  system may be completely hung, and it is unclear how to best
        initiate an orderly shutdown.  Obviously, it is always possible to  type
        the  control-backslash  (^\)  character  at  the  CTY  to  get  into the
        front-end parser, but then what  can  be  done?   The  front-end  parser
        allows  the  operator  to  force  the  processor  to jump to a specified
        location, and in the case described above, this feature may be  used  to
        force  a  BUGHLT.   This can be done after typing ^\, with the following
        commands:

                        PAR>SET CONSOLE PROGRAMMER
                         CONSOLE MODE: PROGRAMMER
                        PAR>JUMP 71
                        PAR>

        causing the console to return to USER  mode,  connected  to  the  KL-10.
        This  will be followed immediately by a KPALVH BUGHLT (Keep Alive Halt),
        and the system will perform the  usual  BUGHLT  procedures.   The  above
        command  forces the processor to jump to location 71, which in turn will
        cause the BUGHLT, sweeping the cache to ensure all  of  the  dump  taken
        will contain valid data.  Simply forcing the processor to halt, and then
        reBOOTing and getting a dump will not cause the cache to be invalidated,
        and random locations in the dump will not contain valid data.

        On the 2020 the equivalent command is "KS10>ST 71".


        GETTING A FRONT-END DUMP:
        ------------------------


        The  front-end  will  generally  create  a  crash   dump   file   called
        PS:<SYSTEM>0DUMP11.BIN, containing the core image of the PDP-11.  If the
        front-end is hung, and none of the terminals are  usable,  it  is  still
        possible to obtain a dump of the -11.  By setting the HALT/ENABLE switch
        of the -11 to the HALT position, and then back to the  ENABLE  position,
        the KL-10 will force the -11 to reload.  In the process of reloading the
        -11, the KL will indicate to the -11 that it has reloaded, and send  the
        necessary  information  to set up the terminals, and unit record devices
        connected to the -11.  The -11 will, in the process of  reloading,  dump
        the  old core image into the 0DUMP11.BIN file mentioned earlier.  In the
        event that the problem will be the subject  of  an  SPR,  the  front-end
        crash dump should also be included on the DUMPER tape with the SPR.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 87
        Crash Analysis


        CRASH ANALYSIS MATERIALS:
        ------------------------


        First when analyzing software or software/hardware problems be sure  you
        have the proper tools:

             1.  A SWSKIT on magtape to provide further tools and documentation.

             2.  A full copy of the current release microfiche MONITOR and  EXEC
                 or equivalent listings online or on paper.

             3.  A copy  of  the  Monitor  Tables  document  from  the  Software
                 Notebooks or the SWSKIT tape.

             4.  A MONITOR CALLS Reference Manual.

             5.  A SPEAR (formerly SYSERR) manual.

             6.  A listing of the SPEAR/SYSERR log, especially  if  hardware  is
                 suspected.   The  SWSKIT  programs  DSKERR  and  SWSERR produce
                 useful, compact extracts of the SPEAR error file, for disk  and
                 BUGxxx entries, respectively.

             7.  The  CTY  log  for  BUGHLTs  and  BUGINFs  or   other   problem
                 indications, or an accurate reproduction of this information.

             8.  Any other manuals you may need for reference such as the proper
                 version  Installation  Guide,  Operators Guide, System Managers
                 Guide, etc.

             9.  The BUGS.MAC file for releases 4.1, 5.0, 5.1.

            10.  The TOPS-20.BWR file for documentation of known  exceptions  to
                 the normal documents.

            11.  The current FILDDT.EXE to examine the dump.

            12.  The MONITR.EXE responsible for the crash to load symbols  from,
                 and, of course

            13.  The DUMP.EXE or DUMP.CPY resulting from the crash.


        You will need the SWSKIT and perhaps listings of the latest versions  of
        monitor modules in case the microfiche are not up to date.  FILDDT is on
        the customers distribution tape.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 88
        Crash Analysis


        Be sure you have analysed the SPEAR log.  This is  the  easiest  way  to
        determine  the  hardware  state of the machine at the time of the crash.
        Be sure, also, that you have looked up  the  BUGHLT  and/or  BUGCHKs  in
        question  in  the  listings  (microfiche)  and  have  at  least read the
        comments around them.  Probably tracing down how it got called is a good
        idea.   If  you happen to be without a GLOB (provided on microfiche) you
        can find the BUGHLT tag of interest in the monitor as follows:

                $GET <SYSTEM>MONITR.EXE
                $ST 140
                DDT
                ILMNRF?                 ; BUGHLT of interest followed by "?"
                PAGEM G                 ; it is defined in PAGEM and is global



        Some other useful bits of information:  There is a GLOB listing provided
        in the microfiche which contains a list of all the global symbols in the
        monitor.  Most of the datasymbols are defined in the module STG.MAC.  If
        you don't know a tag name but want to look at the storage for DTEs, say,
        look through STG.  STG also contains some small portion of  code  mostly
        to do with restart, start, auto reload, dispatches for PI channels and a
        few scheduler tests.  STG stands for storage.  Note that some stuff  may
        be  defined in PROLOG, and of course lots of stuff is defined throughout
        the monitor.  You may also want to get a listing of MACSYM to be able to
        understand  the  macros  you  see  while  reading  the monitor listings;
        MONSYM is also useful at times.  Be sure you know how  PARAMS  has  been
        changed in case it has.  See BUILD.MEM on the distribution tapes for the
        currently distributed information on what to do to change various system
        parameters  in  PARAM0.MAC.   Be  sure that you know about any variables
        that the site may have changed in STG as well.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 89
        Crash Analysis


        EXAMINING THE MONITOR:
        ---------------------


        Debugging a complex, multi-process software system is largely  a  matter
        of  absorbing  sufficient  knowledge,  experience and folklore about the
        particular system with a considerable element of personal preference, or
        'taste'  also  involved.   This  document  is  a  cursory description of
        features built into the system to aid debugging, and  such  folklore  as
        can be described in written English.

        There are four different versions of DDT that may be used to examine the
        monitor.   Each  is  used  for  a  different  purpose  and  has  special
        capabilites.  The versions of DDT are:

             1.  UDDT (user DDT) used to examine or modify the MONITR.EXE file.

             2.  MDDT (monitor DDT)  used  to  examine  or  modify  the  running
                 monitor under timesharing.

             3.  EDDT (exec DDT) used to examine or modify the  running  monitor
                 from the CTY in a stand-alone mode.

             4.  FILDDT used to examine dumps.


        All the DDT's are versions of TOPS-20 DDT documented in the TOPS-20  DDT
        manual,  and have all of the features described in the manual.  See also
        the document DDT41.MEM.

        The use of all four versions of the  DDT's  is  the  same  and  will  be
        described later, however, each version is started differently.



        UDDT:
        ----

        To use UDDT to modify your MONITR.EXE file on system, you must give  the
        following EXEC commands:

                @GET <SYSTEM>MONITR.EXE
                @START 140      (on systems after Release 4, @DDT works too)

        This causes EDDT to start in user mode.  This is the same  DDT  that  is
        used when examining any program.  You may now look at or change any part
        of the monitor.  If you make changes to the monitor and want to save it,
        you  should  get  back  to the EXEC by typing ^Z.  Then you may save the
        monitor.  You will probably have to be enabled  in  order  to  save  the
        monitor   back  in  <SYSTEM>.   This  is  the  safest,  best,  and  only
        recommended method of putting patches into the monitor.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 90
        Crash Analysis


        MDDT:
        ----


        A version of DDT which runs in  monitor  space  is  available.   It  can
        examine  and change the running monitor, and can breakpoint code running
        as a process but not  at  PI  or  scheduler  level.   When  patching  or
        breakpointing the monitor, the normal write protection must be defeated,
        either by setting DBUGSW to 2 on startup, or  calling  SWPMWE.   If  you
        insert  breakpoints  with  MDDT,  remember monitor code is reentrant and
        shared so that the breakpoint could be hit by any other process  in  the
        system.   In  this  event,  the other process will most likely crash the
        system since it will be executing a JSR to a page full of zeros.

        To use MDDT you must have WHEEL or  OPERATOR  capabilities.   You  first
        issue the EXEC command:

                @ENABLE
                $^EQUIT

                        ; You are now in the mini-exec and receive a  prompt
                        ; of MX>.  Now you give the "/" command:
                MX>/
                        ; You are now put into MDDT.  To return to the  EXEC
                        ; you can  issue  a ^Z  or  a ^C  which  produces  a
                        ; message like "INTERRUPT AT 17372" and returns  you
                        ; to the mini-exec.  If  you type a  ^P in MDDT  you
                        ; will get a  message, "ABORT", and  be returned  to
                        ; the mini-exec.  If you once go into the  mini-exec
                        ; the CONTROL-P interrupt is enabled and typing this
                        ; character will return you to the mini-exec.   This
                        ; is a  good thing  to use  when debugging  programs
                        ; that do  CONTROL-C trapping.   From the  mini-exec
                        ; you may give either:
                MX>S
                        ; or
                MX>E
                        ; The S is filled  out as START and  the E as  EXEC.
                        ; Both of  these commands  will  return you  to  the
                        ; EXEC. See the section EXEC-DEBUGGING for more info
                        ; about ^P and getting  out of the  EXEC to MX>  and
                        ; returning from MX> to either your copy of the EXEC
                        ; or the system EXEC.

                        ; You may also give the command:

                MRETN$G
                        ; From MDDT to return  directly to the EXEC.   While
                        ; in MDDT you may examine  any core location in  the
                        ; running monitor.  If you wish to change any of the
                        ; locations in the protected  monitor you must  give
                        ; the command:
                        
                CALL SWPMWE$X   or      $W

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 91
        Crash Analysis



                        ; To write enable the monitor.  After you have  made
                        ; your changes you must give the command:

                CALL SWPMWP$X

                        ; to write protect the monitor again.

        MDDT may also be entered from process level via JSYS:

                JSYS 777$X
                    or
                MDDT%$X ; will enter MDDT from the context of the current process


        To return to user context:

                MRETN$G

        To use SETMPG to map pages to this context:

                Page 677 has been traditionally used for this; but any unused
                page may be used.  To make sure that the page is currently
                unused type:

                ADDRESS/   ?    ; the question mark from DDT indicates that the
                                ; page is nonexistent.

                when the destination page has been found, set up AC2 as:

                AC2/ ACCESS,,677000

                If the page has its own SPT slot:

                AC1/SPT INDEX

        If the source page does not have its own SPT slot,  it  will  belong  to
        either a file or process page table.  It will be represented as an index
        into this page table:

                AC1/ SPT INDEX OF PAGE TABLE,,INDEX INTO PAGE TABLE

                Access = read or/and or write access
                Read/Write access = 140000 in LH

        Therefore, to map a page, call with either:

                AC1/SPT INDEX OF PAGE
                AC2/140000,,677000

                        or

                AC1/SPT INDEX OF PAGETABLE,,INDEX INTO PAGE TABLE
                AC2/140000,,677000

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 92
        Crash Analysis



                AND SAY:

                CALL SETMPG$X

        The page will then be  mapped  to  page  677.   In  examining  locations
        677000-677777, you will be looking at the contents of the page.

        If you desire to map another page into this  slot,  merely  call  SETMPG
        again  with  arguments  for the new page.  You need not first un-map the
        old page.  However, when you are finished, page 677 should be  un-mapped
        in the following manner:

                AC1/0
                AC2/ACCESS,,677000
                CALL SETMPG$X

        WARNING:

        Calling SETMPG incorrectly can crash the system.  Be  CAREFUL!   Do  not
        use SETMPG on a time sharing system if a crash will cause bad feelings.


        NOTE:  if you have the Release 5 version of MDDT/EDDT  that  has  sticky
        current  address  section (see DDTxx.MEM) then be careful about doing an
        MRETN$G  after  examining  section  2,  as  a  crash  will  result  from
        transferring to MRETN in section 2.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 93
        Crash Analysis


        EDDT:
        ----


                                          NOTE

                       Not to be confused with ^EEDDT command  to
                       get   into  UDDT  used  with  the  command
                       processor.  See separate document on  EXEC
                       DEBUGGING for that.



        To get into EDDT you must bring the system up using the switch-register.
        See  the  DECSYSTEM-20 Operators Guide for a discussion of switches.  Go
        through the KLINIT dialog and when you get  the  prompt  BOOT>,  respond
        with:

                BOOT>/L         or      BOOT>/E   (in version 5 or later)
                BOOT>/G141

        The "/L" command causes the monitor to be loaded, but not started.   The
        "/G141"  starts  the  monitor  at location 141, which is a jump to EDDT.
        You can use EDDT like UDDT under timesharing on the MONITR.EXE  file  by
        giving the following commands:

                $GET <SYSTEM>MONITR.EXE
                $START 140

        EDDT is linked into the monitor and is always there.  You may  also  get
        to  EDDT  from  MDDT (providing EDDT is locked down, see EDDTF below) by
        issuing the following:

                EDDT$G

        from MDDT.  This stops timesharing.  To resume  timesharing  and/or  get
        back to MDDT give the command:

                MDDT$G                  ; back to MDDT
                MRETN$G                 ; back to normal timesharing



        Breakpoints may be inserted in the resident monitor with EDDT,  but  not
        in  the  swappable  monitor in general, because its pages may be swapped
        out and be unavailable to EDDT.  You can bring them in by typing:

                SKIP LOC$X              ; where LOC is some address not in core

        and  then  set  the  breakpoint.   The   swappable   monitor   must   be
        write-enabled to set breakpoints.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 94
        Crash Analysis


        There are some locations in the monitor that are very useful when  using
        EDDT  for  debugging.   They  must  be  set before going on to start the
        monitor.


        They are:

                EDDTF   1        keep EDDT in core when system comes up
                        0        delete DDT when system comes up (default)

                DBUGSW  0        do not stop on BUGHLTs, crash and reload
                        1        stop on BUGHLTs (hit EDDT breakpoint)
                        2        write enable the monitor,
                                 do not start up SYSJOB, and stop on
                                 BUGHLTs.  Also it dosn't run CHECKD
                                 automatically on startup.

                DCHKSW  0        do not stop on BUGCHKs (default)
                        1        stop on BUGCHKs (hit EDDT breakpoint)

                DINFSW  0        do not stop on BUGINFs (default)
                        1        stop on BUGINFs (hit EDDT breakpoint)

        In addition the symbol  GOTSWM  appears  in  the  code  just  after  the
        swappable  monitor  is  loaded.   So, if you want to debug the swappable
        part of the monitor  you  must  put  a  breakpoint  at  GOTSWM  (to  get
        swappable part in core) by,

                GOTSWM$B

        Then start the MONITOR by,

                147$G

                CALL SWPMLK$X

        CALL SWPMLK is used to lock swappable monitor  in  core  for  debugging.
        You  must have sufficient physical memory to give this command since the
        resident plus swappable monitor  is  rather  large.   To  start  up  the
        monitor  after  you  have  gone  into  EDDT  and set up your breakpoints
        (remember the last two are used for BUGHLT and BUGCHK) give the command:

                147$G
        or
                SYSGO1$G

        If you are in EDDT and DBUGSW is not 2, that is, the  monitor  is  write
        protected,  you  can  use the routines SWPMWE and SWPMWP to write enable
        and write protect the monitor, i.e.  CALL SWPMWE$X in DDT.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 95
        Crash Analysis


        FILDDT:
        ------

        FILDDT is distributed on the customer software tape.

        The following is a chewed-up FILDDT.HLP file.


        LOAD (SYMBOLS FROM) FILE SPEC

        Reads specified file and builds internal symbol table.  This must be the
        first  command  to FILDDT before "GET" when looking at a dump.  You will
        most probably use <SYSTEM>MONITR.EXE which would have been  the  monitor
        running at the time of the dump.


        GET (FILE) FILE-SPEC

        Loads a file for DDT to examine.  If you are looking at a  monitor  dump
        you  must  load  DUMP.CPY  explicitly.   FILDDT looks for MUMBLE.EXE not
        MUMBLE.CPY.  That is, DUMP<ESC> will tell you that there is no such file
        or  will load DUMP.EXE.  When looking at a dump and you wish to load the
        symbols you must first issue  the  load  command  followed  by  the  get
        command.   Be  sure  that the file from which you get the symbols is the
        same version as the dump.  Be sure,  also  that  the  monitor  that  was
        dumped  is  the  same  monitor  you use for symbols.  That is, don't get
        MONMED symbols to use with MONBCH etc.


        EXIT (FROM FILDDT)

        Returns to command level.  You then may type a save command  if  a  load
        command  was  just  done  to preload symbols.  You will get a version of
        FILDDT that has the symbols you just loaded in it so you no longer  need
        to  "LOAD"  symbols.   You now have a monitor specific FILDDT, which was
        common practice for TOPS-10, but is not generally done for TOPS-20.


        HELP

        Types something like this text.


        ENABLE PATCHING

        Allows writing on an existing file specified by a GET.


        ENABLE DATA-FILE

        Assumes file is raw binary (i.e.  no ACs, and not an EXE file).

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 96
        Crash Analysis



        DDT FEATURES:

                EP$U    Sets monitor context for FILDDT mapping.  EP is a symbol
                        which is equal to the page number of the EPT.  (Rel 4)

           <CTRL/E>     Returns to FILDDT command level.



        TRACKING DOWN UNMAPPED ADDRESSES:
        --------------------------------


        The resident monitor may be looked at without any difficulties, but  the
        swappable  monitor  may  not be in core at the time of the dump.  If the
        value of the symbol is in the swappable monitor you  must  sometimes  go
        through  the  monitor  map  to  find  where the location really is.  The
        location MONCOR contains the number of pages of resident monitor and the
        location  SWPCP0  contains the first page of real core for swapping.  So
        if the value of the symbol is greater than contents of MONCOR times 1000
        then  it  is  in  swappable  monitor.   This also applies to non-monitor
        pages:  mapped file pages, and pages from other processes and their JSBs
        and PSBs.

        If the page of the swappable monitor you want to look at is in  core  it
        will  probably not be in core in the location that it's address refer to
        since the dump is of core and relocation of pages does not  happen.   To
        find  where  a  symbol  really  is  in  the  dump, first type the symbol
        followed by an "=".  DDT will respond with the  value  of  this  symbol.
        The  value  of  the  symbol  can be divided into two, three octal digit,
        fields.  The high order three digits are the page  number  and  the  low
        order three digits are the offset into the page.

        If the value of the symbol is 324621 the high order three  digits,  324,
        are  the page number and the low order three digits, 621, are the offset
        into the page.  To find the location of the page in question in the dump
        you  must  look  at  the  monitor  map  indexed by the page number.  For
        example:

                MMAP+324/

        would give you the monitor map word for page 324.   This  word  contains
        some  protection  bits for the page and the address of the page when the
        dump was taken.

        The page may have been in core, on the swapping area or on the  disk  at
        the time of the dump.

                If bits 14-17 in the monitor map word are non-zero the page  was
                on the swapping area or disk and is no longer available.

                If bits 14-17 are zero then the page was in core, and the  right
                half  of  the  word  contains the page number in the dump of the

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 97
        Crash Analysis


                page you are looking for (the dump program overwrites some pages
                of memory, therefore it does not contain these pages.)

        If the page was in core the new address of the symbol  you  are  looking
        for  can be found by using the page number from the monitor map word and
        appending the offset into the page  to  it.   For  example  if  MMAP+324
        contains  104000,,256;   then  the  new  address  of our symbol would be
        256621.

        All addresses in the swappable monitor must be resolved in this  manner.
        In  addition  the pages of the JSB and PSB must sometimes be resolved in
        this manner.  There are some locations and tables in  the  monitor  that
        make this easy:

                NAME    INDEX   DESCRIPTION

                FORKX   none    Number of the fork that was running at the time of
                                the dump, -1 if in the scheduler.
                JOBNO   In PSB  Job number to which current fork belongs.
                FKJOB   Fork #  Job number,,SPT index of JSB
                JOBDIR  Job #   logged in directory number
                JOBPT   Job #   controlling TTY number,,top fork number
                FKSTAT  Fork #  test data,,address of fork wait routine
                FKPGS   Fork #  SPT index of page table,,SPT index of PSB


        SPT indexes are indexes into a share pointer table starting at SPT.   To
        find  the  PSB of fork 20, you first look at FKPGS+20.  If this location
        contains 425,,426, the word at SPT+426 is the pointer to the PSB.   This
        pointer  can  point  to disk, swap area, or a page in the dump.  If bits
        14-17 are zero it is a pointer to a page in the dump and the right  half
        of the SPT word is the page number of the PSB in the dump.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 98
        Crash Analysis


        BUGHLT, BUGCHK, BUGINF
        ------  ------  ------


        The monitor contains a considerable number of internal redundancy checks
        which  generally  serve  to  prevent  unexpected  hardware  or  software
        failures from cascading into severely destructive reactions.   Also,  by
        detecting  failures  early,  they  tend  to  expedite  the correction of
        errors.

        There are three failure routines,  BUGINF  and  BUGCHK  and  BUGHLT  for
        lesser  and  greater  severity  of failures.  Calls to them with JSR (or
        PUSHJ P, for Release 5 or later BUGCHKs and  BUGINFs)  are  included  in
        code  by  use  of  a macro which records the locations and a text string
        describing the failure.  The general form is:

            for 4.1:    BUG (NAME,<DATA>)

            for 6.0:    BUG. (TYPE,NAME,MODULE,HARD,<STRING>,<DATA>,<EXPLANATION>)

        Where TYPE is HLT or CHK or INF, MODULE is  the  source  file,  DATA  is
        addtitional  data,  HARD  is  the  hardware/software flag, STRING is the
        short text and EXPLANATION the long text explanation of the cause.   The
        strings  are constructed during loading and are dumped into a file.  The
        BUGSTRINGS.TXT file will produce an ordered listing of the bug  messages
        for operator or programmer use.

        BUGCHK (or BUGINF) is used where the inconsistency detected is  probably
        not  fatal  to the system or to the job being run, or which can probably
        be corrected automatically.

        BUGHLT is used where the failure detected is likely to preclude  further
        proper  operation  of the system or file storage might be jeopardized by
        attempted further operation.


                                          NOTE


                       The exact form  the  BUGHLT/CHK/INF  macro
                       takes  is  different  for releases [3A and
                       before], [4.0, 4.1, 5.0,  5.1],  and  [6.0
                       and   after],   and  different  files  and
                       assembly forms are used, though the action
                       of the code remains essentially unchanged.
                       See the separate  article  on  the  BUGxxx
                       macro for details.



        The SWSKIT program SWSERR can be used to produce a  compact  listing  of
        the  BUGxxx entries in the system error file in a less cumbersome manner
        than SPEAR.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                                Page 99
        Crash Analysis


        DBUGSW:
        ------


        A monitor cell, DBUGSW, controls the behavior of BUGHLT and BUGCHK  when
        they  are  called.   DBUGSW  is  set  according to whether the system is
        attended by system programmers.

        If C(DBUGSW)=0, the system is not attended by system programmers, so all
        automatic crash handling is invoked.  BUGCHK will return +1 immediately,
        appearing effectively as NOP.  BUGHLT will, if called from the scheduler
        or at PI level, invoke a total reload from the disk and a restart of the
        system.  The BUGCHK/INF output will appear on the CTY and in the  SYSERR
        log when JOB ZERO gets around to them.

        If the system continues to run or is restarted properly, the location of
        the  bug  (saved  over a reload) and its message will be reported on the
        CTY.

        If C(DBUGSW).NEQ.0,  the  system  is  attended,  and  one  of  the  EDDT
        breakpoints will be hit.  This allows the programmer to look for the bug
        and/or possibly correct the  difficulty  and  proceed.   There  are  two
        defined  non-zero  settings of DBUGSW, 1 and 2, which have the following
        distinction.

                C(DBUGSW) = 1 

                        Operation is the same as with 0 except for breakpoint
                        action.  In particular, the monitor is write protected
                        and SYSJOB is started at startup as described.
                        
                C(DBUGSW) = 2

                        Is used for actual system debugging. The monitor is
                        not write protected so that it may conveniently
                        be patched or breakpointed, and the SYSJOB operation
                        is not started to save time.

                        BUGCHK and BUGHLT procedures are the same as for 1.

        The following is a summary of DBUGSW settings:

        SETTING                 0               1               2
        MEANING                 Unattended      Attended        Debugging

        BUGCHK action           NOP             Hit Breakpoint  Hit Breakpoint
        BUGHLT action           Crash System    Hit Breakpoint  Hit Breakpoint
        Monitor write protect?  Yes             Yes             No
        CHECKD on startup?      Yes             Yes             No

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 100
        Crash Analysis


        Other console functions:
        -----------------------


        In addition to EDDT, several other entry points are defined as  absolute
        addresses.  The machine may be started at these as appropriate.

        EVDDT   140     JRST DDTZ               ; go to EDDT
                141     JRST SYSDDT             ; reset and go to EDDT
        EVDDT2  142     JRST DDTZ               ; copy of EDDT address
        EVSLOD  143     JRST SYSLOD             ; initialize file system
        EVVSM   144     JRST SYSVSM             ; verify swap mon on startup
        EVRST   145     JRST SYSRST             ; restart
        EVLDGO  146     JRST SYSGOX             ; reload and start
        EVGO    147     JRST SYSGO1             ; start


        The soft restart (address 145, EVRST)  restarts  all  I/O  devices,  but
        leaves  the system tables intact.  If it is successful, all jobs and all
        (or all but 1) process will continue in  their  previous  state  without
        interruption.   This  may be used if an I/O device has malfunctioned and
        not recovered properly.  The total restart  initializes  core,  swapping
        storage and all monitor tables.




        A very limited set of control functions for debugging purposes has  been
        built  into the scheduler.  To invoke a function, the appropriate bit or
        bits are set into location 20 via MDDT.  The word is scanned  from  left
        to right (JFFO).  The first 1 bit found will select the function.  Refer
        to routine SWTST in SCHED for the current details.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 101
        Crash Analysis


        DDT TRICKS:
        ----------


        Here are a few useful tidbits to use in DDT when tracking down problems:


             1.  Enter MDDT from a program

                            @ENABLE
                            $SDDT
                            DDT
                            JSYS 777$X
                            MDDT

             2.  Return to program from MDDT

                            MRETN$G                     ! Return from MDDT

             3.  Set a breakpoint in the swappable monitor in EDDT at FOO:

                            BOOT>/L                     ! Load resident part
                            BOOT>/G141                  ! Start EDDT
                            EDDT
                            EDDTF[   0   1              ! Set debugging flags
                            DBUGSW[   0   2
                            GOTSWM$B   147$G            ! Breakpoint at GOTSWM
                            $1B>>GOTSWM/MOVEI T1,FKPTRS ! SWPMON is now loaded
                            SKIP FOO$X                  ! Get FOO into memory
                            <>
                            FOO$B                       ! Set breakpoint

             4.  Find all forks of job J in MDDT or EDDT or FILDDT

                            -1,,0$M                     ! Set compare flag
                            FKJOB<FKJOB+NFKS-1>J,,0$W

             5.  Map a directory in MDDT or EDDT

                            1!   directory number       ! Put DIR in AC1
                            2!   structure number       ! and STR in AC2
                            CALL MAPDIR$X               ! Do routine to map it

             6.  Write-enable monitor in MDDT or EDDT

                            CALL SWPMWE$X

             7.  Write-protect monitor in MDDT or EDDT

                            CALL SWPMWP$X

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 102
        Crash Analysis


             8.  Lock swappable monitor in memory in MDDT or EDDT

                            CALL SWPMLK$X

             9.  Set monitor context for mapping in FILDDT

                            EP$U                ! Have FILDDT do address mapping

            10.  Select unmapped physical addressing in FILDDT

                            $U                  ! Clear address mapping

            11.  Select user virtual address space mapping in FILDDT

                            FKPGS+forknumber/   x,,y
                            SPT+x/   n          ! If LH(n) .NE. 0 => swapped out
                            n$1U                ! n is address of page table (UPT)

            12.  Enter EDDT from MDDT

                            EDDT$G

            13.  Return to MDDT from EDDT

                            MDDT$G

            14.  See if mapped job has been in MDDT from FILDDT

                 Releases 4.1, 5.0, 5.1:
                            DDTPGA/   ?         ! Page non-existence ==> NO

                 Release 6:
                            DDTPXA/   ?         ! Page non-existence ==> NO

            15.  Find what module defines a symbol from any DDT

                            symbol?

            16.  Reference CSTn table entries

                 Releases 4.1, 5.0, 5.1:
                            CST0/   n           ! Reference directly by symbol

                 Release 6:
                            CST0X[   x,,y   $Q<CST0: ! Define symbol from CSTnX
                            CST0/   n           ! table contents, then use symbol
                                                ! Note: CST5 access is same as 5.1

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 103
        TOPS-20 Crash Dump Analysis


                            MORE TOPS-20 CRASH DUMP ANALYSIS
                            --------------------------------





        1.0  INTRODUCTION

        The purpose of this article is to  provide  some  basic  guidelines  for
        those  who  have  never  analyzed a TOPS-20 crash dump.  The information
        contained in this article refers to versions 4.1, 5.0, 5.1, and  6.0  of
        the  TOPS-20  Monitor,  although the basic principles will also apply to
        earlier and later  versions  of  the  Monitor.   None  of  the  concepts
        included  in  this article can be considered highly advanced;  indeed it
        is doubtful that there exists an "advanced" methodology  in  crash  dump
        analysis.   Such  techniques  are  the  result  of nothing more than the
        continual exercise of the basic skills.  In all cases, the person who is
        to perform the analysis must be familiar with the internal structures of
        the Monitor.  Obviously, one must know where to  look  for  a  potential
        problem  before  hoping  to  solve  it.   For  this reason, this article
        assumes  that  the  reader  has  an  in-depth  knowledge  of  the  basic
        structures of the TOPS-20 Monitor.



        2.0  GENERAL INFORMATION

        It would not be practical to define a method of approaching each  BUGHLT
        in  the system, but the state of the system at the time of the crash may
        be defined in terms of the data structures that it accesses.  By looking
        at  the Monitor's stack, the status of the current job, and process, and
        the condition of the Monitor's tables that were in use by the code  that
        BUGHLTed,  we can define a limited number of "types" of crashes, e.g., a
        scheduler crash, a pager crash, an APR or device interrupt crash.   Each
        crash  will  occur  while  the Monitor is using a specific subset of the
        internal data structures of the system.  We will attempt  to  limit  the
        number  of "types" of crashes based upon the function being performed by
        the Monitor at the time of the crash.  In the  sections  following  this
        general  information,  we  will  suggest some of the areas to check when
        looking at each type of crash.  This information is  not  complete,  but
        contains  some  of  the  information  that  is  more significant in each
        particular context.

        When you look at a dump, you should first  try  to  find  why  the  dump
        occured  by  looking at the location BUGHLT.  If BUGHLT is zero then you
        should check the CTY log to find out why the  dump  was  taken  and  for
        information like the PC at the time of the dump and the status of the PI
        system.  If BUGHLT is non-zero it is the address of where the BUGHLT was
        issued.   You should look up the BUGHLT in BUGSTRINGS.TXT or BUGS.MAC or
        the source code to find additional information about the BUGHLT.  If  at
        this  point you are not sure as to why the BUGHLT occured, you will have
        to look at the listings for more information.  A copy of  BUGSTRINGS.TXT
        is  in Appendix A of the Operators manual.  You can find the location of

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 104
        TOPS-20 Crash Dump Analysis


        the call to the BUGHLT by typing the BUGHLT tag to  DDT  followed  by  a
        "?".  DDT will tell which monitor module the BUGHLT is in and you can go
        to your microfiche and read all about the conditions  precipitating  the
        BUGHLT.

        Next if necessary look at FORKX.  If it contains a -1 the scheduler  was
        running;   otherwise  it is the number of the fork that was running when
        the crash occurred.  The registers are saved at BUGACS on a BUGHLT,  but
        if  BUGACS+17  contains  something,,BUGPDL+n,  then  the  registers  are
        invalid and you must go to the SYSERR buffer to get the good  registers.
        This  is  done by adding to the right half of the SYSERR buffer pointer,
        SEBQOU,  the  offset  into  the  buffer  for  the   heading   and   ACs,
        SEBDAT+BG%ACS.   This value points to a block of 16 words containing the
        users ACs.  You may have to chain down more than  one  queued-up  SYSERR
        entry to get to the BUGHLT block.

        Some other locations of interest in the initial stages are:

                LOCATION        DESCRIPTION

                SVN             Monitor version number string
                BUTCMD          BOOT filespec for loading the monitor
                LSTERR          Code of the last error encounterd by process
                USRNAM          User name string
                P               Current stack pointer
                JOBNO           Job number of currently running process
                JOBPNM+(JOBNO)  SIXBIT program name of running program
                UAC             User's ACs when he did his last JSYS
                PAC             Monitor's ACs
                PPC             Process' PC
                UPDL            User's pushdown stack while in a JSYS
                NSKED            0 => ok to run scheduler
                                >0 => cannot run scheduler
                INTDF           -1 => ok to receive software interrupts
                                >=0 => cannot receive software interrupts

        It may be useful to know the status of a fork when it is hung or you are
        unsure  of  its  status.   This  can  be determined by looking at FKSTAT
        indexed by the fork number.  The right half  of  this  location  is  the
        address  of  a test routine and the left half is data to be tested.  For
        example if FKSTAT+12 contains 23,,FKWAT, then fork  12  is  waiting  for
        fork  23 to complete.  FKWAT is a routine that waits for another fork to
        complete and its data (the left half of the word) is the number  of  the
        fork  it is waiting for.  There are many different wait routines and you
        will have to look at the code to see what individual  ones  are  waiting
        for,  or  refer  to  the  section  on  scheduler tests elsewhere in this
        manual.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 105
        TOPS-20 Crash Dump Analysis


        You can easily determine all of the  forks  associated  with  a  job  by
        giving the commands:

                -1,,0$M
                FKJOB<FKJOB+NFKS>N,,0$W

        Where N is the job you are looking for.  A fork structure can usually be
        determined  by looking at the FKSTAT of the forks and seeing which forks
        are waiting on which forks.  A FKSTAT  of  FKSKP  indicates  a  fork  is
        inactive.

        You should refer to STG.MAC for other fork  and  job  tables  and  other
        locations  in  the  PSB and JSB of interest.  All of the above locations
        can be examined with MDDT or EDDT while  the  monitor  is  running.   Of
        course at these times you do not have to go through MMAP and the PSB and
        JSB that are in core are your own.

        There are two separate patch areas in the monitor (FFF and  SWPF).   FFF
        is  the  resident patch area and SWPF is the swapable patch area.  These
        two symbols should be updated to point to the next free location in  the
        patch  area  when  a  patch is inserted.  By convention, all distributed
        patches are applied at  FFF.   This  serves  the  purposes  of  reducing
        confusion, always working until the patch area is exhausted, and leaving
        patches always present in a dump for the cases where that is important.



        2.1  Identifying The Type Of Crash

        The Monitor performs several basic operations, each of which has its own
        set  of  tables  and  data  structures.  Some of these operations can be
        defined as:

             1.  BUGHLT processing

             2.  JSYS processing

             3.  Page faults

             4.  PSI Service

             5.  Scheduling

             6.  DTE interrupt Service

             7.  Initiating I/O transfers (queueing)

             8.  Device interrupt Service

             9.  APR interrupt Service

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 106
        TOPS-20 Crash Dump Analysis


        2.2  The BUGHLT Itself

        There are specific areas in any crash  dump  that  can  be  examined  to
        determine the status and context of the system at the time of the crash.
        The most obvious of these is the  location  called  BUGHLT,  which  will
        contain  the  address  whence  the  BUGHLT  code was called.  It is good
        practice to remember  when  looking  at  this  address  that  there  are
        portions  of the monitor that were overwritten by the BOOT program, when
        the dump was taken, and therefore, the  contents  of  the  address  that
        called the BUGHLT code, that is, the location whose address is contained
        in location "BUGHLT", may not point to the same code that the  fiche  or
        the listings indicate.  A good example of such a BUGHLT is a PTNIC1, one
        that is a part of the APRSRV code, which is overwritten by BOOT.

        See the separate discussion of the BUGxxx macro in its  many  forms  for
        more information on this useful source of problem explanation.

        The BUGHLT's are performed by using the XCT instruction  of  a  location
        that  contains a JSR BUGHLT instruction.  In the locations following the
        JSR BUGHLT, is the list of additional data addresses, and then the  name
        of the BUGHLT, in SIXBIT format, such as "PTNIC1".  Finally in the event
        of multiple BUGCHK's, BUGINF's or even  nested  BUGHLT's,  the  location
        "BUGNUM"  contains  the number of BUGHLT's, BUGCHK's, and BUGINF's since
        the last system start-up.  This location is most helpful in obtaining  a
        clearer  view of the circumstances of the crash.  The case of the BUGHLT
        code itself causing a BUGHLT is extremely unusual, but in certain  cases
        of  extreme degradation of the system's data bases or "pure" code pages,
        this is a possibility.



        2.3  Summary Of PC Storage

        The storage of previous state PC is often  context  dependent,  however,
        some of the standard cases are listed below:

             1.  Crash PC - stored in location BUGHLT.

             2.  PC of JSYS - two copies are stored on the UPDL stack.

             3.  PFL/PPC - contain the current flags and PC of  the  process  at
                 the last context switch.  This might be a user or EXEC mode PC.

             4.  PIFL/PIPC - contain the flags and PC while a software interrupt
                 (PSI) is in progress.

             5.  SKDFL/SKDPC - PC saved here while process is blocking, in  case
                 of context switch.

             6.  MONFL/MONPC - PC saved here while process  is  starting  nested
                 JSYS, in case of context switch.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 107
        TOPS-20 Crash Dump Analysis


             7.  ENSKR/ENSKR+1 - PC saved  here  while  entering  scheduler  via
                 ENTSKD, in case of context switch.




        2.4  Summary Of AC Storage

        There are various areas that the ACs will be found, often  depending  on
        the context of the crash.  The "general" ones are:

             1.  UAC - previous context ACs are stored here  when  the  user  is
                 context  switched.  These are the ACs the last time the process
                 was dismissed.  If in a nested JSYS,  these  are  the  ACs  the
                 nested  JSYS  was  called  with;   the user ACs are in the UACB
                 stack.

             2.  UACB and ACBAS - the UACB block is  the  AC  stack  for  nested
                 JSYSes, and the location ACBAS (shifted left four) is the index
                 to the current set.

             3.  PAC - the EXEC mode ACs for a process are stored here when  the
                 process is dismissed.

             4.  PIAC - the EXEC mode ACs for a process are stored here  when  a
                 software interrupt (PSI) is in progress.

             5.  BUGACS - the EXEC mode ACs at the time of the crash.

             6.  BUGACU - the previous context ACs at the time of the crash.




        2.5  The Monitor's Stacks

        The next piece  of  valuable  information  is  contained  in  the  stack
        pointer, P.  This location will point to one of several possible monitor
        stacks, and will give a strong  indication  about  the  context  of  the
        monitor  at  the time of the crash.  Identifying the type of BUGHLT will
        usually be a direct indication of which stack will be  in  use,  however
        under  certain  circumstances, the monitor may crash while changing from
        one stack to another, and such a circumstance  could  provide  a  useful
        insight  into  the  state  of  the  system  just  before the crash.  The
        following are the names of several  possible  monitor  stacks,  and  the
        context under which each of them is used:


             BUGPDL    This stack is used while  performing  BUGHLT  processing.
                       It  will normally only be important if the system crashes
                       in a nested BUG.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 108
        TOPS-20 Crash Dump Analysis


             BUGSPL    This stack is used when generating KLSTAT blocks.

             UPDL      This  is  the  user  stack,  in  that  it  is  used  when
                       processing a user's JSYS in exec mode.  Whenever any user
                       executes a JSYS, this area in his PSB  is  used  for  the
                       stack.   Those  processes  under  job 0 which run in exec
                       mode will also use this stack.

             TRAPSK    This stack is used by the paging code whenever a  process
                       page  faults.   Normally a page fault will occur while in
                       the midst of performing some other function,  such  as  a
                       JSYS, and the stack pointer at the time of the page fault
                       will be in location TRAPAP, which in turn  will  in  this
                       case point to UPDL plus some offset.

             CFSSTK    This stack is used processing CFS code.

             PIPDB     This is used by the software interrupt handler.

             SKDPDL    This stack is used by the scheduler.

             DTESTK    This stack is used by the DTE interrupt service routines.

             PHYPDL    This stack is used by  PHYSIO  code  in  the  process  of
                       queueing  I/O  request blocks (IORB's).  These IORB's are
                       the  means  by  which  RH20/RH11   data   transfers   are
                       initiated.

             PHYIPD    This stack  is  used  by  the  PHYSIO  interrupt  service
                       routines, and therefore is the interrupt-level equivalent
                       of PHYPDL.  It is important to remember  that  these  two
                       stacks  are  independent of each other, and should not be
                       confused.

             PI5STK    This stack is used  for  unvectored  PI5  interrupts,  eg
                       KLIPA.

             PI6STK    This stack is used  for  unvectored  PI6  interrupts,  eg
                       KLNI.

             PIXSTK    This stack is used while processing  spurious  unvectored
                       interrupts.

             MEMPP     This stack is used when processing APR interrupts.

             IMSTK     This stack is used processing AN20 interrupt code.


             The stack that is being used, and the section of code that executed
        the  BUGHLT  will  indicate  the  type of BUGHLT that has occurred, file
        system BUGHLT's  will  be  observed  either  while  performing  a  JSYS,
        servicing  an interrupt, or otherwise attempting to access a file system
        that has been corrupted to the point of being unusable.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 109
        TOPS-20 Crash Dump Analysis


        3.0  BUGHLT CONTEXT (BUGPDL)

             The previous PC will be stored in location  BUGHLT.   The  ACs  are
        saved  in  the  block  at  BUGACS  (and loaded into the ACs by FILDDT by
        default), hence the saved stack pointer is at BUGACS+17.   The  previous
        context  ACs are stored in the block at BUGACU.  These are the user mode
        ACs unless in a nested JSYS at the time of  the  crash,  in  which  case
        BUGACU  has  the ACs the current JSYS was called with, and the user mode
        ACs are in the UACB block.

             The stack is set to BUGPDL.  In the case of a nested  BUGHLT,  AC17
        will point to BUGPDL, and location BUGLCK will display:

                o  -1 => no BUG in progress

                o   0 => one BUG in progress (the usual case)

                o  +N => N nested bugs in progress (very unusual - bugs
                         during the BUGxxx code)



        4.0  JSYS CONTEXT (UPDL)

             When a process executes a JSYS, the Monitor performs  the  JSYS  by
        dispatching  through  a table called JSTAB to the proper routine.  These
        routines are named by convention as the JSYS name, preceded  by  a  ".",
        thus  the  routine  to  perform the JSYS PMAP is called ".PMAP::".  This
        name is always a global symbol.  The last JSYS executed in user  context
        is saved in the PSB for the process, in location KIMUU1, and KIMUU1+1.

                KIMUU1/   flags,,104000
                    +1/   JSYS number


             The second of these locations will contain the dispatch  offset  in
        JSTAB;   this number, when combined with the JSYS opcode (104000,,0), is
        the last JSYS performed by the user.  This, then, will point  indirectly
        through  the  JSTAB  table  to  the  place  where  the  user  JSYS began
        processing.  By following the code, and examining the stack, it is often
        possible to reconstruct the events leading to the crash.

             The stack will contain two copies of  the  user's  program  counter
        (PC)  and  flags  in the first four locations of UPDL.  The PSB location
        MPP will contain the stack pointer at the time of last  JSYS,  and  each
        time  the  Monitor  performs a JSYS internally, this data is pushed onto
        the stack, and set to the current value of P.

        Initial JSYS stack set-up:

                UPDL/     PC
                UPDL+1/   flags
                UPDL+2/   PC
                UPDL+3/   flags

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 110
        TOPS-20 Crash Dump Analysis




        JSYS in Monitor context (nested JSYS):

                UPDL+n/   INTDF         ;old interrupts-deferred flag
                      /   MPP           ;previous PC, or level of nesting
                      /   Return PC of nested JSYS
                      /   PC flags


             So, MPP is the stack pointer for the return PC block.  If this is a
        nested JSYS, the ACs are saved in UACB at the proper nesting level.

        Some other useful locations in JSYS context are:



                                     JSB Locations


             USRNAM    This contains the name of the user, in ASCII.


                                     PSB Locations

             JOBNO     Contains the number of the job for this process.

             FORKN     Contains the fork number for the top fork of the  job  in
                       the  left  half  of  the word, and the fork number of the
                       current fork in the right.

             INTDF     Contains -1 if process is OKINT, 0 or  greater  if  NOINT
                       (defer all software interrupts for this job)

             NSKED     Contains 0 if process is OKSKED, 1 or greater if  NOSKED.
                       (defer scheduling of other forks)


                Monitor Fork Tables - indexed by the current fork number

             FKCNO     Contains the SPT offset that points to the second page of
                       the PSB in the left half of this word.

             FKINT     Contains the  pseudo-interrupt  communications  register,
                       with  flags  in  the  left  half  describing  the type of
                       request, and the channel number of  the  request  in  the
                       right half.

             FKINTB    Contains the pseudo-interrupt  channel  requests  pending
                       since the fork's last PSI interrupt.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 111
        TOPS-20 Crash Dump Analysis


             FKJOB     Job number of the fork in the left half,  and  SPT  index
                       for the JSB in the right half.

             FKJTQ     Part of a doubly linked list of forks  that  are  waiting
                       program  software interrupt the Monitor.  JTLST points to
                       the top fork on the list.

             FKNR      Contains in bits 0-8 the age stamp value at the last time
                       local garbage collection was performed.

             FKPGS     Contains the SPT indices for the process page  table,  in
                       the left half, and the PSB in the right half.

             FKPGST    Contains the address of the routine to test  for  balance
                       set  wait  satisfied in the right half, with test data in
                       the left.  If the fork is not in the  balance  set,  this
                       contains  the  time  of  day that the fork entered a wait
                       list.

             FKPT      Part of a linked list of forks on a particular  schedular
                       list,  such  as GOLST, WTLST, etc.  The right half of the
                       word contains the address of  the  next  element  in  the
                       list,  and  the  left half contains the amount of runtime
                       the fork's  job  will  have  accumulated  when  the  fork
                       exceeds its Balance Set Hold time.

             FKQ1      Contain the  fork's  remaining  run  quantum.   When  the
                       quantum  expires, the fork is moved to a lower run queue,
                       and given the appropriate new quantum.

             FKQ2      Contains the fork's schedular queue level number  in  the
                       left  half,  and  the  list  address, i.e.  GOLST, WTLST,
                       etc., in the right.

             FKSTAT    Contains the address of the schedular test routine  which
                       will determine when the fork is available to be placed on
                       the GOLST.

             FKTIME    Contains the time of day, in internal  format,  that  the
                       fork was placed on its current run queue.

             FKWSP     Contains the number of physical  pages  assigned  by  the
                       fork  in  the right half, and the working set size of the
                       fork when the fork entered the balance set in the left.



        5.0  PAGER CONTEXT (TRAPSK)

             Page faults trap through the user's UPT, by placing the  old  flags
        and  PC for the process in locations UPTPFL and UPTPFO respectively, and
        taking the new PC from location UPTPFN.  UPTPFN will usually contain the
        address PGRTRP, which is the beginning of the page fault code.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 112
        TOPS-20 Crash Dump Analysis


             The location being referenced and therefore causing the page  fault
        is  stored  in  UPTPFW,  also  called TRAPS0.  This contains the virtual
        address that page faulted in bits 13-35.  Bit 0 of this  word  indicates
        if the location is in user or exec (monitor) address space.  If this bit
        is set, the address is in user address space.

             The PGRTRP code copies TRAPS0 into TRAPSW (before  release  6),  in
        case  of  recursion.   This  code  will determine the nature of the page
        fault, and attempt to resolve it.  UPTPFL and  UPTPFO  are  also  called
        TRAPFL and TRAPPC respectively.

             The old stack pointer is saved in location  TRAPAP  (this  is  only
        relevant  if  the  page  fault  occurred  in exec mode).  The new stack,
        TRAPSK, is set up according to the context of the page fault, i.e., user
        context,  monitor  context,  or  recursive  page fault.  The form of the
        stack changes for Release 6.  First, for earlier releases:

             A page fault in user mode causes the stack to be set  up  with  the
        runtime,  return PC, and return PC flags in the first three locations of
        the stack:

                        TRAPSK/     runtime
                        TRAPSK+1/   return PC
                        TRAPSK+2/   return PC flags


             Page faults from monitor context have the following  initial  stack
        set-up:  (prior to release 6)

                        TRAPSK/     AC1
                        TRAPSK+1/   AC2
                        TRAPSK+2/   AC3
                        TRAPSK+3/   AC4
                        TRAPSK+4/   AC7
                        TRAPSK+5/   AC16
                        TRAPSK+6/   TRAPSW
                        TRAPSK+7/   runtime
                        TRAPSK+10/  PC
                        TRAPSK+11/  PC flags

        Recursive page faults will cause the following set up in TRAPSK, at  the
        time of the page fault:  (prior to release 6)

                        /   AC1
                        /   AC2
                        /   AC3
                        /   AC4
                        /   AC7
                        /   AC16
                        /   TRAPSW
                        /   PC
                        /   PC flags

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 113
        TOPS-20 Crash Dump Analysis


             For release 6, the code becomes more uniform, and the format of the
        stack  is  the  same for all cases;  however, some stack offsets are not
        made use of for all types of page faults -- as above.

             The code at PGRTRP sets up the TRAPSK stack, pushes CX on  it,  and
        the calls PFAULT, which has the following TRVAR:

                TRVAR <<PFACS,5>,PFHTIM,PFHTMP,PFHPFW,PFHFL,PFHPC>

        and so the stack looks like this:

                TRAPSK/   CX
                    +1/   return address of call to PFAULT
                    +2/   AC15 saved by TRVAR
                    +3/   AC1                   =PFHACS
                    +4/   AC2
                    +5/   AC3
                    +6/   AC4
                    +7/   AC7 = FX
                   +10/   runtime               =PFHTIM
                   +11/   TRVAR temp location   =PFHTMP
                   +12/   UPTPFW page fail word =PFHPFW
                   +13/   TRAPFL flags          =PFHFL
                   +14/   TRAPPC pc             =PFHPC
                   +15/   .TRRET

        Recursive page faults will indicate the level  of  recursion  in  TRAPC.
        This  location  is  normally set to -1 and is incremented every time the
        page fault code is called, and decremented when a page  fault  has  been
        satisfied.

             In examining a pager crash, it is usually a good idea to  begin  by
        tracing  down the Monitor's table entries for the location that faulted.
        This location is stored in location TRAPS0.  The identity  of  the  page
        causing  the  trap is stored in location TRPID, and will be in either of
        two forms:  page table number in left, and  page  number  in  right,  or
        simply  the page table number in the right.  The page table number is an
        SPT index, and the page number, if any, is an offset into the page table
        pointed  to by that SPT slot.  There are four Core Status Tables (CST's)
        indexed by physical page number, that are used to  keep  track  of  each
        page  in  the machine.  A page fault crash will usually have bad data in
        either the SPT slot indicated in TRPID, or one  of  the  CST's  for  the
        physical  page  pointed  to  indirectly through that SPT slot.  If TRPID
        contains PTN,,PN, then  find  location  SPT+PTN.   This  should  have  a
        physical  page  number  in  the right half.  Look at this physical page,
        offset by PN in TRPID to find the pointer to the page  that  caused  the
        fault.  Shared and indirect pointers in this location will point through
        another SPT location, but private pointers will point  directly  at  the
        physical page that we are looking for.  If TRPID contains just PTN, then
        SPT+PTN will point directly at the physical page  we  are  looking  for.
        Knowing  the physical page number, it is now possible to examine the CST
        tables for that page.  Refer to the section on referencing the  non-zero
        section CSTs for more info.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 114
        TOPS-20 Crash Dump Analysis


             CST0      Used principally by the  pager  hardware,  this  location
                       will  contain  the Process Use Register, mentioned in the
                       FKCNO table above, and the age stamp.

             CST1      Contains the system lock count, and  the  backup  address
                       for  the  page.   The  lock count indicates the number of
                       systen events necessary before the page will  be  swapped
                       out,  and  the  backup  address for the page.  The system
                       should never swap out a page with a non-zero lock  count.
                       The  backup  address  can be a disk or drum address for a
                       page in memory.

             CST2      Contains the home map location of the  page,  and  should
                       match the contents of TRPID.

             CST3      Is used by the software  to  create  lists  of  pages  in
                       various  states  of  use.   Those pages available for use
                       will be on the Replaceable Queue, and linked together  in
                       a doubly linked list.  Those pages awaiting swapping will
                       be on a swapping device  queue,  and  part  of  a  singly
                       linked  list.   Pages in use will contain the fork number
                       of the owner in bits 3-14, and the local disk address for
                       PHYSIO for the page.

             CST5      Contains the list of short I/O  Request  Blocks  (IORB's)
                       associated with the page.


             A few other significant locations for page faults are:


             RPLQ      Points to the beginning of the Replaceable Queue in CST3.

             NRPLQ     Contains the number of pages on the Replaceable Queue.

             SWPLST    Points to the beginning of the PHYSIO swap list, in CST3.

             NOF       Contains the number of OFN's in use in the SPT.



        6.0  PSI CONTEXT (PIPDB)

             Tables FKINT and FKINTB will be useful in determining the type  and
        timing  of  PSI  interrupts  pending  at  the time of the crash.  When a
        process has a PSI interrupt pending, it is flagged in  the  FKINT  entry
        for  that  fork,  and the scheduler will take note of this event and set
        the PPC location in the PSB for that  process  to  contain  the  address
        PIRQ.  This action takes place at location SCHED5 in the scheduler.

             The next time that the process is ready to run, it will continue at
        location  PIRQ,  which  will  set  up the PSI stack, PIPDB.  SCHED5 also
        moves the PSI request word from FKINT to PIMSK in the PSB.  Thus, it  is
        possible  to  check  this  location  for  the  last PSI request that was

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 115
        TOPS-20 Crash Dump Analysis


        scheduled.

             The old contents of PPC and PFL are stored in PIPC and PIFL by  the
        SCHED5  routine,  so these will indicate the point where the process was
        interrupted.  The ACs are  stored  in  the  block  at  PIAC,  hence  the
        previous stack pointer is at PIAC+17.



        7.0  SCHEDULER CONTEXT (SKDPDL)

             The scheduler is usually invoked in one of  two  ways:   through  a
        software  interrupt initiated by channel 3 PI routine, indicating that a
        set period of time has  elapsed  since  the  last  scheduler  cycle,  or
        through  the  ENTSKD  macro,  which is used by a running process that is
        about to dismiss.  In this way the scheduler is  guaranteed  to  run  at
        regular intervals, or whenever the system is idle.

             The primary entry point to the scheduler is SCHED0.  It is  through
        this  location control passes whenever the running process dismisses, or
        whenever one of the two scheduler clock cycles elapses.

             Briefly, the hardware traps on every clock  tick  through  location
        TIMVIL  in the EPT.  This location contains the instruction XPCW TIMINT.
        Again, as in the device interrupt  code,  this  instruction  causes  the
        flags and PC to be placed in locations TIMINT, and TIMINT+1, and control
        passes to the location in  TIMINT+3,  which  in  this  case  is  TIMIN0.
        TIMIN0  determines  whether  or not it is time to run the scheduler, and
        dismisses the interrupt.  The path  taken  by  the  KS-10  processor  is
        slightly  different,  taking  a 40+2*n interrupt on the CPU channel (3),
        but it winds up in the  same  place  (TIMIN0)  when  it  determines  the
        interrupt was for a clock tick.

             If the  scheduler  is  to  be  run,  TIMIN0  initiates  a  software
        interrupt  on  channel  7,  which causes a trap through the EPT location
        KIEPT+56 to PISC7R.  The instruction executed in  KIEPT+56  is  an  XPCW
        PISC7R,  causing  the  old  PC  and flags to be deposited at PISC7R, and
        control to begin at PISC7+1.  The PISC7 code sets  up  PPC  and  PFL  to
        contain  the old PC and flags, from PISC7R, and saves the process ACs at
        the time of the interrupt in a block of the PSB called PAC.

             Having set up for scheduler context, the PISC7 code then  transfers
        control to the SCHED0 routine.  Similarly, the ENTSKD macro does an XPCW
        ENSKR, causing a jump to  the  ENSKED  routine  that  does  the  context
        switch.

             The stack is set to SKDPDL.  The previous PC is stored by the  code
        in  PFL  and  PPC in the PSB.  The ACs are stored in PAC (exec mode) and
        UAC (previous context ACs) in the PSB.  The previous stack pointer is in
        the saved ACs.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 116
        TOPS-20 Crash Dump Analysis


        Some other useful locations in scheduler context:


             FORKX     contains a -1 if no fork is chosen or the fork number  of
                       the chosen fork.

             INTDF     Contains -1 if process is OKINT, 0 or  greater  if  NOINT
                       (defer all software interrupts for this job)

             NSKED     Contains 0 if process is OKSKED, 1 or greater if  NOSKED.
                       (defer scheduling of other forks)

             SKEDF3    If nonzero, will cause the scheduler  to  reevaluate  the
                       balance set and reschedule all forks.

             SKEDF1    If nonzero, indicates that a fork has been chosen to run,
                       and the scheduler should set the fork context.

             SKEDFC    If nonzero, forces a clear of the balance set and memory.

             RSKED     Contains the instruction to be  executed  when  a  NOSKED
                       process goes OKSKED.

             INSKED    If nonzero, indicates the scheduler  overhead  cycle  has
                       been entered.

             SSKED     Holds the NOSKED fork number, if any.

             SCKATM    The software clock that generates a channel  7  interrupt
                       when it has been decremented to zero.

             GOLST     Points to the beginning of the GOLST in the FKPT table.

             WTLST     Points to the Wait list in the FKPT table.

             TTILST    Points to the TTY input wait list in the FKPT table.

             FRZLST    Points to the list of frozen forks.

             WT2LST    Points to the list of  forks  waiting  to  be  unblocked.
                       (UNBLK1)

             TRMLST    Points to the list of forks waiting for another  fork  to
                       terminate.

             SUMNR    Contains the number of reserved pages.  (locked in memory)

             BALSHC    Contains the number  of  pages  reserved  due  to  shared
                       access.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 117
        TOPS-20 Crash Dump Analysis


        8.0  DTE INTERRUPT CONTEXT (DTESTK)

             DTE  interrupts  also  dispatch  through  locations  in  the   EPT,
        depending upon which DTE is interrupting.  For each DTE that could exist
        on a system (4), there is an eight word block in the EPT  used  to  keep
        up-to-date  information  for  that  DTE.  Not all of the DTE blocks will
        necessarily be used, however they will all  exist  in  the  EPT.   These
        blocks  begin  at location DTEEBP.  The format of one of these blocks is
        described below.  The DTE interrupt executes  the  third  word  in  this
        block, which contains a XPCW DTEN0.

             The old PC and flags will be stored at location DTEN0,  and,  since
        DTEN0+3  contains  ".+1", the system will begin processing the interrupt
        at location DTEN0+4.

             The flags and PC will be stored at DTETRA and  the  ACs  stored  at
        DTEACB  (previous  stack  at  DTEACB+17).   The new stack will be set to
        DTESTK.

             DTEN0 will then use INTDTE to process the interrupt.  This code can
        be found in the DTESRV module of the monitor.

        The DTE control block:

                DTEEBP/   To -11 byte pointer
                DTETBP/   To -10 byte pointer
                DTEINT/   "XPCW DTEN0"          ;dispatch for DTE-0
                      /   reserved
                DTEEPW/   Examine Protection Word
                DTEERW/   Examine Relocation Word
                DTEDPW/   Deposit Protection Word
                DTEDRW/   Deposit Relocation Word

        Note that the labels above apply only to DTE-0, and that  the  remaining
        DTE's must be offset by DTE-number X 8.

                Some other useful locations in the EPT:

                DTEFLG/   Operation Complete Flag
                DTECFK/   Clock Interrupt Flag
                DTECKI/   Clock Interrupt Instruction
                DTET11/   To -11 argument
                DTEF11/   From -11 argument
                DTECMD/   Command Word
                DTESEQ/   DTE20 Operation Sequence Number
                DTEOPR/   Operation In Progress Flag
                DTECHR/   Last Typed Character
                DTETMD/   Monitor TTY Output Complete Flag
                DTEMTI/   Monitor TTY Input Flag
                DTESWR/   Console Switch Register

        These location are found at offsets 444 through 457 in the EPT.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 118
        TOPS-20 Crash Dump Analysis


        9.0  PHYSIO I/O QUEUEING LEVEL (PHYPDL)

             All disk and tape I/O is initiated  through  the  PHYSIO  code,  by
        calling PHYSIO with a pointer to an I/O Request Block (IORB) in AC1, and
        the addresses of the Channel Data Block (CDB) and Unit Data Block  (UDB)
        in  AC2  (CDB,,UDB).   PHYSIO  validates the arguments passed to it, and
        then determines whether the IORB belongs  on  the  Position  Wait  Queue
        (PWQ) or the Transfer Wait Queue (TWQ).  These two queues are pointed to
        by offsets UDBPWQ and UDBTWQ in the UDB for the device.  Note that these
        are  offsets into the UDB, which will be in resident free space, as well
        as the  CDB's.   During  processing,  PHYSIO  will  keep  the  following
        information in the ac's:

                P1/   address of the CDB
                P2/   address of the KDB (for tapes or RP20) or 0
                P3/   address of the UDB
                P4/   address of the IORB being processed

        Since PHYSIO is called via the PUSHJ P, instruction, the previous PC  is
        saved  on  the caller's stack.  The P and Q AC's are stored on the stack
        via the SAVEPQ macro.

             PHYSIO does use a private stack, and the old stack pointer is saved
        in PHYSVP.

             Also, because PHYSIO does use a private stack, it is necessary  for
        the  process  calling  PHYSIO  to be NOSKED.  Also take note of the fact
        that IORB's are associated with the physical pages of  memory  that  are
        involved  with  the  I/O  through  pointers  in the CST5 table for those
        pages.  See the next section for more information in this area.



        10.0  DEVICE INTERRUPT CONTEXT (PHYIPD)

             Device  interrupts,  in  this  context,  refer  to  disk  and  tape
        interrupts,  those  devices  connected  through  the  RH20's.  Each RH20
        channel has a "Channel Logout" area  at  the  beginning  of  EPT.   This
        logout area is four words in length for each channel, the fourth word of
        which  contains  an  instruction  to  execute  on  an  interrupt.   This
        instruction  causes  the  system to dispatch to code actually in the CDB
        for the channel.

             On the 2020, the interrupts work  differently.   The  EPT  contains
        pointers  to  SM10 vector tables starting at address SMTEPT.  The number
        of the interrupting UBA (1 or 3) is used as an offset to SMTEPT to  find
        the  proper  vector  table, and then the function and device (read done,
        DZ11, etc...) is used as an offset into the vector table which  contains
        the  appropriate  XPCW  instruction  to  transfer control to the correct
        routine.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 119
        TOPS-20 Crash Dump Analysis


             The previous PC  and  flags  are  saved  in  the  area  immediately
        preceding  the  CDB;  offset CDBINT (value -6) is the location where the
        flags and PC are  stored.   When  the  interrupt  occurs,  the  hardware
        executes  the  instruction  in  the  channel logout area, which is "XPCW
        loc".  "Loc" is the address of the  CDB  for  this  channel,  offset  by
        CDBINT  (-6).   The XPCW instruction saves the flags at CDBINT(CDB), the
        PC at the next location, and gets the new flags and PC from the next two
        locations.  This area of the CDB, then, contains the following:

                CDBINT(CDB)/   old flags
                    -5(CDB)/   old PC
                    -4(CDB)/   new flags (0)
                    -3(CDB)/   new PC ( ".+1")
                    -2(CDB)/   MOVEM P1,CDBSVQ(CDB) ; save P1 in CDBSVQ
                    -1(CDB)/   JSP P1,PHYINT        ; dispatch to interrupt code
                CDBSTS(CDB)/   status and configuration flags


             PHYINT sets up to use the stack PHYIPD, and saves the  ACs  in  the
        block at PHYACS, therefore the previous stack pointer is at PHYACS+17.

             The KLIPA code takes a 40+2*n interrupt through the EPT to  EPT+52,
        thence  to  PISC5:   (in  STG) and from there to KLPSV:  (in PHYKLP) and
        finally to PHYINT:.

             The PHYINT code, then, resolves the interrupt, and returns  to  the
        old  PC  by  JRSTing through offset CDBJEN in the CDB.  This part of the
        CDB contains the following:

                CDBJEN(CDB)/   BLT 17,17
                         +1/   DATAO RH,CDBRST
                         +2/   XJEN CDBINT(P1)

        The last of these locations causes the system to  resume  where  it  was
        interrupted.    During   processing  of  the  interrupt,  the  following
        information may be found:

                P1/   address of the CDB
                P2/   address of the KDB or 0
                P3/   address of the UDB
                P4/   address of the IORB or argument code:

                        (P4) < 0 - schedule a channel cycle
                        (P4) = 0 - dismiss interrupt
                        (P4) > 0 - complete current request (IORB address)


             When the system is attempting to perform I/O to or from a  specific
        page  of physical memory, that page is locked into core, by incrementing
        the lock count in the CST1 location for that page.  If  a  device  error
        occurs  during  the  transfer of data for that page, then the CST5 entry
        for that page will have either a short I/O Request  Block  (IORB)  or  a
        pointer  to  a long (magtape or DSKOP) IORB.  The short IORB is only one
        word in length and is used for disk transfer requests,  i.e.,  swapping.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 120
        TOPS-20 Crash Dump Analysis


        In either case, the first word of an IORB, called IRBSTS, contains flags
        that describe the success or failure of the transfer.  It may be helpful
        to check these locations in the event of a PHYINT crash.

             The  following  offsets  contain  useful  information  for   PHYSIO
        crashes:

        In the UDB:

                UDBPS1/   cylinder number
                UDBPS2/   surface,, sector number
                UDBERC/   error retry count
                UDBERR/   status function for error retry

        In the CDB:

                CDBCNI/   status of channel when interrupt began.




        11.0  APR INTERRUPT CONTEXT (MEMPP)

             These interrupts are the result of one of numerous hardware  errors
        being  detected -- memory parity error, address parity error, NXM error,
        cache directory parity error,  SBUS  error,  IO  page  fail,  etc.   APR
        Interrupts, like Device interrupts, are vectored through the EPT, but in
        the case of the APR interrupts, the vector location is  a  part  of  the
        priority interrupt scheme.  These are priority channel 3 interrupts, and
        dispatch through location KIEPT+46, which contains an XPCW PIAPRX.  This
        is  the  channel  3  interrupt  routine.   As  in the case of the device
        interrupt, the XPCW PIAPRX will cause the PC and flags to be  stored  at
        locations  PIAPRX  and PIAPRX+1, and the processor will then jump to the
        location stored in PIAPRX+3, which is PIAPR+1.  PIAPR actually dismisses
        the APR interrupt, or BUGHLT's.

             This routine will set up its own stack, MEMPP.  The previous  stack
        pointer will be stored in MEMAP.

             The current AC block is switched to AC block 2 and so the  ACs  are
        not stored in memory.

             One unusual aspect about handling APR interrupts is that the  PIAPR
        code  changes the page fault trap vector, mentioned earlier, from PGRTRP
        to MEMPTP, in UPTPFN, to handle the special case of a page fault in  APR
        interrupt context.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 121
        TOPS-20 Crash Dump Analysis


        12.0  ARPANET INTERRUPT LEVEL (IMSTK)

             The interrupt stack is set to IMSTK via the MKNCTS macro called  in
        STG.  Interrupts enter through the XPCW at the NTIINT offset in the NCT,
        eg NTIINT+(NCTVT).  The  previous  PC  is  stored  in  a  doubleword  at
        NTIPCW+(NCTVT).   The  ACs are stored at NTSVAC+(NCTVT), so the previous
        stack is at NTSVAC+(NCTVT)+17.

             The location NCINPC+(NCTVT) contains the initial interrupt dispatch
        address.   The  dispatch  addresses  for  message  input  and output are
        NTIDSP+(NCTVT) and NTODSP+(NCTVT) respectively.  See the  definition  of
        the NCT table in ANAUNV.MAC.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 122
        Referencing CST Entries under TOPS-20 Version 6


                    REFERENCING CST ENTRIES UNDER TOPS-20 VERSION 6
                    -----------------------------------------------




             Under Release 6 of TOPS-20, a number of data structures  have  been
        relocated  out  of section zero of the monitor's address space.  In some
        cases this necessitated changes in the way those  data  structures  were
        accessed,  and how they are accessed via FILDDT in crash dumps.  The CST
        tables (with the exception of CST5) are one such data  structure.   They
        are  accessed  in  the monitor by indirect reference through a series of
        tables with names of the form CSTnX, e.g.  CST3X to reference CST3.  The
        tables  are  16 words long, where CSTnX + m is an indirect word pointing
        to CSTn and indexed by register m.  Therefore  the  monitor  can  use  a
        construct  such  as  MOVE  T1,@CST0X+P1 where previous monitors used the
        form MOVE T1,CST0(P1) to fetch the CST0 entry for  the  page  number  in
        register P1.

             The following is an example of a method that can be used in  FILDDT
        to  access  the CST tables in a crash dump, assuming we want to find out
        the CST information for page 237:

        @ENABLE (CAPABILITIES) 
        $FILDDT
        FILDDT>LOAD (SYMBOLS FROM) SYSTEM:MONITR.EXE
        [38136 symbols loaded from file]
        FILDDT>GET (FILE) SYSTEM:DUMP.EXE
        [ACs copied from BUGACS to 0-17]
        [Looking at file GIDNEY:<SYSTEM>DUMP.EXE.1]

        EP$U                                    ! Establish virtual mapping

        0,,CST0X[   5,,203000   $Q<CST0:        ! Define symbols CST0,...,CST3
        0,,CST1X[   5,,217000   $Q<CST1:        ! from the contents of the
        0,,CST2X[   5,,233000   $Q<CST2:        ! CSTnX tables zeroth location
        0,,CST3X[   5,,247000   $Q<CST3:   

        CST5=203001                             ! CST5 was not moved for v6

        CST0+237[   556000,,400321   .=5,,203237! Now we can reference the CST
        CST1+237[   101,,0                      ! entries for page 237 in the
        CST2+237[   624,,237                    ! same old way we did for
        CST3+237[   77770,,0                    ! earlier releases.
        CST5+237[   556000,,400321   

        ^Z
        $


             See SWSKIT document MONITOR-ADDRESS-SPACE.MEMOS for more detail.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 123
        The BUG Macro


                                     THE BUG MACRO
                                     -------------




             In the various releases of TOPS-20  the  BUG  macro  mechanism  has
        changed  in  form  many  times.  It has remained essentially the same in
        essence however.  The use of  the  BUG  macro  is  to  generate  a  code
        sequence  in the monitor to report the occurrance of a software-detected
        error and, in the  case  of  a  BUGHLT,  crash  the  system.   The  code
        generates an XCT bugname which is a call to the proper routine to handle
        the BUGHLT/CHK/INF and provides the argument list of additional data for
        the  BUG.   In  3- and 4-series monitors, the call is via JSR to BUGHLT,
        BUGCHK, or BUGINF.  In 5-series and later monitors, the call is via  JSR
        to  BUGHLT  or  CALL  to BGCCHK or BGCINF.  In addition, the single line
        descriptive text is appended to the BGSTR PSECT, and  a  pointer  to  it
        placed in the BGPTR PSECT.

             With current monitors, a document called  BUGHLT  Documentation  is
        included  with  the Software Notebook set, which brings together all the
        additional data that is now part of the BUG description.  This should be
        considered an essential debugging document.

             For 3-series monitors, all of the information for the BUG was found
        in-line  in  the  source file.  There was only a single line descriptive
        text, and so all information  about  the  condition  had  to  be  gotten
        directly from the code.

             For 4-series monitors, there is a file  called  BUGS.MAC  which  is
        part  of  the  monitor build process and which contains the detailed BUG
        descriptions as part of the DEFBUG macros.  BUGS.MAC assembles  as  part
        of the build of PROLOG.UNV, and the calls to the BUG macro in the source
        look like:  BUG(bugname,<additional data>).  For example:

                BUG(XBWERR)
                BUG(WSPNEG,<<FX,D>,<T2,D>>)

        That is, essentially all the descriptive text is in the  BUGS.MAC  file,
        and not in the source.  DEFBUG and BUG are defined in PROLOG.

             For 5-series monitors, the same method as for 4-series monitors  is
        used,  with  the  additional  data field descriptors taking on mnemonics
        instead of the "D".  The descriptive text is still all in BUGS.MAC.

             With Release 6, the procedure changes again.  The whole  BUG  macro
        text  moves  back in-line in each of the source modules, like Release 3,
        however, the long argument list with the long descriptive text  remains.
        The BUGS.MAC file disappears.  The calling name becomes BUG.  instead of
        just BUG without the period, and some new argument options are added.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 124
        The BUG Macro


             Here is a description from PROLOG, and an example of the  new  form
        for Release 6 of TOPS-20:

        ;Macros for defining BUGs
        ;General format for in-line bug macro call is:
        ;       BUG. (TYP,TAG,MODULE,WORD,STR,LOCS,HELP,CONTIN)
        ;
        ;TYP -  Flavor, HLT, CHK, or INF
        ;
        ;TAG -  Name of BUG
        ;
        ;MODULE -
        ;       Name of module in which BUG occurs.
        ;
        ;WORD - Flavor of BUG.  For instance, HARD for hardware-caused, SOFT
        ;       for software-caused.
        ;
        ;STR -  Short descriptive string describing cause of BUG, which gets
        ;       printed on CTY when BUG occurs.
        ;
        ;LOC -  List of locations whose contents should be displayed when the
        ;       BUG occurs.  Each location must be followed by a comma and
        ;       then a one-word descriptor of what the datum represents, for
        ;       instance UNIT or CHN.  Each pair of locations and descriptors
        ;       must be in angle brackets, and the angle-bracketed pairs must
        ;       be separated by commas with the entire LOC argument in angle
        ;       brackets.
        ;
        ;HELP - General documentation for the BUG
        ;
        ;CONTIN - Optional continuation address after BUGCHK or BUGINF is
        ;       logged.  Assumed to be in same section with call.


        For example, from PAGEM.MAC, the PCIN0 BUGCHK:


                    BUG.(CHK,PCIN0,PAGEM,SOFT,<PAGEM - PC has gone into section 0>,
        <<T2,PC>,<T1,PFW>>,<

        Cause:  A reference has been made to RSCOD or NRCOD in section 0.
                This should not happen because section 0 code cannot
                reference data in extended sections.  As an expedient,
                the page being referenced will be mapped to section 1
                with an indirect pointer.
        >)


             There is further information in the BUGHLT Documentation section of
        the TOPS-20 Notebook Set, and the SWSERR program is useful in extracting
        BUGxxx entries from ERROR.SYS.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 125
        Monitor Building Hints


                                 MONITOR BUILDING HINTS
                                 ----------------------


        1. GENERAL

        Judging from the  number of  requests for  help on  this subject,  the
        chances are that you  will be required to  rebuild a monitor  sometime
        during your career  as a  Software Specialist. The  reasons are  quite
        simple.  There are customers, who simply want functionality other than
        that provided  by  stock  monitors.  There  are  also  those  who  are
        experiencing performance problems. We  cannot forget the sales  folks.
        It is not  unusual to  have to  rebuild a monitor  in order  to run  a
        benchmark. A very common example  is increasing the OFN area.  Another
        quite common requirement is  to increase the  patch area (FFF).  Doing
        either of these and simply submitting a build control file will  often
        produce a bad monitor.

        We will talk about PSECTS in  relation to the Monitor's address  space
        but will  make  no attempt  to  define what  they  do. A good detailed
        discussion on the Monitor's address space is on pages 2-62 to 2-73  in
        the Release 4  Update Manual. Also  there is a  memo on the  Monitor's
        address space in the SWSKIT.


        2. BACKGROUND

        In V3A, all of the Monitor was in the same address space. Nevertheless
        there was a crunch on space. As  a result some PSECTS were allowed  to
        overlap. So  critical  was the  space  requirement, that  attempts  to
        increase the OFN area  or FFF usually resulted  in the overlapping  of
        PSECTS other the  the ones  permitted. Therein lies  the problem.  The
        Monitor produced from such a process would ordinarily be useless.
                                        
        With  the  development  of  V4,  the  space  requirement  became  more
        critical.  The Symbol Table became the object of concern. It  required
        a large number of pages, and in general, it is only used  infrequently
        under normal  conditions.  Hence  the Engineering  folks were  of  the
        opinion that  it should  be completely  eliminated.  We  objected.  It
        would be a nightmare to try  to debug the monitor without symbols.  It
        thus became  our  project  to  somehow keep  the  Symbol  Table  while
        conforming with  the space  restrictions.  We  decided to  remove  the
        Symbol Table and place it in  an alternate  address  space. It  should
        be noted  that  this  action  does  not  impact  adversely  on  system
        performance. With this change, the  build procedure and the  monitor's
        address space were reorganized.

        3. BUILD PROCEDURE

        Outlined below are some steps to guide you when rebuilding a  monitor.
        Bear in mind that this  is a guide and might  not account for all  the
        unusual situations.  This guide however, coupled with your  experience
        and common  sense will  most likely  do the  trick. PLEASE  READ  THIS
        ENTIRE MEMO BEFORE ATTEMPTING TO  REBUILD  YOUR  MONITOR. Also  please

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 126
        Monitor Building Hints


        read the build BEWARE file that is on the Installation tape.
                
        NOTE:   The customers Distribution Tape will have all the files needed
                to rebuild  the  monitor.  All  TOPS-20  modules  will  be  in
                TOPS-20.REL (or T2020.REL etc) The control file is  TOPS20.CTL
                (or T2020.CTL  etc).  The  link file  will be  NAME.CCL  where
                "NAME" depends upon what monitor is being used (could be 2020,
                ARPA etc.). For 2040/50, it is called LNKSCH.CCL. In any  case
                the TOPS20.CTL file  will have  the name. The  files you  will
                change will be one  of  the  PARAM's  file  and/or STG.MAC. It
                should be noted that the special LINK.EXE and MACRO.EXE needed
                to build V3A are not required under V4.
                
          ====> The very first thing to do is to use all the standard files to
                build a "vanilla" monitor without any changes.  This will show
                most of the bugs in your attempt without worrying  about  what
                you are changing having an effect; and hence, should result in
                a substantially reduced debugging time.

        STEP 1          Restore all files needed  from <n-SOURCES>. This  will
                        usually contain the monitor modules (TOPS20.REL file),
                        all needed source  files, all  build control,  command
                        and log files.
                        
        STEP 2          Carefully make the source changes as needed.
                        
        STEP 3          Examine the TOPS20.CTL  file. This  file will  usually
                        have logical name definitions and TAKE commands  along
                        with other things. Also look at all referenced files.
                        
        STEP 4          Examine the  corresponding log  file. This  will  show
                        what the result of  the original build procedure  was.
                        It should therefore be a template which should be used
                        to judge the validity of the new Monitor. Pay  special
                        attention to the section which shows the PSECT  layout
                        at the  end of  the BUILD  procedure. This  shows  the
                        start location,  the end  location and  the amount  of
                        free space between each PSECT.  The file used by  LINK
                        to set up the PSECTS is called LNKSCH.CCL. You  should
                        look at this file to get an idea of what's happening.
                        
        STEP 5          Now edit the control and command files as necessary to
                        reflect your environment. This will mean, among  other
                        things,   changing   or   eliminating   logical   name
                        definitions.  Do NOT change the order of the PSECTS in
                        the LNKSCH.CCL file. Also  do not change the  starting
                        value for any PSECT.  The starting value is the  value
                        given to the /SET: switch.
                        
        STEP 6          Submit  the  control  file  with  /TAG:SINGLE  switch.
                        Ensure that the control  file is correct and  reflects
                        accurately logical name definitions and the .CCL file.
                        Also this portion  of the .CTL  file has the  commands
                        necessary to compile the changed module.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 127
        Monitor Building Hints


                        
        STEP 7          When the job ends, examine your log file. Correct  any
                        compilation or  missing files  errors and  go back  to
                        STEP 6. Continue with STEP 8 only after all errors are
                        eliminated.
                        
        STEP 8          At this  point  you  should have  a  MONITR.EXE.   Now
                        examine the section  in the  log file  which gives  an
                        outline of  the  PSECTS.   If any  PSECTS  overlap,  a
                        message will  indicate  the  same.  If  there  are  no
                        overlapping messages, go to  STEP 11. NOTE: There  are
                        some   instances  where  PSECTs  can  overlap.  POSTCD
                        and SYVAR  PSECTs are  allowed to  overlap any  xxxVAR
                        PSECT. This will  not gain  very much in  storage -  4
                        pages to be exact. If  you  follow the build procedure
                        then overlapping  PSECTs are not allowed and therefore
                        must  be  resolved.  You  are  once  again advised NOT
                        to re-organize the monitor's address space.
                        
        STEP 9          Start with  the  first  overlapping.  Figure  out  the
                        amount of words by which  the first PSECT overlaps its
                        following PSECT.  Now  add  this value  to  the  start
                        location of  the overlapped  PSECT. This  value  quite
                        possibly will  be  location  within  a  page  i.e.  an
                        address of the form 125300,  where the page number  is
                        125 and the offset into the page is 300. The  starting
                        address of many  PSECTs is  required to be  on a  page
                        boundary i.e. an  address of the  form 126000. A  good
                        rule to  follow is:  IF THE  PSECT STARTED  ON A  PAGE
                        BOUNDARY BEFORE  THE BUILD,  THEN KEEP  IT ON  A  PAGE
                        BOUNDARY. This would mean that you may be required  to
                        add an additional value to round up to the next  page.
                        For example  the  125300  value would  be  rounded  to
                        126000 if the  PSECT is required  on a page  boundary.
                        The PSECT  sequence and  starting  values are  in  the
                        LNKSCH.CCL file.  NOTE: the  values are  all given  in
                        OCTAL so add in OCTAL.
                        
        STEP 10         EDIT the  LNKSCH.CCL file  to reflect  this new  start
                        value for the  overlapped PSECT.  Go back  to STEP  6.
                        Repeat these  steps  until  there are  no  more  error
                        messages. Note that changing the start location of the
                        overlapped PSECT can cause it to overlap its following
                        PSECT and  the  same  procedure must  be  followed  to
                        resolve any conflicts. Of  course you must be  careful
                        to ensure that you do not outgrow the monitors address
                        space. A total of the  length of all PSECTs will  tell
                        you if the Monitor is too large.
                        
        STEP 11         At this point you should have a good Monitor. Save  it
                        in the proper directory. The final test is getting  it
                        up and running.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 128
        MONITOR BUILDING NOTES FOR RELEASE 6




                          MONITOR BUILDING NOTES FOR RELEASE 6
                          ------------------------------------



             There have been even more changes  in  POSTLD  processing  and  the
        monitor's  address  space for version 6 of TOPS-20, some of which should
        be taken into account when attempting to build a new  monitor  from  the
        control  file.   This  is a list of the changes from the version 4.1/5.1
        build procedures.



             1.  A new file SYSFLG.MAC appears for version 6 builds.  This  file
                 is  not  used  explicitly in the customer build, but is used to
                 create PROLOG.UNV (and BOOT,  by  the  way).   SYSFLG  contains
                 system   configuration  flags  and  conditional  settings,  and
                 replaces the files KSPRE.MAC and KLPRE.MAC which now  disappear
                 (into  PROLOG  for  the  most  part),  along with PROKS.UNV and
                 PROKL.UNV.

             2.  The command  file  ASEMBL.CMD  has  been  split  more  or  less
                 arbitrarily  into  two  files:   ASMBL1.CMD  and  ASMBL2.CMD to
                 perform exactly the same function, but to put less of a  burden
                 on  the  EXEC.   These two files now contain comments about the
                 files to be compiled also, by the way.

             3.  There is a change in the  DDT  dialog  used  to  establish  the
                 breakpoints for BUGHLT and BUGCHK.  An additional breakpoint is
                 set at DDTIBP (which is XCT'ed by DDTINI) with  the  breakpoint
                 set  to proceed when hit.  The purpose for this is so that when
                 the monitor reaches a given state in  initializing  the  system
                 paging,  we  hit  a  DDT  breakpoint and DDT can then sense the
                 state of the world, according to the monitor, and can  set  its
                 own  internal  state  however  it  needs  to  reflect  extended
                 addressing considerations for EDDT.

             4.  POSTLD now tries to make PSECT juggling easier  by  making  one
                 try  itself.   If  the given configuration does not work due to
                 overlaps, POSTLD will try to write what should be a working set
                 of  values  (if  possible)  into two new files:  LNKNEW.CCL and
                 PARNEW.MAC.  It will then have  BATCON  transfer  to  an  error
                 label,  where  the  monitor load is tried again using these new
                 files.  There is a third new parameter file:   LNKINI.CCL  that
                 is  used  in  conjunction with LNKNEW.CCL, and does not contain
                 PSECT settings, which is also used in the try-again load.

             5.  The format of the PSECT map printed by POSTLD has changed  very
                 slightly,  but  the  content is still the same.  There are some
                 new PSECTs.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 129
        MONITOR BUILDING NOTES FOR RELEASE 6


             6.  POSTLD now writes the MONITR.EXE file using extended  sections.
                 This  has some implications for BOOT, which must now know about
                 extended sections, and any other program which might have  some
                 embedded knowledge of what the monitor .EXE file looks like.

             7.  The BUGSTF conditional feature  has  been  removed,  since  the
                 bugstrings  have  been  moved  out  of the way, and there is no
                 additional benefit derived from deleting them.

             8.  The HIDSYF conditional/feature has likewise been removed, as it
                 is assumed that the monitor symbol table is always hidden now.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 130
        EXEC Debugging


                                     EXEC DEBUGGING
                                     --------------


        Now that most SWS have micro fiche of the released EXEC  and  MONITOR  I
        anticipate  questions  on  looking  at  the EXEC and MONITOR.  Here is a
        cursory tutorial on investigating the internals of the EXEC (or  command
        processor,  if you prefer).  The examples are intended to be a guide and
        although the typein is  correct,  the  response  may  not  be  character
        perfect.   You  are  advised to read the other chapters in this document
        for more information on DDT and MONITOR snooping and debugging.




                              LOOKING AT THE EXEC WITH DDT
                              ----------------------------


        You can either look at the running system EXEC or your own copy  of  the
        EXEC with DDT that is loaded with the EXEC.


        I.      TO LOOK AT THE RUNNING EXEC:

        First you must have WHEEL privileges in order to use the ^EEDDT command.
        The  ^EEDDT  command  transfers control to the DDT now loaded with EXEC,
        with symbols.  Now you can do all the normal  DDT  functions.   To  exit
        from  DDT all you do is <ESC>G , echoed as $G.  This starts your program
        which is the EXEC and so now you are at EXEC command level.

                        @ENABLE
                        $^EEDDT
                        DDT
                        .
                        .
                        .
                        $G
                        $DIS
                        @

        II.     TO LOOK AT YOUR COPY OF AN EXEC (RUNNING UNDER SYSTEM EXEC):

        Get your copy of the EXEC in your address space, transfer control to  it
        and start DDT as above.  There are 3 ways to exit from this depending on
        the state you are in.  If you are in DDT you can ^Z out to get  back  to
        system  EXEC.   If  you  are  running  your EXEC and want to exit to the
        system EXEC you can ^EQUIT (if you are enabled) or "POP" (if you are not
        enabled).   POP  is preferable.  Note if you prefer to get your EXEC and
        not start it in order to  set  breakpoints  or  put  in  patches  before
        running, see section "VI -- PATCHING" below.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 131
        EXEC Debugging


        EXAMPLE EXITING FROM DDT:

                @GET MYEXEC.EXE 
                @SET NO CONTROL-C-CAPABILITY
                @START
                @MONNAM.TXT, TOPS-20 MONITOR (VERSION#)
                @ENA
                $^EEDDT
                DDT
                .
                .
                .
                CINITF/  -1   0         ; reset initialization flag to
                .                       ; run this EXEC again after saving
                .
                ^Z                      ; to exit and save, for example
                @                       ; now you are in the monitors EXEC
                                        ; with your EXEC in your address
                @SAV MYEXEC.EXE.2       ; space.  You can save it, say.




        EXAMPLE, EXITING FROM YOUR RUNNING EXEC:

                @GET MYEXEC.EXE
                @START
                @MONNAM.TXT,,TOPS-20 MONITOR(VERSION #)
                @ENA
                $^EEDDT
                DDT
                .
                .
                .
                CINITF/  -1  0          ; clear initialization flag
                $G                      ; running your EXEC
                .
                .
                $^EQUIT                 ; return to higher (system) EXEC
                $                       ; you are in system EXEC
                $SAV NEWEXEC            ; etc.



        EXAMPLE, EXITING FROM YOUR RUNNING EXEC WITH POP:

                @GET MYEXEC.EXE
                @START
                @MONNAM.TXT,,TOPS-20 MONITOR(VERSION#)
                @
                .
                .
                .
                @POP                    ; return to higher (system) EXEC.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 132
        EXEC Debugging


                @                       ; now you are in system EXEC.

                                        ; NOTE: you should set CINITF to 0
                                        ; if you want to save and run this
                                        ; EXEC later.  You can do it by DDT
                                        ; after the POP or ^EEDDT before.

        III.    GETTING OUT OF TROUBLE:


             Since it is true that you could get into trouble with your EXEC and
        not be able to get out of it, CTRL/C traps or you can't POP or whatever,
        there is a way to exit to the MINI-EXEC always.  First  you  must  issue
        ^EQUIT  to  get into the MINI-EXEC.  Then "S" (start) to get back to the
        system EXEC.  Then get into your EXEC.  If you now get into trouble  you
        can  issue  ^P which will get you back into the MINI-EXEC.  Now you have
        the chance to get back to the system EXEC with "S" (start).


                EXAMPLE:

                @ENA
                $^EQUIT
                INTERRUPT AT 15657
                MX>S
                $                               ; now back at system EXEC.
                $GET MYEXEC
                $
                $START
                @MONNAM.TXT, TOPS-20 MONITOR (VERSION)
                        .                       ; let's say your EXEC can't
                        .                       ; do anything - you are hung
                        .                       ; get out, get into MINI-EXEC
                ^P
                INTERRUPT AT 12345
                MX>S                            ; MINI-EXEC prompt then start.
                $                               ; now back at the system EXEC.


        IV.     RUNNING YOUR EXEC AS A TOP LEVEL FORK:


             Suppose that you want to run your EXEC as the top level EXEC,  that
        is,  not  running under the system EXEC.  Get into the MINI-EXEC and get
        your copy of the EXEC and run it as the top level EXEC.


                EXAMPLE:

                @ENA
                $^EQUIT
                INTERRUPT AT 23456
                MX>R                  ; Reset so you will MERGE not GET
                MX>G <MYAREA>MYEXEC.EXE.2

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 133
        EXEC Debugging


                MX>S
                @                     ; Now you are in your EXEC
                  .
                  .
                  .                   ; Lets say you want to get out 
                @^P                   ; Control-P to get to MINI-EXEC
                INTERRUPT AT 12345
                MX>R                  ; "RESET" resets your address space
                MX>E                  ; You are requesting the system EXEC
                @                     ; You are in system EXEC        

        NOTE:   If you had typed "S"  rather than "E" above you  would
                have restarted your EXEC.

        V.      REPLACING THE SYSTEM EXEC


             Once you have made a change to your personal copy of the EXEC,  you
        may  wish  to  have  your  edited  EXEC  run  as the SYSTEM EXEC.  It is
        necessary  to  make  the  saved  EXEC  non-writable  before   using   it
        system-wide.


                EXAMPLE:

                @ENABLE (CAPABILITIES) 
                $GET (PROGRAM) PS:<SYSTEM>EXEC.EXE
                $INFORMATION (ABOUT) MEMORY-USAGE 

                81. pages, Entry vector loc 6000 len 3

                0        PS:<SYSTEM>EXEC.EXE.1  1   R, CW, E
                6-125    PS:<SYSTEM>EXEC.EXE.1  2-121   R, E

                $!MAKE THE EXEC WRITABLE SO WE CAN EDIT IT
                $SET PAGE-ACCESS (OF PAGES) 6:125 (ACCESS) COPY-ON-WRITE 
                $DDT
                DDT
                .               ;Make the edits
                .               
                ^Z
                $
                $!MAKE THOSE PAGES NON-WRITABLE
                $SET PAGE-ACCESS (OF PAGES) 6:125 (ACCESS) NO WRITE 
                $SET PAGE-ACCESS (OF PAGES) 6:125 (ACCESS) NO COPY-ON-WRITE 
                $!SAVE THE NEW EXEC
                $SAVE EXEC.EXE.2 !New generation! (PAGES FROM) 6 (TO) 125 
                 EXEC.EXE.2 Saved
                $!RENAME THE SYSTEM EXEC SO WE CAN GET IT BACK IF WE NEED IT
                $RENAME (EXISTING FILE) SYSTEM:EXEC.EXE  SYSTEM:OLD-EXEC.EXE
                $!AND COPY THE NEW ONE INTO PS:<SYSTEM>
                $COPY (FROM) EXEC.EXE (TO) SYSTEM:EXEC.EXE.197 !New generation!

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 134
        EXEC Debugging


        VI.     OTHER INFORMATION:


             When searching for symbols you may notice that the module name  DDT
        gives  you is different from the module names that are assembled for the
        EXEC.  For example to open the symbol table for EXECED you  say  CANDE$:
        to DDT.

        The following is a correspondence list for EXECs before version 5:

                FILENAME        MODULE NAME     FILENAME        MODULE NAME
                ===========================     ===========================
                EXECDE.MAC      XDEF            EXEC0.MAC       EXEC0
                EXECGL.MAC      XGLOBS          EXEC1.MAC       EXEC1
                EXECPR.MAC      PRIV            EXEC2.MAC       EXEC2
                EXECED.MAC      CANDE           EXEC3.MAC       EXEC3
                EXECCS.MAC      CSCAN           EXEC4.MAC       EXEC4
                EXECSU.MAC      SUBRS           EXECMT.MAC      EXECMT
                EXECVR.MAC      VER             EXECQU.MAC      EXECQU
                EXECMI.MAC      MIC             EXECSE.MAC      EXECSE
                                                EXECP.MAC       EXECP


        For Release 5 of the EXEC, the TITLE statements in the EXEC modules have
        been  changed  to  match the module names so that this concordance is no
        longer necessary.

             The sources and .CTL file for assembling the EXEC are part  of  the
        SWSKIT.

             If it is true that upon trying to examine a  location  symbolically
        you  get  "U" implying the symbol is undefined you may have to reset the
        symbol table pointers.  Look in location 770001  for  the  address  that
        contains  the symbol table pointer then look at location 116 to find the
        real symbol table pointer.  Put the contents  of  116  in  the  location
        pointed to by 770001.

                116/   762600,54463   ; real symbol table pointer

                770001/  776456       ; location of symbol table pointer
                776456/  743200,,23540     762600,,54463


        VII.    PATCHING

             There is a patch command in DDT.  The form is as follows:

                $<                    ; patch before this instruction
                $$<                   ; patch after this instruction
                $>                    ; end patch following this instruction

        DDT will put the patch in the EXEC patch area.  The symbol is PAT..  DDT
        will  insert  JUMPA  1,LOC+1  and  JUMPA 2,LOC+2 following the patch you
        typed in.  Where LOC is the location of the instruction you're patching.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 135
        EXEC Debugging


        DDT  then  replaces  LOC,  the original INST., with a JUMPA XXXXX, where
        XXXXX is the patch area where your patch is now.  Then  the  patch  area
        (PAT..) is redefined to follow your last patch.


             EXAMPLE:

        Get a copy of  <SYSTEM>EXEC,  insert  calls  to  subroutine  MUMBLE  and
        subroutine  FRATZ  before  location  DING+1.   DING+1  contains PRINT Q3
        originally and contains a JUMPA to the patch area after the patch.   The
        patch area will contain:

                CALL MUMBLE
                CALL FRATZ
                PRINT Q3
                JUMPA 1,DING+2
                JUMPA 2,DING+3


        USER TYPESCRIPT FOR THE ABOVE:

                @ENABLE
                $GET<SYSTEM>EXEC
                $SAVE NUEXEC          ; you must SAVE and GET in order to
                $GET NUEXEC           ; write-enable the EXEC and use DDT
                $DDT                  ; instead of ^EEDDT
                DDT
                EXEC0$:               ; open symbols for module where DING is

                DING/ PUSH P,A        ; first location in routine "DING"
                DING+1/ PRINT Q3 $<   ; begin patching before location DING+1
                PAT../ 0  CALL MUMBLE ; DDT opens up PAT.. area, you add code
                PAT..+1/CALL FRATZ    ; continue to insert your patch
                $>                    ; close the patch
                PAT..+2/ PRINT Q3     ; the original instruction being replaced.
                PAT..+3/ JUMPA 1,DING+2       ; DDT inserts this return.
                PAT..+4/ JUMPA 2,DING+3       ; incase a SKIP inst.

                DING+1/  JUMPA 12345  ; JUMPA to PAT.. replaces original LOC.

                $G                    ; start your copy of EXEC etc.


             Various methods may be used to write-enable the EXEC for  patching.
        You  can use the GET, SAVE method above, or SET PAGE n COPY-ON-WRITE, or
        the $W command in DDT to achieve the same results.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 136
        Recovering from a Bad EXEC


                                RECOVERING FROM A BAD EXEC
                                --------------------------


             This procedure is simply a rehash of the procedure for recovering
        from  the  case  in  which  the  EXEC  refuses  to  log  in.  For more
        information see the article "Looking at the EXEC with DDT".

             If your system version of the EXEC blows up completely,  you  can
        recover  rather  easily.   You type a ^C on the CTY, and when the EXEC
        blows up you will be dumped into the MINI-EXEC.  Then you can use  the
        GET  and  START commands to read in a good version of the EXEC, either
        from a copy on disk, or from the distribution magtapes.

             If the problem with the EXEC is that it does not blow up, but  it
        still  fails  to let you log in, then you have a harder time.  In this
        case you have to bring up the system with the switches, and  bring  up
        the system stand-alone.  An example of what to do from the point where
        the BOOT program is loaded follows:

        BOOT>/L                 ; load in the monitor
        BOOT>/G141              ; start up EDDT

        EDDT
        DBUGSW[   0   2         ; set system as debugging
        EDDTF[   0   1          ; keep EDDT around

        GOTSWM$B                ; set a breakpoint after the swappable
                                ; part of the monitor has been loaded
        147$G                   ; start the system
        GOTSWM$1B>>   STEX+1/   HRROI T2,BOOTER+51   HRROI T2,FFF
        FFF[   ""PS:<SYSTEM>OLD-EXEC.EXE"
        FFF:                    ; change the name of the EXEC file
        0$1B                    ; remove the GOTSWM breakpoint
        $P                      ; proceed to bring up the system

        ^C                      ; and Control-C to get the new EXEC

        If  you had no old version of the EXEC around, then change the name to
        some garbage, so that the monitor can't find any such  program.   This
        will  then  dump  you into the MINI-EXEC, and then you can read a good
        EXEC in from magtape.

             In release 3 of the monitor, there is a new JSYS  which  is  very
        useful  for  debugging  new  versions of the EXEC.  The CRJOB JSYS can
        allow you to start up a new job with any program at all  as  it's  top
        level  fork.   You  can  also start the job not logged in.  So you can
        debug your new versions of the EXEC easily,  with  no  possibility  of
        ripping yourself off.     Of course the  ^EQUIT, GET from MINI-EXEC is
        still a valid sequence for starting a new top-level fork.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 137
        Debugging the GALAXY System



                              Debugging the GALAXY System
                              ---------------------------





        1.0  INTRODUCTION

        The GALAXY system presents a unique problem to the  software  specialist
        who  is  trying  to debug one of its components.  Usually, any user mode
        program can be debugged under TOPS-20 by running a copy  of  it,  loaded
        with DDT, taking appropriate care that nothing is done which will affect
        any users of the system.  For GALAXY, however, it is very  difficult  to
        not affect users of the system.  For example, if you are trying to debug
        BATCON, you will find that QUASAR will very happily schedule batch  jobs
        submitted  by  other  users  to  be  run by your BATCON.  If you are not
        careful, you can cause those batch jobs to be lost, or at  least  slowed
        down, while you are debugging.

        Debugging QUASAR or ORION would be even worse.  Users would  see  PRINT,
        SUBMIT,  etc.   commands  hang  when  you  hit  a  breakpoint in QUASAR.
        Operators would be unable to control any system components if  you  were
        breakpointed  in ORION.  On top of this, the monitor knows about QUASAR,
        and you may lose messages  which  happen  when  users  close  a  spooled
        lineprinter file, or when a job logs out.

        To solve these problems, the concept of a "private  GALAXY  system"  has
        been  implemented  in GALAXY and the EXEC.  When a private GALAXY system
        is operating, all of its components are completely  independent  of  the
        primary  GALAXY system.  QUASAR, the queue maintainer, keeps queues that
        are separate from the system queues and are failsofted  to  a  different
        master  queue file.  This QUASAR communicates only with other components
        in the same private system.  It is even possible to run several complete
        private GALAXY systems, with the restrictions that:

             1.  All components in a private system must run under the same user
                 name.

             2.  Only one private system may be run by a given user.

             3.  Each private QUASAR must be connected to a different directory.

             4.  Each private ORION must be connected to a different directory.



                                          NOTE

                       This text is oriented towards version  4.0
                       of   GALAXY,   and  there  may  be  slight
                       differences for version 4.2 or later.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 138
        Debugging the GALAXY System


        2.0  BUILDING A PRIVATE GALAXY SYSTEM

        Since the changes necessary to  create  a  private  GALAXY  system  were
        implemented  in  the  version  4 source code, it is relatively simple to
        build the system.  The recommended procedure is as follow:

             1.  Create a directory for the private GALAXY system.

             2.  Restore the file EXEC-FOR-DEBUGGING-GALAXY.EXE from the  SWSKIT
                 to  this  newly  created directory.  For Release 5 of the EXEC,
                 the  distributed  EXEC  replaces  the  need  for  this  special
                 program.

             3.  Restore each of the following files from the proper saveset  on
                 the TOPS-20 distribution tape to this directory.

                                BATCON.EXE              PLEASE.EXE
                                CDRIVE.EXE              QMANGR.EXE
                                GLXLIB.EXE              QUASAR.EXE
                                LPTSPL.EXE              SPRINT.EXE
                                OPR.EXE                 SPROUT.EXE
                                ORION.EXE

             4.  For each component in the  above  list  except  GLXLIB.EXE  and
                 QMANGR.EXE, perform the following steps:

                 1.  Give the EXEC command "GET xxxxxx.EXE"

                 2.  Give the command "DEPOSIT 135 -1"

                 3.  Give the command "SAVE xxxxxx"





        3.0  EXAMPLE OF A PRIVATE GALAXY BUILD

        It is not strictly necessary to restore all of the GALAXY components for
        a  one  time  only debugging session.  To debug a component like BATCON,
        you would need at a minimum:

             1.  Your own copy of BATCON

             2.  Your own copy of QUASAR for BATCON to speak to

             3.  Your own copy of ORION for BATCON and QUASAR to speak to

             4.  A copy of OPR to speak to ORION to control BATCON

             5.  An EXEC which knows about your QUASAR to make queue entries

        The following is a log of an example build of a private GALAXY system:

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 139
        Debugging the GALAXY System


        @ENABLE (CAPABILITIES) 
        $!
        $! First connect to a debugging directory
        $!
        $CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG> 
        $!
        $! Now build and save debugging .EXE files
        $!
        $! QUASAR, the queue maintainer
        $!
        $GET (PROGRAM) SYS:QUASAR.EXE.55 
        $DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
         [Shared] 
        $SAVE (ON FILE) QUASAR.EXE.1 !New file! (PAGES FROM) 
         QUASAR.EXE.1 Saved
        $!
        $! ORION, the message clearinghouse
        $!
        $GET (PROGRAM) SYS:ORION.EXE.53 
        $DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
         [Shared] 
        $SAVE (ON FILE) ORION.EXE.1 !New file! (PAGES FROM) 
         ORION.EXE.1 Saved
        $!
        $! OPR, the operator interface
        $!
        $GET (PROGRAM) SYS:OPR.EXE.55 
        $DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
         [Shared] 
        $SAVE (ON FILE) OPR.EXE.1 !New file! (PAGES FROM) 
         OPR.EXE.1 Saved
        $!
        $! BATCON, the batch controller
        $!
        $GET SYS:BATCON.EXE.39 
        $DEPOSIT (MEMORY LOCATION) 135 (CONTENTS) -1
         [Shared] 
        $SAVE (ON FILE) BATCON.EXE.1 !New file! (PAGES FROM) 
         BATCON.EXE.1 Saved
        $!
        $! Now a directory of what we've got
        $!
        $VDIRECTORY (OF FILES) *.*.* 

           MISC:<HEMPHILL.GALAXY.DEBUG>
         BATCON.EXE.1;P777700    16 8192(36)   13-Feb-80 22:00:37 
         EXEC-FOR-DEBUGGING-GALAXY.EXE.1;P777700
                                 82 41984(36)  13-Feb-80 04:33:50 
         OPR.EXE.1;P777700       31 15872(36)  13-Feb-80 22:00:09 
         ORION.EXE.1;P777700     44 22528(36)  13-Feb-80 21:59:45 
         QUASAR.EXE.1;P777700    40 20480(36)  13-Feb-80 21:59:27 

         Total of 213 pages in 5 files
        $

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 140
        Debugging the GALAXY System


        4.0  RUNNING THE PRIVATE GALAXY SYSTEM

        Starting and running a private  GALAXY  system  is  similar  to  running
        GALAXY  in  the  usual manner.  First QUASAR and ORION are started, then
        the component you wish to debug.   You  will  also  need  OPR  to  issue
        operator  commands  and  the EXEC to make queue entries.  Since you will
        need about five  jobs,  it  is  usually  most  convenient  to  run  each
        component as a separate subjob under PTYCON.



        4.1  Starting QUASAR

        QUASAR and ORION should be started before everything else.  Nothing evil
        happens  if  you  start  them last, but all the other components will be
        waiting for these two to start.  A suggested procedure is:

             1.  Define a subjob "Q"

             2.  Connect to it

             3.  LOGIN a job under the same user name

             4.  CONNECT that job to the directory in which you did the  private
                 GALAXY build

             5.  ENABLE

             6.  RUN QUASAR




        4.2  Starting ORION

        Starting ORION is as painless as starting QUASAR:

             1.  Define a subjob "O"

             2.  Connect to it

             3.  LOGIN a job under the same user name

             4.  CONNECT that job to the directory in which you did the  private
                 GALAXY build

             5.  ENABLE

             6.  RUN ORION

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 141
        Debugging the GALAXY System


        4.3  Starting OPR

        OPR starts up using the same formula as QUASAR and ORION:

             1.  Define a subjob "OPR"

             2.  Connect to it

             3.  LOGIN a job under the same user name

             4.  CONNECT that job to the directory in which you did the  private
                 GALAXY build

             5.  ENABLE

             6.  RUN OPR

             7.  You may now type OPR commands to see if QUASAR and ORION appear
                 to be healthy.




        4.4  Starting The Component To Be Debugged

        If the component you wish to debug is QUASAR, ORION, or  OPR,  then  you
        have already started it.  Breakpoints could have been set, and when they
        were hit, the component could have been debugged without  any  noticable
        affect  on  other  users  of  the  system.  If you wish to debug PLEASE,
        BATCON, LPTSPL, CDRIVE, SPRINT, or SPROUT, do the following:

             1.  Define a subjob with an appropriate ID (e.g.  B for BATCON)

             2.  Connect to it

             3.  LOGIN a job under the same user name

             4.  CONNECT that job to the directory in which you did the  private
                 GALAXY build

             5.  ENABLE

             6.  GET the component

             7.  Enter DDT

             8.  Set breakpoints, then start the program

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 142
        Debugging the GALAXY System


        4.5  Starting The Modified EXEC

        The file "EXEC-FOR-DEBUGGING-GALAXY.EXE" which has been supplied on  the
        SWSKIT  has  exactly  two  commands  added to its repertoire.  These are
        "^ESET DEBUGGING-GALAXY" and "^ESET NO DEBUGGING-GALAXY".  The effect of
        these  commands  is  to  select  which  one of two PIDs (Process IDs) to
        communicate with:  the system QUASAR or  the  private  QUASAR.   If  "NO
        DEBUGGING-GALAXY"  is  set,  then PRINT, SUBMIT, CANCEL, MODIFY, and the
        INFORMATION commands  will  all  cause  communication  with  the  system
        QUASAR.   If  "DEBUGGING-GALAXY" is set for this EXEC, then the commands
        listed will communicate with the private QUASAR run by that  user.   For
        Release  5  or later of the EXEC, the distributed EXEC incorporates this
        functionality   in   the   "^ESET   PRIVATE-QUASAR"   and   "^ESET    NO
        PRIVATE-QUASAR" commands, and the special EXEC is unneeded.

             1.  Define a subjob "E"

             2.  Connect to it

             3.  LOGIN a job under the same user name

             4.  CONNECT that job to the directory in which you did the  private
                 GALAXY build

             5.  RUN EXEC-FOR-DEBUGGING-GALAXY (or the Release 5 or later EXEC)

             6.  ENABLE

             7.  ^ESET DEBUGGING-GALAXY (or PRIVATE-QUASAR)

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 143
        Debugging the GALAXY System


        5.0  EXAMPLE DEBUGGING SESSION

        The following is a log of a sample debugging session:


         TOPS-20 Command processor 4(560)
        @!
        @! First run PTYCON, so we can control five jobs from one terminal
        @!
        @PTYCON.EXE.7 
        PTYCON> !
        PTYCON> ! Now start up QUASAR as subjob Q
        PTYCON> !
        PTYCON> DEFINE (SUBJOB #) 0 (AS) Q
        PTYCON> CONNECT (TO SUBJOB) Q
        [CONNECTED TO SUBJOB Q(0)]

         2102 Development System, TOPS-20 Monitor 4(3245)
        @LOG HEMPHILL (PASSWORD) 
         Job 21 on TTY222 13-Feb-80 22:18:05
        Structure PS: mounted
        Structure MISC: mounted
        @ENABLE (CAPABILITIES) 
        $!
        $! Connect to directory where debugging .EXE files are
        $!
        $CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG> 
        $!
        $! Finally run the component
        $!
        $RUN (PROGRAM) QUASAR.EXE.1 
        % QUASAR GLXIPC Becoming  [HEMPHILL]QUASAR     (PID = 66000031)
        % QUASAR GLXIPC Waiting for ORION to start
        ^X
        PTYCON> !
        PTYCON> ! Now start up ORION as subjob O
        PTYCON> !
        PTYCON> DEFINE (SUBJOB #) 1 (AS) O
        PTYCON> CONNECT (TO SUBJOB) O
        [CONNECTED TO SUBJOB O(1)]

         2102 Development System, TOPS-20 Monitor 4(3245)
        @LOG HEMPHILL (PASSWORD) 
         Job 22 on TTY223 13-Feb-80 22:19:25
        Structure PS: mounted
        Structure MISC: mounted
        @ENABLE (CAPABILITIES) 
        $!
        $! Connect to directory where debugging .EXE files are
        $!
        $CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG> 
        $!
        $! Finally run the component
        $!

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 144
        Debugging the GALAXY System


        $RUN (PROGRAM) ORION.EXE.1 
        % ORION  GLXIPC Alternate [HEMPHILL]QUASAR     (PID = 66000031)
        % ORION  GLXIPC Becoming  [HEMPHILL]ORION      (PID = 70000032)
        **** Q(0) 22:19:58 ****
        % QUASAR GLXIPC Alternate [HEMPHILL]ORION      (PID = 70000032)
        **** O(1) 22:19:58 ****
        ^X
        PTYCON> !
        PTYCON> ! Now start up OPR as subjob OPR
        PTYCON> !
        PTYCON> DEFINE (SUBJOB #) 2 (AS) OPR
        PTYCON> CONNECT (TO SUBJOB) OPR
        [CONNECTED TO SUBJOB OPR(2)]

         2102 Development System, TOPS-20 Monitor 4(3245)
        @LOG HEMPHILL (PASSWORD) 
         Job 23 on TTY224 13-Feb-80 22:20:29
        Structure PS: mounted
        Structure MISC: mounted
        @ENABLE (CAPABILITIES) 
        $!
        $! Connect to directory where debugging .EXE files are
        $!
        $CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG> 
        $!
        $! Finally run the component
        $!
        $RUN (PROGRAM) OPR.EXE.1 
        % OPR    GLXIPC Alternate [HEMPHILL]QUASAR     (PID = 66000031)
        % OPR    GLXIPC Alternate [HEMPHILL]ORION      (PID = 70000032)
        OPR>
        22:19:59          -- Network Node 1031 is Online --

        22:19:59          -- Network Node 2137 is Online --

        22:19:59          -- Network Node 4097 is Online --

        22:19:59          -- Network Node DN20A is Online --

        22:19:59          -- Network Node MILL20 is Online --

        22:19:59          -- Network Node SYS880 is Online --
        OPR>!
        OPR>! Let's take a look at our brand new queues
        OPR>!
        OPR>SHOW QUEUES 
        OPR>
        22:21:21          --The Queues are Empty--
        OPR>SHOW STATUS PRINTER 
        OPR>
        22:21:27          --There are no Devices Started--
        OPR>^X
        PTYCON> !
        PTYCON> ! Now start up BATCON as subjob B

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 145
        Debugging the GALAXY System


        PTYCON> !
        PTYCON> DEFINE (SUBJOB #) 3 (AS) B
        PTYCON> CONNECT (TO SUBJOB) B
        [CONNECTED TO SUBJOB B(3)]

         2102 Development System, TOPS-20 Monitor 4(3245)
        @LOG HEMPHILL (PASSWORD) 
         Job 24 on TTY225 13-Feb-80 22:21:49
        Structure PS: mounted
        Structure MISC: mounted
        @ENABLE (CAPABILITIES) 
        $!
        $! Connect to directory where debugging .EXE files are
        $!
        $CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG> 
        $!
        $! Finally run the component
        $!
        $RUN (PROGRAM) BATCON.EXE.1 
        % BATCON GLXIPC Alternate [HEMPHILL]QUASAR     (PID = 66000031)
        % BATCON GLXIPC Alternate [HEMPHILL]ORION      (PID = 70000032)
        ^X
        PTYCON> !
        PTYCON> ! Now start up special EXEC as subjob E
        PTYCON> !
        PTYCON> DEFINE (SUBJOB #) 4 (AS) E
        PTYCON> CONNECT (TO SUBJOB) E
        [CONNECTED TO SUBJOB E(4)]

         2102 Development System, TOPS-20 Monitor 4(3245)
        @LOG HEMPHILL (PASSWORD) 
         Job 19 on TTY226 13-Feb-80 22:23:00
        Structure PS: mounted
        Structure MISC: mounted
        @CONNECT (TO DIRECTORY) MISC:<HEMPHILL.GALAXY.DEBUG> 
        @!
        @! Run the special EXEC, which is provided on the SWSKIT
        @!
        @RUN (PROGRAM) EXEC-FOR-DEBUGGING-GALAXY.EXE.1 

         TOPS-20 Command processor 4(560)-1
        @ENABLE (CAPABILITIES) 
        $!
        $! Make this EXEC switch from system queues to private queues
        $!
        $^ESET DEBUGGING-GALAXY 
        $!
        $! Use ordinary EXEC commands to examine private queues
        $!
        $INFORMATION (ABOUT) OUTPUT-REQUESTS 
        [The Queues are Empty]
        $INFORMATION (ABOUT) BATCH-REQUESTS 
        [The Queues are Empty]
        $!

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 146
        Debugging the GALAXY System


        $! Now switch back to look at system queues
        $!
        $^ESET NO DEBUGGING-GALAXY 
        $INFORMATION (ABOUT) OUTPUT-REQUESTS 

        Printer Queue:
        Job Name  Req#  Limit            User
        --------  ----  -----  ------------------------
        * KLERR      6   1197  DEUFEL                     On Unit:0
           Started at 22:05:47, printed 314 of 1197 pages
          XXX        3     18  KAMANITZ                   /Dest:4097
          MS-OUT    18    117  BRAITHWAITE                /Unit:0
        There are 3 Jobs in the Queue (1 in Progress)

        $INFORMATION (ABOUT) BATCH-REQUESTS 

        Batch Queue:
        Job Name  Req#  Run Time            User
        --------  ----  --------  ------------------------
        * DUMP      16  02:00:00  OPERATOR                In Stream:0
            Job# 17 Running DUMPER Last Label: A Runtime 0:23:55
          BATCH      2  00:05:00  BLIZARD                 /Proc:FOO
          SOURCE     8  00:05:00  BLOUNT                  /After:14-Feb-80  0:00
          SRCCOM    12  00:05:00  MURPHY                  /After:14-Feb-80  0:00
          QJD4R     13  00:05:00  SROBINSON               /After:19-Feb-80  0:00
          QAR       10  00:05:00  BLOUNT                  /After:19-Feb-80  0:14
          SAVE       1  00:05:00  FICHE                   /After:19-Feb-80  9:10
        There are 7 Jobs in the Queue (1 in Progress)

        $!
        $! Now let's submit a batch job to our own BATCON
        $!
        $^ESET DEBUGGING-GALAXY 
        $!
        $! Make a trivial batch control file
        $!
        $COPY (FROM) TTY: (TO) A.CTL.1 !New file! 
         TTY: => A.CTL.1

        @SY A
        ^Z
        $!
        $! And submit the job
        $!
        $SUBMIT (BATCH JOB) A.CTL.1 
        [Job A Queued, Request-ID 1, Limit 0:05:00]
        $!
        $! Now examine private queues
        $!
        $INFORMATION (ABOUT) BATCH-REQUESTS 

        Batch Queue:
        Job Name  Req#  Run Time            User
        --------  ----  --------  ------------------------

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 147
        Debugging the GALAXY System


          A          1  00:05:00  HEMPHILL              
        There is 1 Job in the Queue (None in Progress)

        $!
        $! Our job is in the batch queue, but no batch-streams have been started
        $!
        $^X
        PTYCON> CONNECT (TO SUBJOB) OPR
        [CONNECTED TO SUBJOB OPR(2)]

        OPR>START (Object) BATCH-STREAM (Stream Number) 0
        OPR>
        22:25:40        Batch-Stream 0  --Startup Scheduled--

        22:25:40        Batch-Stream 0  --Started--
        OPR>
        22:25:40        Batch-Stream 0  --Begin--
                        Job A Req #1 for HEMPHILL
        OPR>
        22:25:51        Batch-Stream 0  --End--
                        Job A Req #1 for HEMPHILL
        OPR>
        ^X
        PTYCON> !
        PTYCON> ! Cleaning up is easy
        PTYCON> !
        PTYCON> KILL (SUBJOB) ALL
        PTYCON> EXIT (FROM PTYCON) 
        @

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 148
        Debugging the GALAXY System


        6.0  TECHNICAL DETAILS

        This section is to explain what happens differently when a component has
        had  location  135  (.JBOPS)  poked  to -1, and to present a few helpful
        tidbits of information about debugging some  of  the  programs.   .JBOPS
        incidentally  is  the  word in the job data area (defined under TOPS-10)
        which is reserved for a program's OTS.  GALAXY references this  location
        by the symbol "DEBUGW".



        6.1  GLXLIB

        GLXLIB is the GALAXY library.  It  consists  of  a  code  segment  which
        starts  at address 400000 and a data segment at address 600000.  Each of
        the programs QUASAR, ORION, OPR, PLEASE, BATCON, LPTSPL, CDRIVE, SPRINT,
        and  SPROUT  uses  it.  Part of the initialization code of each of these
        programs maps in GLXLIB as a "high  segment".   This  is  in  effect  an
        object  time  system for GALAXY, with many commonly used routines.  Most
        of the support for the private GALAXY system is in this library,  enough
        so  that OPR, PLEASE, BATCON, LPTSPL, SPRINT and SPROUT actually have no
        code which cares whether  they  are  part  of  a  private  GALAXY.   The
        initialization  code  in  each  component  looks in three places to find
        GLXLIB.EXE:  first on the structure and  directory  that  the  component
        itself  came  from, second on DSK:, third on SYS:.  This search order is
        the same for both the system GALAXY and the private one.

             The actual changes  implemented  for  the  private  GALAXY  are  as
        follows:

             1.  Ordinarily, a component which stopcodes will save a crash  file
                 on  disk.   When  debugging,  however,  the  crash  file is not
                 written.  In either case, if DDT is loaded  with  the  program,
                 the stopcode will invoke a jump to DDT.

             2.  GALAXY components do not require receiving  privileged  packets
                 under debugging.

             3.  Ordinarily, QUASAR and ORION get special system PIDs  for  IPCF
                 communications.   When  debugging,  they get PIDs with names of
                 the form "[username]QUASAR" and "[username]ORION".  All  GALAXY
                 components  will  then  look  for  these  PID  names.   Even  a
                 pseudo-GALAXY component, such as MOUNTR or IBMSPL, will be able
                 to  find  these  PIDs if its location 135 has been poked to -1,
                 simply because it uses GLXLIB.

             4.  GALAXY components print messages like:
                 "% QUASAR GLXIPC Waiting for ORION to start"
                 only while debugging.

             5.  ORION and QUASAR print messages about PIDs they acquire, like:
                 "% QUASAR GLXIPC Becoming  [HEMPHILL]QUASAR     (PID =
                 66000031)"

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 149
        Debugging the GALAXY System


             6.  All components print messages about the special PIDs they  find
                 for QUASAR and ORION, like:
                 "% ORION  GLXIPC Alternate [HEMPHILL]QUASAR     (PID =
                 66000031)"




        6.2  QUASAR


             1.  QUASAR reads and  writes  private  queues  from  its  connected
                 directory.  The full filespec is
                 "DSK:PRIVATE-MASTER-QUEUE-FILE.QUASAR"

             2.  QUASAR does  absolutely  no  privilege  checking.   Anyone  can
                 modify  or  kill any request in the queues (if they know how to
                 speak to this private QUASAR).




        6.3  ORION


             1.  ORION  will   create   a   log   file   under   the   name   of
                 "DSK:ORION-TEST.LOG"                 instead                 of
                 "PS:<SPOOL>ORION-SYSTEM-LOG.001", and does no renaming  of  any
                 old log files present.

             2.  ORION will not set up  any  NSP  servers  when  debugging.   It
                 therefore  will not speak to remote nodes to run OPRs for them.
                 However, there are hooks  for  ORION  to  initialize  "SRV:128"
                 instead of the usual "SRV:47" when debugging.




        6.4  QMANGR

        QMANGR has also been modified to look for a private QUASAR's PID if  the
        low segment has a non-zero entry in .JBOPS.



        6.5  CDRIVE

        CDRIVE can pose a problem  to  debug,  since  it  has  potentially  many
        inferior  forks  all executing the same code, so each fork automatically
        loads SDDT into its address space and jumps to it  when  it  starts  up.
        After  setting  any breakpoints or otherwise modifying this fork's code,
        the debugger types "GO<ESC>G" to resume the fork.  While  debugging,  if
        the  fork  terminates  (crashes),  CDRIVE will not go through its normal
        purging of the crashed fork, so that its status can be examined.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 150
        Debugging the GALAXY System


        7.0  EXAMINING GALAXY CRASH FILES

        All GALAXY components use the  stopcode  facility  supplied  by  GLXLIB.
        This  facility  dumps  the  ACs,  program  error codes, associated error
        messages, program version numbers, and the last nine  locations  of  the
        stack  onto  the  controlling  terminal  of  the  program  executing the
        stopcode.  In addition, a crash file is created with  the  name  of  the
        form:   PS:<SPOOL>program-stopcode-CRASH.EXE.   This  .EXE file contains
        the entire core image of the program which has crashed, and is extremely
        useful in determining the cause of the crash.  In particular, there is a
        block of data referred to as the "crash block"  which  usually  contains
        the information most pertinent to the debugger.  This information can be
        read with either DDT or FILDDT.  Its contents are tabulated as follows:

                Location                Data

                .SPC                    PC of stopcode

                .SCODE                  SIXBIT name of stopcode

                .SERR                   Last TOPS-20 error code

                .SACS                   Contents of the sixteen accumulators

                .SPTBL                  Base address of page table used by
                                          GLXMEM

                .SPRGM                  Name of program in SIXBIT

                .SPVER                  Program version number

                .SPLIB                  GLXLIB version number

                .LGERR                  Last GALAXY error code

                .LGEPC                  PC of last GALAXY error return

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 151
        Debugging MOUNTR


                                    DEBUGGING MOUNTR
                                    ----------------



        1.0  INTRODUCTION

             This write-up was prepared to assist developers and maintainers  in
        understanding  and  debugging  the  TOPS-20  tape and structure mounting
        program, MOUNTR.  It is assumed that the reader has a working  knowledge
        of  TOPS-20  assembler  language  coding  and the set of TOPS-20 monitor
        calls.



        2.0  SOURCES OF INFORMATION

             This document will serve primarily as a guide to  debugging  MOUNTR
        crashes.   Much  of  the information needed to understand the data bases
        and the operation of MOUNTR resides within the first 20 or 30  pages  of
        the MOUNTR code itself.  Just make a listing and start reading.



        3.0  DEBUGGING A LIVE MOUNTR

             MOUNTR  can  be  debugged  as  a  standard  GALAXY  component,   by
        depositing  -1  in location 135 of MOUNTR.EXE.  MOUNTR will aquire a PID
        for a private copy of QUASAR and will communicate with it.

             To debug a MOUNTR which is actually recognized by the system as the
        "real"  MOUNTR  it  is  usually  best  to  run  it  as a seperate job by
        including the following commands in SYSJOB.RUN:

             JOB n /LOGIN OPERATOR XX OPERATOR
             ENABLE
             GET SYS:MOUNTR
             START
             /

             This job can be reached by use of the ADVISE command, MOUNTR can be
        killed  and  a  new  copy can be started with appropriate breakpoints or
        patches installed.  Before MOUNTR can be patched or breakpointed  it  is
        necessary to issue the DDT command $W since MOUNTR write protects itself
        during execution.  For example:

             @ENABLE
             $ADVISE OPERATOR
              TTY2, NRT20
              TTY235, OPR
              TTY234, MOUNTR
              TTY233, PTYCON
              TTY232, EXEC
             TTY: 234

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 152
        Debugging MOUNTR


              [Pseudo-terminal, confirm]
              Escape character is <CTRL>E, type <CTRL>^? for help
              OPERATOR Job 3 MOUNTR

             LINK FROM MOSER, TTY 60
              [Advising]
             ^C                                   !KILL OLD MOUNTR
             ^C
             $GET SYS:MOUNTR                      !GET A NEW ONE
             $DDT                                 !ENTER DDT
             DDT
             $W                                   !YOU MUST DO THIS
             USM/   MOVEI 2,BSRRTA#   .$B         !SET SOME BREAKPOINTS OR WHATEVER
             DDSCIH/   JSP 16,SAVEQR#   .$B   
             ^Z                                   !EXIT DDT
             $START                               !START MOUNTR

             Depositing 1 in location CDFLG will  enable  CONTROL-D  interrupts.
        Typing CONTROL-D when enabled causes MOUNTR to enter DDT.



        4.0  MOUNTR CRASHES

        When MOUNTR crashes, it saves its core image in the file,

             PS:<SPOOL>MOUNTR-CRASH.EXE

        All crashes are initiated by a CALL STOP instruction.  This  may  result
        from  a  logic  inconsistency,  or  it  can  happen if MOUNTR receives a
        software interrupt on a panic channel.  The STOP  routine  gathers  some
        important data and saves it in core.  It then types a message giving the
        name of the filespec wherein it is saving the core image, and issues  an
        SSAVE  JSYS to save the image.  After restoring the ACs from the time of
        the crash, MOUNTR halts.

        To begin debugging a MOUNTR crash, follow these steps:

             1.  GET PS:<SPOOL>MOUNTR-CRASH.EXE

             2.  Get into DDT and type STOP1$G.  This will load DDT's  ACs  with
                 MOUNTR's  ACs  at  the  time of the crash and exit to the EXEC.
                 Give the DDT command to the EXEC again to get back into DDT.

             3.  Look at P (AC 17).  If it contains  PDL1+something,  there  has
                 been  a  stack  trap,  and  the  routine  STOPP was called as a
                 result.  The location BADP contains the contents of  P  at  the
                 time of the trap.

             4.  If P contains PDL+something, type TAB to look at the top of the
                 stack.  This will contain one plus the address of the CALL STOP
                 instruction.   Type  TAB  and  ^H  to  display  the   CALL STOP
                 instruction that invoked the crash.  If MOUNTR died as a result
                 of a panic channel interrupt, LPC1 will contain  one  plus  the

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 153
        Debugging MOUNTR


                 address  of  the instruction which was executing at the time of
                 the interrupt.



        The following locations and data structures are helpful in locating  the
        cause of difficulties in MOUNTR:


        NAME    FUNCTION
        ----    --------
        CRSHAC  Contains the ACs at the time the STOP routine was called.

        LPC1    For crashes caused by panic channel  interrupts,  LPC1  contains
                one plus the address of the instruction that caused the crash.

        LSTERR  Contains the last TOPS-20 error.

        MRPDB   PDB for last IPCF message received by MOUNTR

        MSTRBK  Used as an argument block for MTOPR and MSTR monitor calls.

        RBUF    Last IPCF message received by  MOUNTR  (particularly  useful  if
                SSSDAT+1  contains  MRCVIH, indicating that MOUNTR crashed while
                processing an incoming IPCF message).

        SSSDAT  When MOUNTR  crashes,  SSSDAT+1  contains  the  address  of  the
                routine  that  was invoked by MOUNTR's scheduler.  Starting here
                and using the stack, you can trace  the  execution  of  MOUNTR's
                code that led to the crash.

        TBUF    Last IPCF message sent by MOUNTR.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 154
        Debugging PA1050


                                    DEBUGGING PA1050
                                    ----------------


        In order to debug the compatibility package you must have a copy of  the
        file  called PAT.EXE.  PA1050 is just the system name for PAT.  If there
        is no copy of PAT.EXE, then take the source program called PAT.MAC,  and
        assemble  it,  thereby creating a sharable save file called PAT.EXE.  To
        debug the compatibility package the following steps are required.

        $RESET
        $GET PROG          ;Where PROG may be any program you choose
        $MERGE PAT         ;PAT is the source name for PA1050
        $DDT
        PAT$:   MOVBF$B    ;You set your breakpoints here
        DEBUG$G
        $G                 ;You must type $G twice because of the double  symbol
                            table


                                          NOTE

                       Some of the error messages you may receive
                       from  PA1050  may  not  be  the true error
                       message.   To  have  the   correct   error
                       message  printed  out  use an ERJMP, or an
                       ERCAL after the JSYS  it  fails  on.   For
                       more  information on ERJMP and ERCAL refer
                       to the Monitor Calls Reference Manual.


        In order to build the compatibility  package  the  following  steps  are
        required:

        $LOAD /CREF PAT.MAC
        $START
        $SAVE PAT
        $GET PAT
        $DDT
        MAKEPF$G
        Output file: PA1050.EXE
        $DDT
        UDDT
        40000,,0$X
        ^Z
        $I MEM

        The start after loading causes the program to be moved from its location
        to  its  running location in high core.  The symbol table is also moved,
        and the pointer adjusted.  A sharable save file of pages 700-777 must be
        made  for  debugging.   This  is created when you MAKEPF$G, then execute
        40000,,0 in UDDT.  When you type I MEM you should now have PA1050.EXE in
        700-730.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 155
        Copying Floppy Disks


                                  COPYING FLOPPY DISKS
                                  --------------------


        This is  a description  of the  front end  program COP  (quick  floppy
        copy). This program  should be  used to  create backup  copies of  the
        distributed set of floppies.

        CAUTIONARY NOTES ABOUT FLOPPY DISKS:

        1)      Only IBM floppies should be used.  Other floppies  may
                destroy the DX11 drives.

        2)      Floppies have  a  finite  life while  mounted  in  the
                drive. The heads do not  float, and the floppies  turn
                continuously.  This causes the magnetic surface to  be
                eaten away. Minimum floppy life is something like  200
                hours.

        3)      Floppies which are dropped, badly shocked, or used  as
                frisbees will lose their  sector headers, and will  be
                good for nothing.

        4)      Never put a floppy which you suspect is bent into  the
                drive -- it may damage the drive. 

        5)      COP  is discussed  also in  the  Front End File System
                Specification  manual  in  Volume  14 of  the  TOPS-20
                Software Notebooks, section 3.2.


        COP COMMANDS:

                The basic COP command string is of the form:

                  COP> <destination device>/<switch>=<source device>

                To  enter  COP, type a Control-backslash to get to the
                Parser,  then  MCR COP  to start up COP.  The floppies
                should have  already been mounted with  MCR MOUNT, and
                should  then be dismounted with  MCR DMOUNT  after the
                copy.

        COP SWITCHES:

                /HE     Help, types a list of switches
                /RD     Read Device, check for errors
                /CP     Copy (default action)
                /VF     Verify copy (default when copy in effect)
                /ZE     Zero the device

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 156
        Copying Floppy Disks


        COP EXAMPLE:

                The  following  sequence  of commands will  succeed in
                copying  the contents of the floppy in DX0:  (the left
                hand drive) onto the floppy in DX1:, and verifying the
                operation.

                ^\
                PAR>MCR MOU
                MOU>DX0:
                Mount completed
                MOU>DX1:
                Mount completed
                MOU>^Z
                ^\
                PAR>MCR COP
                COP>DX1:=DX0:
                COP>^Z
                ^\
                PAR>MCR DMO
                DMO>DX0:
                Dismount Complete
                DMO>DX1:
                Dismount Complete
                DMO>^Z

                The copy takes about two minutes, the verify about the same.
                Take  care to  specify the  correct source  and  destination
                devices.

        CAUTIONARY NOTE--

                If you  COP for  many generations  you will  build  up
                ghost bad  blocks until  RSX will  declare the  floppy
                useless. This is  because in each  generation the  bad
                block file of the  old floppy is  copied onto the  new
                (which will have its bad blocks in different  physical
                locations).  A way around this  is to use PIP for  any
                non-boot copies once every several generations.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 157
        The SWSKIT Documentation Files


                             THE SWSKIT DOCUMENTATION FILES
                             ------------------------------



        Following is a brief synopsis of each article/file that appears  in  the
        SWSKIT  documentation.   Please  note  that  many  of these articles are
        preliminary functional specs  and  discussions,  and  may  contain  some
        information that is completely false.  However, the material is provided
        to be used with proper caution because it does provide  information  not
        otherwise  available  in  useful  form at this time.  Over time, many of
        these documents will be replaced by  SDC-type  materials.   For  others,
        these articles may be the main source of information indefinitely.


                     TITLE                      DESCRIPTION


            * HANDBOOK              This document is the latest revision of  the
                                    TOPS-20 Trouble-Shooting Handbook.

              ACCOUNTING            This article describes the changes  made  to
                                    allow  the billing rates for system usage to
                                    change during the day.  It also  explains  a
                                    feature called disk accounting.

              ACCOUNTING-TABLES     This   file   documents   the   formats   of
                                    SYSTEM-DATA.BIN    and   CHECKPOINT.BIN   in
                                    tabular format.

              ARCHIVE               This document describes the functionality of
                                    archiving, and how to use archiving.

            * CFS-INFO              This document describes  the  implementation
                                    of the Common File System (CFS) for TOPS-20.

              CI-INFO               This document contains files describing  the
                                    implentation   of   support  for  the  CI-20
                                    (KLIPA) I/O port for the DECSYSTEM-20.

            * CTERM-INFO            This document describes  the  implementation
                                    of  the  CTERM  protcol  terminal  links for
                                    TOPS-20.

              DDT-INFO              This document describes changes made to  DDT
                                    for versions 41, 41A, and 43.

              DDP                   This document discusses some aspects of  DDP
                                    (Distributed  Data  Processing)  on TOPS-20.
                                    (Very early paper.)

              DEBUGGING-GALAXY      This  document  describes  how  to  build  a
                                    private  GALAXY  system  for  debugging, and
                                    gives hints on debugging various components.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 158
        The SWSKIT Documentation Files


              DX20                  This document gives a brief  description  of
                                    the  configuration  requirements  for  tapes
                                    controlled by the DX20.

              EAPGMG                This document describes Extended  Addressing
                                    and programming in non-zero sections.

              EXECUTE-ONLY          This   document   describes   the   changes,
                                    restrictions,   and   implementation  of  an
                                    execute-only file capability on TOPS-20.

              GALAXY-TABLES         This document is the tables for GALAXY v4.2.

              GALAXY-V5             This document contains  discussions  of  the
                                    changes   to   GALAXY   for  version  5  and
                                    specifications of the QUEUE% JSYS.

              GETOK                 This  document   describes   three   JSYSes:
                                    GETOK, RCVOK, and GIVOK;  and also describes
                                    the SMON function.

              HSC-INFO              This document describes the programming  and
                                    use  of  the  HSC-50  storage  controller by
                                    release 6.0 of TOPS-20.

              IO                    This document describes some of the  aspects
                                    of how IO is done by TOPS-20.

              KFS                   This  document   explains   the   functional
                                    specification of the RSX-20F KLINK link.

              KLCOM                 This  document  describes  the   KL10/PDP-11
                                    DTE20  protocol.  It explains such things as
                                    the protocol messages, error  messages,  and
                                    bootstrap procedures.

              KL873                 This document describes the functionality of
                                    all the revisions of the BM873 Bootstrap ROM
                                    for KL10 based on PDP-11 Front-Ends.

              LABELED-TAPES         This document describes TOPS-20's support of
                                    labeled  tapes.  It also gives a description
                                    of the monitor calls  and  support  routines
                                    that are used for labeled tapes.

            * LAT-INFO              This     document     collects      software
                                    specifications for the 10/20 host support of
                                    LAT-based   terminals,   architecture    and
                                    implementation.

              LP20                  This functional specification describes  the
                                    interface to the LP-20 from the KL-10.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 159
        The SWSKIT Documentation Files


            * MONITOR-ADDRESS-SPACE This document describes the changes to  BOOT
                                    and  DDT  to  enable more address space.  It
                                    also explains about PSECTS, and  overlapping
                                    BGSTR    to    build    monitors   and   the
                                    write-protecting of the resident monitor and
                                    the  parts  of the Release 6.0 address space
                                    project, moving things out of section 0/1.

            * MONITOR-TABLES        This document displays most of the tables in
                                    the monitor.  This is a best effort based on
                                    the ED SERVICES materials and will doubtless
                                    be  not  as  complete  as  the  eventual  ED
                                    SERVICES document.

              MOS                   This document describes MOS memory  and  the
                                    TOPS-20 monitor support of TGHA.

              MSCP-INFO             This  document  contains  the   design   and
                                    functional  specifications  for  the TOPS-20
                                    implementation of the Mass  Storage  Control
                                    Protocol (MSCP) server and driver.

            * NI-INFO               This  document  describes  NIA-20   Ethernet
                                    adapter support for TOPS-20.

              PARITY                This document describes some of the  changes
                                    made  to  the  way parity errors are handled
                                    for Release 5.

            * PERFORMANCE           This  document   discusses   a   number   of
                                    performance issues.

              RSX-STOP-CODES        This documents a list of RSX-20F stop codes,
                                    stating  their  meaning, and the module that
                                    contains the stop code.

              SCA-INFO              This document  describes  SCA,  the  Systems
                                    Communication   Architecture  protocol  used
                                    over the CI bus.

              SCHEDULER             This   document   describes   Working    Set
                                    Swapping,  and  Release  4  and  6 Scheduler
                                    changes (Class Scheduler, SKED%, etc.).

            * TCPIP                 This document describes the  TCP/IP  ARPAnet
                                    software implementation for TOPS-20AN.

              USEFE                 This document outlines how  to  use  the  FE
                                    device and program.


                * indicates new or updated material for this SWSKIT version.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 160
        The SWSKIT Tools Programs


                               THE SWSKIT TOOLS PROGRAMS
                               -------------------------





        Included on the SWSKIT are a number of utility programs,  as  summarized
        below.   These tools have been found to have at least some usefulness in
        the past in a debugging environment.  Most of these programs require the
        user  to have WHEEL or OPERATOR privileges to work, but also most are of
        the "show and tell but don't touch" category, so  they  are  in  general
        "safe" to run.

        We have cleaned up some of the old ones a bit, added a few new ones, and
        checked them all out to the extent that they will all run.  There should
        even be some documentation, at least a HELP file, with each program.

        While we do not actively "support" these programs, we are quite  willing
        to accept complaints and suggestions and submissions from the field.

        These are the "standard" tools;  the Marlboro Support Group is generally
        familiar  with  their  operation and quirks, and in providing support to
        the field may request that one or more of the  programs  be  used  at  a
        customer  site  to  diagnose or assist in correcting a problem.  This is
        generally more effective than random poking about in DDT, or  trying  to
        learn the peculiarities of whatever the customer may have available.

        And now, the current collection:




                  PROGRAM                       DESCRIPTION
                  -------                       -----------


                  ACTDMP              Converts an ACCOUNTS-TABLE.BIN  file  back
                                      into  a  sequence of commands that created
                                      it for debugging purposes.

                  CHANS               Produces system configuration, and  status
                                      information on tapes and disks.

                  DIRPNT              Lists the contents of the blocks in a disk
                                      directory.

                  DIRTST              Checks the format, and lists  any  invalid
                                      data in directory files.

        TOPS-20 TROUBLE-SHOOTING HANDBOOK                               Page 161
        The SWSKIT Tools Programs


                  PROGRAM                       DESCRIPTION
                  -------                       -----------

                  DMPTTY              Produces internal monitor  information  on
                                      the state of a given terminal line.

                  DS                  Provides    software    diagnostic    help
                                      concerning  the  disk file system.  It can
                                      also  perform  the  functions   of   READ,
                                      FILADR, and UNITS.

                  DSKERR              Provides a convenient listing of the  hard
                                      and soft disk errors that have occurred.

                  DX20PC              Traces the microcode PC in the DX20.

                  ENVIRONMENT         Types out  CPU  and  memory  configuration
                                      information.

                  JSTRAP              Produces information in a log on any JSYS,
                                      including the PC and arguments used.

                  MONRD               Allows you to easily examine  the  running
                                      monitor.

                  MTEST               Allows you to insert  MONITOR  instruction
                                      execution tests anywhere in the monitor.

                  REV                 Allows you to easily alter, edit,  delete,
                                      obtain information, etc.  on files.

                  SWSERR              Produces  a  convenient  listing  of   BUG
                                      HLT/CHK/INF occurrences.

                  TYPVF7              This program is useful for typing out  the
                                      contents of a VFU file in a readable form.

                  UNITS               Produces  status   information   on   disk
                                      drives.

                                                                    Page Index-1



                             TROUBLESHOOTING HANDBOOK INDEX




        2020
          See KS-10

        Address break  . . . . . . . . . . . 32

        BOOT
          commands . . . . . . . . . . . . . 82
          getting a dump . . . . . . . . . . 85
          Monitor memory pages . . . . . . . 85
        BUG macro  . . . . . . . . . . . . . 123
        BUGCHK . . . . . . . . . . . . . . . 98, 123
        BUGHLT . . . . . . . . . . . . . . . 98, 123
        BUGINF . . . . . . . . . . . . . . . 98, 123

        Crash analysis
          AC storage . . . . . . . . . . . . 107
          address translation  . . . . . . . 96
          APR interrupt context  . . . . . . 120
          ARPANET interrupt level  . . . . . 121
          BOOT . . . . . . . . . . . . . . . 85
          BUGHLT . . . . . . . . . . . . . . 106, 109
          Device interrupt context . . . . . 118
          DTE Interrupt context  . . . . . . 117
          DUMPs  . . . . . . . . . . . . . . 84
          EDDT . . . . . . . . . . . . . . . 93
          FILDDT . . . . . . . . . . . . . . 95
          front-end dumps  . . . . . . . . . 86
          general  . . . . . . . . . . . . . 103
          JSYS context . . . . . . . . . . . 109
          materials  . . . . . . . . . . . . 87
          MDDT . . . . . . . . . . . . . . . 90
          monitor locations  . . . . . . . . 104, 110, 115
          Pager context  . . . . . . . . . . 111
          PHYSIO context . . . . . . . . . . 118
          PSI context  . . . . . . . . . . . 114
          Scheduler context  . . . . . . . . 115
          stacks . . . . . . . . . . . . . . 107

        DDT
          EDDT . . . . . . . . . . . . . . . 93
          FILDDT . . . . . . . . . . . . . . 49, 95
          MDDT . . . . . . . . . . . . . . . 90
          patching TOPS-20 . . . . . . . . . 16
          Tricks . . . . . . . . . . . . . . 101
          UDDT . . . . . . . . . . . . . . . 89
        Directories
          Mapping in MDDT  . . . . . . . . . 20
          problems . . . . . . . . . . . . . 20, 25, 36
        Disk debugging . . . . . . . . . . . 49

                                                                    Page Index-2



        Disk parameters  . . . . . . . . . . 45, 51

        EXEC debugging . . . . . . . . . . . 130
          BOOT . . . . . . . . . . . . . . . 136

        FKSTAT . . . . . . . . . . . . . . . 53
        Floppy disks
          copying  . . . . . . . . . . . . . 155

        GALAXY debugging . . . . . . . . . . 137
          CDRIVE . . . . . . . . . . . . . . 149
          crash files  . . . . . . . . . . . 150
          GLXLIB . . . . . . . . . . . . . . 148
          MOUNTR . . . . . . . . . . . . . . 151
          ORION  . . . . . . . . . . . . . . 149
          private GALAXY . . . . . . . . . . 138
          QMANGR . . . . . . . . . . . . . . 149
          QUASAR . . . . . . . . . . . . . . 149
          stopcodes  . . . . . . . . . . . . 148

        Hardware
          deficiencies . . . . . . . . . . . 71
          disk parameters  . . . . . . . . . 45, 51
          tape parameters  . . . . . . . . . 45, 52
        Hung Jobs  . . . . . . . . . . . . . 35
        Hung SETSPD  . . . . . . . . . . . . 36
        Hung tapes . . . . . . . . . . . . . 41
        Hung terminals . . . . . . . . . . . 35

        Job Zero . . . . . . . . . . . . . . 36, 56, 58, 70, 99, 108
        JSB  . . . . . . . . . . . . . . . . 27, 96-97, 110

        KS-10
          8080 information . . . . . . . . . 77
          BOOT errors  . . . . . . . . . . . 78
          console information  . . . . . . . 74
          microcode  . . . . . . . . . . . . 76

        Legal policy . . . . . . . . . . . . 5

        MDDT Operations  . . . . . . . . . . 90, 101
          Address break  . . . . . . . . . . 32
          Breakpoints  . . . . . . . . . . . 30
          CST access . . . . . . . . . . . . 122
          directory mapping  . . . . . . . . 20
          JSB and PSB Mapping  . . . . . . . 27
          magtapes . . . . . . . . . . . . . 41
          MAPDIR . . . . . . . . . . . . . . 20
          MSETMP . . . . . . . . . . . . . . 27
        Monitor address space
          CSTs . . . . . . . . . . . . . . . 122
          Job Zero forks . . . . . . . . . . 70
          PSECTs . . . . . . . . . . . . . . 68
          sections . . . . . . . . . . . . . 67

                                                                    Page Index-3



        Monitor building . . . . . . . . . . 69, 125
          POSTLD . . . . . . . . . . . . . . 128
          release 6  . . . . . . . . . . . . 128
        Monitor locations  . . . . . . . . . 97
          CDB  . . . . . . . . . . . . . . . 45
          CHNTAB . . . . . . . . . . . . . . 45
          CSTs . . . . . . . . . . . . . . . 122
          DBUGSW . . . . . . . . . . . . . . 94, 99
          DTE  . . . . . . . . . . . . . . . 117
          EDDTF  . . . . . . . . . . . . . . 94
          entry vector . . . . . . . . . . . 100
          KDB  . . . . . . . . . . . . . . . 45
          Page zero  . . . . . . . . . . . . 62
          Pager  . . . . . . . . . . . . . . 114
          scheduler  . . . . . . . . . . . . 115
          See also JSB, PSB
          UDB  . . . . . . . . . . . . . . . 45
        Monitor universal files  . . . . . . 69

        PA1050 debugging . . . . . . . . . . 154
        Page zero  . . . . . . . . . . . . . 62
        Patch area . . . . . . . . . . . . . 18, 105
        PCOs
          see SIRUS
        PSB  . . . . . . . . . . . . . . . . 27, 96-97, 110, 115

        Scheduler tests  . . . . . . . . . . 53
        SIRUS
          CHERRY . . . . . . . . . . . . . . 9
          commands . . . . . . . . . . . . . 9
          system . . . . . . . . . . . . . . 9
        SPR
          answers  . . . . . . . . . . . . . 9
          see SIRUS
          submission . . . . . . . . . . . . 6
        SWPMLK . . . . . . . . . . . . . . . 94
        SWSKIT files . . . . . . . . . . . . 5, 124, 157, 160
        SWSKIT programs  . . . . . . . . . . 5, 160
          ACTDMP . . . . . . . . . . . . . . 160
          CHANS  . . . . . . . . . . . . . . 48, 160
          DIRPNT . . . . . . . . . . . . . . 25, 40, 160
          DIRTST . . . . . . . . . . . . . . 25, 40, 160
          DMPTTY . . . . . . . . . . . . . . 35, 160
          DS . . . . . . . . . . . . . . . . 25, 40, 48, 160
          DSKERR . . . . . . . . . . . . . . 87, 160
          DX20PC . . . . . . . . . . . . . . 44, 160
          ENVIRONMENT  . . . . . . . . . . . 160
          JSTRAP . . . . . . . . . . . . . . 160
          MONRD  . . . . . . . . . . . . . . 29, 160
          MTEST  . . . . . . . . . . . . . . 160
          REV  . . . . . . . . . . . . . . . 160
          SWSERR . . . . . . . . . . . . . . 87, 98, 124, 160
          TYPVF7 . . . . . . . . . . . . . . 160
          UNITS  . . . . . . . . . . . . . . 48, 160

                                                                    Page Index-4



        System problems
          Hung jobs  . . . . . . . . . . . . 35, 53
          Hung SETSPD  . . . . . . . . . . . 36
          Hung tapes . . . . . . . . . . . . 41
          Hung terminals . . . . . . . . . . 35
          Trashed disks  . . . . . . . . . . 36

        Tape parameters  . . . . . . . . . . 52
        Trashed disks  . . . . . . . . . . . 36




        [END OF HANDBOOK]

        FOR DSR RUNOFF