PDP-10 Archive: swskit-v21/documentation/decnet-monitor-hints.mem from BB-H348C-RM

Trailing-Edge - PDP-10 Archives - BB-H348C-RM_1982 - swskit-v21/documentation/decnet-monitor-hints.mem

There are no other files named decnet-monitor-hints.mem in the archive.



      ---------------
      |d|i|g|i|t|a|l|                   INTEROFFICE MEMORANDUM
      ---------------


      TO: DECnet Users                  DATE: 24-Jun-82
                                        FROM: TSG/NCSS
                                        LOC/MAIL STOP: MR1-2/H22



      SUBJ:  Monitor Considerations for DECnet




      The following is a list of several problems which have  been  seen
      (usually  rarely) in-house, plus some hints for using SYSDPY (from
      the tools tape) and MONRD (from the monitor SWSkit) for looking at
      DECnet problems.



      1.0  NFT PROBLEMS


      1.  NFT  sometimes  hangs  while  transferring  big  files.   This
          problem  is  not reproducible and, in fact, patches to the MCB
          code made during field test may have fixed this problem  as  a
          side  effect.  A way around the problem is to simply control-C
          out of NFT and try the transfer again.

          When the problem has been seen, the job was usually hung at an
          I/O  JSYS  (SINR  or  SOUTR).  In the monitor, location FKSTAT
          plus the fork number shows that the scheduler  test  on  which
          the  fork  is  waiting  is  CHKRAW.   The  fork  number can be
          obtained from SYSDPY by using its Jn  command  while  enabled,
          where n is the number of the job running NFT.

          Please SPR this problem if you see it.

                                                                  Page 2


      2.0  NETCON PROBLEMS

      On a busy system one can see problems restarting NETCON.  You  may
      get  an  error  on the order of "?Unable to create server topology
      link".  (This is not the  exact  error  message.)  There  are  two
      possible causes of this problem.

      The first cause is that there is not enough swappable  free  space
      in the monitor for network use.  To check this, use the SYSDPY RES
      command.  The swappable DECnet core will be 100%  used.   (Typical
      in-house  figures  range  from  60 to 80 percent.) To double-check
      this figure, get into the running monitor with FILDDT.  The  value
      of  MAXBLK is the total amount of memory available to DECnet.  The
      location BLKASG shows the  amount  currently  in  use.   The  only
      solution is to cut down on the number of logical links in use.  If
      the problem happens a lot, the monitor can be reassembled  with  a
      larger  value  specified  for MAXBLK.  This variable is defined in
      STG.

      The second cause is that the JSB space for the job running  NETCON
      is  full.  To check to see if this is your problem, use the SYSDPY
      Jn command to see what fork NETCON is running in.  Then use  MONRD
      from the monitor SWSkit and use the commands

           MONRD>FORK n
           MONRD>TYP JBCOR JBCOR+1

      JBCOR and the left half of JBCOR + 1 is a bit-map of free pages in
      the JSB.  If few or no bits are set, this is your problem.  NETCON
      needs 10 pages just to start up.

      One bug in the PRARG JSYS which was causing excess JSB space to be
      used  has  been fixed in release 4, so this problem should be less
      likely to occur than in the past.

      Because NETCON uses a lot of JSB space, it should always be run in
      its own job.  Run it under PTYCON, not under SYSJOB.  If you still
      run out of JSB space, please send an SPR.

      Note that NETCON is good at exercising monitor problems because it
      is  big and does a lot of things.  Therefore, it is quite possible
      that some NETCON bugs which pop up will be due  to  monitor  bugs,
      not NETCON bugs.



      3.0  INFORMATION DECNET COMMAND

      Sometimes this command will  not  show  all  the  nodes  that  are
      available  in  the  network.   The problem arises when the network
      topology changes and for some reason the DN20 is not able to  tell
      the  KL about it.  The way around is to wait a few minutes.  Every
      ten  minutes  NETCON  asks  the  DN20  for  the  node  information
      explicitly and corrects the node data base.

                                                                  Page 3


      4.0  SYSDPY HINTS

      SYSDPY's DN command shows all the logical links which exist in the
      system.   The DNA command shows only those links which are active.
      It leaves out all servers waiting for a remote program to talk  to
      them (CI wait).

      The HC command shows the columns you can  type  out  with  the  DN
      command.   The  LINK-ID  column is useful if a program is having a
      problem establishing a local link  (or  any  other  problems  with
      task-to-task)  and  you would like to look at the data base in the
      monitor.

      First use the C command to add the LINK-ID column to the DN or DNA
      display.  Find the link id of the program that is having problems.
      Write this down somewhere, exit from SYSDPY, and run MDDT as shown
      below.



      5.0  FINDING MONITOR LINK INFORMATION

      The link id from SYSDPY must be converted  into  a  pointer  to  a
      logical   link  block,  the  data  structure  in  which  the  link
      information is kept.   To  do  this,  execute  monitor  subroutine
      LLLKUP as shown here.

      $MDDT
      MDDT
      1/   DSKOPB#+6   26014     ;In AC1, put link id from SYSDPY
      T2/   BLKI 370,400000   -1 ;Put -1 in AC2
      CALL LLLKUP$X              ;Call the monitor routine
      <SKIP>

      1/   733275          ;AC1 now holds the logical link block address
      733275/   0          ;Look at the first few words of the block
      733276/   100000,,733205   
      733277/   GTCUB3#+2,,MLKCP+1   =40640,,26014   ^Z
      $

      The fields in the logical link block are defined in monitor module
      NSPPAR.MAC.   One of the most interesting words is word two, which
      contains the state of  the  link.   If  a  program  is  hung,  the
      contents  of this word can give you a pretty good idea of what the
      program is waiting for.  Other words point to queues  of  messages
      received  or  of  messages  sent  and  waiting to be acknowledged.
      Check the listing  or  your  monitor  fiche  of  NSPPAR  for  more
      details.

                                                                  Page 4


      6.0  OTHER MONITOR NETWORK DATA

      Location NODTBL in the monitor points  to  the  table  of  network
      nodes.  After a header word, this table consists of entries of the
      form addr1,,addr2 where addr1  is  the  address  of  the  nodename
      string.   Addr2 is the address of a "nearer" node.  This node will
      be the DN20 for all nodes except the TOPS-20 and  DN20  nodes,  in
      which  case  it  will  be  zero.   This  is sort of a routing node
      (except the tables will  be  completely  redone  when  routing  is
      really implemented).