Trailing-Edge
-
PDP-10 Archives
-
BB-H348C-RM_1982
-
swskit-v21/documentation/decnet-monitor-hints.mem
There are no other files named decnet-monitor-hints.mem in the archive.
---------------
|d|i|g|i|t|a|l| INTEROFFICE MEMORANDUM
---------------
TO: DECnet Users DATE: 24-Jun-82
FROM: TSG/NCSS
LOC/MAIL STOP: MR1-2/H22
SUBJ: Monitor Considerations for DECnet
The following is a list of several problems which have been seen
(usually rarely) in-house, plus some hints for using SYSDPY (from
the tools tape) and MONRD (from the monitor SWSkit) for looking at
DECnet problems.
1.0 NFT PROBLEMS
1. NFT sometimes hangs while transferring big files. This
problem is not reproducible and, in fact, patches to the MCB
code made during field test may have fixed this problem as a
side effect. A way around the problem is to simply control-C
out of NFT and try the transfer again.
When the problem has been seen, the job was usually hung at an
I/O JSYS (SINR or SOUTR). In the monitor, location FKSTAT
plus the fork number shows that the scheduler test on which
the fork is waiting is CHKRAW. The fork number can be
obtained from SYSDPY by using its Jn command while enabled,
where n is the number of the job running NFT.
Please SPR this problem if you see it.
Page 2
2.0 NETCON PROBLEMS
On a busy system one can see problems restarting NETCON. You may
get an error on the order of "?Unable to create server topology
link". (This is not the exact error message.) There are two
possible causes of this problem.
The first cause is that there is not enough swappable free space
in the monitor for network use. To check this, use the SYSDPY RES
command. The swappable DECnet core will be 100% used. (Typical
in-house figures range from 60 to 80 percent.) To double-check
this figure, get into the running monitor with FILDDT. The value
of MAXBLK is the total amount of memory available to DECnet. The
location BLKASG shows the amount currently in use. The only
solution is to cut down on the number of logical links in use. If
the problem happens a lot, the monitor can be reassembled with a
larger value specified for MAXBLK. This variable is defined in
STG.
The second cause is that the JSB space for the job running NETCON
is full. To check to see if this is your problem, use the SYSDPY
Jn command to see what fork NETCON is running in. Then use MONRD
from the monitor SWSkit and use the commands
MONRD>FORK n
MONRD>TYP JBCOR JBCOR+1
JBCOR and the left half of JBCOR + 1 is a bit-map of free pages in
the JSB. If few or no bits are set, this is your problem. NETCON
needs 10 pages just to start up.
One bug in the PRARG JSYS which was causing excess JSB space to be
used has been fixed in release 4, so this problem should be less
likely to occur than in the past.
Because NETCON uses a lot of JSB space, it should always be run in
its own job. Run it under PTYCON, not under SYSJOB. If you still
run out of JSB space, please send an SPR.
Note that NETCON is good at exercising monitor problems because it
is big and does a lot of things. Therefore, it is quite possible
that some NETCON bugs which pop up will be due to monitor bugs,
not NETCON bugs.
3.0 INFORMATION DECNET COMMAND
Sometimes this command will not show all the nodes that are
available in the network. The problem arises when the network
topology changes and for some reason the DN20 is not able to tell
the KL about it. The way around is to wait a few minutes. Every
ten minutes NETCON asks the DN20 for the node information
explicitly and corrects the node data base.
Page 3
4.0 SYSDPY HINTS
SYSDPY's DN command shows all the logical links which exist in the
system. The DNA command shows only those links which are active.
It leaves out all servers waiting for a remote program to talk to
them (CI wait).
The HC command shows the columns you can type out with the DN
command. The LINK-ID column is useful if a program is having a
problem establishing a local link (or any other problems with
task-to-task) and you would like to look at the data base in the
monitor.
First use the C command to add the LINK-ID column to the DN or DNA
display. Find the link id of the program that is having problems.
Write this down somewhere, exit from SYSDPY, and run MDDT as shown
below.
5.0 FINDING MONITOR LINK INFORMATION
The link id from SYSDPY must be converted into a pointer to a
logical link block, the data structure in which the link
information is kept. To do this, execute monitor subroutine
LLLKUP as shown here.
$MDDT
MDDT
1/ DSKOPB#+6 26014 ;In AC1, put link id from SYSDPY
T2/ BLKI 370,400000 -1 ;Put -1 in AC2
CALL LLLKUP$X ;Call the monitor routine
<SKIP>
1/ 733275 ;AC1 now holds the logical link block address
733275/ 0 ;Look at the first few words of the block
733276/ 100000,,733205
733277/ GTCUB3#+2,,MLKCP+1 =40640,,26014 ^Z
$
The fields in the logical link block are defined in monitor module
NSPPAR.MAC. One of the most interesting words is word two, which
contains the state of the link. If a program is hung, the
contents of this word can give you a pretty good idea of what the
program is waiting for. Other words point to queues of messages
received or of messages sent and waiting to be acknowledged.
Check the listing or your monitor fiche of NSPPAR for more
details.
Page 4
6.0 OTHER MONITOR NETWORK DATA
Location NODTBL in the monitor points to the table of network
nodes. After a header word, this table consists of entries of the
form addr1,,addr2 where addr1 is the address of the nodename
string. Addr2 is the address of a "nearer" node. This node will
be the DN20 for all nodes except the TOPS-20 and DN20 nodes, in
which case it will be zero. This is sort of a routing node
(except the tables will be completely redone when routing is
really implemented).