PDP-10 Archive: swskit-v21/documentation/mcb-dump.doc from BB-H348C-RM

Trailing-Edge - PDP-10 Archives - BB-H348C-RM_1982 - swskit-v21/documentation/mcb-dump.doc

There are no other files named mcb-dump.doc in the archive.

		How to Read an Unformatted DN20 Dump
		------------------------------------

  The subject of this document is the dump file produced by the DUMP NODE com-
mand of OPR.  By far the best way to read this dump is to format it with the
MCBDA program. Note that MCBDA can also be used to examine an unrun system
image.  However, if this is not possible or if the information in
MCBDA is inadequate, it may be worthwhile to look at the unformatted dump.
In addition, much of the information given here on the data base formats and the
structure of the system is available nowhere else, and is therefore of general
use in understanding the DN20 software.

	MCBDA can be used to print any part of a dump in octal, rad50
and ASCII, in addition to all of the formatted reports it normally
generates. The /DUMP switch is used to do this. MCBDA can examine an
unrun system image, or a system dump file.

	DNDDT can be used to print any part of a running system, or
a system dump, in any one of a number of formats. DNDDT cannot
examine an unrun system image; it does not understand how to
bypass the 2000 bytes of task header. DNDDT, unlike MCBDA, can
write into a running system. To use DNDDT to examine or write into
a running system, give the DN20's node name when DNDDT is run. To
examine a dump, give a carriage return when DNDDT asks for node
name and then use DNDDT like FEDDT. See the DNDDT document.
Note that DNDDT cannot read the STB or MAP files available with
the MCB.


    The rest of this document will discuss how to read the dump.

Structure of the MCB
--------------------

	The MCB consists of:

1. RSX11S Operating System.

2. Communications Executive (Comm/Exec)
	RSX11S high priority fork process which acts as a
	sceduler for the Comm/Exec processes.

3. Comm/Exec Processes.

	The analysis of an MCB dump requires a basic knowledge of
RSX11S architecture, Comm/Exec architecture, and DECnet. This document
assumes the reader is already familiar with RSX11S.
	The following sections describe data structures used by
NSP and the Comm/Exec.

Buffer Management
-----------------

	The Communications Control Block (CCB) is a data structure
containing buffer pointers, status information, and function codes.
CCB's, pointing to buffers, are passed from process to process to
effect the movement of data through the DNA defined layers of
the DECnet architecture.

	The CCB contains buffer pointers, counts, and mapping
information. A number of different types of buffers can be used.
Receive Data Buffers (RDB's) are used by device drivers to accomodate
received messages. Large Data Buffers (LDB's) are used for the
construction of messages to be transmitted. Small Data Buffers (SDB's)
are used in about the same way as LDB's, but when the use of an
LDB would waste buffer space. If a message requires more than
one data buffer, the owning CCB's of multiple data buffers are chained
together. Note that LDB's and RDB's are always the same size and that LDB's
can be considered RDB's used for transmission.

Data Structure Definitions
--------------------------

	Almost all definitions of symbols and offsets are in macros
in NETLIB or CETLIB. All other modules invoke these macros in order
to use the offsets and symbols. Please refer to NETLIB whenever
you have a question about data structures which don't seem to be
defined in the listings or fiche.

Format of the Panic Dump Area
-----------------------------

    Since the DN20 dump does not include the registers or any of the rest of
the IO page, a panic dump program has been written to capture some of this
information in a core area.  This dump program, NETPAN, is invoked whenever
an error interrupt occurs.  It saves the information and HALTs, thus com-
pleting the DN20 crash.
    The NETPAN crash information begins at address $NPCOD (16774) in the dump,
and has the following format:

Address		Description
-------		-----------

16774		Error code; describes the type of interrupt
			0 - no crash detected (this can happen if a live system
			    is dumped).  All further crash dump locations will
			    be meaningless if this code is present.
			1 - CPU error (including odd address trap and nonexist-
			    ent memory) (vector 4)
			2 - Illegal instruction (undefined opcode, unimplemented
			    instruction, or privileged instruction in user mode)
			    (vector 10)
			3 - BPT instruction executed (vector 14)
			4 - IOT instruction executed (vector 20)
			5 - Powerfail (this can happen if a live system is
			    dumped) (vector 24)
			8 - Memory protection error (unmapped page or address
			    past page size accessed) (vector 250)
			9 - Memory parity error (vector 114)

16776		Register 0
17000		Register 1
17002		Register 2
17004		Register 3
17006		Register 4
17010		Register 5
17012		PC before interrupt (usually points past offending instruction)
17014		PS before interrupt (whether the system was in user or system
			state can be determined from this)
17016		Kernel SP before interrupt
17020		User SP before interrupt
		Kernel PAR's
		User PAR's
		SR0,SR1,SR2,SR3
		Parity Registers
		Kernel PDR's
		User PDR's
		Alternate register set

Finding Your Way Around
-----------------------

    The most important address in the dump, which will allow you to locate
just about everything else, is the beginning of module CETAB.  This is the
symbol $PDVTA and is located in the COMM EXEC (CEX).  The region at the
top of CETAB contains pointers to most of the rest of the system tables.  It
is documented in the CETAB assembly listing, and is also described below:
You can find the address of $PDVTA by reading CEX.MAP or using the
utility DUMPS and the CEX.STB file.

Symbol	Offset in CETAB		Description
------	---------------		-----------

$PDVTA		0		Address of PDV vector table $PDVTB
$SLTTA		2		Address of first System Line Table entry
$LLCTA		4		Address of LLC reverse mapping table
$PDVNM		6		Number of entries in $PDVTB
$SLTNM		8.		Number of SLT entries
$CCBNM	       10.		Number of CCBs in system
$CCBSZ	       12.		  and the size of each in bytes (28.)
$RDBNM	       14.		Number of RDBs in system
$RDBSZ	       16.		  and the size of each in bytes
$SDBNM	       18.		Number of SDBs in system
$SDBSZ	       20.		  and the size of each in bytes
$CCBCT	       22.		Number of free CCBs
$RDBCT	       24.		Number of free RDBs
$SDBCT	       26.		Number of free SDBs
$CCBAF	       28.		Number of CCB allocation failures
$RDBAF	       30.		Number of RDB allocation failures
$LDBAF	       32.		Number of LDB allocation failures (an LDB is
				  an RDB which can't be one of the last ones
				  left)
$SDBAF	       34.		Number of SDB allocation failures
$RDBTH	       36.		Number of RDBs which can't be LDBs
$XMTBF	       38.		XMITS/link to stack
$CMPDV	       40.		PDV index of current process (elements in
				  $PDVTB are indexed 0,2,...)
$CMFRK	       42.
	       44.		Address of fork process block (for MCB)
	       46.		Two word fork queue (CCBs waiting to be dis-
				  patched to their processes)
$CCBLH	       50.		One word pointer to first free CCB
$SDBLH	       52.		Two word pointer (page, virtual address)
				  to first free SDB (the first two words of the
				  SDB will point to the next one)
$RDBLH	       56.		One word pointer to CCB for first free RDB
				  (the CCBs chain together, and each points to
				  its RDB)
$NSPNM	       72.		Node number (established at NETGEN)
$NTNAM	       74.		Node name 
$HOST	       80.		Host name
$NODID	       86.		Length of local system ID 
	       88.		System ID 

Process Descriptor Vector (PDV)
-------------------------------

    Location $PDVTA points to the PDV Address Table, a list of addresses of
PDVs for MCB processes.  The offset of a process's address in the table (which
is of course an even number) is the process's PDV INDEX.
    The PDV for a process contains or points to all the data base information
for the process, including: the process's mapping bias, its three-character
ID, and its process data base address.  The format of a PDV can be found
in an assembler listing of CETAB. PDV's are located in the Comm/Exec (CEX).

Name	Offset		Description
----	------		-----------

Z.DSP	    0.		Dispatch table's mapping bias 
	    2.		Virtual address of process's dispatch table
Z.SCH	    4.		Process's priority
Z.NAM	    6.		Process's ID (3 characters, RAD-50)
Z.LLN	    8.		Number of logical lines for this process (LLC only)
Z.FLG	    9.		Flag word (the flags are described in CETAB.LST)
Z.PCB	   10.		Pointer to process's PCB (an RSX-11S pointer)
Z.DAT	   12.		Address of process's data base

Of these values, Z.DSP and Z.DAT are the most valuable.  Z.DAT allows you to
access the process's data base, which for NSP will contain the addresses of the
Logical Link Table, the Physical Line Table, the node data base, and the sta-
tistics data base (see below).

System Line Table (SLT)
-----------------------

    Location $SLTTA points to the first entry in the System Line Table.  This
table allows you to access the line data bases for each line in the system.  It
also contains flags describing the state of the line.
    The format of an SLT entry is as follows:

Name	Offset		Description
----	------		-----------

L.FLG	    0.		Line state flags (see below)
L.DDM	    2.		PDV index for DDM process (byte)
L.DLC	    3.		PDV index for DLC process (byte)
			NOTE: for combined DLC/DDM lines (DMC,DTE) L.DDM and
			L.DLC will be the same
L.DDS	    4.		DDM (driver) line table address
L.DLM	    6.		DLC line table mapping bias
L.DLS	    8.		DLC line table virtual address
L.CTL	   10.		Controller number (as in xxx_0_x) (byte)
L.UNT	   11.		Line number (as in xxx_x_1) (byte)

The state bits in L.FLG are:

Name	Value (octal)	Description
----	-------------	-----------

LF.BWT	    000007	Buffer wait queue count
LF.TIM	    000010	Line needs timer service
LF.MTP	    000020	Multipoint line (not supported in 3A)
LF.DLO	    000040	Dial-out line (not supported in 3A)
LF.MDC	    000100	Line needs modem control
LF.ENA	    002000	Line is to be enabled at init time
LF.MFL	    004000	Line is marked for load at init time
LF.REA	    010000	Line may be reassigned to another LLC (not supported
			  for 3A)
LF.UNL	    020000	Line is marked for unload
LF.RDY	    040000	Line is ready (enabled and loaded)
LF.ACT	    100000	Line is active (ready and assigned to an LLC)

NSP Data Base
-------------

    The data base for the NSP process is addressed from the PDV for NSP.  It
contains most of the run-time data used to control logical links.  The format
of the NSP data base is as follows:

Name	Offset		Description
----	------		-----------

N$ACQ	    0.		Two-word queue header of CCBs to be passed to NETACP
N$TCB	    4.		Address of TCB for NETACP
N$ICF	    6.		Address of physical link table entry (see below) for
			the line to the node which is acting as intercept node
			for this node (i.e., the other DN20 in a 2050-2050
			configuration), or 0 for none
N$LNI	    8.		Address of physical link table entry for the line which
			is serving as loopback line for this node (i.e., the
			line over which all intra-node logical link traffic is
			sent), or 0 for none
N$TMP	   10.		Temporary parameter area (4 words)
N$LVC	   18.		Number of LLT entries
	   20.		Address of first LLT entry
N$PLD	   22.		Number of PLT entries
	   24.		Address of first PLT entry
N$NOD	   26.		Number of node table entries
	   28.		Address of first node table entry
N$VER	   30.		  (1)
	   32.		Address of password table
N$STS	   34.		Number of statistics table entries
	   36.		Address of the first statistic table entry

Logical Link Table
------------------

    Location N$LVC+2 in the NSP data base points to the logical link table (LLT)
address list.  This list is indexed by the LLA of the logical link (low-order
byte only).  Each list entry points to an LLT element.
    In the table description below, an "intercept link" is one for which the
task or process on each end of the link is located in another node, and the
DN20 acts only as a route-through mechanism.  A "non-intercept link" has one
or more ends located in the DN20.  An intercept link uses two LLT entries, one
for each physical line the link uses.  The whole intercept link therefore is
modelled in the DN20 by the two LLTs, each of which is attached to another node
by a physical line and to the other LLT by an imaginary line known as the
"network" (which for a real routing implementation could contain multiple
connected routing nodes).  The node on the other end of a LLT's physical line
is called the "adjacent node" for the LLT, while the other LLT's adjacent node
is the "remote node".
    The format of an LLT for an intercept NSP such as the DN20's is as follows:
Note that there are two sets of symbol names, I.xxx and L.xxx, one
for intercept links, and one for all links, respectively.

Name	Offset		Description
----	------		-----------

L.STA	    0.		State of the link (byte):
					0 - invalid
				ST$CIS	1 - Connect Initiate Sent
				ST$CC	2 - Connect Confirm sent, waiting to
					    complete
				ST$CIR	3 - Connect Initiate Received
					4 - invalid
				ST$DAT	5 - Data state: link up and running
				ST$DIP	6 - Disconnect In Progress
L.LVL	    1.		Link level and type (byte):
				LF.RSU	100000	Resource recovery required
				LF.LCL	40000	Local Logical Link
				LF.INT	400	Intercept Link
				LF.DSP	40	Stop data flow
				LF.DST	20	Start Data Flow
				LF.FPN	10	Flow notification to local
						  process pending
				LF.NPN	1	NAK complete to local process
						  pending
L.LLA	    2.		Logical link address for this end of the link
L.TIPI	    4.		Transmits in progress on I/LS channel (byte)
L.TIPD	    5.		Transmits in progress on data channel (byte)
L.REM	    6.		Intercept link:  pointer to node data base entry for
			  remote node
			Non-intercept link: pointer to entry for adjacent node
L.RLA	    8.		Intercept link: LLA for other intercept LLT
			Non-intercept link: remote LLA
L.FLG	   10.		More link flags:
			   LF.INR 000100 - NSP has received an I/LS NAK
			   LF.NKR 000200 - Data NAK received by NSP
			   LF.NKS 000400 - Data NAK sent
			   LF.HF0 001000 - flow closed on this link (transmit)
					     Link is backpressured
			   LF.NTS 002000 - data or I/LS received while in CIS
				    state - must send NAKS
			   LF.HSF 010000 - segment flow controlled other end
				     of link (transmit)
			   LF.HMF 020000 - message flow controlled other end
				    of link (transmit)
			   LF.MSF 040000 - segment flow controlled this end
					of link (receive)
			   LF.MMF 100000 - message flow controlled this end
				    of link (receive)
L.NXN	   12.		Next data segment number to be sent
L.NIN	   14.		Next I/LS segment number to be sent
L.RNO	   16.		Next data segment number to be received
L.LNO	   18.		Next I/LS segment number to be received
L.LDA	   20.		Last data segment acknowledged by remote end
		OR
L.USTA	   20.		Disconnect substate for the user (byte):
			  ST$UNR 11 - User notification of disconnect required
			  ST$DIR 12 - Disconnect received from network
			  ST$UDI 13 - Disconnect received from user
			  ST$DIS 14 - Disconnect sent (DC response required)
			  ST$DID 15 - Disconnect done (No response required)
L.NSTA	   21.		Disconnect substate for the network (byte):
L.LIA	   22.		Last I/LS segment acknowledged by remote end
L.CIQ	   22.		Connect initiate and connect confirm pending queue
L.USA	   24.		Non-intercept node only: last data segment number ac-
			knowledged by the user
L.LSA	   26.		Non-intercept node only: last I/LS segment number ac-
			knowledged by local NSP
L.NDA	   28.		Number of data segments to acknowledge
L.UDQ	   28.		Disconnect initiate pending queue
L.NLA	   30.		Number of I/LS segments to acknowledge
L.DCR	   30.		Disconnect reason code
		For intercept links:
I.ILA	   32.		LLA for adjacent end of link
I.IREM	   34.		Pointer to node data base entry for adjacent node
		For non-intercept links:
L.ULA	   32.		ULA (User Link Address) for user task or 0 for MCB
			processes (byte)
L.PDV	   33.		PDV index for user MCB process or 0 for user task
L.TC	   34.		Current flow control count for outbound data messages
			(byte)
L.TIC	   35.		Current flow control count for outbound I/LS messages
			(byte)
L.LSF	   36.		Flow control request count status
L.XQ1	   38.		Queue for storing buffers waiting for transmit
			  (because of stopped flow control or a NAK)
L.XQ2	   40.		Queue for storing data buffers waiting to be ACKed by
			  the other end
L.XQ3	   42.		Queue for storing I/LS buffers waiting to be ACKed by
			  the other end

Physical Link Table
-------------------

    Location N$PLD+2 in the NSP data base points to the first element of the
physical link table.  There is one entry in this table for each active line
which is controlled by NSP.  It is indexed by the NSP logical line number for
the line.  The contents of a PLT entry are as follows:

Name	Offset		Description
----	------		-----------

P$LST	    0.		Line state:
			  PS$OFF 0 - idle
			  PS$STR 1 - DDCMP being started
			  PS$WT  2 - waiting for a Node Init
			  PS$NTI 3 - Node Init received
			  PS$VER 4 - waiting for Node Verify
			  PS$UP  5 - line up and Node Initialized
	    1.		Line flags:
			  PF$OFF 0 Off state desired
			  PF$ON  1 On state desired
			  PF$STA 3 State flag mask
			  PF$ENB 200 Link is enabled
			  PF$EIP 100 control enable function in progress
			  PF$RVR  40 verification requested when node init sent
P$LCD	    2.		Recovery flags
			  RF.CTL 3 COUNT OF OUTSTANDING CONTROL REQUESTS
			  RF.CLN 0 LOGICAL LINK CLEAN UP REQUIRED
			  RF.WTM 30 MASK FOR ALLOCATION FAILURE INTO WAIT STATE
			  RF.WTS 10 STOP FUNCTION FLAG
			  RF.WTD 20 DISABLE FUNCTION FLAG
			  RF.TIM 377*400 MASK FOR TIMEOUT VALUE
			  RF.TM0 1*400 INITIAL TIMEOUT VALUE
P$TIM       3.		Recovery timer
P$CHN	    4.		NSP logical line number for line
P$CNT       5.		Number of messages queued
P$PFQ	    6.		Address of a CCB for a pending control function on line
P$NOD	    8.		Address of the node table entry for the node on the
			  other end of the line
P$FRQ	   10.		Functions requested
P$FSP	   12.		Functions requested
P$LEN	   14.		Length of physical link table
 
 

Node Table
----------

    Location P$NOD+2 in the NSP data base points to the first entry in the node
table.  Each entry describes a node known to NSP.  The format of a node table
entry is as follows:

Name	Offset		Description
----	------		-----------

D$FLG	    0.		Node flags:
			  004000 - node is being removed and isn't really in
				   the table
			  010000 - node is a remote node, i.e., is not local
				   or adjacent
			  020000 - node is an adjacent node, i.e., is connected
				   to the local node by a direct physical link
			  040000 - node is the local node
			  100000 - temporary node entry; used to send a DC to
				   a remote node newly connecting, when there is
				   no room in the node data base for the node's
				   entry
D$CHN	    2.		Pointer to PLT entry for line over which this node may
			be accessed
D$USE	    4.		Number of logical links to this node (byte)
D$LNG	    5.		Length in bytes of the node name (byte)
D$NAM	    6.		Node name in ASCII (1 to 6 bytes; right-padded with
			garbage)

Verification Data Base
----------------------

    Location N$VER+2 in the NSP data base points to the verification data base,
which contains the transmit and receive passwords.  The transmit password is
used to build a Node Verification message to any node which requests one.  The
receive password is checked against the Node Verification message the DN20
requests from another node.  (The released 3A DN20 system has no receive pass-
word, and consequently does not request verification.)  The format of the ver-
ification data base is as follows:

Name	Offset		Description
----	------		-----------

V$FLG	    0.		Flags:
			  040000 - a transmit password exists
			  100000 - a receive password exists
V$RCV	    2.		8-byte receive password or binary zeroes if none
V$XMT	   10.		8-byte transmit password or binary zeroes if none

Statistics Data Base
--------------------

    Location N$STS+2 in the NSP data base points to the statistics data base,
which contains node counters for the local nodes.  It is this information which
may be requested by the NCP SHOW COUNTS LOCAL command (not implemented in OPR
for 4, but available from the NCU program on the SWS kit).  The format of the
statistics data base is as follows:

Name	Offset		Description
----	------		-----------

S$SEC	    0.		Seconds since the data base was zeroed
S$UMS	    2.		Number of user messages sent
S$UMR	    4.		Number of user messages received
S$EMR	    6.		Number of extraneous messages received
S$NKS	    8.		Number of NAKs sent
S$FMT	   10.		Number of format errors
S$RES	   12.		Number of resource allocation failures
S$SNI	   14.		Number of successful node initializations
S$UNI	   16.		Number of unsuccessful node initializations
S$LNK	   18.		Current number of active logical links
S$MLK	   20.		Maximum number of active logical links
	   22.		10 bytes of unused statistics for expansion

Debugging with an Unformatted Dump
----------------------------------

    This is not meant to be the "only way" to use an unformatted dump, but
rather "a" way which has been used with varying degrees of success.
    The first thing to find out is where the crash occurred.  The relevant
data may be found in the crash dump area.  If the PS has the two high-order
bits off, the crash occurred in the executive (which for this purpose includes
all of MCB).  In this case, if the PC is less than 120000, it is equal to the
physical address of the crash; the module this address is in will be found in
either EXEC.MAP or CEX.MAP.  If the PC is in the 120000-137777 range, it is
in an MCB process; PAR 5 may be examined to determine which process, and the
errant module will be found in that process's map.
    If the crash PS has the two high-order bits on, the crash occurred in a
user program (NICE or the 72-hour Test ).  Location $TKTCB in the
executive  points to the TCB for the active task.
    If you are in either NETACP or the NSP process, register 5 should point
to the current CCB being acted on.  Register 0 may be pointing to the current
LLT entry; register 4 will probably be pointing to the place in the NSP message
which is being examined.  The first buffer descriptor in the CCB will be point-
ing to MSGFLGS in the message; the second descriptor will point to the beginning
of the message (either MSGFLGS or the routing header if there is one).
    If you are in any other MCB process, register 4 should point to the current
CCB.

			[END]