Trailing-Edge
-
PDP-10 Archives
-
BB-H348C-RM_1982
-
swskit-v21/documentation/mcb-dump.doc
There are no other files named mcb-dump.doc in the archive.
How to Read an Unformatted DN20 Dump
------------------------------------
The subject of this document is the dump file produced by the DUMP NODE com-
mand of OPR. By far the best way to read this dump is to format it with the
MCBDA program. Note that MCBDA can also be used to examine an unrun system
image. However, if this is not possible or if the information in
MCBDA is inadequate, it may be worthwhile to look at the unformatted dump.
In addition, much of the information given here on the data base formats and the
structure of the system is available nowhere else, and is therefore of general
use in understanding the DN20 software.
MCBDA can be used to print any part of a dump in octal, rad50
and ASCII, in addition to all of the formatted reports it normally
generates. The /DUMP switch is used to do this. MCBDA can examine an
unrun system image, or a system dump file.
DNDDT can be used to print any part of a running system, or
a system dump, in any one of a number of formats. DNDDT cannot
examine an unrun system image; it does not understand how to
bypass the 2000 bytes of task header. DNDDT, unlike MCBDA, can
write into a running system. To use DNDDT to examine or write into
a running system, give the DN20's node name when DNDDT is run. To
examine a dump, give a carriage return when DNDDT asks for node
name and then use DNDDT like FEDDT. See the DNDDT document.
Note that DNDDT cannot read the STB or MAP files available with
the MCB.
The rest of this document will discuss how to read the dump.
Structure of the MCB
--------------------
The MCB consists of:
1. RSX11S Operating System.
2. Communications Executive (Comm/Exec)
RSX11S high priority fork process which acts as a
sceduler for the Comm/Exec processes.
3. Comm/Exec Processes.
The analysis of an MCB dump requires a basic knowledge of
RSX11S architecture, Comm/Exec architecture, and DECnet. This document
assumes the reader is already familiar with RSX11S.
The following sections describe data structures used by
NSP and the Comm/Exec.
Buffer Management
-----------------
The Communications Control Block (CCB) is a data structure
containing buffer pointers, status information, and function codes.
CCB's, pointing to buffers, are passed from process to process to
effect the movement of data through the DNA defined layers of
the DECnet architecture.
The CCB contains buffer pointers, counts, and mapping
information. A number of different types of buffers can be used.
Receive Data Buffers (RDB's) are used by device drivers to accomodate
received messages. Large Data Buffers (LDB's) are used for the
construction of messages to be transmitted. Small Data Buffers (SDB's)
are used in about the same way as LDB's, but when the use of an
LDB would waste buffer space. If a message requires more than
one data buffer, the owning CCB's of multiple data buffers are chained
together. Note that LDB's and RDB's are always the same size and that LDB's
can be considered RDB's used for transmission.
Data Structure Definitions
--------------------------
Almost all definitions of symbols and offsets are in macros
in NETLIB or CETLIB. All other modules invoke these macros in order
to use the offsets and symbols. Please refer to NETLIB whenever
you have a question about data structures which don't seem to be
defined in the listings or fiche.
Format of the Panic Dump Area
-----------------------------
Since the DN20 dump does not include the registers or any of the rest of
the IO page, a panic dump program has been written to capture some of this
information in a core area. This dump program, NETPAN, is invoked whenever
an error interrupt occurs. It saves the information and HALTs, thus com-
pleting the DN20 crash.
The NETPAN crash information begins at address $NPCOD (16774) in the dump,
and has the following format:
Address Description
------- -----------
16774 Error code; describes the type of interrupt
0 - no crash detected (this can happen if a live system
is dumped). All further crash dump locations will
be meaningless if this code is present.
1 - CPU error (including odd address trap and nonexist-
ent memory) (vector 4)
2 - Illegal instruction (undefined opcode, unimplemented
instruction, or privileged instruction in user mode)
(vector 10)
3 - BPT instruction executed (vector 14)
4 - IOT instruction executed (vector 20)
5 - Powerfail (this can happen if a live system is
dumped) (vector 24)
8 - Memory protection error (unmapped page or address
past page size accessed) (vector 250)
9 - Memory parity error (vector 114)
16776 Register 0
17000 Register 1
17002 Register 2
17004 Register 3
17006 Register 4
17010 Register 5
17012 PC before interrupt (usually points past offending instruction)
17014 PS before interrupt (whether the system was in user or system
state can be determined from this)
17016 Kernel SP before interrupt
17020 User SP before interrupt
Kernel PAR's
User PAR's
SR0,SR1,SR2,SR3
Parity Registers
Kernel PDR's
User PDR's
Alternate register set
Finding Your Way Around
-----------------------
The most important address in the dump, which will allow you to locate
just about everything else, is the beginning of module CETAB. This is the
symbol $PDVTA and is located in the COMM EXEC (CEX). The region at the
top of CETAB contains pointers to most of the rest of the system tables. It
is documented in the CETAB assembly listing, and is also described below:
You can find the address of $PDVTA by reading CEX.MAP or using the
utility DUMPS and the CEX.STB file.
Symbol Offset in CETAB Description
------ --------------- -----------
$PDVTA 0 Address of PDV vector table $PDVTB
$SLTTA 2 Address of first System Line Table entry
$LLCTA 4 Address of LLC reverse mapping table
$PDVNM 6 Number of entries in $PDVTB
$SLTNM 8. Number of SLT entries
$CCBNM 10. Number of CCBs in system
$CCBSZ 12. and the size of each in bytes (28.)
$RDBNM 14. Number of RDBs in system
$RDBSZ 16. and the size of each in bytes
$SDBNM 18. Number of SDBs in system
$SDBSZ 20. and the size of each in bytes
$CCBCT 22. Number of free CCBs
$RDBCT 24. Number of free RDBs
$SDBCT 26. Number of free SDBs
$CCBAF 28. Number of CCB allocation failures
$RDBAF 30. Number of RDB allocation failures
$LDBAF 32. Number of LDB allocation failures (an LDB is
an RDB which can't be one of the last ones
left)
$SDBAF 34. Number of SDB allocation failures
$RDBTH 36. Number of RDBs which can't be LDBs
$XMTBF 38. XMITS/link to stack
$CMPDV 40. PDV index of current process (elements in
$PDVTB are indexed 0,2,...)
$CMFRK 42.
44. Address of fork process block (for MCB)
46. Two word fork queue (CCBs waiting to be dis-
patched to their processes)
$CCBLH 50. One word pointer to first free CCB
$SDBLH 52. Two word pointer (page, virtual address)
to first free SDB (the first two words of the
SDB will point to the next one)
$RDBLH 56. One word pointer to CCB for first free RDB
(the CCBs chain together, and each points to
its RDB)
$NSPNM 72. Node number (established at NETGEN)
$NTNAM 74. Node name
$HOST 80. Host name
$NODID 86. Length of local system ID
88. System ID
Process Descriptor Vector (PDV)
-------------------------------
Location $PDVTA points to the PDV Address Table, a list of addresses of
PDVs for MCB processes. The offset of a process's address in the table (which
is of course an even number) is the process's PDV INDEX.
The PDV for a process contains or points to all the data base information
for the process, including: the process's mapping bias, its three-character
ID, and its process data base address. The format of a PDV can be found
in an assembler listing of CETAB. PDV's are located in the Comm/Exec (CEX).
Name Offset Description
---- ------ -----------
Z.DSP 0. Dispatch table's mapping bias
2. Virtual address of process's dispatch table
Z.SCH 4. Process's priority
Z.NAM 6. Process's ID (3 characters, RAD-50)
Z.LLN 8. Number of logical lines for this process (LLC only)
Z.FLG 9. Flag word (the flags are described in CETAB.LST)
Z.PCB 10. Pointer to process's PCB (an RSX-11S pointer)
Z.DAT 12. Address of process's data base
Of these values, Z.DSP and Z.DAT are the most valuable. Z.DAT allows you to
access the process's data base, which for NSP will contain the addresses of the
Logical Link Table, the Physical Line Table, the node data base, and the sta-
tistics data base (see below).
System Line Table (SLT)
-----------------------
Location $SLTTA points to the first entry in the System Line Table. This
table allows you to access the line data bases for each line in the system. It
also contains flags describing the state of the line.
The format of an SLT entry is as follows:
Name Offset Description
---- ------ -----------
L.FLG 0. Line state flags (see below)
L.DDM 2. PDV index for DDM process (byte)
L.DLC 3. PDV index for DLC process (byte)
NOTE: for combined DLC/DDM lines (DMC,DTE) L.DDM and
L.DLC will be the same
L.DDS 4. DDM (driver) line table address
L.DLM 6. DLC line table mapping bias
L.DLS 8. DLC line table virtual address
L.CTL 10. Controller number (as in xxx_0_x) (byte)
L.UNT 11. Line number (as in xxx_x_1) (byte)
The state bits in L.FLG are:
Name Value (octal) Description
---- ------------- -----------
LF.BWT 000007 Buffer wait queue count
LF.TIM 000010 Line needs timer service
LF.MTP 000020 Multipoint line (not supported in 3A)
LF.DLO 000040 Dial-out line (not supported in 3A)
LF.MDC 000100 Line needs modem control
LF.ENA 002000 Line is to be enabled at init time
LF.MFL 004000 Line is marked for load at init time
LF.REA 010000 Line may be reassigned to another LLC (not supported
for 3A)
LF.UNL 020000 Line is marked for unload
LF.RDY 040000 Line is ready (enabled and loaded)
LF.ACT 100000 Line is active (ready and assigned to an LLC)
NSP Data Base
-------------
The data base for the NSP process is addressed from the PDV for NSP. It
contains most of the run-time data used to control logical links. The format
of the NSP data base is as follows:
Name Offset Description
---- ------ -----------
N$ACQ 0. Two-word queue header of CCBs to be passed to NETACP
N$TCB 4. Address of TCB for NETACP
N$ICF 6. Address of physical link table entry (see below) for
the line to the node which is acting as intercept node
for this node (i.e., the other DN20 in a 2050-2050
configuration), or 0 for none
N$LNI 8. Address of physical link table entry for the line which
is serving as loopback line for this node (i.e., the
line over which all intra-node logical link traffic is
sent), or 0 for none
N$TMP 10. Temporary parameter area (4 words)
N$LVC 18. Number of LLT entries
20. Address of first LLT entry
N$PLD 22. Number of PLT entries
24. Address of first PLT entry
N$NOD 26. Number of node table entries
28. Address of first node table entry
N$VER 30. (1)
32. Address of password table
N$STS 34. Number of statistics table entries
36. Address of the first statistic table entry
Logical Link Table
------------------
Location N$LVC+2 in the NSP data base points to the logical link table (LLT)
address list. This list is indexed by the LLA of the logical link (low-order
byte only). Each list entry points to an LLT element.
In the table description below, an "intercept link" is one for which the
task or process on each end of the link is located in another node, and the
DN20 acts only as a route-through mechanism. A "non-intercept link" has one
or more ends located in the DN20. An intercept link uses two LLT entries, one
for each physical line the link uses. The whole intercept link therefore is
modelled in the DN20 by the two LLTs, each of which is attached to another node
by a physical line and to the other LLT by an imaginary line known as the
"network" (which for a real routing implementation could contain multiple
connected routing nodes). The node on the other end of a LLT's physical line
is called the "adjacent node" for the LLT, while the other LLT's adjacent node
is the "remote node".
The format of an LLT for an intercept NSP such as the DN20's is as follows:
Note that there are two sets of symbol names, I.xxx and L.xxx, one
for intercept links, and one for all links, respectively.
Name Offset Description
---- ------ -----------
L.STA 0. State of the link (byte):
0 - invalid
ST$CIS 1 - Connect Initiate Sent
ST$CC 2 - Connect Confirm sent, waiting to
complete
ST$CIR 3 - Connect Initiate Received
4 - invalid
ST$DAT 5 - Data state: link up and running
ST$DIP 6 - Disconnect In Progress
L.LVL 1. Link level and type (byte):
LF.RSU 100000 Resource recovery required
LF.LCL 40000 Local Logical Link
LF.INT 400 Intercept Link
LF.DSP 40 Stop data flow
LF.DST 20 Start Data Flow
LF.FPN 10 Flow notification to local
process pending
LF.NPN 1 NAK complete to local process
pending
L.LLA 2. Logical link address for this end of the link
L.TIPI 4. Transmits in progress on I/LS channel (byte)
L.TIPD 5. Transmits in progress on data channel (byte)
L.REM 6. Intercept link: pointer to node data base entry for
remote node
Non-intercept link: pointer to entry for adjacent node
L.RLA 8. Intercept link: LLA for other intercept LLT
Non-intercept link: remote LLA
L.FLG 10. More link flags:
LF.INR 000100 - NSP has received an I/LS NAK
LF.NKR 000200 - Data NAK received by NSP
LF.NKS 000400 - Data NAK sent
LF.HF0 001000 - flow closed on this link (transmit)
Link is backpressured
LF.NTS 002000 - data or I/LS received while in CIS
state - must send NAKS
LF.HSF 010000 - segment flow controlled other end
of link (transmit)
LF.HMF 020000 - message flow controlled other end
of link (transmit)
LF.MSF 040000 - segment flow controlled this end
of link (receive)
LF.MMF 100000 - message flow controlled this end
of link (receive)
L.NXN 12. Next data segment number to be sent
L.NIN 14. Next I/LS segment number to be sent
L.RNO 16. Next data segment number to be received
L.LNO 18. Next I/LS segment number to be received
L.LDA 20. Last data segment acknowledged by remote end
OR
L.USTA 20. Disconnect substate for the user (byte):
ST$UNR 11 - User notification of disconnect required
ST$DIR 12 - Disconnect received from network
ST$UDI 13 - Disconnect received from user
ST$DIS 14 - Disconnect sent (DC response required)
ST$DID 15 - Disconnect done (No response required)
L.NSTA 21. Disconnect substate for the network (byte):
L.LIA 22. Last I/LS segment acknowledged by remote end
L.CIQ 22. Connect initiate and connect confirm pending queue
L.USA 24. Non-intercept node only: last data segment number ac-
knowledged by the user
L.LSA 26. Non-intercept node only: last I/LS segment number ac-
knowledged by local NSP
L.NDA 28. Number of data segments to acknowledge
L.UDQ 28. Disconnect initiate pending queue
L.NLA 30. Number of I/LS segments to acknowledge
L.DCR 30. Disconnect reason code
For intercept links:
I.ILA 32. LLA for adjacent end of link
I.IREM 34. Pointer to node data base entry for adjacent node
For non-intercept links:
L.ULA 32. ULA (User Link Address) for user task or 0 for MCB
processes (byte)
L.PDV 33. PDV index for user MCB process or 0 for user task
L.TC 34. Current flow control count for outbound data messages
(byte)
L.TIC 35. Current flow control count for outbound I/LS messages
(byte)
L.LSF 36. Flow control request count status
L.XQ1 38. Queue for storing buffers waiting for transmit
(because of stopped flow control or a NAK)
L.XQ2 40. Queue for storing data buffers waiting to be ACKed by
the other end
L.XQ3 42. Queue for storing I/LS buffers waiting to be ACKed by
the other end
Physical Link Table
-------------------
Location N$PLD+2 in the NSP data base points to the first element of the
physical link table. There is one entry in this table for each active line
which is controlled by NSP. It is indexed by the NSP logical line number for
the line. The contents of a PLT entry are as follows:
Name Offset Description
---- ------ -----------
P$LST 0. Line state:
PS$OFF 0 - idle
PS$STR 1 - DDCMP being started
PS$WT 2 - waiting for a Node Init
PS$NTI 3 - Node Init received
PS$VER 4 - waiting for Node Verify
PS$UP 5 - line up and Node Initialized
1. Line flags:
PF$OFF 0 Off state desired
PF$ON 1 On state desired
PF$STA 3 State flag mask
PF$ENB 200 Link is enabled
PF$EIP 100 control enable function in progress
PF$RVR 40 verification requested when node init sent
P$LCD 2. Recovery flags
RF.CTL 3 COUNT OF OUTSTANDING CONTROL REQUESTS
RF.CLN 0 LOGICAL LINK CLEAN UP REQUIRED
RF.WTM 30 MASK FOR ALLOCATION FAILURE INTO WAIT STATE
RF.WTS 10 STOP FUNCTION FLAG
RF.WTD 20 DISABLE FUNCTION FLAG
RF.TIM 377*400 MASK FOR TIMEOUT VALUE
RF.TM0 1*400 INITIAL TIMEOUT VALUE
P$TIM 3. Recovery timer
P$CHN 4. NSP logical line number for line
P$CNT 5. Number of messages queued
P$PFQ 6. Address of a CCB for a pending control function on line
P$NOD 8. Address of the node table entry for the node on the
other end of the line
P$FRQ 10. Functions requested
P$FSP 12. Functions requested
P$LEN 14. Length of physical link table
Node Table
----------
Location P$NOD+2 in the NSP data base points to the first entry in the node
table. Each entry describes a node known to NSP. The format of a node table
entry is as follows:
Name Offset Description
---- ------ -----------
D$FLG 0. Node flags:
004000 - node is being removed and isn't really in
the table
010000 - node is a remote node, i.e., is not local
or adjacent
020000 - node is an adjacent node, i.e., is connected
to the local node by a direct physical link
040000 - node is the local node
100000 - temporary node entry; used to send a DC to
a remote node newly connecting, when there is
no room in the node data base for the node's
entry
D$CHN 2. Pointer to PLT entry for line over which this node may
be accessed
D$USE 4. Number of logical links to this node (byte)
D$LNG 5. Length in bytes of the node name (byte)
D$NAM 6. Node name in ASCII (1 to 6 bytes; right-padded with
garbage)
Verification Data Base
----------------------
Location N$VER+2 in the NSP data base points to the verification data base,
which contains the transmit and receive passwords. The transmit password is
used to build a Node Verification message to any node which requests one. The
receive password is checked against the Node Verification message the DN20
requests from another node. (The released 3A DN20 system has no receive pass-
word, and consequently does not request verification.) The format of the ver-
ification data base is as follows:
Name Offset Description
---- ------ -----------
V$FLG 0. Flags:
040000 - a transmit password exists
100000 - a receive password exists
V$RCV 2. 8-byte receive password or binary zeroes if none
V$XMT 10. 8-byte transmit password or binary zeroes if none
Statistics Data Base
--------------------
Location N$STS+2 in the NSP data base points to the statistics data base,
which contains node counters for the local nodes. It is this information which
may be requested by the NCP SHOW COUNTS LOCAL command (not implemented in OPR
for 4, but available from the NCU program on the SWS kit). The format of the
statistics data base is as follows:
Name Offset Description
---- ------ -----------
S$SEC 0. Seconds since the data base was zeroed
S$UMS 2. Number of user messages sent
S$UMR 4. Number of user messages received
S$EMR 6. Number of extraneous messages received
S$NKS 8. Number of NAKs sent
S$FMT 10. Number of format errors
S$RES 12. Number of resource allocation failures
S$SNI 14. Number of successful node initializations
S$UNI 16. Number of unsuccessful node initializations
S$LNK 18. Current number of active logical links
S$MLK 20. Maximum number of active logical links
22. 10 bytes of unused statistics for expansion
Debugging with an Unformatted Dump
----------------------------------
This is not meant to be the "only way" to use an unformatted dump, but
rather "a" way which has been used with varying degrees of success.
The first thing to find out is where the crash occurred. The relevant
data may be found in the crash dump area. If the PS has the two high-order
bits off, the crash occurred in the executive (which for this purpose includes
all of MCB). In this case, if the PC is less than 120000, it is equal to the
physical address of the crash; the module this address is in will be found in
either EXEC.MAP or CEX.MAP. If the PC is in the 120000-137777 range, it is
in an MCB process; PAR 5 may be examined to determine which process, and the
errant module will be found in that process's map.
If the crash PS has the two high-order bits on, the crash occurred in a
user program (NICE or the 72-hour Test ). Location $TKTCB in the
executive points to the TCB for the active task.
If you are in either NETACP or the NSP process, register 5 should point
to the current CCB being acted on. Register 0 may be pointing to the current
LLT entry; register 4 will probably be pointing to the place in the NSP message
which is being examined. The first buffer descriptor in the CCB will be point-
ing to MSGFLGS in the message; the second descriptor will point to the beginning
of the message (either MSGFLGS or the routing header if there is one).
If you are in any other MCB process, register 4 should point to the current
CCB.
[END]